How to find broken link in wordpress posts using python

Click to rate this post!
[Total: 1 Average: 5]

Introduction:

When managing a WordPress website, it’s important to ensure that the content remains valid and up-to-date. One critical aspect is checking for invalid or broken links within the content of posts. This script allows you to directly access the WordPress MySQL database, avoiding hosting limitations such as pagination limits, and identify posts that contain invalid links.

Summary:

This Python script connects to the WordPress MySQL database and performs the following tasks:

  1. Connects to the WordPress database using MySQL credentials.
  2. Retrieves published posts from the WordPress database.
  3. Parses the HTML content of each post to find links.
  4. Checks each link for validity by sending an HTTP request.
  5. If an invalid link is detected, it prints the post details, including the post title and a link to the post, along with the invalid link.
  6. Continues to process other posts, identifying and printing posts with invalid links.

This approach allows for efficient identification of posts that require attention due to the presence of invalid links, and it bypasses hosting limitations related to pagination.

import mysql.connector
import requests
from bs4 import BeautifulSoup

# Connect to the database
conn = mysql.connector.connect(host='your-database-host',
                               database='your-database-name',
                               user='your-username',
                               password='your-password',)

# Create a cursor
cursor = conn.cursor()

# Execute the SQL query to retrieve posts
sql_query = """
    SELECT
        wp_posts.ID AS post_id,
        wp_posts.post_title AS post_title,
        wp_posts.post_content AS post_content,
        CONCAT('https://www.electronics-engineering.com/', wp_posts.post_name) AS post_url
    FROM
        wp_posts
    WHERE
        wp_posts.post_type = 'post'
        AND wp_posts.post_status = 'publish';
"""
cursor.execute(sql_query)


# Function to check if a link is valid
def is_valid_link(link):
    try:
        response = requests.get(link)
        if response.status_code == 200:
            return True
        elif ("risorse_dinamiche" in link) or ("electronics-engineering.com" in link):
            return True
        else:
            return False
    except requests.exceptions.RequestException:
        if ("risorse_dinamiche" in link) or ("electronics-engineering.com" in link):
            return True
        else:
            return False


# Process the results
for result in cursor.fetchall():
    post_id, post_title, post_content, post_url = result

    # Parse the HTML content
    soup = BeautifulSoup(post_content, 'html.parser')

    # Find all links in the content
    links = [a['href'] for a in soup.find_all('a', href=True)]

    # Check each link for validity
    for link in links:
        if not is_valid_link(link):
            print(f"Post ID: {post_id}")
            print(f"Title: {post_title}")
            print(f"Link: {post_url}")
            print(f"Invalid Link: {link}")
            print()
            break

# Close cursor and connection
cursor.close()
conn.close()

This script provides a practical solution for WordPress site administrators to maintain content quality by identifying and addressing posts with invalid links.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x