Introduction:
When managing a WordPress website, it’s important to ensure that the content remains valid and up-to-date. One critical aspect is checking for invalid or broken links within the content of posts. This script allows you to directly access the WordPress MySQL database, avoiding hosting limitations such as pagination limits, and identify posts that contain invalid links.
Summary:
This Python script connects to the WordPress MySQL database and performs the following tasks:
- Connects to the WordPress database using MySQL credentials.
- Retrieves published posts from the WordPress database.
- Parses the HTML content of each post to find links.
- Checks each link for validity by sending an HTTP request.
- If an invalid link is detected, it prints the post details, including the post title and a link to the post, along with the invalid link.
- Continues to process other posts, identifying and printing posts with invalid links.
This approach allows for efficient identification of posts that require attention due to the presence of invalid links, and it bypasses hosting limitations related to pagination.
import mysql.connector
import requests
from bs4 import BeautifulSoup
# Connect to the database
conn = mysql.connector.connect(host='your-database-host',
database='your-database-name',
user='your-username',
password='your-password',)
# Create a cursor
cursor = conn.cursor()
# Execute the SQL query to retrieve posts
sql_query = """
SELECT
wp_posts.ID AS post_id,
wp_posts.post_title AS post_title,
wp_posts.post_content AS post_content,
CONCAT('https://www.electronics-engineering.com/', wp_posts.post_name) AS post_url
FROM
wp_posts
WHERE
wp_posts.post_type = 'post'
AND wp_posts.post_status = 'publish';
"""
cursor.execute(sql_query)
# Function to check if a link is valid
def is_valid_link(link):
try:
response = requests.get(link)
if response.status_code == 200:
return True
elif ("risorse_dinamiche" in link) or ("electronics-engineering.com" in link):
return True
else:
return False
except requests.exceptions.RequestException:
if ("risorse_dinamiche" in link) or ("electronics-engineering.com" in link):
return True
else:
return False
# Process the results
for result in cursor.fetchall():
post_id, post_title, post_content, post_url = result
# Parse the HTML content
soup = BeautifulSoup(post_content, 'html.parser')
# Find all links in the content
links = [a['href'] for a in soup.find_all('a', href=True)]
# Check each link for validity
for link in links:
if not is_valid_link(link):
print(f"Post ID: {post_id}")
print(f"Title: {post_title}")
print(f"Link: {post_url}")
print(f"Invalid Link: {link}")
print()
break
# Close cursor and connection
cursor.close()
conn.close()
This script provides a practical solution for WordPress site administrators to maintain content quality by identifying and addressing posts with invalid links.