HTML Cleaner
Clean and sanitize HTML content
What does this node do?
Removes unwanted HTML elements like scripts, styles, and ads from content.
Configuration
html string required HTML content to clean.
remove_scripts boolean default: true Remove script tags.
remove_styles boolean default: true Remove style tags and attributes.
remove_comments boolean default: true Remove HTML comments.
selectors_to_remove array CSS selectors to remove (e.g., .ads, #sidebar).
Output
{
"cleaned_html": "<div>Clean content...</div>",
"removed_elements": 15
}
Use case
Clean scraped content before AI processing:
graph LR
A[Web Scraper] --> B[HTML Cleaner]
B --> C[HTML to Markdown]
C --> D[LLM]