Go to Studio

HTML Cleaner

Clean and sanitize HTML content

What does this node do?

Removes unwanted HTML elements like scripts, styles, and ads from content.

Configuration

html string required

HTML content to clean.

remove_scripts boolean default: true

Remove script tags.

remove_styles boolean default: true

Remove style tags and attributes.

remove_comments boolean default: true

Remove HTML comments.

selectors_to_remove array

CSS selectors to remove (e.g., .ads, #sidebar).

Output

{
  "cleaned_html": "<div>Clean content...</div>",
  "removed_elements": 15
}

Use case

Clean scraped content before AI processing:

graph LR
    A[Web Scraper] --> B[HTML Cleaner]
    B --> C[HTML to Markdown]
    C --> D[LLM]