HTML Cleaner

Node Description

The HTML Cleaner node processes HTML content and removes specified elements, tags, and attributes based on your configuration. This is useful for simplifying HTML, removing unnecessary metadata, or extracting readable content.

Node Inputs

Required Fields

HTML
The raw HTML content to be cleaned.
Example:

<html>
<head><title>Example</title></head>
<body><h1>Hello, World!</h1><script>console.log("Hi!")</script></body>
</html>

Optional Fields

You can enable or disable the removal of specific HTML elements.

Remove <iframe>
Remove all <iframe> elements from the HTML.
Default: Enabled
Remove <header>
Remove all <header> tags.
Default: Enabled
Remove <nav>
Remove all <nav> tags.
Default: Enabled
Remove <footer>
Remove all <footer> tags.
Default: Enabled
Remove Attributes
Remove all attributes from the tags, leaving only the bare tags.
Default: Enabled
Remove Additional Tags (Optional):
- <script>
- <meta>
- <link>
- <style>
- <noscript>
- <head>
- <img> and <svg>
- <video>
All are toggled on by default, but you can customize based on your needs.

Output Format

Output Type
- Text: Extracts clean text content only.
- HTML: Returns the cleaned HTML structure.

Node Output

The HTML Cleaner node provides the following output:

Output: Cleaned HTML or plain text depending on the selected format.

Example Output (Text):

Hello, World!

Example Output (HTML):

<h1>Hello, World!</h1>

Example Usage

1. Extract Clean Text

HTML Input:

<html>
<header></header>
<body>
   <h1>Welcome!</h1>
   <script>console.log('Hi')</script>
</body>
</html>

Configuration:
- Remove <header>: Enabled
- Remove <script>: Enabled
- Output: Text

Output:

Welcome!

2. Simplify HTML Structure

HTML Input:

<html>
<body>
   <div>
      <h1>Main Title</h1>
      <footer>Footer Content</footer>
   </div>
</body>
</html>

Configuration:
- Remove <footer>: Enabled
- Output: HTML

Output:

<div>
   <h1>Main Title</h1>
</div>

Node Functionality

The HTML Cleaner node is perfect for:

Simplifying raw HTML before text extraction.
Removing clutter like ads, metadata, or scripts from web-scraped content.
Preprocessing content for downstream workflows like NLP or data analysis.

This node helps ensure that you only work with the most relevant and clean content.

Get Started

Tools

AI Nodes

Input Nodes

Output Nodes

Node Description

Node Inputs

Required Fields

Optional Fields

Output Format

Node Output

Example Usage

1. Extract Clean Text

2. Simplify HTML Structure

Node Functionality

Get Started

Tools

AI Nodes

Input Nodes

Output Nodes

​Node Description

​Node Inputs

​Required Fields

​Optional Fields

​Output Format

​Node Output

​Example Usage

​1. Extract Clean Text

​2. Simplify HTML Structure

​Node Functionality

Node Description

Node Inputs

Required Fields

Optional Fields

Output Format

Node Output

Example Usage

1. Extract Clean Text

2. Simplify HTML Structure

Node Functionality