Tools
Web Scraper
This document explains the Web Scraper node, which extracts content and HTML code from web pages using specified settings and templates.
Node Inputs
Required Fields
URL:
The web address to scrape.
Example: "https://www.example.com"
Optional Fields
Default Content Type:
Specifies the type of content to extract based on predefined templates. Options:
- No Template: No predefined structure is used.
- Article: Extracts article-specific content.
- ArticleList: Extracts a list of articles from the page.
- Product: Extracts product-specific details.
- ProductList: Extracts a list of products.
XPath 1, XPath 2, XPath 3:
Custom XPath expressions for targeted data extraction.
Example: "//div[@class='content']"
Node Output
Output:
Extracted content or HTML code based on the URL and settings provided.
Example Output:
Node Functionality
The Web Scraper node:
- Connects to the provided URL and retrieves HTML content.
- Supports structured extraction using predefined templates like Article or Product.
- Allows customization using up to three custom XPath expressions for precise data targeting.
- Outputs raw HTML or parsed content for further analysis or integration.