Linea Docs

Extract Node

Scrape and extract content from web pages using Firecrawl or a native fetch fallback.

Extract Node

The extract node fetches and parses web content from one or more URLs. It uses the Firecrawl API when a FIRECRAWL_API_KEY is configured in workspace secrets, and falls back to native fetch + HTML stripping otherwise.

Configuration Fields

FieldTypeDefaultDescription
scrapeUrlstringSingle URL to scrape (supports {{}})
batchUrlsstringComma-separated list of URLs for batch mode
mapUrlstringURL to map — discovers all internal links
scrapeFormatsstring[]['markdown']Output formats: markdown, html, text
outputFieldmarkdown | html | text | fullmarkdownWhich field to return as lastOutput

Only one of scrapeUrl, batchUrls, or mapUrl should be set per node.

Modes

Single Scrape (scrapeUrl)

Fetches one URL and returns content in the requested format.

With Firecrawl: Returns cleaned markdown, HTML, or text from the Firecrawl /scrape endpoint.

Without Firecrawl (native fallback): Strips <script>, <style>, and all HTML tags, extracting the title and plain text.

Batch Scrape (batchUrls)

Sends multiple URLs to the Firecrawl /batch/scrape endpoint in one call.

{ "results": [...], "count": 3 }

Batch mode requires Firecrawl. The native fallback only handles single URLs.

Map (mapUrl)

Discovers all links on the page using Firecrawl's /map endpoint.

{ "links": ["https://example.com/page1", "..."], "count": 42 }

Output

outputFieldContent
markdownCleaned markdown representation
htmlRaw HTML
textPlain text (scripts/styles stripped)
fullObject with all fields: { url, title, html, text, markdown }

Example Config

{
  "scrapeUrl": "{{input.articleUrl}}",
  "scrapeFormats": ["markdown"],
  "outputField": "markdown"
}

Timeout is 30 seconds per request. Very large pages may be truncated by Firecrawl. SSRF protection is not applied to this node — ensure URLs come from trusted sources.

On this page