Extract Data block is the core of most data collection workflows. It’s where you tell the AI Agent precisely what new information to pull from the current webpage and how to structure it.
In its advanced JSON mode, this block has the powerful ability to merge data. It can take existing variables from a previous Extract Data step and combine them with newly extracted information into a single, structured output.
Purpose
Use theExtract Data block to:
- Define a schema to extract new data like text, numbers, or other details from a page.
- Scrape data from a single item or a list of items on a page.
- (Advanced) Merge data from a previous
Extract Datastep with new data extracted from the current page into a unified result.
Open Websites, Follow Links, or Explore Content.
Configuration
Simple Mode
- Defining Attributes to Extract:
- In the default Simple mode, you define the new information you want the AI to find on the current page using a table.
- NAME (Column 1): The name for the new piece of data (e.g.,
price,stock_status). This becomes a new variable. - DESCRIPTION (Column 2): A clear instruction or example for the AI on what to find (e.g.,
<the current product price>).
- Other Options:
- A list of items / A single item: Choose whether you’re extracting multiple items from the current page or just one.
- Add additional instructions: Provide global context or rules for the extraction process.

Screenshot: Extract Data block configuration showing item type selection and attribute table
Advanced (JSON) Mode
Clicking Advanced switches to a JSON editor for full control over the output schema. This mode unlocks the merging capability.- Defining Extraction and Merging in JSON:
- You define a JSON object where keys represent the final output fields.
- For new data, the value for a key is a descriptive prompt in angle brackets
<...>instructing the AI what to find on the page. - For old data from a previous steps, the value for a key is the corresponding
{{variable_name}}. The agent will not search for this on the page, but will pull the existing value from its memory.
- Merging Previous Columns:
- Click the
+ Merge previous columnslink to reveal the merge section. - A new field appears: “And merge these columns from previous results”.
- Here, you specify the names of
existing variables from a preceding Extract Data blockthat you want to carry over and include in the final output for this item. - The agent does not search for this data on the current page; it simply pulls the existing value from its memory for the current item’s loop.
- Click the

Screenshot: Extract Data block configuration showing Advanced mode

Screenshot: Extract Data block configuration showing JSON Editor
How It Works
- A workflow processes a series of items. In an early
Extract Datastep, initial data is collected (e.g.,product_name). - After more steps (like
Follow Links), the workflow arrives at a new page. - A second
Extract Datablock (in JSON mode) executes. - The agent looks at the JSON schema. For keys with descriptive prompts (e.g.,
"price": "<the price of the product>"), it scrapes the page for that new information. - For keys with variable placeholders (e.g.,
"product": "<put {{product_name}} as value of this field>"), it retrieves the existing value from the previous extraction step. - Finally, it combines the newly extracted data and the merged data into a single output row matching the JSON structure.
Example: Merging List Data with Detail Page Data
Imagine a workflow that scrapes product names from a category page and then gets the price for each from their individual detail pages.Open Websites Block:- Opens a product category page.
First Extract Data Block:- Mode: Simple
- What information…?:
A list of items. - Attributes: Extracts
product_nameandurlfor each product in the list.
Follow Links Block:- Configured to “Follow each link”.
Second Extract Data Block (on the detail page):- Mode: Advanced (JSON)
-
JSON Schema:
-
Additional instructions:
You are on a product detail page. Extract the price of {{product_name}} -
Merge previous columns
product_name
- Result: The final output will be a table with two columns:
product_nameandprice. For each product, theproduct_nameis carried over from the first extraction step via the merge, and thepriceis newly extracted from the detail page.
Key Considerations
- Merge is an Advanced Feature: The ability to merge data from previous extraction steps is only available in the Advanced (JSON) Mode. The Simple mode only defines new data to be extracted.
- Variable Name Matching: When merging in JSON mode, the
variable_namemust exactly match the name of a variable from a previous extraction (case-sensitive). - No Redundant Extraction: Use the merge feature to avoid asking the AI to re-extract information you already have, which makes your workflow more efficient and reliable.
Extract Data block, with its combined extraction and merging capabilities, is a powerful tool for creating comprehensive and well-structured datasets.
