Define a schema to extract new information from a webpage and, in advanced mode, merge it with data from previous steps.
Extract Data
block is the core of most data collection workflows. It’s where you tell the AI Agent precisely what new information to pull from the current webpage and how to structure it.
In its advanced JSON mode, this block has the powerful ability to merge data. It can take existing variables from a previous Extract Data
step and combine them with newly extracted information into a single, structured output.
Extract Data
block to:
Extract Data
step with new data extracted from the current page into a unified result.Open Websites
, Follow Links
, or Explore Content
.
price
, stock_status
). This becomes a new variable.<the current product price>
).Screenshot: Extract Data block configuration showing item type selection and attribute table
<...>
instructing the AI what to find on the page.{{variable_name}}
. The agent will not search for this on the page, but will pull the existing value from its memory.+ Merge previous columns
link to reveal the merge section.existing variables from a preceding Extract Data block
that you want to carry over and include in the final output for this item.Screenshot: Extract Data block configuration showing Advanced mode
Screenshot: Extract Data block configuration showing JSON Editor
Extract Data
step, initial data is collected (e.g., product_name
).Follow Links
), the workflow arrives at a new page.Extract Data
block (in JSON mode) executes."price": "<the price of the product>"
), it scrapes the page for that new information."product": "<put {{product_name}} as value of this field>"
), it retrieves the existing value from the previous extraction step.Open Websites Block:
First Extract Data Block:
A list of items
.product_name
and url
for each product in the list.Follow Links Block:
Second Extract Data Block (on the detail page):
You are on a product detail page. Extract the price of {{product_name}}
product_name
product_name
and price
. For each product, the product_name
is carried over from the first extraction step via the merge, and the price
is newly extracted from the detail page.variable_name
must exactly match the name of a variable from a previous extraction (case-sensitive).Extract Data
block, with its combined extraction and merging capabilities, is a powerful tool for creating comprehensive and well-structured datasets.