Handling Complexity
Strategies for tackling complex websites, nested data, and dynamic content in your Jsonify workflows.
As you build more sophisticated automations, you’ll encounter websites with complex structures, dynamic content, and nested data. This section provides strategies for handling these complexities effectively with Jsonify.
1. Dealing with Complex Website Structures
- Break Down the Problem: Instead of trying to do everything in one giant workflow, break the task into smaller, manageable parts.
- If different types of pages in your workflow (e.g., a product listing page vs. a product detail page accessed after a click) have vastly different data structures, you will use a separate, appropriately configured
Extract Data
block after navigating to each respective page type. Jsonify processes oneExtract Data
configuration per page view. - Chain workflows if the overall process is very long or involves distinct stages that are logically separate.
- If different types of pages in your workflow (e.g., a product listing page vs. a product detail page accessed after a click) have vastly different data structures, you will use a separate, appropriately configured
- Iterative Navigation: For sites where content is spread across many linked pages or requires specific interactions to reveal, use a combination of
Find Links
,Follow Links
, andInteract with Page
blocks methodically to reach the desired page before extraction. - Targeting Specific Sections: When extracting data with an
Extract Data
block, use very specific instructions to target only the relevant parts of a complex page, ignoring sidebars, footers, or unrelated content. Use negative constraints (e.g., “Do not extract from the ‘related articles’ section”).
2. Extracting Nested Data (JSON Output)
Many websites present data in a hierarchical or nested fashion (e.g., an author with multiple books, each book with multiple reviews).
- Using the
Extract Data
Block’s Advanced Mode (Edit data shape - JSON):- This is the most powerful way to define nested structures. You can directly write a JSON schema that mirrors the desired output. The values in your schema should be descriptive prompts within angle brackets
<...>
to guide the AI. - Example: To extract a list of articles, each with an author object (name, profile URL) and a list of tags:
- In your “Additional Instructions” for the
Extract Data
block, you would then guide the AI on how to populate this structure (e.g., “For each article, populate the article_title with the main title, and publication_date with the publishing date. For the author object, fill author_name and author_profile_url. For tags, collect all associated tags as an array of strings.”).
- This is the most powerful way to define nested structures. You can directly write a JSON schema that mirrors the desired output. The values in your schema should be descriptive prompts within angle brackets
- Flattening Data (Simpler Approach):
- If the nesting isn’t too deep or critical, you can “flatten” it by creating combined field names in the table view (e.g.,
author_name
,author_profile_url
instead of a nestedauthor
object). This is simpler to set up but provides a less structured output.
- If the nesting isn’t too deep or critical, you can “flatten” it by creating combined field names in the table view (e.g.,
3. Handling Dynamic Content (JavaScript-Loaded Content)
Some websites load content dynamically using JavaScript after the initial page load (e.g., infinite scroll, content appearing after a “Load More” button click, or in pop-up/modal windows). The AI Agent perceives all visible content on the page at any given time.
-
Interact with Page
Block (for “Load More” buttons, pop-ups, etc.):- Use this to simulate actions that trigger dynamic content by clicking specific elements, such as “Load More” buttons, tabs that reveal more data, or buttons that open modal/pop-up windows without a full page reload.
- If content loads progressively through multiple such clicks, you would list multiple click actions sequentially.
- Important: After an interaction that loads new content (like opening a modal), the AI Agent will see the updated page state. Subsequent
Extract Data
orFind Links
blocks will operate on this new state. - Sequential Extraction with Interactions: It’s possible to perform an initial data extraction from the main page content, then use
Interact with Page
to trigger an event (e.g., click a button to open a pop-up window), and then use anotherExtract Data
block to get information specifically from this newly appeared pop-up/modal content. All these operations are considered to be within the context of the same initial page view, as the primary URL does not change for the modal itself.
-
Paginate a list
Block (for Infinite Scroll & Button-Based Pagination):- This block is designed to handle common dynamic content patterns, including:
- Infinite Scroll: Where new content loads as the user would typically scroll down. The
Paginate a list
block simulates these scroll actions to load new viewports of content. - “Next Page” / Numbered Page Buttons: It can also handle clicking traditional pagination buttons.
- Infinite Scroll: Where new content loads as the user would typically scroll down. The
- Configure it with the number of “pages” or scroll/click iterations to perform.
- This block is designed to handle common dynamic content patterns, including:
-
Waiting for Content (Implicit):
- Jsonify’s AI agents are designed to wait for pages to load fully, including a reasonable amount of time for initial JavaScript execution. For content loaded by specific user interactions (clicks, scrolls), these interactions must be explicitly defined using
Interact with Page
orPaginate a list
.
- Jsonify’s AI agents are designed to wait for pages to load fully, including a reasonable amount of time for initial JavaScript execution. For content loaded by specific user interactions (clicks, scrolls), these interactions must be explicitly defined using
4. Dealing with Variations in Page Layout
Sometimes, similar pages (e.g., product pages from the same site) can have slight variations in layout.
- Flexible Instructions: Write your
Extract Data
descriptions to be somewhat flexible. Instead of “Extract the text from the third paragraph,” try “Extract the paragraph that starts with ‘Product Overview:’.” - Focus on Semantic Meaning: Instruct the AI based on the meaning of the data (e.g., “the main product image,” “the discounted price”) rather than its exact position or HTML structure, as the AI is designed to understand content semantically. Jsonify’s AI will attempt to adapt to minor layout variations.
5. Error Tolerance and Robustness (Conceptual)
- Missing Data: In your
Extract Data
instructions, always specify how to handle missing fields (e.g., “If the discount price is not available, leave thediscount_price
field empty”). This prevents errors and ensures consistent output structure. - Iterative Testing: Complex sites require more iterative testing. Test with various example pages to ensure your workflow handles common variations.
By anticipating these complexities and using Jsonify’s blocks strategically, you can build robust workflows capable of handling a wide range of websites and data structures.