As you build more sophisticated automations, you’ll encounter websites with complex structures, dynamic content, and nested data. This section provides strategies for handling these complexities effectively with Jsonify.

1. Dealing with Complex Website Structures

  • Break Down the Problem: Instead of trying to do everything in one giant workflow, break the task into smaller, manageable parts.
    • If different types of pages in your workflow (e.g., a product listing page vs. a product detail page accessed after a click) have vastly different data structures, you will use a separate, appropriately configured Extract Data block after navigating to each respective page type. Jsonify processes one Extract Data configuration per page view.
    • Chain workflows if the overall process is very long or involves distinct stages that are logically separate.
  • Iterative Navigation: For sites where content is spread across many linked pages or requires specific interactions to reveal, use a combination of Find Links, Follow Links, and Interact with Page blocks methodically to reach the desired page before extraction.
  • Targeting Specific Sections: When extracting data with an Extract Data block, use very specific instructions to target only the relevant parts of a complex page, ignoring sidebars, footers, or unrelated content. Use negative constraints (e.g., “Do not extract from the ‘related articles’ section”).

2. Extracting Nested Data (JSON Output)

Many websites present data in a hierarchical or nested fashion (e.g., an author with multiple books, each book with multiple reviews).

  • Using the Extract Data Block’s Advanced Mode (Edit data shape - JSON):
    • This is the most powerful way to define nested structures. You can directly write a JSON schema that mirrors the desired output. The values in your schema should be descriptive prompts within angle brackets <...> to guide the AI.
    • Example: To extract a list of articles, each with an author object (name, profile URL) and a list of tags:
      [
        {
          "article_title": "<The main title of the article>",
          "publication_date": "<Date the article was published, e.g., YYYY-MM-DD>",
          "author": {
            "author_name": "<Full name of the article's author>",
            "author_profile_url": "<URL to the author's profile page>"
          },
          "tags": [
            "<tag for the article>" // AI will try to find all relevant tags and return them as a list of strings
          ]
        }
      ]
    • In your “Additional Instructions” for the Extract Data block, you would then guide the AI on how to populate this structure (e.g., “For each article, populate the article_title with the main title, and publication_date with the publishing date. For the author object, fill author_name and author_profile_url. For tags, collect all associated tags as an array of strings.”).
  • Flattening Data (Simpler Approach):
    • If the nesting isn’t too deep or critical, you can “flatten” it by creating combined field names in the table view (e.g., author_name, author_profile_url instead of a nested author object). This is simpler to set up but provides a less structured output.

3. Handling Dynamic Content (JavaScript-Loaded Content)

Some websites load content dynamically using JavaScript after the initial page load (e.g., infinite scroll, content appearing after a “Load More” button click, or in pop-up/modal windows). The AI Agent perceives all visible content on the page at any given time.

  • Interact with Page Block (for “Load More” buttons, pop-ups, etc.):

    • Use this to simulate actions that trigger dynamic content by clicking specific elements, such as “Load More” buttons, tabs that reveal more data, or buttons that open modal/pop-up windows without a full page reload.
    • If content loads progressively through multiple such clicks, you would list multiple click actions sequentially.
    • Important: After an interaction that loads new content (like opening a modal), the AI Agent will see the updated page state. Subsequent Extract Data or Find Links blocks will operate on this new state.
    • Sequential Extraction with Interactions: It’s possible to perform an initial data extraction from the main page content, then use Interact with Page to trigger an event (e.g., click a button to open a pop-up window), and then use another Extract Data block to get information specifically from this newly appeared pop-up/modal content. All these operations are considered to be within the context of the same initial page view, as the primary URL does not change for the modal itself.
  • Paginate a list Block (for Infinite Scroll & Button-Based Pagination):

    • This block is designed to handle common dynamic content patterns, including:
      • Infinite Scroll: Where new content loads as the user would typically scroll down. The Paginate a list block simulates these scroll actions to load new viewports of content.
      • “Next Page” / Numbered Page Buttons: It can also handle clicking traditional pagination buttons.
    • Configure it with the number of “pages” or scroll/click iterations to perform.
  • Waiting for Content (Implicit):

    • Jsonify’s AI agents are designed to wait for pages to load fully, including a reasonable amount of time for initial JavaScript execution. For content loaded by specific user interactions (clicks, scrolls), these interactions must be explicitly defined using Interact with Page or Paginate a list.

4. Dealing with Variations in Page Layout

Sometimes, similar pages (e.g., product pages from the same site) can have slight variations in layout.

  • Flexible Instructions: Write your Extract Data descriptions to be somewhat flexible. Instead of “Extract the text from the third paragraph,” try “Extract the paragraph that starts with ‘Product Overview:’.”
  • Focus on Semantic Meaning: Instruct the AI based on the meaning of the data (e.g., “the main product image,” “the discounted price”) rather than its exact position or HTML structure, as the AI is designed to understand content semantically. Jsonify’s AI will attempt to adapt to minor layout variations.

5. Error Tolerance and Robustness (Conceptual)

  • Missing Data: In your Extract Data instructions, always specify how to handle missing fields (e.g., “If the discount price is not available, leave the discount_price field empty”). This prevents errors and ensures consistent output structure.
  • Iterative Testing: Complex sites require more iterative testing. Test with various example pages to ensure your workflow handles common variations.

By anticipating these complexities and using Jsonify’s blocks strategically, you can build robust workflows capable of handling a wide range of websites and data structures.