Skip to content

Develop a web data scraper application with manual list/grid selection and work profiles #70

@Scumpic

Description

@Scumpic

Create a new application inspired by the Instant Data Scraper Chrome extension, but with additional functionalities:

  • Automatically detect and extract structured data (tables, lists, grids) from arbitrary web pages, using heuristics or AI analysis.
  • Allow the user to manually select lists, tables, or grid elements on the page for data extraction, in cases where automatic detection is insufficient or inaccurate.
  • Enable users to save their manual selections as named "work profiles" that can be loaded and reused on similar or recurring web pages.
  • Include export options for CSV, Excel, JSON, and integration hooks (e.g., Google Sheets, Airtable, Zapier).
  • Ensure user privacy by processing data locally in the browser whenever possible.
  • Optionally support scraping of paginated or dynamically loaded content (AJAX/infinite scroll).

This feature will improve usability for users dealing with complex or inconsistent web pages and streamline repetitive scraping tasks.

How it works:

The extension uses AI-based heuristics to analyze the HTML structure of web pages and identify sections containing structured or tabular data.
It does not require custom scripts or site-specific modules, but instead automatically scans the page to find tables or lists.

Data identification and selection:

Automatic scanning:

Analyzes HTML table elements (

)

Identifies lists (

    ,
      )

      Detects grids built with

      Recognizes repeating data blocks

      Smart selection:

      Uses AI to determine which tables or sections are most likely to contain useful information

      Allows preview of detected data

      Offers the option for manual adjustment if the automatic prediction is not accurate

      Technical features:

      Supports dynamic content (infinite scroll or AJAX tables)

      Can extract data from multiple pages (pagination)

      Collects not only text but also links and image sources

      Runs entirely in the browser using:

      JavaScript for DOM manipulation

      Chrome extension APIs for data access

      Does not send data externally for processing

      Export and processing:

      Export formats include:

      CSV

      Excel

      JSON

      Direct export to Google Sheets

      Integration with Airtable and Zapier

      Allows renaming and filtering of columns before export

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions