Docs Scraper

A Python-based web scraping tool that converts web content to markdown format using the Firecrawl API.

Features

Converts web pages to clean markdown format
Supports JavaScript rendering for dynamic content
Configurable wait times for dynamic content loading
Custom HTTP headers support
Interactive or command-line output file selection
Error handling for failed scrapes and file operations

Requirements

Python 3.6 or higher
Firecrawl API key (sign up at firecrawl.dev)
Required Python packages (installed via requirements.txt):
- python-dotenv
- firecrawl
- requests

Setup

Clone this repository
Create a .env file in the root directory with your Firecrawl API key:
```
FIRECRAWL_API_KEY=your_api_key_here
```
Install dependencies:
```
pip install -r requirements.txt
```

Usage

Run the script with a URL:

python main.py <url>

Optional arguments:

--wait <seconds>: Time to wait for dynamic content
--js: Enable JavaScript rendering
--headers "key1:value1,key2:value2": Custom headers
--output <file>: Output markdown file path (if not provided, will prompt)

Examples:

# Basic usage with output prompt
python main.py https://example.com

# Enable JavaScript rendering and specify output file
python main.py https://example.com --js --output result.md

# Wait for dynamic content and add custom headers
python main.py https://example.com --wait 5 --headers "User-Agent:Mozilla/5.0"

Error Handling

The script handles various error scenarios:

Invalid URLs or connection errors
Missing API key
Invalid header format
File write permission issues
Failed JavaScript rendering

Output

The script generates a markdown (.md) file containing:

Converted web content in markdown format
Preserved heading structure
Formatted links and images
Tables and lists (if present in source)

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Docs Scraper

Features

Requirements

Setup

Usage

Error Handling

Output

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

Orinks/CLI-scrape

Folders and files

Latest commit

History

Repository files navigation

Docs Scraper

Features

Requirements

Setup

Usage

Error Handling

Output

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages