-
Notifications
You must be signed in to change notification settings - Fork 414
Closed
Labels
Milestone
Description
Overview
Create a Robots License Guard Plugin that respects robots.txt files and enforces license restrictions for web scraping and content access.
Plugin Requirements
Plugin Details
- Name: RobotsLicenseGuardPlugin
- Type: Self-contained (native) plugin
- File Location:
plugins/robots_license_guard/ - Complexity: Medium
Functionality
- Parse and respect robots.txt files
- Enforce crawl delays and rate limits
- Check content licensing restrictions
- Support user-agent specific rules
- Cache robots.txt data
Hook Integration
- Primary Hooks:
resource_pre_fetch,tool_pre_invoke - Purpose: Enforce robots.txt and licensing compliance
- Behavior: Block or delay requests based on robots.txt rules
Acceptance Criteria
- Plugin implements RobotsLicenseGuardPlugin class
- Robots.txt parsing and compliance
- User-agent specific rule handling
- Crawl delay enforcement
- License restriction checking
- Plugin manifest and documentation created
- Unit tests with >90% coverage
Priority
Medium - Compliance feature
Dependencies
- Robots.txt parsing libraries
- HTTP client utilities
Security Considerations
- Respect website access policies
- Proper user-agent identification
- Audit logging for compliance