Skip to content

[Plugin] Create Robots License Guard Plugin using Plugin Framework #1066

@crivetimihai

Description

@crivetimihai

Overview

Create a Robots License Guard Plugin that respects robots.txt files and enforces license restrictions for web scraping and content access.

Plugin Requirements

Plugin Details

  • Name: RobotsLicenseGuardPlugin
  • Type: Self-contained (native) plugin
  • File Location: plugins/robots_license_guard/
  • Complexity: Medium

Functionality

  • Parse and respect robots.txt files
  • Enforce crawl delays and rate limits
  • Check content licensing restrictions
  • Support user-agent specific rules
  • Cache robots.txt data

Hook Integration

  • Primary Hooks: resource_pre_fetch, tool_pre_invoke
  • Purpose: Enforce robots.txt and licensing compliance
  • Behavior: Block or delay requests based on robots.txt rules

Acceptance Criteria

  • Plugin implements RobotsLicenseGuardPlugin class
  • Robots.txt parsing and compliance
  • User-agent specific rule handling
  • Crawl delay enforcement
  • License restriction checking
  • Plugin manifest and documentation created
  • Unit tests with >90% coverage

Priority

Medium - Compliance feature

Dependencies

  • Robots.txt parsing libraries
  • HTTP client utilities

Security Considerations

  • Respect website access policies
  • Proper user-agent identification
  • Audit logging for compliance

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions