Skip to content

High-fidelity, Rust-powered browser observation system for bot detection with forensic granularity. Multi-layer fingerprinting, behavioral modeling, and ML-driven anomaly detection.

Notifications You must be signed in to change notification settings

copyleftdev/scrybe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦‰ Scrybe

Scrybe - The Vigilant Observer

"Welcome, traveler. I am Scrybe. You have just gifted me a fingerprint.
My task is to remember it, enrich it, and test its truth."


🎯 Vision

Scrybe is a high-fidelity, Rust-powered browser observation system designed to detect and understand automation with forensic granularity. It is equal parts data collector, behavior profiler, and session fingerprint historianβ€”engineered to act as a sophisticated anti-bot detection engine and training ground for resilient bot defenses.

More than a passive observer, Scrybe is a vigilant system that watches browsers with contextual memory and scientific rigor. Its mission is not just to block botsβ€”it's to understand them, adapt to them, and learn from every interaction.


πŸ¦‰ Meet Scrybe

Species: Autonomous Rust Intelligence
Personality: Scholarly, curious, and unflinchingly meticulous

Scrybe documents all who visit its domainβ€”not to judge, but to remember. Every movement, header, and anomaly becomes a piece of a broader behavioral mosaic.

  • Humans find Scrybe charming
  • Bots find it uncanny

✨ Key Features

Canvas, WebGL, and audio fingerprinting:

  • Multi-layer canvas tests (anti-spoofing)
  • Font enumeration patterns
  • DOM feature detection
  • WebDriver presence analysis

🎯 Per-Session Anomaly Detection

ML-driven behavioral baselines:

  • Percentile-based thresholds (adaptive)
  • Deviation vector flagging
  • Fingerprint similarity clustering (MinHash)
  • Real-time anomaly scoring

πŸ” Privacy by Design

GDPR-compliant from the ground up:

  • Zero PII collection
  • Salted hash fingerprints
  • Explicit consent for EU visitors
  • Data Processing Agreement templates
  • 90-day automatic retention

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Browser   β”‚ ──> β”‚  Ingestion    β”‚ ──> β”‚  Enrichment & ML β”‚ ──> β”‚  ClickHouse   β”‚
β”‚  (JS SDK)  β”‚     β”‚  Gateway/API  β”‚     β”‚  Fingerprinting  β”‚     β”‚   Storage     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚                      β”‚
                             β–Ό                      β–Ό
                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                   β”‚ Session Cache  β”‚     β”‚  Analyst UI    β”‚
                   β”‚   (Redis)      β”‚     β”‚  Dashboard     β”‚
                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

  • Core Engine: Rust (TigerStyle compliant)
  • JavaScript SDK: TypeScript with bounded collections
  • Storage: ClickHouse (columnar analytics)
  • Session Cache: Redis (sub-millisecond lookups)
  • ML Pipeline: Percentile-based anomaly detection
  • Security: HMAC-SHA256 auth, TLS 1.3, nonce validation

πŸ“Š Performance Targets

Metric Target Status
Ingestion throughput 100k sessions/sec 🎯 Designed
Query latency (p99) < 100ms 🎯 Designed
Fingerprint generation < 5ms 🎯 Designed
Redis lookup < 1ms 🎯 Designed
Storage compression 10-20:1 ratio 🎯 Designed

πŸ›‘οΈ Security & Privacy

Security First

  • βœ… HMAC-SHA256 API authentication
  • βœ… Anti-replay protection (nonce validation)
  • βœ… Bounded collections (DoS prevention)
  • βœ… Rate limiting per IP and session
  • βœ… Security headers (HSTS, CSP, X-Frame-Options)
  • βœ… Graceful degradation (circuit breakers)

Privacy by Default

  • βœ… IP hashing (SHA-256 salted)
  • βœ… No PII collection
  • βœ… GDPR Article 6(1)(a) compliance
  • βœ… Explicit consent for EU visitors
  • βœ… Data Processing Agreement templates
  • βœ… Right to erasure (delete by fingerprint)
  • βœ… 90-day TTL with automatic cleanup

πŸ“š Documentation

This repository contains comprehensive RFC documentation (v0.2.0):

Additional Resources:


🎨 Design Philosophy: TigerStyle

Scrybe follows TigerStyle principles:

  1. Safety First - No panics, all errors via Result
  2. Simplicity - Clear over clever, explicit over implicit
  3. Correctness - Type-driven design, >90% test coverage
  4. Performance - Fast by default, profile before optimizing
  5. Minimal Dependencies - Each dependency justified

πŸ’° Cost Model

At 10,000 requests/second sustained:

Component Monthly Cost Optimization Potential
ClickHouse (90-day retention) $3,200 66% with 30-day retention
Redis (1-hour session cache) $1,200 Optimized
Data Transfer $270 90% with 10% sampling
Backups (S3) $700 -
Total $7,264/month $2,200/month (optimized)

πŸš€ Current Status

Version: v0.2.0 (RFC Phase)
Status: 🎯 Design Complete - Ready for Implementation

Completed

  • βœ… Complete RFC suite (7 documents)
  • βœ… Multi-disciplinary review (10 expert perspectives)
  • βœ… All critical blockers addressed
  • βœ… Security hardening (authentication, replay protection)
  • βœ… GDPR compliance (consent, DPA templates)
  • βœ… Production readiness (health checks, disaster recovery)

Next Steps

  • πŸ”¨ Phase 1: Core infrastructure (Weeks 1-2)
  • πŸ” Phase 2: Security features (Weeks 3-4)
  • πŸ§ͺ Phase 3: SDK & enrichment (Weeks 5-6)
  • πŸ’Ύ Phase 4: Storage & reliability (Weeks 7-8)
  • βœ… Phase 5: Testing & hardening (Weeks 9-10)

Timeline: 10 weeks to production-ready system


🀝 Contributing

This is a private repository. Contributions are welcome from authorized collaborators.

Development Principles

  • Follow TigerStyle guidelines
  • Maintain >90% test coverage
  • Document all public APIs
  • No unwrap() or panic!() in production code
  • Explicit error handling with context

πŸ“œ License

Private & Proprietary


πŸ¦‰ Philosophy

"The best defense is not to be invisible, but to be understood."

Scrybe doesn't just detect botsβ€”it studies them. Every fingerprint, every behavioral anomaly, every timing quirk becomes part of a living knowledge base. The system learns, adapts, and evolves.

Like its namesake suggests, Scrybe is both scribe (recorder of truth) and scrying (diviner of hidden meaning). It sees not just what browsers do, but what they are.


Built with Rust πŸ¦€ | Powered by Curiosity πŸ¦‰ | Guided by TigerStyle 🐯

About

High-fidelity, Rust-powered browser observation system for bot detection with forensic granularity. Multi-layer fingerprinting, behavioral modeling, and ML-driven anomaly detection.

Topics

Resources

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published