Skip to content

feat: Implement CursorReplayStrategy with Visual Feedback and Self-Correction #952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TanCodeX
Copy link

/claim #760

What kind of change does this PR introduce?

Feature: New cursor replay strategy with visual feedback and self-correction

Summary

This PR addresses #760 by introducing a new cursor replay strategy that improves targeting accuracy using visual feedback and AI-powered self-correction.

Key Features:

  • Red dot visual feedback system for suggested target points
  • AI-powered accuracy analysis via OpenAI models
  • Self-correction mechanism based on visual feedback
  • Grid-based movement with recursive refinement for higher precision
  • Robust testing framework to measure accuracy, actions, and performance

This strategy sets the groundwork for improving OpenAdapt’s cursor control system in complex screen environments.

Checklist

  • My code follows OpenAdapt's style guidelines
  • Follows PEP 8
  • Uses consistent naming conventions
  • Maintains existing project structure
  • Self-reviewed my code
  • Verified edge cases
  • Validated parameter types
  • Checked error handling
  • Added tests
  • test_grid.py evaluates grid strategy
  • Metrics for accuracy, actions, and time
  • Test cases for various screen regions
  • Linted code
  • Used flake8 for Python linting
  • Fixed all issues
  • Removed unused imports
  • Commented the code
  • Explained AI logic
  • Documented grid algorithm
  • Clarified self-correction behavior
  • Updated documentation
  • Added docstrings for all methods/classes
  • Updated requirements.txt
  • Included usage examples in comments
  • All new and existing tests pass locally
  • Visual feedback tests
  • Grid strategy accuracy checks
  • OpenAI API integration tests

How can your code be run and tested?

  1. Install dependencies:
pip install -r requirements.txt
  1. Run the grid evaluation:
python -m experiments.cursor.test_grid

Example Output:

Grid Strategy Evaluation Results:
---------------------------------
Total test cases: 45
Average distance error: 5.2 pixels
Average actions per target: 4.3
Average time per target: 0.82 seconds

Results by grid size:
Grid size: 2x2
  Average error: 8.4 pixels
  Average actions: 3.0
  Average time: 0.65 seconds

Grid size: 4x4
  Average error: 4.2 pixels
  Average actions: 4.5
  Average time: 0.85 seconds

Grid size: 8x8
  Average error: 3.1 pixels
  Average actions: 5.5
  Average time: 0.96 seconds
  1. Test specific components:
from openadapt.strategies.cursor import CursorReplayStrategy
from experiments.cursor.grid import GridCursorStrategy

# Visual feedback
strategy = CursorReplayStrategy(recording)
img_with_dot = strategy.paint_dot(screenshot, x=100, y=100)

# Grid approach
grid_strategy = GridCursorStrategy(recording, grid_size=(4, 4))
action = grid_strategy.get_next_action_event(screenshot, window_event)

Dependencies:

  • opencv-python for visual processing
  • numpy for grid calculations
  • openai for visual feedback evaluation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant