Skip to content

Conversation

cristiand391
Copy link
Member

@cristiand391 cristiand391 commented Oct 2, 2025

What does this PR do?

Adds tool prediction scorer based on Sentry's one with some updates:

  • updated sys prompt to avoid over-indexing on eval expectations (was causing too much false positives with score 1)
  • enforce expected tools are available to influence final scoring (will be blocked in code later)
  • scorer updated to send the full tool metadata (Sentry's one sends tool name + 1st description line) to enhance scoring based on parameters and examples availables

test failure example:
Screenshot 2025-10-02 at 17 25 46

What issues does this PR fix or reference?

@cristiand391 cristiand391 changed the title test: add tool predicion scorer for light evals test: add tool prediction scorer for light evals Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants