Grade the Grader runbook
This runbook reads a configuration.yaml file, loads the referenced data files from the input folder, and then grades a model's responses to prompts that produce a grade when provided given and expected values. Grades are written to the output folder as yyyy.mm.ddThh:mm:ss-grades.json
When you run the utility you will specify folders to mount for /config
, /input
and /output
/📁 config
├── 📝 grade_config.yaml # Grade Configuration
/📁 input
│── 📁 grader # Simple LLM message list with grading prompts and keys
│ ├── ✏️ grader1.csv # grade prompts
│ ├── ✏️ grader_key.csv # grading keys (given, expected, min, max)
/📁 output
│ ├── 📀 yyyy-mm-ddThh:mm:ss-grades.json # Grades from running the evaluation
Adjust the command below to use appropriate values for your Echo Bot project, and add it to your pipenv scripts. See the [test_data] folder for sample files. Grades will be written to a file called {datetime}-grades.json
in the output folder when you run the tool.
docker run --rm /
-v ./my_model/input:/input
-v ./my_model/config:/config
-v ./my_model/output:/output
ghcr.io/agile-learning-institute/stage0-echo-grade:latest
Ensure the following tools are installed:
All testing uses config/input/output folders in ./test_data.
pipenv install
pipenv run grade
pipenv run debug
Runs locally with logging level set to DEBUG
See Gary.modelfile - from llama3.2:latest, turns the temperature all the way down to 0
pipenv run model
pipenv run build
pipenv run container