This repository provides a framework for running evaluations, including OpenAI's SimpleQA evaluation. This code was used to evaluate the APIs in this You.com blogpost.
If you would like to reproduce the numbers or add new samplers, follow the instructions on how to install and run the code.
-
Clone this repository:
git clone https://github.com/youdotcom-oss/evals.git cd evals -
Install the required dependencies:
pip install -r requirements.txt pip install -e . -
Set up environment variables as environment variables or an .env file:
export OPENAI_API_KEY=your_openai_api_key export YOU_API_KEY=your_you_api_key export TAVILY_API_KEY=your_you_api_key export EXA_API_KEY=your_you_api_key export SERP_API_KEY=your_you_api_key
To run a SimpleQA evaluation, simply run the simpleqa_runner.py file with your desired arguments.
View available arguments and samplers
python src/simpleqa/simpleqa_runner.py --helpRun the SimpleQA evaluation on the entire problem set for all available samplers with default settings
python src/simpleqa/simpleqa_runner.pyRun the SimpleQA evaluation on just You.com for 5 random problems
python src/simpleqa/simpleqa_runner.py --samplers you --limit 5Results files will be placed in simpleqa/results after a successful run of SimpleQA. Files following the pattern
raw_results_{sampler}.csv are the raw results for each individual sampler. The file simpleqa_results.csv contains
aggregated results with various metrics useful for analysis.