-
Notifications
You must be signed in to change notification settings - Fork 2
Stanett-77: Local LLM evaluation results #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
evaluation_results/version_2/open_models/configs/dev_open_models+retrieval.yaml
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
@Aleksis99 The file Qwen3-Coder-30B-A3B-Instruct_local/chat_responses_dev.jsonl contains only likes such as Also, I think the results are not directly comparable, because for some experiments we have error samples
Also, there is a discrepancy between the models from the evaluation results table https://github.com/statnett/Talk2PowerSystem/wiki/Evaluation-Results#open-source-llms compared to the folder names, i.e. we have a folder named Table name -> Folder name
|
|
@nelly-hateva, so we need to either rerun the tests or at least recompute the values including error samples as failures? |
Given that we keep |
|
@Aleksis99 Please, update the results with the new version of the library and the table with the results here https://github.com/statnett/Talk2PowerSystem/wiki/Evaluation-Results#open-source-llms |
No description provided.