EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2.8k
Star 10.5k

Code
Issues 513
Pull requests 165
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

New pull request New

165 Open 1,660 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Math 500

#3381 opened Nov 1, 2025 by seldereyy

Loading…

[fix] crows_pairs dataset

#3378 opened Oct 31, 2025 by jannalulu

Loading…

[feat] add graphwalks

#3377 opened Oct 31, 2025 by jannalulu

Loading…

Fix descriptions in the Moral Stories and Histoires Morales tasks.

#3374 opened Oct 28, 2025 by upunaprosk

Loading…

Fix: Prevent infinite loop when max_seq_lengths < 4096 in prepare_niah.py

#3372 opened Oct 28, 2025 by vnayakde

Loading…

Add support for configurable chrF metric parameters in task YAML, fix…

#3363 opened Oct 23, 2025 by augustlakia

Loading…

fix trust_remote_code=True for longbench

#3361 opened Oct 22, 2025 by jannalulu

Loading…

Longbench group fix

#3359 opened Oct 22, 2025 by jannalulu

Loading…

Fix issue 3355 assertion error

#3356 opened Oct 20, 2025 by marksverdhei

Loading…

Add gsm_symbolic and gsm_symbolic_cot tasks

#3354 opened Oct 19, 2025 by MengAiDev

Loading…

[AIME24 | AIME25] Enable Multiple Generation Repeats with Pass@k and Majority@k Metrics

#3351 opened Oct 17, 2025 by ihebchaa

Loading…

fix(tasks):pin correct MMLUSR version

#3350 opened Oct 16, 2025 by christinaexyou

Loading…

added azure openai support

#3349 opened Oct 16, 2025 by zinccat

Loading…

Delegate BOS to the tokenizer; add_bos_token defaults to None

#3347 opened Oct 15, 2025 by baberabb

Loading…

Added ULQA benchmark

#3340 opened Oct 13, 2025 by keramjan

Loading…

Fix PIL image hashing to use actual bytes instead of object repr

#3331 opened Oct 7, 2025 by tboerstad

Loading…

feat: Add support for accelerate-wrapped models in simple_evaluate()

#3313 opened Sep 26, 2025 by DhruvaKashyap

Loading…

Add MATH500

#3311 opened Sep 26, 2025 by jannalulu

Loading…

Support empty response for Completions and ChatCompletions API

#3309 opened Sep 22, 2025 by tboerstad

Loading…

Adding New Task SLR-Bench : Scalable Logical Reasoning Benchmark

#3305 opened Sep 20, 2025 by Ahmad21Omar

Loading…

Support torchrun vllm DP

#3304 opened Sep 19, 2025 by luccafong

Loading…

Gemini evaluation support

#3300 opened Sep 15, 2025 by IsraelAbebe

Loading…

Fix lambada_multilingual_stablelm

#3294 opened Sep 11, 2025 by jmichaelov

Loading…

Adding SPaRC to lm eval harness

#3262 opened Aug 25, 2025 by lkaesberg

Loading…

Add long-context evaluation benchmarks (LongBench v2, Babilong, InfiniteBench, Phonebook)

#3256 opened Aug 21, 2025 by Mariani-code

Loading…

Previous 1 2 3 4 5 6 7 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-10-03.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!