- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.8k
 
Pull requests: EleutherAI/lm-evaluation-harness
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
      Fix descriptions in the Moral Stories and Histoires Morales tasks.
      
    
        
          #3374
            opened Oct 28, 2025  by
            upunaprosk
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Fix: Prevent infinite loop when max_seq_lengths < 4096 in prepare_niah.py
      
    
      
  
        
          #3372
            opened Oct 28, 2025  by
            vnayakde
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Add support for configurable chrF metric parameters in task YAML, fix…
      
    
      
  
        
          #3363
            opened Oct 23, 2025  by
            augustlakia
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      [AIME24 | AIME25] Enable Multiple Generation Repeats with Pass@k and Majority@k Metrics
      
    
      
  
        
          #3351
            opened Oct 17, 2025  by
            ihebchaa
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Delegate BOS to the tokenizer; 
    
      
  add_bos_token defaults to None
      
        
          #3347
            opened Oct 15, 2025  by
            baberabb
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Fix PIL image hashing to use actual bytes instead of object repr
      
    
        
          #3331
            opened Oct 7, 2025  by
            tboerstad
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      feat: Add support for accelerate-wrapped models in simple_evaluate()
      
    
      
  
        
          #3313
            opened Sep 26, 2025  by
            DhruvaKashyap
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Support empty response for Completions and ChatCompletions API
      
    
      
  
        
          #3309
            opened Sep 22, 2025  by
            tboerstad
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Adding New Task SLR-Bench : Scalable Logical Reasoning Benchmark
      
    
      
  
        
          #3305
            opened Sep 20, 2025  by
            Ahmad21Omar
            
        
        
            
    
  
    Loading…
 
        
        
      
    
      Add long-context evaluation benchmarks (LongBench v2, Babilong, InfiniteBench, Phonebook)
      
    
      
  
        
          #3256
            opened Aug 21, 2025  by
            Mariani-code
            
        
        
            
    
  
    Loading…
 
        
        
      
    Previous Next
  
  
  ProTip!
  What’s not been updated in a month: updated:<2025-10-03.