benchmark

Here are 1,218 public repositories matching this topic...

zalandoresearch / fashion-mnist

A MNIST-like fashion product database. Benchmark 👇

benchmark machine-learning computer-vision deep-learning fashion dataset gan mnist convolutional-neural-networks zalando fashion-mnist

Updated Jun 13, 2022
Python

open-mmlab / mmpose

Star

OpenMMLab Pose Estimation Toolbox and Benchmark.

Updated Aug 7, 2024
Python

open-compass / opencompass

Star

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

benchmark evaluation openai llm chatgpt large-language-model llama2 llama3

Updated Apr 27, 2025
Python

erikbern / ann-benchmarks

Star

Benchmarks of approximate nearest neighbor libraries in Python

docker benchmark nearest-neighbors

Updated Apr 15, 2025
Python

open-mmlab / mmaction2

Star

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

benchmark deep-learning pytorch ava x3d action-recognition video-understanding video-classification tsm non-local i3d tsn slowfast temporal-action-localization spatial-temporal-action-detection openmmlab posec3d uniformerv2

Updated Aug 14, 2024
Python

baichuan-inc / Baichuan2

Star

A series of large language models developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese gpt huggingface ceval gpt-4 large-language-models chatgpt mmlu llama2

Updated Nov 8, 2024
Python

CLUEbenchmark / CLUE

Star

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

benchmark tensorflow nlu glue corpus transformers pytorch dataset chinese pretrained-models language-model albert bert roberta chineseglue

Updated May 23, 2024
Python

MichaelGrupp / evo

Star

Python package for the evaluation of odometry and SLAM

benchmark robotics tum mapping metrics evaluation ros slam trajectory-analysis odometry trajectory ros2 kitti euroc trajectory-evaluation

Updated Mar 20, 2025
Python

baichuan-inc / Baichuan-13B

Star

A 13B large language model developed by Baichuan Intelligent Technology

benchmark natural-language-processing artificial-intelligence chinese huggingface ceval gpt-4 large-language-models chatgpt mmlu

Updated Sep 6, 2023
Python

SWE-bench / SWE-bench

Star

SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

benchmark software-engineering language-model

Updated Apr 22, 2025
Python

microsoft / promptbench

Star

A unified evaluation framework for large language models

benchmark evaluation prompt robustness adversarial-attacks large-language-models prompt-engineering chatgpt

Updated Apr 21, 2025
Python

embeddings-benchmark / mteb

Star

MTEB: Massive Text Embedding Benchmark

benchmark information-retrieval retrieval text-classification clustering sts semantic-search reranking text-embedding sgpt neural-search sentence-transformers sbert multilingual-nlp bitext-mining mteb

Updated Apr 28, 2025
Python

RUC-NLPIR / FlashRAG

Star

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

benchmark datasets large-language-models retrieval-augmented-generation

Updated Apr 25, 2025
Python

OpenGVLab / InternVideo

Star

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Updated Apr 25, 2025
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Apr 21, 2025
Python

beir-cellar / beir

Star

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

nlp elasticsearch benchmark information-retrieval deep-learning retrieval pytorch dataset bert dpr passage-retrieval question-generation rag sentence-transformers sbert zero-shot-retrieval colbert retrieval-models llm

Updated Feb 25, 2025
Python

logpai / logparser

Star

A machine learning toolkit for log parsing [ICSE'19, DSN'16]

benchmark log-analysis log log-parser log-mining anomaly-detection log-parsing

Updated Apr 19, 2025
Python

mlcommons / training

Star

Reference implementations of MLPerf™ training benchmarks

benchmark machine-learning

Updated Apr 9, 2025
Python

IntelLabs / fastRAG

Star

Efficient Retrieval Augmentation and Generation Framework

nlp benchmark information-retrieval transformers knowledge-graph question-answering summarization multi-modal semantic-search diffusion sentence-transformers colbert llm generative-ai

Updated Jan 9, 2025
Python

evalplus / evalplus

Star

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

testing benchmark efficiency program-synthesis gpt-4 large-language-models chatgpt

Updated Apr 2, 2025
Python

Improve this page

Add a description, image, and links to the benchmark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the benchmark topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark

Here are 1,218 public repositories matching this topic...

zalandoresearch / fashion-mnist

open-mmlab / mmpose

open-compass / opencompass

erikbern / ann-benchmarks

open-mmlab / mmaction2

baichuan-inc / Baichuan2

CLUEbenchmark / CLUE

MichaelGrupp / evo

baichuan-inc / Baichuan-13B

SWE-bench / SWE-bench

microsoft / promptbench

embeddings-benchmark / mteb

RUC-NLPIR / FlashRAG

OpenGVLab / InternVideo

xlang-ai / OSWorld

beir-cellar / beir

logpai / logparser

mlcommons / training

IntelLabs / fastRAG

evalplus / evalplus

Improve this page

Add this topic to your repo