Explore Projects

Discover 230 open source projects

Active filters (1):
Search: benchmarking×
Clear all

Showing 61-80 of 230 projects

microsoft/PhiCookBook

An open-source cookbook for getting started with Phi, a family of high-performance small language models from Microsoft.

3.7K
Active
Jupyter Notebook
LLM Frameworks
Books & Guides
#language-model#phi-models#small-language-model

devMEremenko/XcodeBenchmark

XcodeBenchmark measures the compilation time of a large codebase on different Mac devices to help developers optimize their build times.

3.6K
Stable
Swift
CLI Tools
Build Tools
Swift
#benchmark#xcode#swift

sergiomarotco/Network-segmentation-cheat-sheet

This repository provides best practices for segmenting corporate networks for improved security.

3.4K
Experimental
Network Security
#firewall-segmentation#network-isolation#network-security

open-mmlab/mmyolo

An open-source toolbox for implementing state-of-the-art object detection models like YOLOv5, YOLOv6, YOLOv7, and RTMDet.

3.4K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#object-detection#yolo#rtmdet

minitest/minitest

minitest provides a complete suite of testing facilities supporting TDD, BDD, and benchmarking for Ruby developers.

3.4K
Active
Ruby
Testing
Ruby
#testing#ruby#tdd

google/BIG-bench

A collaborative benchmark for measuring and extrapolating the capabilities of language models.

3.2K
Archived
Python
LLM Frameworks
#language-models#benchmarking#evaluation

THUDM/AgentBench

A comprehensive benchmark to evaluate large language models (LLMs) as agents for various tasks.

3.2K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#chatgpt#gpt-4#llm

embeddings-benchmark/mteb

MTEB is a benchmark for evaluating and comparing text embedding models across multiple tasks and languages.

3.2K
Active
Python
LLM Wrappers & SDKs
Search
Python
#benchmark#text-embedding#multilingual-nlp

Tencent/AI-Infra-Guard

A comprehensive AI Red Teaming platform for security researchers and developers.

3.0K
Active
Python
LLM Frameworks
Security Research
#ai-security#red-teaming#security-tools

phoronix-test-suite/phoronix-test-suite

An open-source, cross-platform automated testing and benchmarking software for developers.

3.0K
Active
PHP
Testing
API Frameworks
#benchmark#benchmarking#linux

FreedomIntelligence/LLMZoo

Provides data, models, and evaluation benchmark for large language models.

3.0K
Archived
Python
React
#LLM#Large Language Models#Benchmarking

baichuan-inc/Baichuan-13B

A large language model developed by Baichuan Intelligent Technology for AI-powered applications and research.

2.9K
Archived
Python
LLM Frameworks
LLM Wrappers & SDKs
Hugging Face
#large-language-model#chinese#gpt-4

PlummersSoftwareLLC/Primes

A collection of prime number projects in over 100 programming languages to compare their speed and cleverness.

2.9K
Stable
C
Benchmark
Tutorials & Courses
#benchmark#primes#programming-languages

kostya/benchmarks

A set of language benchmarks to compare performance across different programming languages.

2.9K
Active
Makefile
CLI Tools
API Frameworks
#benchmarks#performance#languages

EZLippi/WebBench

A simple and lightweight web benchmarking tool for Linux, useful for testing website performance under load.

2.8K
Archived
C
Backend & APIs
CLI Tools
#performance-testing#web-benchmarking#linux-tools

microsoft/promptbench

A unified evaluation framework for large language models, focused on prompt engineering and model robustness.

2.8K
Active
Python
LLM Frameworks
Testing
Python
#large-language-models#prompt-engineering#evaluation

FranxYao/chain-of-thought-hub

This repository provides a benchmark for evaluating the complex reasoning ability of large language models using chain-of-thought prompting.

2.8K
Archived
Jupyter Notebook
LLM Frameworks
Tutorials & Courses
Jupyter Notebook
#llm#benchmarking#reasoning

soumith/convnet-benchmarks

Easy benchmarking of publicly available convolutional neural network implementations.

2.7K
Archived
Python
Computer Vision
PyTorch
#benchmarking#neural-networks#computer-vision

xlang-ai/OSWorld

A benchmark for multimodal AI agents to tackle open-ended tasks in real computer environments.

2.6K
Active
Python
Agents & Orchestration
Benchmark
Python
#multimodal-ai#agent-benchmarking#open-ended-tasks

haosulab/ManiSkill

An open-source GPU-accelerated robotics simulator and benchmark for manipulation skill learning.

2.6K
Active
Python
Robotics
Computer Vision
Python
#robotics-simulation#computer-vision#reinforcement-learning
1...35...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.