Explore Projects

Discover 230 open source projects

Active filters (1):

Search: benchmarking×

Showing 61-80 of 230 projects

microsoft/PhiCookBook

An open-source cookbook for getting started with Phi, a family of high-performance small language models from Microsoft.

3.7K

Active

Jupyter Notebook

LLM Frameworks

Books & Guides

#language-model#phi-models#small-language-model

devMEremenko/XcodeBenchmark

XcodeBenchmark measures the compilation time of a large codebase on different Mac devices to help developers optimize their build times.

3.6K

Stable

Swift

CLI Tools

Build Tools

Swift

#benchmark#xcode#swift

sergiomarotco/Network-segmentation-cheat-sheet

This repository provides best practices for segmenting corporate networks for improved security.

3.4K

Experimental

Network Security

#firewall-segmentation#network-isolation#network-security

open-mmlab/mmyolo

An open-source toolbox for implementing state-of-the-art object detection models like YOLOv5, YOLOv6, YOLOv7, and RTMDet.

3.4K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#object-detection#yolo#rtmdet

minitest/minitest

minitest provides a complete suite of testing facilities supporting TDD, BDD, and benchmarking for Ruby developers.

3.4K

Active

Ruby

Testing

Ruby

#testing#ruby#tdd

google/BIG-bench

A collaborative benchmark for measuring and extrapolating the capabilities of language models.

3.2K

Archived

Python

LLM Frameworks

#language-models#benchmarking#evaluation

THUDM/AgentBench

A comprehensive benchmark to evaluate large language models (LLMs) as agents for various tasks.

3.2K

Stable

Python

LLM Frameworks

Agents & Orchestration

Python

#chatgpt#gpt-4#llm

embeddings-benchmark/mteb

MTEB is a benchmark for evaluating and comparing text embedding models across multiple tasks and languages.

3.2K

Active

Python

LLM Wrappers & SDKs

Python

#benchmark#text-embedding#multilingual-nlp

Tencent/AI-Infra-Guard

A comprehensive AI Red Teaming platform for security researchers and developers.

3.0K

Active

Python

LLM Frameworks

Security Research

#ai-security#red-teaming#security-tools

phoronix-test-suite/phoronix-test-suite

An open-source, cross-platform automated testing and benchmarking software for developers.

3.0K

Active

PHP

Testing

API Frameworks

#benchmark#benchmarking#linux

FreedomIntelligence/LLMZoo

Provides data, models, and evaluation benchmark for large language models.

3.0K

Archived

Python

React

#LLM#Large Language Models#Benchmarking

baichuan-inc/Baichuan-13B

A large language model developed by Baichuan Intelligent Technology for AI-powered applications and research.

2.9K

Archived

Python

LLM Frameworks

LLM Wrappers & SDKs

Hugging Face

#large-language-model#chinese#gpt-4

PlummersSoftwareLLC/Primes

A collection of prime number projects in over 100 programming languages to compare their speed and cleverness.

2.9K

Stable

Benchmark

Tutorials & Courses

#benchmark#primes#programming-languages

kostya/benchmarks

A set of language benchmarks to compare performance across different programming languages.

2.9K

Active

Makefile

CLI Tools

API Frameworks

#benchmarks#performance#languages

EZLippi/WebBench

A simple and lightweight web benchmarking tool for Linux, useful for testing website performance under load.

2.8K

Archived

Backend & APIs

CLI Tools

#performance-testing#web-benchmarking#linux-tools

microsoft/promptbench

A unified evaluation framework for large language models, focused on prompt engineering and model robustness.

2.8K

Active

Python

LLM Frameworks

Testing

Python

#large-language-models#prompt-engineering#evaluation

FranxYao/chain-of-thought-hub

This repository provides a benchmark for evaluating the complex reasoning ability of large language models using chain-of-thought prompting.

2.8K

Archived

Jupyter Notebook

LLM Frameworks

Tutorials & Courses

Jupyter Notebook

#llm#benchmarking#reasoning

soumith/convnet-benchmarks

Easy benchmarking of publicly available convolutional neural network implementations.

2.7K

Archived

Python

Computer Vision

PyTorch

#benchmarking#neural-networks#computer-vision

xlang-ai/OSWorld

A benchmark for multimodal AI agents to tackle open-ended tasks in real computer environments.

2.6K

Active

Python

Agents & Orchestration

Benchmark

Python

#multimodal-ai#agent-benchmarking#open-ended-tasks

haosulab/ManiSkill

An open-source GPU-accelerated robotics simulator and benchmark for manipulation skill learning.

2.6K

Active

Python

Robotics

Computer Vision

Python

#robotics-simulation#computer-vision#reinforcement-learning

1...35...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.