Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluation×
Clear all

Showing 41-60 of 222 projects

jeinlee1991/chinese-llm-benchmark

A comprehensive benchmark for evaluating the capabilities of Chinese large language models (LLMs)

5.6K
Active
LLM Frameworks
LLM Evaluation
#ai-benchmarking#chinese-llms#model-evaluation

OpenBMB/ToolBench

An open platform for training, serving, and evaluating large language models for tool learning.

5.5K
Experimental
Python
LLM Frameworks
Agents & Orchestration
Python
#large-language-models#tool-learning#open-source

showlab/Awesome-Video-Diffusion

A curated collection of recent diffusion models for video generation, editing, and various other applications.

5.5K
Active
Computer Vision
AI Image & Video
#diffusion-models#video-generation#video-editing

coze-dev/coze-loop

Cozeloop is an open-source platform that provides full-lifecycle management for AI agent development, debugging, and monitoring.

5.3K
Active
Go
Agents & Orchestration
LLM Frameworks
Go
#agent#agent-observability#agent-optimization

AgentOps-AI/agentops

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more for vibe coders.

5.3K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#ai#agents#cost-tracking

killme2008/aviatorscript

AviatorScript is a high-performance scripting language hosted on the JVM for developers who want a powerful expression evaluator.

5.2K
Stable
Java
API Frameworks
CLI Tools
#expression-evaluator#scripting-language#jvm

Giskard-AI/giskard-oss

Open-source evaluation and testing library for LLM Agents

5.1K
Active
Python
LLM Frameworks
React
#evaluation#testing#LLM

rafaelpadilla/Object-Detection-Metrics

A Python library that provides the most popular metrics used to evaluate object detection algorithms.

5.1K
Experimental
Python
Computer Vision
API Frameworks
#object-detection#metrics#computer-vision

PacktPublishing/LLM-Engineers-Handbook

A practical guide for LLM engineers, covering fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices.

4.8K
Stable
Python
LLM Frameworks
ML Ops
AWS
#llm#mlops#aws

pytorch/ignite

A high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

4.7K
Active
Python
ML Ops
API Frameworks
PyTorch
#deep-learning#machine-learning#neural-network

Kiln-AI/Kiln

Build, Evaluate, and Optimize AI Systems

4.7K
Active
Python
AI Editors/Agents/Copilot
#AI#chain-of-thought#collaboration

lm-sys/RouteLLM

Framework for routing LLM requests to optimize costs while maintaining response quality

4.7K
Archived
Python
LLM Frameworks
AI Model Serving
Python
#llm-router#cost-optimization#inference-framework

datawhalechina/tiny-universe

An open-source guide for building a 'Tiny-Universe' of large language models and AI tools.

4.6K
Stable
Jupyter Notebook
LLM Frameworks
Agents & Orchestration
Jupyter Notebook
#large-language-models#diffusion#evaluation-metrics

openai/simple-evals

Simple-evals is a Python library for running OpenAI's model evaluation scripts.

4.4K
Experimental
Python
LLM Frameworks
API Frameworks
Python
#openai#llm#model-evaluation

facebook/duckling

A language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

4.3K
Stable
Haskell
API Frameworks
CLI Tools
Haskell
#language-processing#text-parsing#composable-rules

CLUEbenchmark/CLUE

CLUE is a comprehensive Chinese language understanding evaluation benchmark with datasets, baselines, pre-trained models, and a leaderboard.

4.2K
Stable
Python
LLM Frameworks
Datasets
PyTorch
#chinese#nlp#bert

princeton-vl/RAFT

RAFT is a PyTorch library for training and evaluating visual transformers, a popular AI model for computer vision tasks.

4.0K
Stable
Python
LLM Frameworks
PyTorch
#computer-vision#transformers#pytorch

modelscope/ClearerVoice-Studio

An open-source toolkit for speech processing, supporting enhancement, separation, and target speaker extraction.

4.0K
Stable
Python
AI Voice & Speech
PyTorch
#speech-enhancement#speech-separation#speaker-extraction

Knetic/govaluate

Govaluate is a Go library that allows arbitrary expression evaluation, useful for building dynamic, configurable applications.

3.9K
Experimental
Go
API Clients & Testing
API Frameworks
#expression-evaluation#parsing#go-lang

latitude-dev/latitude-llm

Latitude is an open-source platform for building, evaluating, and refining prompts for large language models.

3.9K
Active
TypeScript
LLM Frameworks
Prompt Engineering
TypeScript
#prompt-engineering#llm#ai-coding-tools
124...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.