Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluation×

Clear all

Showing 41-60 of 222 projects

jeinlee1991/chinese-llm-benchmark

A comprehensive benchmark for evaluating the capabilities of Chinese large language models (LLMs)

5.6K

Active

LLM Frameworks

LLM Evaluation

#ai-benchmarking#chinese-llms#model-evaluation

OpenBMB/ToolBench

An open platform for training, serving, and evaluating large language models for tool learning.

5.5K

Experimental

Python

LLM Frameworks

Agents & Orchestration

Python

#large-language-models#tool-learning#open-source

showlab/Awesome-Video-Diffusion

A curated collection of recent diffusion models for video generation, editing, and various other applications.

5.5K

Active

Computer Vision

AI Image & Video

#diffusion-models#video-generation#video-editing

coze-dev/coze-loop

Cozeloop is an open-source platform that provides full-lifecycle management for AI agent development, debugging, and monitoring.

5.3K

Active

Agents & Orchestration

LLM Frameworks

#agent#agent-observability#agent-optimization

AgentOps-AI/agentops

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more for vibe coders.

5.3K

Stable

Python

LLM Frameworks

Agents & Orchestration

Python

#ai#agents#cost-tracking

killme2008/aviatorscript

AviatorScript is a high-performance scripting language hosted on the JVM for developers who want a powerful expression evaluator.

5.2K

Stable

Java

API Frameworks

CLI Tools

#expression-evaluator#scripting-language#jvm

Giskard-AI/giskard-oss

Open-source evaluation and testing library for LLM Agents

5.1K

Active

Python

LLM Frameworks

React

#evaluation#testing#LLM

rafaelpadilla/Object-Detection-Metrics

A Python library that provides the most popular metrics used to evaluate object detection algorithms.

5.1K

Experimental

Python

Computer Vision

API Frameworks

#object-detection#metrics#computer-vision

PacktPublishing/LLM-Engineers-Handbook

A practical guide for LLM engineers, covering fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices.

4.8K

Stable

Python

LLM Frameworks

ML Ops

AWS

#llm#mlops#aws

pytorch/ignite

A high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

4.7K

Active

Python

ML Ops

API Frameworks

PyTorch

#deep-learning#machine-learning#neural-network

Kiln-AI/Kiln

Build, Evaluate, and Optimize AI Systems

4.7K

Active

Python

AI Editors/Agents/Copilot

#AI#chain-of-thought#collaboration

lm-sys/RouteLLM

Framework for routing LLM requests to optimize costs while maintaining response quality

4.7K

Archived

Python

LLM Frameworks

AI Model Serving

Python

#llm-router#cost-optimization#inference-framework

datawhalechina/tiny-universe

An open-source guide for building a 'Tiny-Universe' of large language models and AI tools.

4.6K

Stable

Jupyter Notebook

LLM Frameworks

Agents & Orchestration

Jupyter Notebook

#large-language-models#diffusion#evaluation-metrics

openai/simple-evals

Simple-evals is a Python library for running OpenAI's model evaluation scripts.

4.4K

Experimental

Python

LLM Frameworks

API Frameworks

Python

#openai#llm#model-evaluation

facebook/duckling

A language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

4.3K

Stable

Haskell

API Frameworks

CLI Tools

Haskell

#language-processing#text-parsing#composable-rules

CLUEbenchmark/CLUE

CLUE is a comprehensive Chinese language understanding evaluation benchmark with datasets, baselines, pre-trained models, and a leaderboard.

4.2K

Stable

Python

LLM Frameworks

Datasets

PyTorch

#chinese#nlp#bert

princeton-vl/RAFT

RAFT is a PyTorch library for training and evaluating visual transformers, a popular AI model for computer vision tasks.

4.0K

Stable

Python

LLM Frameworks

PyTorch

#computer-vision#transformers#pytorch

modelscope/ClearerVoice-Studio

An open-source toolkit for speech processing, supporting enhancement, separation, and target speaker extraction.

4.0K

Stable

Python

AI Voice & Speech

PyTorch

#speech-enhancement#speech-separation#speaker-extraction

Knetic/govaluate

Govaluate is a Go library that allows arbitrary expression evaluation, useful for building dynamic, configurable applications.

3.9K

Experimental

API Clients & Testing

API Frameworks

#expression-evaluation#parsing#go-lang

latitude-dev/latitude-llm

Latitude is an open-source platform for building, evaluating, and refining prompts for large language models.

3.9K

Active

TypeScript

LLM Frameworks

Prompt Engineering

TypeScript

#prompt-engineering#llm#ai-coding-tools

1 24...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.