Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluations×

Clear all

Showing 1-20 of 222 projects

lm-sys/FastChat

Open platform for training, serving, and evaluating LLM chatbots with Vicuna and Chatbot Arena

39.4K

Experimental

Python

LLM Frameworks

Inference

Python

#llm#vicuna#chatbot

huggingface/pytorch-image-models

A collection of PyTorch image encoders/backbones with training, evaluation, and inference scripts.

36.4K

Active

Python

LLM Frameworks

Full-Stack Frameworks

Next.js

#PyTorch#Image Models#Deep Learning

viraptor/reverse-interview

Reverse interview questions for job applicants to evaluate companies

28.5K

Archived

Interview Prep

#interview-questions#job-hunting#career-development

mlflow/mlflow

MLflow is an open-source platform for building, tracking, and deploying AI/ML models with end-to-end observability and evaluation tools.

24.6K

Active

Python

ML Ops

Agent Coordination

LangChain

#mlflow#ai-models#experiment-tracking

langfuse/langfuse

LLM engineering platform for observability, evaluation, and prompt management

22.7K

Active

TypeScript

LLM Frameworks

LLM Wrappers & SDKs

LangChain

#llm-observability#llm-evaluation#prompt-management

google/adk-python

An open-source Python toolkit for building, evaluating, and deploying sophisticated AI agents.

18.2K

Active

Python

Agents & Orchestration

#agents#agentic-ai#ai-agents

comet-ml/opik

A comprehensive library for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows.

18.0K

Active

Python

LLM Frameworks

Python

#llm-evaluation#llm-observability#llm-monitoring

openai/evals

A framework for evaluating large language models (LLMs) and an open-source registry of benchmarks.

17.9K

Stable

Python

LLM Frameworks

Python

#llm#evaluation#benchmarking

aFarkas/lazysizes

A high-performance and SEO-friendly lazy loader for images, iframes, and more that detects visibility changes without configuration.

17.7K

Archived

JavaScript

React

#lazyload#performance#responsive-images

raga-ai-hub/RagaAI-Catalyst

A Python SDK for agent AI observability, monitoring, and evaluation with features like tracing, debugging, and analytics.

16.1K

Active

Python

Agents & Orchestration

Python

#agentic-ai#ai-application-debugging#ai-evaluation-tools

josdejong/mathjs

An extensive math library for JavaScript and Node.js, providing a wide range of mathematical functions and capabilities.

15.0K

Active

JavaScript

Utilities & Libraries

Node.js

#math#javascript#bignumbers

confident-ai/deepeval

A Python framework for evaluating and benchmarking large language models (LLMs) and their capabilities.

13.9K

Active

Python

LLM Frameworks

Python

#llm-evaluation#benchmarking#python-framework

Tencent/WeKnora

A Go-based framework for deep document understanding, semantic retrieval, and context-aware question answering using the RAG paradigm.

13.3K

Active

LLM Frameworks

#llm#question-answering#semantic-search

trycua/cua

Open-source infrastructure for AI agents that can control full desktops (macOS, Linux, Windows).

12.9K

Active

Python

Agents & Orchestration

Python

#agent#ai-agent#desktop-automation

vibrantlabsai/ragas

A Python library that helps supercharge the evaluation of large language model applications.

12.8K

Active

Python

LLM Frameworks

Python

#llm#evaluation#testing

ShishirPatil/gorilla

Gorilla is a Python tool for training and evaluating large language models (LLMs) for API/function calls.

12.7K

Active

Python

LLM Frameworks

#llm#api#chatgpt

EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of language models, useful for vibe coders working with AI tools.

11.6K

Active

Python

LLM Frameworks

Python

#language-model#evaluation-framework#transformer

dataelement/bisheng

An open LLM devops platform for building next-gen enterprise AI applications with powerful features like GenAI workflow, RAG, Agent, and model management.

11.1K

Active

TypeScript

LLM Frameworks

React

#ai#llm#genai

tensorzero/tensorzero

Open-source stack for industrial-grade LLM applications, including LLM gateway, observability, optimization, evaluation, and experimentation.

11.0K

Active

Rust

LLM Frameworks

Rust

#ai#large-language-models#llms

mrgloom/awesome-semantic-segmentation

Semantic segmentation benchmark and evaluation framework for AI-powered developers

10.8K

Archived

React

#semantic-segmentation#deeplearning#evaluation

2...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.