Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluatorร—
Clear all

Showing 1-20 of 222 projects

lm-sys/FastChat

Open platform for training, serving, and evaluating LLM chatbots with Vicuna and Chatbot Arena

39.4K
Experimental
Python
LLM Frameworks
Inference
Python
#llm#vicuna#chatbot

huggingface/pytorch-image-models

A collection of PyTorch image encoders/backbones with training, evaluation, and inference scripts.

36.4K
Active
Python
LLM Frameworks
Full-Stack Frameworks
Next.js
#PyTorch#Image Models#Deep Learning

viraptor/reverse-interview

Reverse interview questions for job applicants to evaluate companies

28.5K
Archived
Interview Prep
#interview-questions#job-hunting#career-development

mlflow/mlflow

MLflow is an open-source platform for building, tracking, and deploying AI/ML models with end-to-end observability and evaluation tools.

24.6K
Active
Python
ML Ops
Agent Coordination
LangChain
#mlflow#ai-models#experiment-tracking

langfuse/langfuse

LLM engineering platform for observability, evaluation, and prompt management

22.7K
Active
TypeScript
LLM Frameworks
LLM Wrappers & SDKs
LangChain
#llm-observability#llm-evaluation#prompt-management

google/adk-python

An open-source Python toolkit for building, evaluating, and deploying sophisticated AI agents.

18.2K
Active
Python
Agents & Orchestration
#agents#agentic-ai#ai-agents

comet-ml/opik

A comprehensive library for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows.

18.0K
Active
Python
LLM Frameworks
Python
#llm-evaluation#llm-observability#llm-monitoring

openai/evals

A framework for evaluating large language models (LLMs) and an open-source registry of benchmarks.

17.9K
Stable
Python
LLM Frameworks
Python
#llm#evaluation#benchmarking

aFarkas/lazysizes

A high-performance and SEO-friendly lazy loader for images, iframes, and more that detects visibility changes without configuration.

17.7K
Archived
JavaScript
React
#lazyload#performance#responsive-images

raga-ai-hub/RagaAI-Catalyst

A Python SDK for agent AI observability, monitoring, and evaluation with features like tracing, debugging, and analytics.

16.1K
Active
Python
Agents & Orchestration
Python
#agentic-ai#ai-application-debugging#ai-evaluation-tools

josdejong/mathjs

An extensive math library for JavaScript and Node.js, providing a wide range of mathematical functions and capabilities.

15.0K
Active
JavaScript
Utilities & Libraries
Node.js
#math#javascript#bignumbers

confident-ai/deepeval

A Python framework for evaluating and benchmarking large language models (LLMs) and their capabilities.

13.9K
Active
Python
LLM Frameworks
Python
#llm-evaluation#benchmarking#python-framework

Tencent/WeKnora

A Go-based framework for deep document understanding, semantic retrieval, and context-aware question answering using the RAG paradigm.

13.3K
Active
Go
LLM Frameworks
#llm#question-answering#semantic-search

trycua/cua

Open-source infrastructure for AI agents that can control full desktops (macOS, Linux, Windows).

12.9K
Active
Python
Agents & Orchestration
Python
#agent#ai-agent#desktop-automation

vibrantlabsai/ragas

A Python library that helps supercharge the evaluation of large language model applications.

12.8K
Active
Python
LLM Frameworks
Python
#llm#evaluation#testing

ShishirPatil/gorilla

Gorilla is a Python tool for training and evaluating large language models (LLMs) for API/function calls.

12.7K
Active
Python
LLM Frameworks
#llm#api#chatgpt

EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of language models, useful for vibe coders working with AI tools.

11.6K
Active
Python
LLM Frameworks
Python
#language-model#evaluation-framework#transformer

dataelement/bisheng

An open LLM devops platform for building next-gen enterprise AI applications with powerful features like GenAI workflow, RAG, Agent, and model management.

11.1K
Active
TypeScript
LLM Frameworks
React
#ai#llm#genai

tensorzero/tensorzero

Open-source stack for industrial-grade LLM applications, including LLM gateway, observability, optimization, evaluation, and experimentation.

11.0K
Active
Rust
LLM Frameworks
Rust
#ai#large-language-models#llms

mrgloom/awesome-semantic-segmentation

Semantic segmentation benchmark and evaluation framework for AI-powered developers

10.8K
Archived
React
#semantic-segmentation#deeplearning#evaluation
2...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.