Explore Projects

Discover 19 open source projects

Active filters (1):
Search: llm-evaluation×
Clear all

Showing 1-19 of 19 projects

mlflow/mlflow

MLflow is an open-source platform for building, tracking, and deploying AI/ML models with end-to-end observability and evaluation tools.

24.6K
Active
Python
ML Ops
Agent Coordination
LangChain
#mlflow#ai-models#experiment-tracking

langfuse/langfuse

LLM engineering platform for observability, evaluation, and prompt management

22.7K
Active
TypeScript
LLM Frameworks
LLM Wrappers & SDKs
LangChain
#llm-observability#llm-evaluation#prompt-management

comet-ml/opik

A comprehensive library for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows.

18.0K
Active
Python
LLM Frameworks
Python
#llm-evaluation#llm-observability#llm-monitoring

confident-ai/deepeval

A Python framework for evaluating and benchmarking large language models (LLMs) and their capabilities.

13.9K
Active
Python
LLM Frameworks
Python
#llm-evaluation#benchmarking#python-framework

promptfoo/promptfoo

A framework for testing and evaluating large language models, prompts, and AI agents for security and performance.

10.8K
Active
TypeScript
LLM Frameworks
TypeScript
#llm-evaluation#prompt-engineering#red-teaming

Arize-ai/phoenix

AI observability and evaluation tooling for developers building with large language models and AI agents.

8.8K
Active
Jupyter Notebook
LLM Frameworks
Agents & Orchestration
Jupyter Notebook
#ai-monitoring#ai-observability#llm-evaluation

NVIDIA/garak

The LLM vulnerability scanner, a Python-based tool for identifying security vulnerabilities in large language models.

7.1K
Active
Python
LLM Frameworks
Security Research
#llm-security#vulnerability-assessment#security-scanning

jeinlee1991/chinese-llm-benchmark

A comprehensive benchmark for evaluating the capabilities of Chinese large language models (LLMs)

5.6K
Active
LLM Frameworks
LLM Evaluation
#ai-benchmarking#chinese-llms#model-evaluation

Giskard-AI/giskard-oss

Open-source evaluation and testing library for LLM Agents

5.1K
Active
Python
LLM Frameworks
React
#evaluation#testing#LLM

PacktPublishing/LLM-Engineers-Handbook

A practical guide for LLM engineers, covering fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices.

4.8K
Stable
Python
LLM Frameworks
ML Ops
AWS
#llm#mlops#aws

Agenta-AI/agenta

An open-source LLMOps platform for prompt playground, prompt management, LLM evaluation, and LLM observability.

3.9K
Active
TypeScript
LLM Frameworks
LLM Wrappers & SDKs
React
#llm-platform#prompt-engineering#prompt-management

lmnr-ai/lmnr

Laminar is an open-source observability platform purpose-built for AI agents and workflows.

2.7K
Active
TypeScript
Agents & Orchestration
LLM Observability
TypeScript
#ai#observability#llm

genieincodebottle/generative-ai

Comprehensive resources for developers working with Generative AI, including projects, use cases, and interview prep.

1.9K
Active
Jupyter Notebook
LLM Frameworks
Tutorials & Courses
Jupyter Notebook
#generative-ai#llm#ai-coding

msoedov/agentic_security

An AI-powered security toolkit for LLM vulnerability scanning and red teaming.

1.8K
Active
Python
LLM Frameworks
Security Research
Python
#llm-security#llm-vulnerability-scanner#llm-fuzzing

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

1.6K
Stable
TypeScript
LLM Frameworks
AI SDKs & Wrappers
TypeScript
#ai#llms#nocode

cyberark/FuzzyAI

A powerful tool for automated LLM fuzzing to help developers and security researchers identify and mitigate potential jailbreaks.

1.2K
Stable
Jupyter Notebook
LLM Frameworks
Security Research
Jupyter Notebook
#ai#fuzzing#jailbreak

microsoft/prompty

Prompty is a Python library that makes it easy to create, manage, debug, and evaluate LLM prompts for AI applications.

1.2K
Active
Python
LLM Frameworks
Prompt Engineering
Python
#generative-ai#llm-evaluation#prompt-engineering

cvs-health/uqlm

A Python package for uncertainty quantification and hallucination detection in large language models (LLMs)

1.1K
Active
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#ai-safety#confidence-estimation#hallucination-detection

JudgmentLabs/judgeval

An open-source post-building layer for AI agents, providing environment data and evaluations to power agent post-training and monitoring.

1.0K
Active
Python
Agents & Orchestration
LLM Frameworks
Python
#agent#agentic-ai#llm-evaluation

Stay in the loop

Get weekly updates on trending AI coding tools and projects.