Explore Projects

Discover 19 open source projects

Active filters (1):

Search: llm-evaluation×

Showing 1-19 of 19 projects

mlflow/mlflow

MLflow is an open-source platform for building, tracking, and deploying AI/ML models with end-to-end observability and evaluation tools.

24.6K

Active

Python

ML Ops

Agent Coordination

LangChain

#mlflow#ai-models#experiment-tracking

langfuse/langfuse

LLM engineering platform for observability, evaluation, and prompt management

22.7K

Active

TypeScript

LLM Frameworks

LLM Wrappers & SDKs

LangChain

#llm-observability#llm-evaluation#prompt-management

comet-ml/opik

A comprehensive library for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows.

18.0K

Active

Python

LLM Frameworks

Python

#llm-evaluation#llm-observability#llm-monitoring

confident-ai/deepeval

A Python framework for evaluating and benchmarking large language models (LLMs) and their capabilities.

13.9K

Active

Python

LLM Frameworks

Python

#llm-evaluation#benchmarking#python-framework

promptfoo/promptfoo

A framework for testing and evaluating large language models, prompts, and AI agents for security and performance.

10.8K

Active

TypeScript

LLM Frameworks

TypeScript

#llm-evaluation#prompt-engineering#red-teaming

Arize-ai/phoenix

AI observability and evaluation tooling for developers building with large language models and AI agents.

8.8K

Active

Jupyter Notebook

LLM Frameworks

Agents & Orchestration

Jupyter Notebook

#ai-monitoring#ai-observability#llm-evaluation

NVIDIA/garak

The LLM vulnerability scanner, a Python-based tool for identifying security vulnerabilities in large language models.

7.1K

Active

Python

LLM Frameworks

Security Research

#llm-security#vulnerability-assessment#security-scanning

jeinlee1991/chinese-llm-benchmark

A comprehensive benchmark for evaluating the capabilities of Chinese large language models (LLMs)

5.6K

Active

LLM Frameworks

LLM Evaluation

#ai-benchmarking#chinese-llms#model-evaluation

Giskard-AI/giskard-oss

Open-source evaluation and testing library for LLM Agents

5.1K

Active

Python

LLM Frameworks

React

#evaluation#testing#LLM

PacktPublishing/LLM-Engineers-Handbook

A practical guide for LLM engineers, covering fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices.

4.8K

Stable

Python

LLM Frameworks

ML Ops

AWS

#llm#mlops#aws

Agenta-AI/agenta

An open-source LLMOps platform for prompt playground, prompt management, LLM evaluation, and LLM observability.

3.9K

Active

TypeScript

LLM Frameworks

LLM Wrappers & SDKs

React

#llm-platform#prompt-engineering#prompt-management

lmnr-ai/lmnr

Laminar is an open-source observability platform purpose-built for AI agents and workflows.

2.7K

Active

TypeScript

Agents & Orchestration

LLM Observability

TypeScript

#ai#observability#llm

genieincodebottle/generative-ai

Comprehensive resources for developers working with Generative AI, including projects, use cases, and interview prep.

1.9K

Active

Jupyter Notebook

LLM Frameworks

Tutorials & Courses

Jupyter Notebook

#generative-ai#llm#ai-coding

msoedov/agentic_security

An AI-powered security toolkit for LLM vulnerability scanning and red teaming.

1.8K

Active

Python

LLM Frameworks

Security Research

Python

#llm-security#llm-vulnerability-scanner#llm-fuzzing

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

1.6K

Stable

TypeScript

LLM Frameworks

AI SDKs & Wrappers

TypeScript

#ai#llms#nocode

cyberark/FuzzyAI

A powerful tool for automated LLM fuzzing to help developers and security researchers identify and mitigate potential jailbreaks.

1.2K

Stable

Jupyter Notebook

LLM Frameworks

Security Research

Jupyter Notebook

#ai#fuzzing#jailbreak

microsoft/prompty

Prompty is a Python library that makes it easy to create, manage, debug, and evaluate LLM prompts for AI applications.

1.2K

Active

Python

LLM Frameworks

Prompt Engineering

Python

#generative-ai#llm-evaluation#prompt-engineering

cvs-health/uqlm

A Python package for uncertainty quantification and hallucination detection in large language models (LLMs)

1.1K

Active

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#ai-safety#confidence-estimation#hallucination-detection

JudgmentLabs/judgeval

An open-source post-building layer for AI agents, providing environment data and evaluations to power agent post-training and monitoring.

1.0K

Active

Python

Agents & Orchestration

LLM Frameworks

Python

#agent#agentic-ai#llm-evaluation

Stay in the loop

Get weekly updates on trending AI coding tools and projects.