Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluatorร—
Clear all

Showing 181-200 of 222 projects

Scale3-Labs/langtrace

An open-source, end-to-end observability tool for LLM applications with real-time tracing, evaluations and metrics.

1.2K
Stable
TypeScript
LLM Frameworks
CLI Tools
TypeScript
#ai#observability#tracing

google/adk-docs

An open-source toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

1.2K
Active
Shell
Agents & Orchestration
CLI Tools
Shell
#ai-development#agent-based-systems#cli-tools

chakki-works/seqeval

A Python framework for sequence labeling evaluation, useful for named-entity recognition and POS tagging.

1.2K
Archived
Python
Computer Vision
API Clients & Testing
Python
#named-entity-recognition#sequence-labeling#natural-language-processing

toshas/torch-fidelity

High-fidelity performance metrics for generative models in PyTorch

1.2K
Stable
Python
Inference
Metrics
PyTorch
#generative-models#evaluation#reproducibility

run-house/kubetorch

Distribute and run AI workloads on Kubernetes with a Python-based infrastructure toolkit like PyTorch.

1.2K
Active
Python
ML Ops
Containerization
PyTorch
#kubernetes#distributed-computing#data-science

plurai-ai/intellagent

A Python framework for comprehensive diagnosis and optimization of AI agents using simulated, realistic synthetic interactions.

1.2K
Stable
Python
Agents & Orchestration
Simulator
#agent-evaluation#agent-optimization#llmops

microsoft/prompty

Prompty is a Python library that makes it easy to create, manage, debug, and evaluate LLM prompts for AI applications.

1.2K
Active
Python
LLM Frameworks
Prompt Engineering
Python
#generative-ai#llm-evaluation#prompt-engineering

Mathics3/mathics-core

An open-source Mathematica Kernel written in Python with built-in functions, variables, and a parser/evaluator.

1.2K
Active
Python
LLM Frameworks
CLI Tools
Python
#computer-algebra-system#mathematica#wolfram-language

rafaelpadilla/review_object_detection_metrics

A comprehensive Python library for evaluating object detection models using various metrics like mAP, AR, and STT-AP.

1.2K
Stable
Python
Computer Vision
API Frameworks
Python
#object-detection#metrics#average-precision

marpple/FxTS

A functional programming library for TypeScript/JavaScript developers with concurrency, lazy evaluation, and other FP features.

1.2K
Active
TypeScript
Frontend Frameworks
API Frameworks
JavaScript
#functional-programming#concurrency#lazy-evaluation

Escheee/TBCF

A benchmark library for evaluating correlation filter-based visual tracking algorithms.

1.1K
Archived
Objective-C
Tracking
Benchmarks
#visual-tracking#correlation-filters#benchmark

Tencent/AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security evaluation benchmark developed by Tencent Wukong Code Security Team.

1.1K
Active
Python
LLM Frameworks
API Frameworks
Python
#benchmark#codesecurity#llm

inclusionAI/AWorld

An open-source framework for building, evaluating, and training general multi-agent assistance systems using AI tools.

1.1K
Active
Python
Agents & Orchestration
LLM Frameworks
Python
#agent-framework#agent-learning#agent-runtime

thu-coai/Safety-Prompts

Collection of Chinese safety prompts for evaluating and improving the safety of large language models (LLMs).

1.1K
Archived
LLM Frameworks
Prompt Engineering
#attack-defense#chatgpt#chinese-language

easystats/performance

R package for evaluating the quality and performance of statistical models in R.

1.1K
Active
R
ML Ops
Analytics & Modeling
R
#r#statistics#model-evaluation

sierra-research/tau-bench

Tau-Bench is a Python library for benchmarking and evaluating AI language models and tools.

1.1K
Stable
Python
LLM Frameworks
CLI Tools
#benchmarking#evaluation#language-models

cvs-health/uqlm

A Python package for uncertainty quantification and hallucination detection in large language models (LLMs)

1.1K
Active
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#ai-safety#confidence-estimation#hallucination-detection

mozilla-ai/any-agent

A Python library that provides a single interface to use and evaluate different AI agent frameworks.

1.1K
Active
Python
Agents & Orchestration
CLI Tools
Python
#ai#agents#testing

THUDM/LongBench

LongBench is a benchmark for evaluating large language models on long-context tasks.

1.1K
Archived
Python
LLM Frameworks
Benchmark
Python
#benchmark#llm#long-context

THU-LYJ-Lab/T3Bench

A Python benchmark suite for evaluating text-to-3D generation models and techniques.

1.1K
Archived
Python
Computer Vision
Inference
Python
#3d-generation#text-to-3d#diffusion
1...91112

Stay in the loop

Get weekly updates on trending AI coding tools and projects.