Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluation×

Clear all

Showing 181-200 of 222 projects

Scale3-Labs/langtrace

An open-source, end-to-end observability tool for LLM applications with real-time tracing, evaluations and metrics.

1.2K

Stable

TypeScript

LLM Frameworks

CLI Tools

TypeScript

#ai#observability#tracing

google/adk-docs

An open-source toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

1.2K

Active

Shell

Agents & Orchestration

CLI Tools

Shell

#ai-development#agent-based-systems#cli-tools

chakki-works/seqeval

A Python framework for sequence labeling evaluation, useful for named-entity recognition and POS tagging.

1.2K

Archived

Python

Computer Vision

API Clients & Testing

Python

#named-entity-recognition#sequence-labeling#natural-language-processing

toshas/torch-fidelity

High-fidelity performance metrics for generative models in PyTorch

1.2K

Stable

Python

Inference

Metrics

PyTorch

#generative-models#evaluation#reproducibility

run-house/kubetorch

Distribute and run AI workloads on Kubernetes with a Python-based infrastructure toolkit like PyTorch.

1.2K

Active

Python

ML Ops

Containerization

PyTorch

#kubernetes#distributed-computing#data-science

plurai-ai/intellagent

A Python framework for comprehensive diagnosis and optimization of AI agents using simulated, realistic synthetic interactions.

1.2K

Stable

Python

Agents & Orchestration

Simulator

#agent-evaluation#agent-optimization#llmops

microsoft/prompty

Prompty is a Python library that makes it easy to create, manage, debug, and evaluate LLM prompts for AI applications.

1.2K

Active

Python

LLM Frameworks

Prompt Engineering

Python

#generative-ai#llm-evaluation#prompt-engineering

Mathics3/mathics-core

An open-source Mathematica Kernel written in Python with built-in functions, variables, and a parser/evaluator.

1.2K

Active

Python

LLM Frameworks

CLI Tools

Python

#computer-algebra-system#mathematica#wolfram-language

rafaelpadilla/review_object_detection_metrics

A comprehensive Python library for evaluating object detection models using various metrics like mAP, AR, and STT-AP.

1.2K

Stable

Python

Computer Vision

API Frameworks

Python

#object-detection#metrics#average-precision

marpple/FxTS

A functional programming library for TypeScript/JavaScript developers with concurrency, lazy evaluation, and other FP features.

1.2K

Active

TypeScript

Frontend Frameworks

API Frameworks

JavaScript

#functional-programming#concurrency#lazy-evaluation

Escheee/TBCF

A benchmark library for evaluating correlation filter-based visual tracking algorithms.

1.1K

Archived

Objective-C

Tracking

Benchmarks

#visual-tracking#correlation-filters#benchmark

Tencent/AICGSecEval

A.S.E (AICGSecEval) is a repository-level AI-generated code security evaluation benchmark developed by Tencent Wukong Code Security Team.

1.1K

Active

Python

LLM Frameworks

API Frameworks

Python

#benchmark#codesecurity#llm

inclusionAI/AWorld

An open-source framework for building, evaluating, and training general multi-agent assistance systems using AI tools.

1.1K

Active

Python

Agents & Orchestration

LLM Frameworks

Python

#agent-framework#agent-learning#agent-runtime

thu-coai/Safety-Prompts

Collection of Chinese safety prompts for evaluating and improving the safety of large language models (LLMs).

1.1K

Archived

LLM Frameworks

Prompt Engineering

#attack-defense#chatgpt#chinese-language

easystats/performance

R package for evaluating the quality and performance of statistical models in R.

1.1K

Active

ML Ops

Analytics & Modeling

#r#statistics#model-evaluation

sierra-research/tau-bench

Tau-Bench is a Python library for benchmarking and evaluating AI language models and tools.

1.1K

Stable

Python

LLM Frameworks

CLI Tools

#benchmarking#evaluation#language-models

cvs-health/uqlm

A Python package for uncertainty quantification and hallucination detection in large language models (LLMs)

1.1K

Active

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#ai-safety#confidence-estimation#hallucination-detection

mozilla-ai/any-agent

A Python library that provides a single interface to use and evaluate different AI agent frameworks.

1.1K

Active

Python

Agents & Orchestration

CLI Tools

Python

#ai#agents#testing

THUDM/LongBench

LongBench is a benchmark for evaluating large language models on long-context tasks.

1.1K

Archived

Python

LLM Frameworks

Benchmark

Python

#benchmark#llm#long-context

THU-LYJ-Lab/T3Bench

A Python benchmark suite for evaluating text-to-3D generation models and techniques.

1.1K

Archived

Python

Computer Vision

Inference

Python

#3d-generation#text-to-3d#diffusion

1...911 12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.