Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluationร—
Clear all

Showing 21-40 of 222 projects

promptfoo/promptfoo

A framework for testing and evaluating large language models, prompts, and AI agents for security and performance.

10.8K
Active
TypeScript
LLM Frameworks
TypeScript
#llm-evaluation#prompt-engineering#red-teaming

facebookresearch/ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

10.6K
Archived
Python
LLM Frameworks
PyTorch
#ai#machine-learning#dialogue

Theano/Theano

Theano is a powerful Python library for defining, optimizing, and evaluating mathematical expressions efficiently.

10.0K
Archived
Python
LLM Frameworks
Python
#machine-learning#numerical-computing#optimization

oumi-ai/oumi

Easily fine-tune, evaluate and deploy open-source large language models like GPT-OSS and Llama.

8.9K
Active
Python
LLM Frameworks
Inference
Python
#llms#fine-tuning#evaluation

aden-hive/hive

An agent development framework that evolves to drive outcomes, with support for Anthropic's Claude and OpenAI's models.

8.8K
Active
Python
Agents & Orchestration
LLM Frameworks
Python
#agent#ai-evaluation#automation

Arize-ai/phoenix

AI observability and evaluation tooling for developers building with large language models and AI agents.

8.8K
Active
Jupyter Notebook
LLM Frameworks
Agents & Orchestration
Jupyter Notebook
#ai-monitoring#ai-observability#llm-evaluation

LianjiaTech/BELLE

Open-source Chinese language model BELLE for building AI-powered chatbots and conversational applications.

8.3K
Archived
HTML
LLM Frameworks
Agents & Orchestration
React
#chinese-nlp#instruction-set#llama

kernc/backtesting.py

Backtesting.py is a Python library for backtesting and evaluating trading strategies in the financial markets.

8.0K
Stable
Python
API Frameworks
Databases
#algo-trading#algorithmic-trading#backtesting

expr-lang/expr

A powerful expression language and evaluator for Go, enabling developers to build rule-based systems and configuration languages.

7.7K
Active
Go
API Frameworks
CLI Tools
#expression-language#rule-engine#configuration-language

evidentlyai/evidently

Evidently is an open-source ML and LLM observability framework to evaluate, test, and monitor AI-powered systems.

7.3K
Active
Jupyter Notebook
MLOps
Data Validation
Jupyter Notebook
#data-quality#data-validation#model-monitoring

NVIDIA/garak

The LLM vulnerability scanner, a Python-based tool for identifying security vulnerabilities in large language models.

7.1K
Active
Python
LLM Frameworks
Security Research
#llm-security#vulnerability-assessment#security-scanning

google/adk-go

An open-source Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

7.1K
Active
Go
LLM Frameworks
Agents & Orchestration
#ai#agents#llm

open-compass/opencompass

OpenCompass is a comprehensive LLM evaluation platform supporting a wide range of models and datasets.

6.7K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#benchmark#chatgpt#evaluation

flutter/gallery

Flutter Gallery is a resource to help developers evaluate and use the Flutter framework.

6.6K
Archived
Dart
Component Libraries (Flutter)
Tutorials & Courses
Flutter
#flutter#dart#ui-components

openai/consistency_models

Official repo for consistency models, a framework for training and evaluating AI models.

6.5K
Archived
Python
LLM Frameworks
CLI Tools
Python
#machine-learning#natural-language-processing#language-models

alibaba/OpenSandbox

General-purpose sandbox platform providing multi-language SDKs and Docker/K8s runtimes for AI agents.

6.4K
Active
Python
AI Agent Runtimes
AI SDKs & Wrappers
Docker
#agent-sandbox#kubernetes-runtime#isolated-execution

allenai/OLMo

Modeling, training, evaluation, and inference code for OLMo, a large language model.

6.3K
Stable
Python
LLM Frameworks
Python
#language-model#llm#ai

tensortrade-org/tensortrade

An open-source reinforcement learning framework for training, evaluating, and deploying robust trading agents.

6.0K
Archived
Python
Agents & Orchestration
API Frameworks
Python
#reinforcement-learning#trading#agents

GoogleCloudPlatform/agent-starter-pack

Production-ready templates to quickly ship AI agents to Google Cloud with built-in CI/CD, evaluation, and observability.

5.9K
Active
Python
Agents & Orchestration
CI/CD
Python
#ai-agents#mlops#observability

Ackites/KillWxapkg

A tool for automated decompiling and security evaluation of WeChat mini-programs, supporting decryption, unpacking, and code modification.

5.7K
Archived
Go
Security Research
CLI Tools
#wechat#mini-program#decompiler
13...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.