Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluation×

Clear all

Showing 21-40 of 222 projects

promptfoo/promptfoo

A framework for testing and evaluating large language models, prompts, and AI agents for security and performance.

10.8K

Active

TypeScript

LLM Frameworks

TypeScript

#llm-evaluation#prompt-engineering#red-teaming

facebookresearch/ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

10.6K

Archived

Python

LLM Frameworks

PyTorch

#ai#machine-learning#dialogue

Theano/Theano

Theano is a powerful Python library for defining, optimizing, and evaluating mathematical expressions efficiently.

10.0K

Archived

Python

LLM Frameworks

Python

#machine-learning#numerical-computing#optimization

oumi-ai/oumi

Easily fine-tune, evaluate and deploy open-source large language models like GPT-OSS and Llama.

8.9K

Active

Python

LLM Frameworks

Inference

Python

#llms#fine-tuning#evaluation

aden-hive/hive

An agent development framework that evolves to drive outcomes, with support for Anthropic's Claude and OpenAI's models.

8.8K

Active

Python

Agents & Orchestration

LLM Frameworks

Python

#agent#ai-evaluation#automation

Arize-ai/phoenix

AI observability and evaluation tooling for developers building with large language models and AI agents.

8.8K

Active

Jupyter Notebook

LLM Frameworks

Agents & Orchestration

Jupyter Notebook

#ai-monitoring#ai-observability#llm-evaluation

LianjiaTech/BELLE

Open-source Chinese language model BELLE for building AI-powered chatbots and conversational applications.

8.3K

Archived

HTML

LLM Frameworks

Agents & Orchestration

React

#chinese-nlp#instruction-set#llama

kernc/backtesting.py

Backtesting.py is a Python library for backtesting and evaluating trading strategies in the financial markets.

8.0K

Stable

Python

API Frameworks

Databases

#algo-trading#algorithmic-trading#backtesting

expr-lang/expr

A powerful expression language and evaluator for Go, enabling developers to build rule-based systems and configuration languages.

7.7K

Active

API Frameworks

CLI Tools

#expression-language#rule-engine#configuration-language

evidentlyai/evidently

Evidently is an open-source ML and LLM observability framework to evaluate, test, and monitor AI-powered systems.

7.3K

Active

Jupyter Notebook

MLOps

Data Validation

Jupyter Notebook

#data-quality#data-validation#model-monitoring

NVIDIA/garak

The LLM vulnerability scanner, a Python-based tool for identifying security vulnerabilities in large language models.

7.1K

Active

Python

LLM Frameworks

Security Research

#llm-security#vulnerability-assessment#security-scanning

google/adk-go

An open-source Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

7.1K

Active

LLM Frameworks

Agents & Orchestration

#ai#agents#llm

open-compass/opencompass

OpenCompass is a comprehensive LLM evaluation platform supporting a wide range of models and datasets.

6.7K

Active

Python

LLM Frameworks

Agents & Orchestration

Python

#benchmark#chatgpt#evaluation

flutter/gallery

Flutter Gallery is a resource to help developers evaluate and use the Flutter framework.

6.6K

Archived

Dart

Component Libraries (Flutter)

Tutorials & Courses

Flutter

#flutter#dart#ui-components

openai/consistency_models

Official repo for consistency models, a framework for training and evaluating AI models.

6.5K

Archived

Python

LLM Frameworks

CLI Tools

Python

#machine-learning#natural-language-processing#language-models

alibaba/OpenSandbox

General-purpose sandbox platform providing multi-language SDKs and Docker/K8s runtimes for AI agents.

6.4K

Active

Python

AI Agent Runtimes

AI SDKs & Wrappers

Docker

#agent-sandbox#kubernetes-runtime#isolated-execution

allenai/OLMo

Modeling, training, evaluation, and inference code for OLMo, a large language model.

6.3K

Stable

Python

LLM Frameworks

Python

#language-model#llm#ai

tensortrade-org/tensortrade

An open-source reinforcement learning framework for training, evaluating, and deploying robust trading agents.

6.0K

Archived

Python

Agents & Orchestration

API Frameworks

Python

#reinforcement-learning#trading#agents

GoogleCloudPlatform/agent-starter-pack

Production-ready templates to quickly ship AI agents to Google Cloud with built-in CI/CD, evaluation, and observability.

5.9K

Active

Python

Agents & Orchestration

CI/CD

Python

#ai-agents#mlops#observability

Ackites/KillWxapkg

A tool for automated decompiling and security evaluation of WeChat mini-programs, supporting decryption, unpacking, and code modification.

5.7K

Archived

Security Research

CLI Tools

#wechat#mini-program#decompiler

13...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.