Showing 21-40 of 222 projects
A framework for testing and evaluating large language models, prompts, and AI agents for security and performance.
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
Theano is a powerful Python library for defining, optimizing, and evaluating mathematical expressions efficiently.
Easily fine-tune, evaluate and deploy open-source large language models like GPT-OSS and Llama.
An agent development framework that evolves to drive outcomes, with support for Anthropic's Claude and OpenAI's models.
AI observability and evaluation tooling for developers building with large language models and AI agents.
Open-source Chinese language model BELLE for building AI-powered chatbots and conversational applications.
Backtesting.py is a Python library for backtesting and evaluating trading strategies in the financial markets.
A powerful expression language and evaluator for Go, enabling developers to build rule-based systems and configuration languages.
Evidently is an open-source ML and LLM observability framework to evaluate, test, and monitor AI-powered systems.
The LLM vulnerability scanner, a Python-based tool for identifying security vulnerabilities in large language models.
An open-source Go toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
OpenCompass is a comprehensive LLM evaluation platform supporting a wide range of models and datasets.
Flutter Gallery is a resource to help developers evaluate and use the Flutter framework.
Official repo for consistency models, a framework for training and evaluating AI models.
General-purpose sandbox platform providing multi-language SDKs and Docker/K8s runtimes for AI agents.
Modeling, training, evaluation, and inference code for OLMo, a large language model.
An open-source reinforcement learning framework for training, evaluating, and deploying robust trading agents.
Production-ready templates to quickly ship AI agents to Google Cloud with built-in CI/CD, evaluation, and observability.
A tool for automated decompiling and security evaluation of WeChat mini-programs, supporting decryption, unpacking, and code modification.
Get weekly updates on trending AI coding tools and projects.