Showing 1-20 of 41 projects
A collection of PyTorch image encoders/backbones with training, evaluation, and inference scripts.
LLM engineering platform for observability, evaluation, and prompt management
AI-powered app & agent framework for TypeScript
A framework for evaluating large language models (LLMs) and an open-source registry of benchmarks.
A powerful JavaScript tool for creating datasets for fine-tuning large language models (LLMs) and retrieval-augmented generation (RAG).
A framework for testing and evaluating large language models, prompts, and AI agents for security and performance.
A REPL (Read-Eval-Print Loop) for PHP, providing a powerful interactive environment for developers.
AI observability and evaluation tooling for developers building with large language models and AI agents.
A powerful REPL (Read-Eval-Print Loop) for the Laravel PHP framework.
Modeling, training, evaluation, and inference code for OLMo, a large language model.
A feature-rich, interactive Python REPL (Read-Eval-Print Loop) that improves the developer experience.
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more for vibe coders.
Open-source evaluation and testing library for LLM Agents
Build, Evaluate, and Optimize AI Systems
Simple-evals is a Python library for running OpenAI's model evaluation scripts.
An AI observability platform for production LLM and agent systems, built with Python and Pydantic.
A Python library for reinforcement learning environments and evaluations targeted at AI-focused developers.
A multimodal evaluation toolkit for assessing AI models across text, image, video, and audio tasks.
A Python library for evaluating the capabilities of large language models trained on code.
A free and open-source application for reading manga, novels, and watching animes across multiple platforms.
Get weekly updates on trending AI coding tools and projects.