Showing 1-4 of 4 projects
A Python framework for evaluating and benchmarking large language models (LLMs) and their capabilities.
A framework for few-shot evaluation of language models, useful for vibe coders working with AI tools.
A framework for testing and evaluating large language models, prompts, and AI agents for security and performance.
Build, Evaluate, and Optimize AI Systems
Get weekly updates on trending AI coding tools and projects.