Showing 1-20 of 222 projects
Open platform for training, serving, and evaluating LLM chatbots with Vicuna and Chatbot Arena
A collection of PyTorch image encoders/backbones with training, evaluation, and inference scripts.
Reverse interview questions for job applicants to evaluate companies
MLflow is an open-source platform for building, tracking, and deploying AI/ML models with end-to-end observability and evaluation tools.
LLM engineering platform for observability, evaluation, and prompt management
An open-source Python toolkit for building, evaluating, and deploying sophisticated AI agents.
A comprehensive library for debugging, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows.
A framework for evaluating large language models (LLMs) and an open-source registry of benchmarks.
A high-performance and SEO-friendly lazy loader for images, iframes, and more that detects visibility changes without configuration.
A Python SDK for agent AI observability, monitoring, and evaluation with features like tracing, debugging, and analytics.
An extensive math library for JavaScript and Node.js, providing a wide range of mathematical functions and capabilities.
A Python framework for evaluating and benchmarking large language models (LLMs) and their capabilities.
A Go-based framework for deep document understanding, semantic retrieval, and context-aware question answering using the RAG paradigm.
Open-source infrastructure for AI agents that can control full desktops (macOS, Linux, Windows).
A Python library that helps supercharge the evaluation of large language model applications.
Gorilla is a Python tool for training and evaluating large language models (LLMs) for API/function calls.
A framework for few-shot evaluation of language models, useful for vibe coders working with AI tools.
An open LLM devops platform for building next-gen enterprise AI applications with powerful features like GenAI workflow, RAG, Agent, and model management.
Open-source stack for industrial-grade LLM applications, including LLM gateway, observability, optimization, evaluation, and experimentation.
Semantic segmentation benchmark and evaluation framework for AI-powered developers
Get weekly updates on trending AI coding tools and projects.