Showing 41-60 of 222 projects
A comprehensive benchmark for evaluating the capabilities of Chinese large language models (LLMs)
An open platform for training, serving, and evaluating large language models for tool learning.
A curated collection of recent diffusion models for video generation, editing, and various other applications.
Cozeloop is an open-source platform that provides full-lifecycle management for AI agent development, debugging, and monitoring.
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more for vibe coders.
AviatorScript is a high-performance scripting language hosted on the JVM for developers who want a powerful expression evaluator.
Open-source evaluation and testing library for LLM Agents
A Python library that provides the most popular metrics used to evaluate object detection algorithms.
A practical guide for LLM engineers, covering fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps best practices.
A high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
Build, Evaluate, and Optimize AI Systems
Framework for routing LLM requests to optimize costs while maintaining response quality
An open-source guide for building a 'Tiny-Universe' of large language models and AI tools.
Simple-evals is a Python library for running OpenAI's model evaluation scripts.
A language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
CLUE is a comprehensive Chinese language understanding evaluation benchmark with datasets, baselines, pre-trained models, and a leaderboard.
RAFT is a PyTorch library for training and evaluating visual transformers, a popular AI model for computer vision tasks.
An open-source toolkit for speech processing, supporting enhancement, separation, and target speaker extraction.
Govaluate is a Go library that allows arbitrary expression evaluation, useful for building dynamic, configurable applications.
Latitude is an open-source platform for building, evaluating, and refining prompts for large language models.
Get weekly updates on trending AI coding tools and projects.