Showing 141-160 of 222 projects
An open-source benchmarking tool for evaluating video generation models.
A Python library for creating multi-class confusion matrices, useful for evaluating machine learning models.
A video conversation model that combines LLM capabilities with pretrained visual encoders for video-based chatbots.
A PyTorch library for training and evaluating convolutional neural networks (CNNs) for image retrieval.
An open-source large language model (LLM) for AI-powered coding and developer discovery tools.
HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.
A Jupyter Notebook-based benchmark for evaluating the quality of large language models by having them play Street Fighter 3.
A tool for comparing and evaluating databases for time series data.
OpenSource customer service platform with built-in evaluations and monitoring for developers.
DataComp for Language Models is a library for training, evaluating, and deploying large language models.
A Unix benchmark tool for developers to evaluate system performance.
An AI-powered framework to evaluate the safety and alignment of large language models.
Evaluate asynchronous tasks with configurable concurrency in JavaScript projects.
Evaluate your LLM-powered apps with TypeScript, a library for vibe coders building AI tools.
TextWorld is a sandbox learning environment for training and evaluating reinforcement learning agents on text-based games.
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
A policy-as-code DSL to validate CloudFormation, Kubernetes, and Terraform configurations against custom rules.
A TypeScript-based Babel plugin that allows pre-evaluating code at build-time.
A Python-based benchmarking framework for evaluating large-scale machine learning models and datasets.
A Go utility to generate malicious network traffic patterns for security testing and evaluation.
Get weekly updates on trending AI coding tools and projects.