Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluationsร—
Clear all

Showing 141-160 of 222 projects

Vchitect/VBench

An open-source benchmarking tool for evaluating video generation models.

1.5K
Active
Python
React
#authentication#benchmarking#evaluation

sepandhaghighi/pycm

A Python library for creating multi-class confusion matrices, useful for evaluating machine learning models.

1.5K
Active
Python
ML Ops
Data Analysis
#accuracy#classification#confusion-matrix

mbzuai-oryx/Video-ChatGPT

A video conversation model that combines LLM capabilities with pretrained visual encoders for video-based chatbots.

1.5K
Experimental
Python
LLM Frameworks
Computer Vision
PyTorch
#chatbot#video-conversation#vision-language

filipradenovic/cnnimageretrieval-pytorch

A PyTorch library for training and evaluating convolutional neural networks (CNNs) for image retrieval.

1.5K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#cnn#image-retrieval#pytorch

SkyworkAI/Skywork

An open-source large language model (LLM) for AI-powered coding and developer discovery tools.

1.5K
Experimental
Python
LLM Frameworks
AI Code Generation
Python
#llm#ai-coding#code-generation

Intel-bigdata/HiBench

HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.

1.5K
Stable
Java
Benchmark
#big-data#benchmark#hadoop

OpenGenerativeAI/llm-colosseum

A Jupyter Notebook-based benchmark for evaluating the quality of large language models by having them play Street Fighter 3.

1.5K
Experimental
Jupyter Notebook
LLM Frameworks
#benchmark#generative-ai#llm

timescale/tsbs

A tool for comparing and evaluating databases for time series data.

1.4K
Archived
Go
Databases
Benchmarking
#time-series#benchmarking#database-comparison

GitHamza0206/simba

OpenSource customer service platform with built-in evaluations and monitoring for developers.

1.4K
Active
TypeScript
CMS & Content
Monitoring
TypeScript
#customer-service#evals#knowledge-base

mlfoundations/dclm

DataComp for Language Models is a library for training, evaluating, and deploying large language models.

1.4K
Stable
HTML
LLM Frameworks
API Frameworks
Next.js
#machine-learning#language-models#api-development

kdlucas/byte-unixbench

A Unix benchmark tool for developers to evaluate system performance.

1.4K
Experimental
C
CLI Tools
API Frameworks
#system-performance#benchmarking#command-line

centerforaisafety/hle

An AI-powered framework to evaluate the safety and alignment of large language models.

1.4K
Stable
Python
LLM Frameworks
AI SDKs & Wrappers
Python
#ai-safety#llm-evaluation#cli-tool

d3/d3-queue

Evaluate asynchronous tasks with configurable concurrency in JavaScript projects.

1.4K
Archived
JavaScript
API Clients & Testing
CLI Tools
Node
#asynchronous#concurrency#task-management

mattpocock/evalite

Evaluate your LLM-powered apps with TypeScript, a library for vibe coders building AI tools.

1.4K
Stable
TypeScript
LLM Frameworks
AI App Builders
TypeScript
#ai#llm#typescript

microsoft/TextWorld

TextWorld is a sandbox learning environment for training and evaluating reinforcement learning agents on text-based games.

1.4K
Active
Jupyter Notebook
Agents & Orchestration
Tutorials & Courses
Jupyter Notebook
#reinforcement-learning#text-based-adventure#text-based-game

Maluuba/nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

1.4K
Archived
Python
NLP
API Frameworks
Python
#nlg#evaluation#metrics

aws-cloudformation/cloudformation-guard

A policy-as-code DSL to validate CloudFormation, Kubernetes, and Terraform configurations against custom rules.

1.4K
Active
Rust
Infrastructure as Code
CLI Tools
#cloudformation#policy-as-code#compliance

kentcdodds/babel-plugin-preval

A TypeScript-based Babel plugin that allows pre-evaluating code at build-time.

1.4K
Archived
TypeScript
Build Tools
Frontend Frameworks
React
#build-time#pre-evaluation#babel-plugin

BIT-DataLab/LakeBench

A Python-based benchmarking framework for evaluating large-scale machine learning models and datasets.

1.4K
Experimental
Python
ML Ops
Databases
Python
#machine-learning#benchmarking#data-science

alphasoc/flightsim

A Go utility to generate malicious network traffic patterns for security testing and evaluation.

1.4K
Archived
Go
Security Research
Testing
#intrusion-detection#network-traffic-generation#security-testing
1...79...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.