Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluations×

Clear all

Showing 141-160 of 222 projects

Vchitect/VBench

An open-source benchmarking tool for evaluating video generation models.

1.5K

Active

Python

React

#authentication#benchmarking#evaluation

sepandhaghighi/pycm

A Python library for creating multi-class confusion matrices, useful for evaluating machine learning models.

1.5K

Active

Python

ML Ops

Data Analysis

#accuracy#classification#confusion-matrix

mbzuai-oryx/Video-ChatGPT

A video conversation model that combines LLM capabilities with pretrained visual encoders for video-based chatbots.

1.5K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatbot#video-conversation#vision-language

filipradenovic/cnnimageretrieval-pytorch

A PyTorch library for training and evaluating convolutional neural networks (CNNs) for image retrieval.

1.5K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#cnn#image-retrieval#pytorch

SkyworkAI/Skywork

An open-source large language model (LLM) for AI-powered coding and developer discovery tools.

1.5K

Experimental

Python

LLM Frameworks

AI Code Generation

Python

#llm#ai-coding#code-generation

Intel-bigdata/HiBench

HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.

1.5K

Stable

Java

Benchmark

#big-data#benchmark#hadoop

OpenGenerativeAI/llm-colosseum

A Jupyter Notebook-based benchmark for evaluating the quality of large language models by having them play Street Fighter 3.

1.5K

Experimental

Jupyter Notebook

LLM Frameworks

#benchmark#generative-ai#llm

timescale/tsbs

A tool for comparing and evaluating databases for time series data.

1.4K

Archived

Databases

Benchmarking

#time-series#benchmarking#database-comparison

GitHamza0206/simba

OpenSource customer service platform with built-in evaluations and monitoring for developers.

1.4K

Active

TypeScript

CMS & Content

Monitoring

TypeScript

#customer-service#evals#knowledge-base

mlfoundations/dclm

DataComp for Language Models is a library for training, evaluating, and deploying large language models.

1.4K

Stable

HTML

LLM Frameworks

API Frameworks

Next.js

#machine-learning#language-models#api-development

kdlucas/byte-unixbench

A Unix benchmark tool for developers to evaluate system performance.

1.4K

Experimental

CLI Tools

API Frameworks

#system-performance#benchmarking#command-line

centerforaisafety/hle

An AI-powered framework to evaluate the safety and alignment of large language models.

1.4K

Stable

Python

LLM Frameworks

AI SDKs & Wrappers

Python

#ai-safety#llm-evaluation#cli-tool

d3/d3-queue

Evaluate asynchronous tasks with configurable concurrency in JavaScript projects.

1.4K

Archived

JavaScript

API Clients & Testing

CLI Tools

Node

#asynchronous#concurrency#task-management

mattpocock/evalite

Evaluate your LLM-powered apps with TypeScript, a library for vibe coders building AI tools.

1.4K

Stable

TypeScript

LLM Frameworks

AI App Builders

TypeScript

#ai#llm#typescript

microsoft/TextWorld

TextWorld is a sandbox learning environment for training and evaluating reinforcement learning agents on text-based games.

1.4K

Active

Jupyter Notebook

Agents & Orchestration

Tutorials & Courses

Jupyter Notebook

#reinforcement-learning#text-based-adventure#text-based-game

Maluuba/nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

1.4K

Archived

Python

NLP

API Frameworks

Python

#nlg#evaluation#metrics

aws-cloudformation/cloudformation-guard

A policy-as-code DSL to validate CloudFormation, Kubernetes, and Terraform configurations against custom rules.

1.4K

Active

Rust

Infrastructure as Code

CLI Tools

#cloudformation#policy-as-code#compliance

kentcdodds/babel-plugin-preval

A TypeScript-based Babel plugin that allows pre-evaluating code at build-time.

1.4K

Archived

TypeScript

Build Tools

Frontend Frameworks

React

#build-time#pre-evaluation#babel-plugin

BIT-DataLab/LakeBench

A Python-based benchmarking framework for evaluating large-scale machine learning models and datasets.

1.4K

Experimental

Python

ML Ops

Databases

Python

#machine-learning#benchmarking#data-science

alphasoc/flightsim

A Go utility to generate malicious network traffic patterns for security testing and evaluation.

1.4K

Archived

Security Research

Testing

#intrusion-detection#network-traffic-generation#security-testing

1...79...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.