Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluator×

Clear all

Showing 121-140 of 222 projects

evalplus/evalplus

A rigorous benchmark for evaluating the code quality and efficiency of large language models like GPT-4.

1.7K

Stable

Python

LLM Frameworks

Testing

Python

#benchmark#chatgpt#efficiency

SmartFlowAI/EmoLLM

This is a large language model (LLM) focused on mental health, with pre/post-training, datasets, evaluation, and deployment tools.

1.7K

Stable

Python

LLM Frameworks

Fine-tuning

#llm#mental-health#dataset

michaelb/sniprun

A Neovim plugin that allows running code snippets independently, supporting multiple languages.

1.7K

Stable

Rust

IDE Extensions

API Frameworks

Neovim

#code-runner#interpreted-language#compiled-language

harbor-framework/terminal-bench

A benchmark for evaluating the performance of large language models (LLMs) on complex terminal-based tasks.

1.7K

Active

Python

LLM Frameworks

CLI Tools

Python

#benchmark#llm#terminal

benhamner/Metrics

A Python library that provides a collection of commonly used machine learning evaluation metrics.

1.7K

Archived

Python

ML Ops

#machine-learning#evaluation-metrics#python

openai/neural-mmo

A massively multiagent game environment for training and evaluating intelligent agents.

1.6K

Archived

Python

Agents & Orchestration

Example Projects

Python

#ai#machine-learning#multiagent-systems

zai-org/ImageReward

A repository for ImageReward, a learning and evaluating human preferences for text-to-image generation

1.6K

Stable

Python

React

#authentication#diffusion-models#generative-model

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

1.6K

Stable

TypeScript

LLM Frameworks

AI SDKs & Wrappers

TypeScript

#ai#llms#nocode

alecthomas/go_serialization_benchmarks

Benchmarks for evaluating Go serialization methods for performance and efficiency.

1.6K

Experimental

Benchmarking

API Frameworks

#benchmarking#performance#serialization

justmarkham/DAT8

General Assembly's 2015 Data Science course covering topics like machine learning, data analysis, and data visualization.

1.6K

Archived

Jupyter Notebook

Tutorials & Courses

Jupyter Notebook

#data-analysis#data-science#machine-learning

deepsense-ai/ragbits

A Python library that provides building blocks for rapid development of generative AI applications.

1.6K

Active

Python

LLM Frameworks

Agents & Orchestration

#llm#agents#rag

ffoodd/a11y.css

A CSS file that helps developers detect accessibility issues in HTML code.

1.6K

Active

SCSS

Frontend Frameworks

Documentation

#accessibility#diagnostics#markup

MLGroupJLU/LLM-eval-survey

A survey paper on evaluating large language models (LLMs) for developers building AI-powered applications.

1.6K

Experimental

LLM Frameworks

Tutorials & Courses

#benchmark#evaluation#large-language-models

WuKongOpenSource/Wukong_HRM

A free, open-source HRM system for vibe coders with features like recruitment management and performance evaluation.

1.6K

Archived

Java

React

#HRM#open-source#java

tmikolov/word2vec

A popular library for training, using, and evaluating word embeddings, a fundamental building block for natural language processing.

1.6K

Archived

LLM Frameworks

API Frameworks

#nlp#word-embeddings#caching

opendatalab/OmniDocBench

A comprehensive benchmark for document parsing and evaluation, designed for CVPR 2025.

1.5K

Stable

Python

Computer Vision

Datasets

#computer-vision#document-parsing#benchmark

nccgroup/PMapper

A Python tool for quickly evaluating IAM permissions in AWS.

1.5K

Archived

Python

API Frameworks

Monitoring

Python

#aws#iam#botocore

Lifelong-Robot-Learning/LIBERO

A benchmark for evaluating knowledge transfer in lifelong robot learning using AI tools.

1.5K

Experimental

Jupyter Notebook

Agents & Orchestration

API Frameworks

Jupyter Notebook

#benchmark#imitation-learning#lifelong-learning

BytedanceSpeech/seed-tts-eval

This Python repository provides an evaluation framework for text-to-speech models, focusing on enabling vibe coder development with AI tools.

1.5K

Archived

Python

AI Voice & Speech

Testing

Python

#text-to-speech#speech-synthesis#model-evaluation

frutik/awesome-search

Comprehensive collection of resources and tools for building awesome search experiences.

1.5K

Stable

HTML

Search-as-a-Service

Frontend Frameworks

React

#search#ecommerce#relevance-algorithms

1...68...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.