Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluatorร—
Clear all

Showing 121-140 of 222 projects

evalplus/evalplus

A rigorous benchmark for evaluating the code quality and efficiency of large language models like GPT-4.

1.7K
Stable
Python
LLM Frameworks
Testing
Python
#benchmark#chatgpt#efficiency

SmartFlowAI/EmoLLM

This is a large language model (LLM) focused on mental health, with pre/post-training, datasets, evaluation, and deployment tools.

1.7K
Stable
Python
LLM Frameworks
Fine-tuning
#llm#mental-health#dataset

michaelb/sniprun

A Neovim plugin that allows running code snippets independently, supporting multiple languages.

1.7K
Stable
Rust
IDE Extensions
API Frameworks
Neovim
#code-runner#interpreted-language#compiled-language

harbor-framework/terminal-bench

A benchmark for evaluating the performance of large language models (LLMs) on complex terminal-based tasks.

1.7K
Active
Python
LLM Frameworks
CLI Tools
Python
#benchmark#llm#terminal

benhamner/Metrics

A Python library that provides a collection of commonly used machine learning evaluation metrics.

1.7K
Archived
Python
ML Ops
#machine-learning#evaluation-metrics#python

openai/neural-mmo

A massively multiagent game environment for training and evaluating intelligent agents.

1.6K
Archived
Python
Agents & Orchestration
Example Projects
Python
#ai#machine-learning#multiagent-systems

zai-org/ImageReward

A repository for ImageReward, a learning and evaluating human preferences for text-to-image generation

1.6K
Stable
Python
React
#authentication#diffusion-models#generative-model

huggingface/aisheets

Build, enrich, and transform datasets using AI models with no code

1.6K
Stable
TypeScript
LLM Frameworks
AI SDKs & Wrappers
TypeScript
#ai#llms#nocode

alecthomas/go_serialization_benchmarks

Benchmarks for evaluating Go serialization methods for performance and efficiency.

1.6K
Experimental
Go
Benchmarking
API Frameworks
#benchmarking#performance#serialization

justmarkham/DAT8

General Assembly's 2015 Data Science course covering topics like machine learning, data analysis, and data visualization.

1.6K
Archived
Jupyter Notebook
Tutorials & Courses
Jupyter Notebook
#data-analysis#data-science#machine-learning

deepsense-ai/ragbits

A Python library that provides building blocks for rapid development of generative AI applications.

1.6K
Active
Python
LLM Frameworks
Agents & Orchestration
#llm#agents#rag

ffoodd/a11y.css

A CSS file that helps developers detect accessibility issues in HTML code.

1.6K
Active
SCSS
Frontend Frameworks
Documentation
#accessibility#diagnostics#markup

MLGroupJLU/LLM-eval-survey

A survey paper on evaluating large language models (LLMs) for developers building AI-powered applications.

1.6K
Experimental
LLM Frameworks
Tutorials & Courses
#benchmark#evaluation#large-language-models

WuKongOpenSource/Wukong_HRM

A free, open-source HRM system for vibe coders with features like recruitment management and performance evaluation.

1.6K
Archived
Java
React
#HRM#open-source#java

tmikolov/word2vec

A popular library for training, using, and evaluating word embeddings, a fundamental building block for natural language processing.

1.6K
Archived
C
LLM Frameworks
API Frameworks
#nlp#word-embeddings#caching

opendatalab/OmniDocBench

A comprehensive benchmark for document parsing and evaluation, designed for CVPR 2025.

1.5K
Stable
Python
Computer Vision
Datasets
#computer-vision#document-parsing#benchmark

nccgroup/PMapper

A Python tool for quickly evaluating IAM permissions in AWS.

1.5K
Archived
Python
API Frameworks
Monitoring
Python
#aws#iam#botocore

Lifelong-Robot-Learning/LIBERO

A benchmark for evaluating knowledge transfer in lifelong robot learning using AI tools.

1.5K
Experimental
Jupyter Notebook
Agents & Orchestration
API Frameworks
Jupyter Notebook
#benchmark#imitation-learning#lifelong-learning

BytedanceSpeech/seed-tts-eval

This Python repository provides an evaluation framework for text-to-speech models, focusing on enabling vibe coder development with AI tools.

1.5K
Archived
Python
AI Voice & Speech
Testing
Python
#text-to-speech#speech-synthesis#model-evaluation

frutik/awesome-search

Comprehensive collection of resources and tools for building awesome search experiences.

1.5K
Stable
HTML
Search-as-a-Service
Frontend Frameworks
React
#search#ecommerce#relevance-algorithms
1...68...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.