Explore Projects

Discover 41 open source projects

Active filters (1):

Search: evals×

Clear all

Showing 21-40 of 41 projects

datachain-ai/datachain

Comprehensive analytics, versioning, and ETL toolkit for multimodal data (video, audio, PDFs, images)

2.7K

Active

Python

Computer Vision

ETL & Pipelines

Python

#data-analytics#data-wrangling#embeddings

lmnr-ai/lmnr

Laminar is an open-source observability platform purpose-built for AI agents and workflows.

2.7K

Active

TypeScript

Agents & Orchestration

LLM Observability

TypeScript

#ai#observability#llm

uptrain-ai/uptrain

An open-source platform for evaluating and improving Generative AI applications with 20+ preconfigured checks and root cause analysis.

2.3K

Archived

Python

LLM Frameworks

Testing

Python

#llm-eval#prompt-engineering#root-cause-analysis

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models with human-validated, high-quality, cheap, and fast evaluation.

2.0K

Stable

Jupyter Notebook

LLM Frameworks

Evaluation

Jupyter Notebook

#deep-learning#foundation-models#large-language-models

hkust-nlp/ceval

Official repository for C-Eval, a Chinese evaluation suite for foundation models.

1.8K

Experimental

Python

LLM Frameworks

#llm#evaluation#chinese

justjake/quickjs-emscripten

A TypeScript library that allows safely executing untrusted JavaScript and using async functions synchronously.

1.6K

Experimental

TypeScript

API Frameworks

CLI Tools

React

#javascript#quickjs#wasm

MLGroupJLU/LLM-eval-survey

A survey paper on evaluating large language models (LLMs) for developers building AI-powered applications.

1.6K

Experimental

LLM Frameworks

Tutorials & Courses

#benchmark#evaluation#large-language-models

BytedanceSpeech/seed-tts-eval

This Python repository provides an evaluation framework for text-to-speech models, focusing on enabling vibe coder development with AI tools.

1.5K

Archived

Python

AI Voice & Speech

Testing

Python

#text-to-speech#speech-synthesis#model-evaluation

eqcss/eqcss

EQCSS is a CSS Reprocessor that introduces Element Queries, Scoped CSS, a Parent selector, and responsive JavaScript to all browsers IE8 and up.

1.5K

Archived

HTML

CSS Frameworks

React

#css#container-queries#element-queries

salesforce/CodeTF

A one-stop Transformer library for state-of-the-art code language models and AI-powered code understanding.

1.5K

Experimental

Python

LLM Frameworks

AI Code Generation

Python

#ai4code#transformers#code-generation

GitHamza0206/simba

OpenSource customer service platform with built-in evaluations and monitoring for developers.

1.4K

Active

TypeScript

CMS & Content

Monitoring

TypeScript

#customer-service#evals#knowledge-base

mattpocock/evalite

Evaluate your LLM-powered apps with TypeScript, a library for vibe coders building AI tools.

1.4K

Stable

TypeScript

LLM Frameworks

AI App Builders

TypeScript

#ai#llm#typescript

Maluuba/nlg-eval

Evaluation code for various unsupervised automated metrics for Natural Language Generation.

1.4K

Archived

Python

NLP

API Frameworks

Python

#nlg#evaluation#metrics

albertlatacz/java-repl

A Java REPL (Read Eval Print Loop) that allows developers to interactively run Java code.

1.3K

Archived

Java

CLI Tools

API Frameworks

#java#repl#interactive-coding

silentmatt/expr-eval

A JavaScript library for parsing and evaluating mathematical expressions.

1.3K

Archived

JavaScript

General Utilities

Frontend Frameworks

JavaScript

#math#expressions#parsing

refreshdotdev/web-eval-agent

An autonomous web application evaluation agent powered by MCP and Playwright for vibe coders.

1.2K

Active

Python

MCP Servers

AI Code Editors

React

#debugging#qa#vibe-coding

facebookarchive/phpsh

A read-eval-print-loop (REPL) for PHP, allowing developers to interactively test and experiment with PHP code.

1.1K

Archived

Emacs Lisp

CLI Tools

API Frameworks

#php#repl#interactive

superlinear-ai/raglite

A Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL, useful for AI coding tools.

1.1K

Active

Python

RAG & Vector

Databases

Python

#retrieval-augmented-generation#duckdb#postgresql

AI-QL/tuui

A desktop tool for orchestrating AI models across vendors using the Model Context Protocol (MCP)

1.1K

Active

TypeScript

MCP Frameworks

Agents & Orchestration

React

#ai-integration#mcp#llm-orchestration

prometheus-eval/prometheus-eval

A Python library to evaluate the response of large language models like GPT-4 using Prometheus metrics.

1.1K

Experimental

Python

LLM Frameworks

Testing

Python

#llm#gpt4#evaluation

Stay in the loop

Get weekly updates on trending AI coding tools and projects.