Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluation×

Clear all

Showing 61-80 of 222 projects

Agenta-AI/agenta

An open-source LLMOps platform for prompt playground, prompt management, LLM evaluation, and LLM observability.

3.9K

Active

TypeScript

LLM Frameworks

LLM Wrappers & SDKs

React

#llm-platform#prompt-engineering#prompt-management

PrimeIntellect-ai/verifiers

A Python library for reinforcement learning environments and evaluations targeted at AI-focused developers.

3.9K

Active

Python

Agents & Orchestration

Python

#reinforcement-learning#ai-tools#evaluation

open-compass/VLMEvalKit

Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.

3.9K

Active

Python

LLM Frameworks

LLM Wrappers & SDKs

PyTorch

#chatgpt#llm#multi-modal

mseitzer/pytorch-fid

A PyTorch library for computing Fréchet Inception Distance (FID), a metric used to evaluate generative adversarial networks.

3.8K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#deep-learning#fid#fid-score

EvolvingLMMs-Lab/lmms-eval

A multimodal evaluation toolkit for assessing AI models across text, image, video, and audio tasks.

3.8K

Active

Python

LLM Frameworks

Agents & Orchestration

Python

#evaluation#multimodal#large-language-models

portfolio-performance/portfolio

A Java library for tracking and evaluating the performance of investment portfolios across stocks, crypto, and other assets.

3.7K

Active

Java

API Frameworks

ORMs & Query Builders

#investment#portfolio#finance

nmslib/nmslib

An efficient similarity search library and toolkit for evaluating k-NN methods in non-metric spaces.

3.6K

Active

C++

Computer Vision

Vector Databases

#k-nn#similarity-search#non-metric

homenc/HElib

HElib is an open-source C++ library for homomorphic encryption, supporting BGV and CKKS schemes.

3.2K

Archived

C++

Privacy Tools

Encryption

#cryptography#encryption#privacy-enhancing-technologies

conanhujinming/comments-for-awesome-courses

A GitHub repository providing comments for awesome courses on public universities' course evaluations.

3.2K

Archived

Python

#comments#course-evaluation#public-university

THUDM/AgentBench

A comprehensive benchmark to evaluate large language models (LLMs) as agents for various tasks.

3.2K

Stable

Python

LLM Frameworks

Agents & Orchestration

Python

#chatgpt#gpt-4#llm

openai/human-eval

A Python library for evaluating the capabilities of large language models trained on code.

3.2K

Archived

Python

LLM Frameworks

#language-model#code-generation#evaluation

embeddings-benchmark/mteb

MTEB is a benchmark for evaluating and comparing text embedding models across multiple tasks and languages.

3.2K

Active

Python

LLM Wrappers & SDKs

Python

#benchmark#text-embedding#multilingual-nlp

viebel/klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs, supporting various programming languages.

3.1K

Archived

HTML

Component Libraries (React)

Frontend Frameworks

React

#code-editor#interactive-snippets#code-evaluation

langwatch/langwatch

An open platform for managing, monitoring, and optimizing large language models (LLMs) and AI workflows.

3.0K

Active

TypeScript

LLM Frameworks

LLM Wrappers & SDKs

TypeScript

#llm-ops#observability#prompt-engineering

Cartucho/mAP

A Python library for evaluating the performance of neural networks for object detection.

3.0K

Archived

Python

Computer Vision

Testing

#object-detection#neural-network#evaluation

ianarawjo/ChainForge

An open-source visual programming environment for battle-testing prompts to large language models.

3.0K

Active

TypeScript

Prompt Engineering

LLM Frameworks

TypeScript

#ai#evaluation#large-language-models

FreedomIntelligence/LLMZoo

Provides data, models, and evaluation benchmark for large language models.

3.0K

Archived

Python

React

#LLM#Large Language Models#Benchmarking

google-research/t5x

T5X is a flexible and extensible framework for training and evaluating T5 models, a popular family of language models.

3.0K

Active

Python

LLM Frameworks

API Frameworks

Python

#language-models#training#evaluation

google/cel-go

Fast, portable expression evaluator with gradual typing for safe, non-Turing complete scripting in Go

2.9K

Active

General Utilities

API Frameworks

#expression-language#expression-evaluator#cel

microsoft/table-transformer

Deep learning model for extracting & analyzing table structures from PDFs and images with datasets.

2.9K

Archived

Python

Computer Vision

ETL & Pipelines

PyTorch

#table-extraction#computer-vision#document-processing

1...35...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.