Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluationร—
Clear all

Showing 61-80 of 222 projects

Agenta-AI/agenta

An open-source LLMOps platform for prompt playground, prompt management, LLM evaluation, and LLM observability.

3.9K
Active
TypeScript
LLM Frameworks
LLM Wrappers & SDKs
React
#llm-platform#prompt-engineering#prompt-management

PrimeIntellect-ai/verifiers

A Python library for reinforcement learning environments and evaluations targeted at AI-focused developers.

3.9K
Active
Python
Agents & Orchestration
Python
#reinforcement-learning#ai-tools#evaluation

open-compass/VLMEvalKit

Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.

3.9K
Active
Python
LLM Frameworks
LLM Wrappers & SDKs
PyTorch
#chatgpt#llm#multi-modal

mseitzer/pytorch-fid

A PyTorch library for computing Frรฉchet Inception Distance (FID), a metric used to evaluate generative adversarial networks.

3.8K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#deep-learning#fid#fid-score

EvolvingLMMs-Lab/lmms-eval

A multimodal evaluation toolkit for assessing AI models across text, image, video, and audio tasks.

3.8K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#evaluation#multimodal#large-language-models

portfolio-performance/portfolio

A Java library for tracking and evaluating the performance of investment portfolios across stocks, crypto, and other assets.

3.7K
Active
Java
API Frameworks
ORMs & Query Builders
#investment#portfolio#finance

nmslib/nmslib

An efficient similarity search library and toolkit for evaluating k-NN methods in non-metric spaces.

3.6K
Active
C++
Computer Vision
Vector Databases
#k-nn#similarity-search#non-metric

homenc/HElib

HElib is an open-source C++ library for homomorphic encryption, supporting BGV and CKKS schemes.

3.2K
Archived
C++
Privacy Tools
Encryption
#cryptography#encryption#privacy-enhancing-technologies

conanhujinming/comments-for-awesome-courses

A GitHub repository providing comments for awesome courses on public universities' course evaluations.

3.2K
Archived
Python
Python
#comments#course-evaluation#public-university

THUDM/AgentBench

A comprehensive benchmark to evaluate large language models (LLMs) as agents for various tasks.

3.2K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#chatgpt#gpt-4#llm

openai/human-eval

A Python library for evaluating the capabilities of large language models trained on code.

3.2K
Archived
Python
LLM Frameworks
#language-model#code-generation#evaluation

embeddings-benchmark/mteb

MTEB is a benchmark for evaluating and comparing text embedding models across multiple tasks and languages.

3.2K
Active
Python
LLM Wrappers & SDKs
Search
Python
#benchmark#text-embedding#multilingual-nlp

viebel/klipse

Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs, supporting various programming languages.

3.1K
Archived
HTML
Component Libraries (React)
Frontend Frameworks
React
#code-editor#interactive-snippets#code-evaluation

langwatch/langwatch

An open platform for managing, monitoring, and optimizing large language models (LLMs) and AI workflows.

3.0K
Active
TypeScript
LLM Frameworks
LLM Wrappers & SDKs
TypeScript
#llm-ops#observability#prompt-engineering

Cartucho/mAP

A Python library for evaluating the performance of neural networks for object detection.

3.0K
Archived
Python
Computer Vision
Testing
#object-detection#neural-network#evaluation

ianarawjo/ChainForge

An open-source visual programming environment for battle-testing prompts to large language models.

3.0K
Active
TypeScript
Prompt Engineering
LLM Frameworks
TypeScript
#ai#evaluation#large-language-models

FreedomIntelligence/LLMZoo

Provides data, models, and evaluation benchmark for large language models.

3.0K
Archived
Python
React
#LLM#Large Language Models#Benchmarking

google-research/t5x

T5X is a flexible and extensible framework for training and evaluating T5 models, a popular family of language models.

3.0K
Active
Python
LLM Frameworks
API Frameworks
Python
#language-models#training#evaluation

google/cel-go

Fast, portable expression evaluator with gradual typing for safe, non-Turing complete scripting in Go

2.9K
Active
Go
General Utilities
API Frameworks
Go
#expression-language#expression-evaluator#cel

microsoft/table-transformer

Deep learning model for extracting & analyzing table structures from PDFs and images with datasets.

2.9K
Archived
Python
Computer Vision
ETL & Pipelines
PyTorch
#table-extraction#computer-vision#document-processing
1...35...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.