Explore Projects

Discover 222 open source projects

Active filters (1):

Search: evaluator×

Clear all

Showing 81-100 of 222 projects

microsoft/promptbench

A unified evaluation framework for large language models, focused on prompt engineering and model robustness.

2.8K

Active

Python

LLM Frameworks

Testing

Python

#large-language-models#prompt-engineering#evaluation

FranxYao/chain-of-thought-hub

This repository provides a benchmark for evaluating the complex reasoning ability of large language models using chain-of-thought prompting.

2.8K

Archived

Jupyter Notebook

LLM Frameworks

Tutorials & Courses

Jupyter Notebook

#llm#benchmarking#reasoning

lmnr-ai/lmnr

Laminar is an open-source observability platform purpose-built for AI agents and workflows.

2.7K

Active

TypeScript

Agents & Orchestration

LLM Observability

TypeScript

#ai#observability#llm

handsontable/hyperformula

HyperFormula is an open-source headless spreadsheet engine for building business web apps with features like formulas, CRUD, and more.

2.6K

Active

TypeScript

Spreadsheet

Frontend Frameworks

React

#headless-spreadsheet#calculation-engine#evaluator

young-geng/EasyLM

EasyLM is a one-stop solution for pre-training, fine-tuning, evaluating, and serving large language models (LLMs) in JAX/Flax.

2.5K

Archived

Python

LLM Frameworks

LLM Wrappers & SDKs

JAX

#large-language-models#natural-language-processing#chatbot

Hexagon/croner

A TypeScript library for scheduling tasks and evaluating cron expressions with no dependencies.

2.5K

Active

TypeScript

Scheduling & Calendars

Background Jobs

#cron#scheduling#tasks

modelscope/evalscope

A streamlined framework for efficient evaluation and performance benchmarking of large models like LLMs and VLMs.

2.5K

Active

Python

LLM Frameworks

Testing

Python

#evaluation#benchmarking#llm

huggingface/evaluate

Evaluate is a library for easily evaluating machine learning models and datasets.

2.4K

Active

Python

React

#evaluation#machine-learning#model-evaluation

pydata/numexpr

A fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more.

2.4K

Stable

Python

Databases

CLI Tools

NumPy

#numerical-computing#arrays#optimization

uptrain-ai/uptrain

An open-source platform for evaluating and improving Generative AI applications with 20+ preconfigured checks and root cause analysis.

2.3K

Archived

Python

LLM Frameworks

Testing

Python

#llm-eval#prompt-engineering#root-cause-analysis

openlit/openlit

An open-source platform for AI engineering with LLM observability, GPU monitoring, and prompt management tools.

2.3K

Active

Python

LLM Frameworks

Monitoring

Python

#ai-observability#gpu-monitoring#llmops

dynamicexpresso/DynamicExpresso

C# expressions interpreter that allows evaluating dynamic expressions at runtime.

2.2K

Stable

API Frameworks

CLI Tools

#c-sharp#expression-evaluator#lambda-expressions

Cloud-CV/EvalAI

Open-source platform for evaluating state-of-the-art in AI and machine learning models and challenges.

2.0K

Active

Python

Agents & Orchestration

API Frameworks

Django

#ai#machine-learning#challenges

JDAI-CV/FaceX-Zoo

A PyTorch toolbox for advanced face recognition tasks like masked face recognition and fairness evaluation.

2.0K

Archived

Python

Computer Vision

PyTorch

#face-recognition#masked-face-recognition#fairness-evaluation

salesforce/awd-lstm-lm

A PyTorch toolkit for training and evaluating LSTM and QRNN language models.

2.0K

Archived

Python

LLM Frameworks

API Frameworks

PyTorch

#language-model#lstm#qrnn

clementchadebec/benchmark_VAE

A benchmark for evaluating different implementations of Variational Autoencoders (VAEs) in PyTorch.

2.0K

Archived

Python

Benchmarking

Variational Autoencoders

PyTorch

#benchmarking#vae#pytorch

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models with human-validated, high-quality, cheap, and fast evaluation.

2.0K

Stable

Jupyter Notebook

LLM Frameworks

Evaluation

Jupyter Notebook

#deep-learning#foundation-models#large-language-models

pykeen/pykeen

A Python library for learning and evaluating knowledge graph embeddings

2.0K

Stable

Python

Knowledge Graphs

Databases

Python

#knowledge-graph-embeddings#link-prediction#machine-learning

1517005260/graph-rag-agent

An integrated solution for building and evaluating knowledge graphs using AI tools like GraphRAG and LightRAG.

1.9K

Stable

Python

LLM Frameworks

Agents & Orchestration

Python

#agentic-rag#chain-of-exploration#deepresearch

tsale/EDR-Telemetry

This Python project aims to compare and evaluate the telemetry of various EDR (Endpoint Detection and Response) products.

1.9K

Active

Python

API Frameworks

Testing

#security#monitoring#telemetry

1...46...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.