Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluatorร—
Clear all

Showing 81-100 of 222 projects

microsoft/promptbench

A unified evaluation framework for large language models, focused on prompt engineering and model robustness.

2.8K
Active
Python
LLM Frameworks
Testing
Python
#large-language-models#prompt-engineering#evaluation

FranxYao/chain-of-thought-hub

This repository provides a benchmark for evaluating the complex reasoning ability of large language models using chain-of-thought prompting.

2.8K
Archived
Jupyter Notebook
LLM Frameworks
Tutorials & Courses
Jupyter Notebook
#llm#benchmarking#reasoning

lmnr-ai/lmnr

Laminar is an open-source observability platform purpose-built for AI agents and workflows.

2.7K
Active
TypeScript
Agents & Orchestration
LLM Observability
TypeScript
#ai#observability#llm

handsontable/hyperformula

HyperFormula is an open-source headless spreadsheet engine for building business web apps with features like formulas, CRUD, and more.

2.6K
Active
TypeScript
Spreadsheet
Frontend Frameworks
React
#headless-spreadsheet#calculation-engine#evaluator

young-geng/EasyLM

EasyLM is a one-stop solution for pre-training, fine-tuning, evaluating, and serving large language models (LLMs) in JAX/Flax.

2.5K
Archived
Python
LLM Frameworks
LLM Wrappers & SDKs
JAX
#large-language-models#natural-language-processing#chatbot

Hexagon/croner

A TypeScript library for scheduling tasks and evaluating cron expressions with no dependencies.

2.5K
Active
TypeScript
Scheduling & Calendars
Background Jobs
#cron#scheduling#tasks

modelscope/evalscope

A streamlined framework for efficient evaluation and performance benchmarking of large models like LLMs and VLMs.

2.5K
Active
Python
LLM Frameworks
Testing
Python
#evaluation#benchmarking#llm

huggingface/evaluate

Evaluate is a library for easily evaluating machine learning models and datasets.

2.4K
Active
Python
React
#evaluation#machine-learning#model-evaluation

pydata/numexpr

A fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more.

2.4K
Stable
Python
Databases
CLI Tools
NumPy
#numerical-computing#arrays#optimization

uptrain-ai/uptrain

An open-source platform for evaluating and improving Generative AI applications with 20+ preconfigured checks and root cause analysis.

2.3K
Archived
Python
LLM Frameworks
Testing
Python
#llm-eval#prompt-engineering#root-cause-analysis

openlit/openlit

An open-source platform for AI engineering with LLM observability, GPU monitoring, and prompt management tools.

2.3K
Active
Python
LLM Frameworks
Monitoring
Python
#ai-observability#gpu-monitoring#llmops

dynamicexpresso/DynamicExpresso

C# expressions interpreter that allows evaluating dynamic expressions at runtime.

2.2K
Stable
C#
API Frameworks
CLI Tools
C#
#c-sharp#expression-evaluator#lambda-expressions

Cloud-CV/EvalAI

Open-source platform for evaluating state-of-the-art in AI and machine learning models and challenges.

2.0K
Active
Python
Agents & Orchestration
API Frameworks
Django
#ai#machine-learning#challenges

JDAI-CV/FaceX-Zoo

A PyTorch toolbox for advanced face recognition tasks like masked face recognition and fairness evaluation.

2.0K
Archived
Python
Computer Vision
PyTorch
#face-recognition#masked-face-recognition#fairness-evaluation

salesforce/awd-lstm-lm

A PyTorch toolkit for training and evaluating LSTM and QRNN language models.

2.0K
Archived
Python
LLM Frameworks
API Frameworks
PyTorch
#language-model#lstm#qrnn

clementchadebec/benchmark_VAE

A benchmark for evaluating different implementations of Variational Autoencoders (VAEs) in PyTorch.

2.0K
Archived
Python
Benchmarking
Variational Autoencoders
PyTorch
#benchmarking#vae#pytorch

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models with human-validated, high-quality, cheap, and fast evaluation.

2.0K
Stable
Jupyter Notebook
LLM Frameworks
Evaluation
Jupyter Notebook
#deep-learning#foundation-models#large-language-models

pykeen/pykeen

A Python library for learning and evaluating knowledge graph embeddings

2.0K
Stable
Python
Knowledge Graphs
Databases
Python
#knowledge-graph-embeddings#link-prediction#machine-learning

1517005260/graph-rag-agent

An integrated solution for building and evaluating knowledge graphs using AI tools like GraphRAG and LightRAG.

1.9K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#agentic-rag#chain-of-exploration#deepresearch

tsale/EDR-Telemetry

This Python project aims to compare and evaluate the telemetry of various EDR (Endpoint Detection and Response) products.

1.9K
Active
Python
API Frameworks
Testing
#security#monitoring#telemetry
1...46...12

Stay in the loop

Get weekly updates on trending AI coding tools and projects.