Explore Projects

Discover 222 open source projects

Active filters (1):
Search: evaluationsร—
Clear all

Showing 201-220 of 222 projects

kickstarter/kickstarter-autodesk-3d

A Kickstarter project that provides an Autodesk 3D printer evaluation for developers.

1.1K
Archived
3D Printing
#3d-printing#autodesk#kickstarter

google-research-datasets/natural-questions

A dataset of real user questions and answers for training and evaluating question answering systems.

1.1K
Archived
Python
LLM Frameworks
Datasets
#dataset#question-answering#natural-language-processing

ezylang/EvalEx

EvalEx is a Java library for evaluating simple mathematical and boolean expressions.

1.1K
Active
Java
API Clients & Testing
API Frameworks
#expression-evaluator#expression-parser#math-expressions

rlancemartin/auto-evaluator

Evaluation tool for building and testing LLM-powered QA chains in Python.

1.1K
Archived
Python
LLM Frameworks
API Frameworks
Python
#llm#qa#testing

OpenGVLab/ScaleCUA

Open-source computer use agents that can operate on cross-platform environments for AI-focused developers.

1.1K
Active
Python
Agents & Orchestration
CLI Tools
Python
#computer-use-agents#cross-platform#data

EmbeddedLLM/JamAIBase

A collaborative spreadsheet-like platform for building and experimenting with AI applications.

1.1K
Active
Python
LLM Frameworks
Agents & Orchestration
Svelte
#ai#llm#orchestration

LiveBench/LiveBench

A challenging, contamination-free benchmark for large language models (LLMs) to evaluate their performance.

1.1K
Active
Python
LLM Frameworks
#llm#benchmark#evaluation

hanjuku-kaso/awesome-offline-rl

An index of algorithms for offline reinforcement learning (offline-rl) targeting AI and ML researchers.

1.1K
Archived
Reinforcement Learning
Tutorials & Courses
#offline-reinforcement-learning#reinforcement-learning#machine-learning

JuliaDocs/Franklin.jl

A simple, customizable, and fast static site generator in the Julia programming language.

1.1K
Stable
Julia
Static Site Generators
API Frameworks
Julia
#julia-language#markdown-parser#katex

ncalc/ncalc

A fast and lightweight .NET expression evaluator library for math and logical operations.

1.1K
Active
C#
API Frameworks
CLI Tools
dotnet
#math#expressions#parser

prometheus-eval/prometheus-eval

A Python library to evaluate the response of large language models like GPT-4 using Prometheus metrics.

1.1K
Experimental
Python
LLM Frameworks
Testing
Python
#llm#gpt4#evaluation

carlini/yet-another-applied-llm-benchmark

A benchmark to evaluate language models on various tasks, useful for vibe coders building AI-powered apps.

1.0K
Experimental
Python
LLM Frameworks
Testing
Python
#language-models#evaluation#benchmark

yangxudong/deeplearning

This Python repository provides code for training, evaluating, and predicting deep learning models.

1.0K
Archived
Python
LLM Frameworks
API Frameworks
None
#deep-learning#model-training#model-evaluation

mauriciopoppe/function-plot

A powerful 2D function plotting library for developers to visualize mathematical expressions and data.

1.0K
Active
TypeScript
Charts & Visualization
React
#function-plotting#graph-visualization#mathematical-visualization

jonrau1/ElectricEye

A Python CLI tool for multi-cloud and multi-SaaS asset management, security posture monitoring, and attack surface reduction.

1.0K
Active
Python
Security Engineering
CLI Tools
Python
#asset-management#security-auditing#cloud-security

princeton-vl/RAFT-Stereo

RAFT-Stereo is a PyTorch library for training and evaluating stereo matching models.

1.0K
Archived
Python
Computer Vision
Backend Frameworks
PyTorch
#stereo-matching#computer-vision#pytorch

bigcode-project/bigcode-evaluation-harness

A framework for evaluating autoregressive code generation language models for developers building AI-powered coding tools.

1.0K
Experimental
Python
LLM Frameworks
Testing
Python
#code-generation#model-evaluation#autoregressive

soulverteam/SoulverCore

A powerful Swift framework for evaluating natural language math expressions

1.0K
Stable
Swift
CLI Tools
Date & Time
Swift
#calculator#currency#date-time

JudgmentLabs/judgeval

An open-source post-building layer for AI agents, providing environment data and evaluations to power agent post-training and monitoring.

1.0K
Active
Python
Agents & Orchestration
LLM Frameworks
Python
#agent#agentic-ai#llm-evaluation

pytorch/benchmark

An open-source benchmark suite for evaluating PyTorch performance across various use cases.

1.0K
Active
Python
Inference
Testing
PyTorch
#benchmark#pytorch#performance
1...1012

Stay in the loop

Get weekly updates on trending AI coding tools and projects.