Showing 81-100 of 222 projects
A unified evaluation framework for large language models, focused on prompt engineering and model robustness.
This repository provides a benchmark for evaluating the complex reasoning ability of large language models using chain-of-thought prompting.
Laminar is an open-source observability platform purpose-built for AI agents and workflows.
HyperFormula is an open-source headless spreadsheet engine for building business web apps with features like formulas, CRUD, and more.
EasyLM is a one-stop solution for pre-training, fine-tuning, evaluating, and serving large language models (LLMs) in JAX/Flax.
A TypeScript library for scheduling tasks and evaluating cron expressions with no dependencies.
A streamlined framework for efficient evaluation and performance benchmarking of large models like LLMs and VLMs.
Evaluate is a library for easily evaluating machine learning models and datasets.
A fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more.
An open-source platform for evaluating and improving Generative AI applications with 20+ preconfigured checks and root cause analysis.
An open-source platform for AI engineering with LLM observability, GPU monitoring, and prompt management tools.
C# expressions interpreter that allows evaluating dynamic expressions at runtime.
Open-source platform for evaluating state-of-the-art in AI and machine learning models and challenges.
A PyTorch toolbox for advanced face recognition tasks like masked face recognition and fairness evaluation.
A PyTorch toolkit for training and evaluating LSTM and QRNN language models.
A benchmark for evaluating different implementations of Variational Autoencoders (VAEs) in PyTorch.
An automatic evaluator for instruction-following language models with human-validated, high-quality, cheap, and fast evaluation.
A Python library for learning and evaluating knowledge graph embeddings
An integrated solution for building and evaluating knowledge graphs using AI tools like GraphRAG and LightRAG.
This Python project aims to compare and evaluate the telemetry of various EDR (Endpoint Detection and Response) products.
Get weekly updates on trending AI coding tools and projects.