Showing 1-19 of 19 projects
dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.
Refine high-quality datasets and visual AI models with this Python library for active learning and data curation.
Builds a Neo4j graph from unstructured data using LLMs
A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.
A fast and simple framework for building neural data processing pipelines using Python.
This GitHub repository provides a Bootcamp for dealing with unstructured data like reverse image search, audio search, and NLP.
Instill Core is an open-source AI infrastructure tool for orchestrating data, models, and pipelines to build AI-powered applications.
Nomic Developer API SDK is a Python library that provides tools for clustering, duplicate detection, embeddings, and topic modeling on unstructured data.
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit.
A Python library for extracting data and LLM outputs from various document types with ease.
A high-performance, MySQL-compatible vector database that supports structured and unstructured data for AI-driven applications.
Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.
A Python library that uses LLMs and embeddings to process datasets with up to 1000x speedups
A Python library that helps developers extract structured data from tricky documents using vision-language models.
A curated list of resources for Document Understanding (DU) related to machine learning and natural language processing.
A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.
Interactively explore unstructured datasets like audio, images, and video using this TypeScript library.
An enterprise-grade, API-first LLM workspace for unstructured document processing, with features like data extraction, redaction, and prompt engineering.
An open-source Python library that helps curate better data for large language models (LLMs).
Get weekly updates on trending AI coding tools and projects.