Showing 1-9 of 9 projects
Converts complex documents into LLM-ready formats for agentic workflows
Dolphin is a document image parsing library that uses heterogeneous anchor prompting for OCR and layout analysis.
A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.
A C# library for reading and extracting text and other content from PDF files, ported from the Java PDFBox library.
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit.
An innovative AI-powered document understanding and OCR platform from Alibaba Research.
A curated list of resources for Document Understanding (DU) related to machine learning and natural language processing.
Open-source platform for extracting structured data from documents using AI.
An open-source toolkit for general OCR research and applications, with integrated training, evaluation, and production-ready OCR systems.
Get weekly updates on trending AI coding tools and projects.