Explore Projects

Discover 9 open source projects

Active filters (1):
Search: document-analysisร—
Clear all

Showing 1-9 of 9 projects

opendatalab/MinerU

Converts complex documents into LLM-ready formats for agentic workflows

55.5K
Active
Python
Agents & Orchestration
Agent Coordination
Python
#document-analysis#pdf-extraction#llm-workflows

bytedance/Dolphin

Dolphin is a document image parsing library that uses heterogeneous anchor prompting for OCR and layout analysis.

8.9K
Stable
Python
Computer Vision
API Frameworks
Python
#document-analysis#layout-analysis#ocr

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.

3.7K
Active
Python
Agents & Orchestration
ETL & Pipelines
Python
#agents#data-pipelines#document-processing

UglyToad/PdfPig

A C# library for reading and extracting text and other content from PDF files, ported from the Java PDFBox library.

2.4K
Stable
C#
API Frameworks
Databases
#pdf#pdf-extraction#document-analysis

NanoNets/docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit.

1.9K
Stable
Python
Computer Vision
API Frameworks
Python
#document-analysis#document-data-extraction#ocr-benchmark

AlibabaResearch/AdvancedLiterateMachinery

An innovative AI-powered document understanding and OCR platform from Alibaba Research.

1.8K
Experimental
C++
Computer Vision
Document Intelligence
#ocr#document-recognition#document-understanding

tstanislawek/awesome-document-understanding

A curated list of resources for Document Understanding (DU) related to machine learning and natural language processing.

1.5K
Archived
Computer Vision
Natural Language Processing
#document-understanding#pdf-processing#ocr

DocumindHQ/documind

Open-source platform for extracting structured data from documents using AI.

1.5K
Experimental
JavaScript
React
#document-extraction#pdf-extractor#ai

Topdu/OpenOCR

An open-source toolkit for general OCR research and applications, with integrated training, evaluation, and production-ready OCR systems.

1.3K
Active
Python
Computer Vision
Backend Frameworks
PyTorch
#ocr#document-processing#computer-vision

Stay in the loop

Get weekly updates on trending AI coding tools and projects.