Explore Projects

Discover 8 open source projects

Active filters (1):
Search: pdf-parserร—
Clear all

Showing 1-8 of 8 projects

PaddlePaddle/PaddleOCR

PaddleOCR converts documents/images to structured data for AI apps

71.6K
Active
Python
Computer Vision
MCP Servers
PaddlePaddle
#ocr#document-parsing#ai4science

opendatalab/MinerU

Converts complex documents into LLM-ready formats for agentic workflows

55.5K
Active
Python
Agents & Orchestration
Agent Coordination
Python
#document-analysis#pdf-extraction#llm-workflows

py-pdf/pypdf

A pure-Python library for manipulating PDF documents, including splitting, merging, cropping, and transforming pages.

9.8K
Active
Python
API Frameworks
#pdf#pdf-manipulation#pdf-parser

bytedance/Dolphin

Dolphin is a document image parsing library that uses heterogeneous anchor prompting for OCR and layout analysis.

8.9K
Stable
Python
Computer Vision
API Frameworks
Python
#document-analysis#layout-analysis#ocr

opendataloader-project/opendataloader-pdf

Fast local PDF-to-Markdown/JSON converter for RAG pipelines. No GPU needed.

1.8K
Active
Java
RAG Frameworks
RAG & Vector
Java
#pdf-parser#rag-pipeline#markdown-conversion

yobix-ai/extractous

Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.

1.7K
Archived
Rust
ETL & Pipelines
ETL & Pipelines
Rust
#data-extraction#unstructured-data#etl

dromara/yft-design

A powerful, feature-rich online design tool built with Vue3, fabric.js, and Element Plus for creating posters, product images, and more.

1.5K
Stable
TypeScript
Component Libraries (Vue/Svelte)
Frontend Frameworks
Vue.js
#canvas-editor#online-design#online-editor

NanoNets/docstrange

An intelligent document parsing tool that extracts and converts data from various document formats to structured data like Markdown, JSON, CSV, and HTML.

1.4K
Stable
Python
LLM Wrappers & SDKs
API Frameworks
Python
#ocr#pdf-parser#document-parsing

Stay in the loop

Get weekly updates on trending AI coding tools and projects.