Explore Projects

Discover 352 open source projects

Active filters (1):
Search: pdf×
Clear all

Showing 1-20 of 352 projects

justjavac/free-programming-books-zh_CN

免费的编程中文书籍索引,支持多语言和主题

116.4K
Archived
Books & Guides
#books#programming#free

microsoft/markitdown

Converts files and office documents to Markdown for LLMs

90.2K
Stable
Python
MCP Servers
CLI Tools
Python
#markdown-converter#document-conversion#llm-input-preparation

fighting41love/funNLP

Comprehensive Chinese NLP resource collection for developers

79.2K
Archived
Python
LLM Frameworks
RAG & Vector
Python
#nlp#chinese-nlp#ai-resources

Stirling-Tools/Stirling-PDF

Open-source PDF platform for editing, converting, and automating PDFs with desktop, browser, and self-hosted options.

75.0K
Active
TypeScript
CLI Tools
General Utilities
TypeScript
#pdf-editor#pdf-converter#pdf-tools

PaddlePaddle/PaddleOCR

PaddleOCR converts documents/images to structured data for AI apps

71.6K
Active
Python
Computer Vision
MCP Servers
PaddlePaddle
#ocr#document-parsing#ai4science

opendatalab/MinerU

Converts complex documents into LLM-ready formats for agentic workflows

55.5K
Active
Python
Agents & Orchestration
Agent Coordination
Python
#document-analysis#pdf-extraction#llm-workflows

docling-project/docling

Converts documents to AI-ready formats with advanced parsing

55.0K
Active
Python
Computer Vision
CLI Tools
#document-parsing#pdf-converter#ocr

mozilla/pdf.js

JavaScript PDF viewer for web

52.9K
Active
JavaScript
Component Libraries (React)
Frontend Frameworks
JavaScript
#pdf-viewer#javascript#mozilla

hiroi-sora/Umi-OCR

Offline OCR software with batch processing, PDF support, and multi-language recognition.

42.4K
Stable
Python
CLI Tools
Computer Vision
Python
#ocr#python#paddleocr

siyuan-note/siyuan

Privacy-first, self-hosted knowledge management with markdown and AI integrations

41.7K
Active
TypeScript
Full-Stack Frameworks
RAG & Vector
Electron
#knowledge-base#markdown#local-first

paperless-ngx/paperless-ngx

Document management system for scanning, indexing, and archiving documents

37.1K
Active
Python
Collaboration & Real-time
Documentation
Django
#document-management#ocr#machine-learning

ocrmypdf/OCRmyPDF

Adds OCR text layer to scanned PDFs for searchability

32.8K
Active
Python
CLI Tools
Computer Vision
#ocr#pdf-processing#command-line

datalab-to/marker

Converts PDFs to markdown and JSON with high accuracy

32.2K
Active
Python
Computer Vision
CLI Tools
#pdf-to-markdown#document-processing#ai-ocr

PDFMathTranslate/PDFMathTranslate

AI-powered PDF translation preserving formats and math for scientific documents

32.0K
Stable
Python
MCP Servers
LLM Wrappers & SDKs
#pdf-translation#ai-translation#scientific-papers

parallax/jsPDF

Client-side PDF generation library for JavaScript

31.2K
Active
JavaScript
HTTP Clients
#pdf-generation#javascript#client-side

hehonghui/awesome-english-ebooks

英语电子书下载资源合集,包含经济学人、纽约客等杂志的epub、mobi、pdf格式下载

29.4K
Active
CSS
Awesome Lists
#ebooks#english-magazines#pdf

posquit0/Awesome-CV

LaTeX template for creating professional CVs, resumes, and cover letters

26.9K
Active
TeX
Documentation
#latex#cv-template#resume

forthespada/CS-Books

A collection of over 1000 computer science books and resources for learning and interviews.

26.4K
Stable
Books & Guides
Awesome Lists
#cs-books#algorithms#c

koodo-reader/koodo-reader

A cross-platform ebook reader with sync and backup capabilities

26.2K
Active
JavaScript
Cross-Platform
JavaScript
#ebook-reader#cross-platform#sync

koreader/koreader

Ebook reader for e-ink and mobile devices

25.6K
Active
Lua
Cross-Platform
General Utilities
#ebook-reader#lua#eink
2...18

Stay in the loop

Get weekly updates on trending AI coding tools and projects.