deanmalmgren/textract

A Python library that provides a simple and unified interface for extracting text from any document format.

HTML
Data & Databases
ETL & Pipelines
MIT

4.5K

Stars

663

Forks

Jul 3, 2014

Created

Feb 4, 2026

Last Updated

Project Analytics

Stars Growth (1 Month)

+39

+0.9% change

Avg Daily Growth (1 Month)

+1.4

stars per day

Fork/Star Ratio (All Time)

14.8%

Good engagement

Lifetime Growth

1.0

stars/day over 4.3K days

Stars Over Time

Forks Over Time

Open Issues Over Time

Pull Requests Over Time

Commits Over Time

AI-Generated Tags

text-extraction
pdf
docx
ocr
data-pipeline
pdf-reader
document-processing

Comments (0)

Sign in to leave a comment or vote

Sign In

No comments yet. Be the first to comment!

Stay in the loop

Get weekly updates on trending AI coding tools and projects.