Explore Projects

Discover 6 open source projects

Active filters (1):
Search: extract-dataร—
Clear all

Showing 1-6 of 6 projects

opendatalab/MinerU

Converts complex documents into LLM-ready formats for agentic workflows

55.5K
Active
Python
Agents & Orchestration
Agent Coordination
Python
#document-analysis#pdf-extraction#llm-workflows

pymupdf/PyMuPDF

A high-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other documents.

9.2K
Active
Python
Document Processing
#pdf#data-extraction#text-processing

bda-research/node-crawler

A NodeJS-based web crawler/spider for extracting data from websites using cheerio and jQuery.

6.8K
Experimental
TypeScript
React
#crawler#data-extraction#javascript

meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

2.4K
Active
Python
ETL & Pipelines
API Frameworks
Python
#data-integration#data-pipelines#etl

DocumindHQ/documind

Open-source platform for extracting structured data from documents using AI.

1.5K
Experimental
JavaScript
React
#document-extraction#pdf-extractor#ai

elixir-crawly/crawly

Crawly is a high-level web crawling and scraping framework for Elixir, enabling developers to extract data from websites efficiently.

1.1K
Experimental
Elixir
Backend Frameworks
Caching
#crawler#crawling#scraper

Stay in the loop

Get weekly updates on trending AI coding tools and projects.