Explore Projects

Discover 30 open source projects

Active filters (1):
Search: extractorร—
Clear all

Showing 1-20 of 30 projects

PaddlePaddle/PaddleOCR

PaddleOCR converts documents/images to structured data for AI apps

71.6K
Active
Python
Computer Vision
MCP Servers
PaddlePaddle
#ocr#document-parsing#ai4science

opendatalab/MinerU

Converts complex documents into LLM-ready formats for agentic workflows

55.5K
Active
Python
Agents & Orchestration
Agent Coordination
Python
#document-analysis#pdf-extraction#llm-workflows

YaoFANGUK/video-subtitle-extractor

A Python tool for extracting hard-coded subtitles from videos and generating SRT files using deep learning-based OCR.

8.5K
Stable
Python
Computer Vision
API Frameworks
#ocr#subtitles#srt

MetrolistGroup/Metrolist

A YouTube Music client for Android with a modern Material Design UI and features like NewPipe integration.

7.1K
Active
Kotlin
Material Design
Android
Android
#youtube-music#material-design#android-app

peazip/PeaZip

A free and open-source file archiver and compression tool with support for various archive formats.

7.0K
Stable
Pascal
CLI Tools
General Utilities
#archiver#compression#encryption

microsoft/rushstack

A monorepo for a set of tools developed by the Rush Stack community for TypeScript-based projects.

6.4K
Active
TypeScript
Build Tools
TypeScript
#monorepo#toolchain#api

adbar/trafilatura

Gathers text and metadata from the web using crawling, scraping, and extraction techniques.

5.4K
Stable
Python
React
#web-scraping#text-extraction#metadata-gathering

GeneralNewsExtractor/GeneralNewsExtractor

Python-based AI news extractor beta version

3.8K
Experimental
Python
AI-powered coding assistants
Vibe Coders
#NewsExtractor#VibeCoders#AI-Powered

miso-belica/sumy

A Python module for automatic summarization of text documents and HTML pages.

3.7K
Stable
Python
NLP
Backend Frameworks
Python
#html-extraction#text-summarization#nlp

luigifreda/pyslam

Python/C++ Visual SLAM pipeline for 3D reconstruction

3.1K
Active
Python
Machine Learning & AI Libraries
#pySLAM#Visual SLAM#3D Reconstruction

Purfview/whisper-standalone-win

Standalone Windows executables for Whisper speech-to-text & diarization without Python setup.

2.9K
Stable
Desktop Model Runners
AI Voice & Speech
Whisper
#speech-to-text#whisper#faster-whisper

drewnoakes/metadata-extractor

A Java library for extracting metadata from various media file formats, including images, videos, and audio.

2.8K
Experimental
Java
Libraries & Utilities
Backend Frameworks
#metadata#exif#iptc

nelenkov/android-backup-extractor

An Android backup extractor tool written in Java for developers working with Android devices.

2.5K
Active
Java
Android
CLI Tools
#android#backup#extractor

fhamborg/news-please

news-please is an integrated web crawler and information extractor for news that works out of the box.

2.4K
Stable
Python
API Frameworks
Web Crawlers
#news#web-crawler#data-extraction

UglyToad/PdfPig

A C# library for reading and extracting text and other content from PDF files, ported from the Java PDFBox library.

2.4K
Stable
C#
API Frameworks
Databases
#pdf#pdf-extraction#document-analysis

Achno/gowall

A go-based tool to process images with features like color palette extraction, OCR, upscaling, and more.

2.0K
Active
Go
Computer Vision
Backend Frameworks
#image-processing#color-palette#ocr

extractus/article-extractor

A Node.js library for extracting the main article content from a given URL using the Readability algorithm.

1.9K
Stable
JavaScript
API Frameworks
Backend Frameworks
Node
#article-extraction#web-scraping#readability

TeamNewPipe/NewPipeExtractor

A Java library for extracting data from various streaming platforms like YouTube, SoundCloud, and Bandcamp.

1.8K
Active
Java
API Frameworks
Backend Frameworks
#crawler#extractor#scraper

echohive42/AI-reads-books-page-by-page

An AI-powered tool that extracts knowledge and generates summaries from PDF books, page by page.

1.6K
Archived
Python
LLM Frameworks
API Frameworks
Python
#pdf-extraction#knowledge-extraction#summarization

GravityLabs/goose

A Scala library for extracting HTML content and articles from web pages.

1.5K
Archived
Scala
Backend Frameworks
API Clients & Testing
#html-extraction#content-scraping#web-crawling
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.