Explore Projects

Discover 385 open source projects

Active filters (1):
Search: extractร—
Clear all

Showing 41-60 of 385 projects

rednote-hilab/dots.ocr

A multilingual document layout parsing model that can extract text, images, and structure from documents in a single vision-language model.

7.9K
Stable
Python
Computer Vision
Component Libraries (React)
React
#document-parsing#ocr#layout-extraction

tabulapdf/tabula

Tabula is a tool for extracting data from PDF files, allowing developers to easily parse and extract tables.

7.3K
Experimental
CSS
API Frameworks
ETL & Pipelines
#pdf#scraping#data-extraction

alirezamika/autoscraper

A powerful, lightweight web scraping library for Python that can automate data extraction from websites.

7.1K
Experimental
Python
Backend & APIs
CLI Tools
Python
#web-scraping#automation#data-extraction

deeppavlov/DeepPavlov

An open-source library for deep learning end-to-end dialog systems and chatbots.

7.0K
Stable
Python
TensorFlow
#deep-learning#dialogue-agents#natural-language-processing

Momo707577045/m3u8-downloader

A JavaScript library for extracting and downloading m3u8 videos from online sources.

6.9K
Experimental
JavaScript
API Clients & Testing
Frontend Frameworks
JavaScript
#m3u8#video-downloader#blob

bda-research/node-crawler

A NodeJS-based web crawler/spider for extracting data from websites using cheerio and jQuery.

6.8K
Experimental
TypeScript
React
#crawler#data-extraction#javascript

isnowfy/snownlp

A Python library for processing Chinese text, including sentiment analysis, keyword extraction, and more.

6.6K
Archived
Python
NLP Frameworks
API Frameworks
#chinese-nlp#sentiment-analysis#keyword-extraction

kreuzberg-dev/kreuzberg

A polyglot document intelligence framework with a Rust core for extracting text, metadata, and structured information from various file formats.

6.6K
Active
HTML
API Clients & Testing
API Documentation
#document-intelligence#metadata-extraction#pdf-extraction

axa-group/nlp.js

A JavaScript NLP library for building bots with entity extraction, sentiment analysis, and more.

6.6K
Archived
JavaScript
React
#natural-language-processing#nlp#javascript

briangonzalez/jquery.adaptive-backgrounds.js

A jQuery plugin for extracting the dominant color from images and applying it to their parent.

6.5K
Archived
JavaScript
Animation & Motion
General Utilities
jQuery
#image-processing#color-extraction#css-styling

DerekYRC/mini-spring

A simplified version of the Spring framework that helps you learn Spring's core principles and source code

6.3K
Stable
Java
API Frameworks
Learning & Education
Spring
#spring#springboot#framework

cloudquery/cloudquery

Data pipelines for cloud config and security data, enabling CSPM, FinOps, and vulnerability management solutions.

6.3K
Active
Go
API Frameworks
ETL & Pipelines
Go
#cloud#security#data-engineering

MontFerret/ferret

Declarative web scraping library written in Go, providing a powerful DSL for extracting data from websites.

5.9K
Stable
Go
Backend Frameworks
CLI Tools
#web-scraping#crawler#data-mining

matthewmueller/x-ray

A versatile and powerful web scraping library for JavaScript, designed to help developers extract data from the web with ease.

5.9K
Active
JavaScript
Frontend Frameworks
API Frameworks
Node.js
#web-scraping#data-extraction#crawling

Trusted-AI/adversarial-robustness-toolbox

A Python library for machine learning security, providing tools for adversarial attacks and defenses.

5.9K
Stable
Python
AI SDKs & Wrappers
Security Research
Python
#adversarial-attacks#adversarial-examples#machine-learning-security

postlight/parser

A parser library to extract meaningful content from web pages, built with a focus on performance and extensibility.

5.8K
Archived
JavaScript
Frontend Frameworks
API Frameworks
React
#parsing#web-scraping#html-extraction

vi3k6i5/flashtext

A powerful Python library for keyword extraction and text processing for natural language tasks.

5.7K
Experimental
Python
NLP
Data Extraction
#keyword-extraction#text-processing#nlp

firecrawl/firecrawl-mcp-server

Firecrawl MCP Server adds powerful web scraping and search capabilities to AI language models like Cursor and Claude.

5.7K
Active
JavaScript
MCP Servers
LLM Wrappers & SDKs
JavaScript
#web-scraping#search-api#llm-integration

adbar/trafilatura

Gathers text and metadata from the web using crawling, scraping, and extraction techniques.

5.4K
Stable
Python
React
#web-scraping#text-extraction#metadata-gathering

allure-framework/allure2

Allure Report is a flexible, lightweight multi-language test reporting tool that provides clear graphical reports.

5.3K
Active
Java
Testing
Documentation
#reporting#test-reporting#graphical-reports
124...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.