Explore Projects

Discover 386 open source projects

Active filters (1):
Search: extractร—
Clear all

Showing 161-180 of 386 projects

meltano/meltano

Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.

2.4K
Active
Python
ETL & Pipelines
API Frameworks
Python
#data-integration#data-pipelines#etl

landing-ai/agentic-doc

A Python library for extracting structured data from documents using advanced AI techniques.

2.4K
Active
Python
LLM Frameworks
API Clients & Testing
#document-extraction#ocr#information-extraction

heucoder/dimensionality_reduction_alo_codes

Dimensionality reduction algorithms implemented in Python

2.4K
Archived
Python
React
#dimensionality-reduction#data-reduction#python-implementation

microsoft/PIKE-RAG

PIKE-RAG is a specialized knowledge extraction and reasoning framework for AI-powered applications.

2.4K
Stable
Python
RAG & Vector
RAG Frameworks
#knowledge-extraction#reasoning#industrial-ai

lucasjinreal/weibo_terminater

A powerful Python-based web scraper for extracting data from Weibo, a popular Chinese social media platform.

2.3K
Archived
Python
Backend & APIs
Scraping & ETL
Python
#web-scraper#social-media-data#chinese-corpus

chezou/tabula-py

A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.

2.3K
Archived
Python
Databases
Backend Frameworks
Python
#pdf#tabula#pandas

feature-engine/feature_engine

Open-source Python library for feature engineering and selection, compatible with scikit-learn.

2.2K
Active
Python
Feature Engineering
ORMs & Query Builders
scikit-learn
#feature-engineering#feature-extraction#feature-selection

microsoft/Microsoft365DSC

A PowerShell module that manages and configures Microsoft 365 tenant configurations.

2.2K
Active
PowerShell
API Frameworks
Infrastructure as Code
#microsoft365#configuration-as-code#devops

login-securite/lsassy

A Python library for remotely extracting credentials from the Windows Local Security Authority Subsystem Service (lsass).

2.2K
Stable
Python
Security Research
Authentication
#windows#credentials#lsass

ckreibich/scholar.py

A Python parser for Google Scholar, allowing developers to extract information from scholarly articles.

2.2K
Archived
Python
Authentication
React
#scholarly-articles#information-extraction#google-scholar

ageitgey/node-unfluff

A Node.js library for automatically extracting content from HTML documents.

2.2K
Archived
HTML
Backend Frameworks
CLI Tools
Node.js
#html-parsing#web-scraping#content-extraction

tobi/delayed_job

A Ruby library for handling asynchronous background jobs with database-backed priority queues.

2.2K
Archived
Ruby
Background Jobs
#background-jobs#queue#priority-queue

cthackers/adm-zip

A JavaScript library for creating and extracting ZIP files, usable in both memory and on disk.

2.2K
Experimental
JavaScript
General Utilities
API Frameworks
Node
#zip#compression#file-management

brightdata/brightdata-mcp

A powerful MCP server that provides an all-in-one solution for public web access and data extraction.

2.2K
Active
JavaScript
MCP Servers
Backend Frameworks
Node.js
#mcp#web-scraping#data-extraction

php-embed/Embed

A PHP library that allows developers to easily extract metadata from any web service or page.

2.1K
Stable
PHP
API Clients & Testing
Backend Frameworks
#oembed#opengraph#scraping

minimaxir/facebook-page-post-scraper

A Python scraper for extracting data from Facebook Page posts for statistical analysis.

2.1K
Archived
Python
API Clients & Testing
Backend Frameworks
Python
#facebook#scraper#data-analysis

Achno/gowall

A go-based tool to process images with features like color palette extraction, OCR, upscaling, and more.

2.0K
Active
Go
Computer Vision
Backend Frameworks
#image-processing#color-palette#ocr

baidu/Senta

Open-source sentiment analysis system for aspect-level sentiment classification and opinion target extraction.

2.0K
Archived
Python
NLP
API Frameworks
PaddlePaddle
#sentiment-analysis#aspect-level-sentiment#opinion-target-extraction

symfony/http-client-contracts

A set of HTTP client abstractions for building reusable HTTP clients in PHP.

2.0K
Active
PHP
API Frameworks
HTTP Clients
Symfony
#http-client#api#symfony

harry0703/AudioNotes

A Python library that quickly extracts structured Markdown notes from audio and video content.

2.0K
Archived
Python
AI Voice & Speech
CLI Tools
Python
#asr#transcription#note-taking
1...810...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.