Explore Projects

Discover 385 open source projects

Active filters (1):
Search: extractionร—
Clear all

Showing 1-20 of 385 projects

firecrawl/firecrawl

Convert websites into LLM-ready data with API for scraping, crawling, and structured data extraction

88.5K
Active
TypeScript
Web Scraping AI
Agents & Orchestration
TypeScript
#ai-scraping#web-crawler#llm-data

opendatalab/MinerU

Converts complex documents into LLM-ready formats for agentic workflows

55.5K
Active
Python
Agents & Orchestration
Agent Coordination
Python
#document-analysis#pdf-extraction#llm-workflows

naptha/tesseract.js

JavaScript OCR library for image text extraction

37.9K
Active
JavaScript
Computer Vision
General Utilities
Node.js
#ocr#javascript#tesseract

SheetJS/sheetjs

SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.

36.2K
Archived
ETL & Pipelines
General Utilities
#spreadsheet#data-extraction#csv

google/langextract

Extracts structured info from text using LLMs with source grounding

34.3K
Stable
Python
LLM Wrappers & SDKs
Local Inference Engines
Python
#llm#information-extraction#gemini

asgeirtj/system_prompts_leaks

Collection of system prompts from popular chatbots for developers

33.8K
Active
JavaScript
LLM Wrappers & SDKs
#ai#chatbots#system-prompts

D4Vinci/Scrapling

Powerful, flexible Python library for effortless web scraping with AI-powered features.

23.6K
Active
Python
Web Scraping
Backend Frameworks
Python
#web-scraping#automation#data-extraction

ScrapeGraphAI/Scrapegraph-ai

AI-powered web scraping library for extracting data from websites and documents

22.9K
Active
Python
Web Scraping AI
RAG & Vector
Python
#ai-scraping#llm#rag

apify/crawlee

Web scraping and browser automation library for Node.js

22.0K
Active
TypeScript
Browser Automation SDKs
Testing
Node.js
#web-scraping#browser-automation#nodejs

gentilkiwi/mimikatz

Extracts credentials and performs security testing on Windows systems

21.3K
Experimental
C
Penetration Testing
#security#penetration-testing#windows-security

gildas-lormeau/SingleFile

Saves web pages as single HTML files

20.5K
Active
JavaScript
CLI Tools
Frontend Frameworks
#web-archiving#single-file-export#browser-extension

Evil0ctal/Douyin_TikTok_Download_API

A high-performance async web scraping tool for extracting data from Douyin, TikTok, Bilibili and more.

16.5K
Stable
Python
API Frameworks
FastAPI
#api#async#scraper

tidwall/gjson

A fast and efficient JSON parser for Go that allows developers to quickly extract values from JSON data.

15.5K
Archived
Go
API Clients & Testing
#json#golang#parser

getmaxun/maxun

Turn websites into clean data pipelines & structured APIs in minutes with a low-code web scraping tool.

15.2K
Active
TypeScript
API Clients & Testing
React
#web-scraping#automation#no-code

Perfare/AssetStudio

AssetStudio is a tool for exploring, extracting and exporting Unity assets and asset bundles.

15.1K
Archived
C#
CLI Tools
Unity
#unity#unity-assets#asset-bundles

codelucas/newspaper

A Python library for extracting news articles, full-text, and metadata from websites.

15.0K
Stable
HTML
Backend Frameworks
Python
#web-scraping#news-extraction#data-extraction

coderamp-labs/gitingest

A Python tool that generates a prompt-friendly extract of a GitHub codebase by replacing 'hub' with 'ingest' in any GitHub URL.

14.1K
Active
Python
AI Code Ingestion
Python
#code-ingestion#codebase-extraction#prompt-engineering

interagent/http-api-design

A guide for designing HTTP APIs, extracted from work on the Heroku Platform API.

13.7K
Archived
API Documentation
#http-api#api-design#heroku

ReFirmLabs/binwalk

An open-source firmware analysis tool written in Rust that can be used to extract, analyze, and identify embedded firmware components.

13.7K
Stable
Rust
CLI Tools
#firmware#analysis#reverse-engineering

moonD4rk/HackBrowserData

A cross-platform tool to extract and decrypt browser data, supporting multiple data types.

13.6K
Stable
Go
CLI Tools
#browser#security#cross-platform
2...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.