Explore Projects

Discover 162 open source projects

Active filters (1):
Search: scraper×
Clear all

Showing 1-20 of 162 projects

firecrawl/firecrawl

Convert websites into LLM-ready data with API for scraping, crawling, and structured data extraction

88.5K
Active
TypeScript
Web Scraping AI
Agents & Orchestration
TypeScript
#ai-scraping#web-crawler#llm-data

unclecode/crawl4ai

LLM-friendly web crawler & scraper for RAG, agents, and data pipelines

61.4K
Active
Python
RAG & Vector
CLI Tools
Python
#web-crawler#llm-ready#markdown

huginn/huginn

Automated agents for monitoring and acting on your behalf online

48.8K
Active
Ruby
Agent Coordination
Agents & Orchestration
Ruby
#agent-automation#monitoring#notifications

NaiboWang/EasySpider

Visual code-free web crawler/spider with GUI for data collection and automation

44.0K
Active
JavaScript
Testing
No-Code AI Platforms
#web-crawler#data-collection#gui

iawia002/lux

Fast video downloader in Go for various platforms

30.9K
Stable
Go
CLI Tools
Full-Stack Frameworks
Go
#video-downloader#go#cli

cheeriojs/cheerio

Fast and flexible HTML parser for TypeScript

30.1K
Active
TypeScript
AI Code Generation
Component Libraries (React)
Cheerio
#cheerio#dom#htmlparser

feder-cr/Jobs_Applier_AI_Agent_AIHawk

AIHawk automates job applications using AI for tailored submissions.

29.4K
Stable
Python
Agents & Orchestration
Browser Agents
Python
#ai-agent#job-automation#chrome-automation

gocolly/colly

Go Colly - Elegant Scraper and Crawler Framework for Golang

25.1K
Active
Go
CLI Tools
Backend Frameworks
Go
#golang#scraper#crawler

D4Vinci/Scrapling

Powerful, flexible Python library for effortless web scraping with AI-powered features.

23.6K
Active
Python
Web Scraping
Backend Frameworks
Python
#web-scraping#automation#data-extraction

ScrapeGraphAI/Scrapegraph-ai

AI-powered web scraping library for extracting data from websites and documents

22.9K
Active
Python
Web Scraping AI
RAG & Vector
Python
#ai-scraping#llm#rag

apify/crawlee

Web scraping and browser automation library for Node.js

22.0K
Active
TypeScript
Browser Automation SDKs
Testing
Node.js
#web-scraping#browser-automation#nodejs

Evil0ctal/Douyin_TikTok_Download_API

A high-performance async web scraping tool for extracting data from Douyin, TikTok, Bilibili and more.

16.5K
Stable
Python
API Frameworks
FastAPI
#api#async#scraper

getmaxun/maxun

Turn websites into clean data pipelines & structured APIs in minutes with a low-code web scraping tool.

15.2K
Active
TypeScript
API Clients & Testing
React
#web-scraping#automation#no-code

codelucas/newspaper

A Python library for extracting news articles, full-text, and metadata from websites.

15.0K
Stable
HTML
Backend Frameworks
Python
#web-scraping#news-extraction#data-extraction

alex000kim/nsfw_data_scraper

A collection of scripts to aggregate image data for training NSFW image classifiers.

12.5K
Archived
Shell
Computer Vision
#content-moderation#deep-learning#machine-learning

seleniumbase/SeleniumBase

Python APIs for web automation, testing, and bypassing bot-detection with ease.

12.4K
Active
Python
Frontend Frameworks
#web-automation#test-automation#bot-detection

instaloader/instaloader

A Python library for downloading photos, videos, and metadata from Instagram.

11.7K
Active
Python
Backend & APIs
#instagram#instagram-downloader#instagram-scraper

code4craft/webmagic

A scalable web crawler framework for Java developers to build custom web scrapers and data extraction tools.

11.7K
Stable
Java
API Frameworks
#crawler#scraping#framework

pwxcoo/chinese-xinhua

A comprehensive Chinese dictionary dataset for developers working on Chinese NLP projects.

11.5K
Archived
Python
JSON Dataset
Python
#chinese#chinese-nlp#data

yusufkaraaslan/Skill_Seekers

Automatically convert documentation, GitHub repos, and PDFs into Claude AI skills with conflict detection.

10.2K
Active
Python
AI Code Generation
MCP Servers
Python
#ai-tools#automation#claude-ai
2...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.