Explore Projects

Discover 96 open source projects

Active filters (1):
Search: crawling×
Clear all

Showing 1-20 of 96 projects

firecrawl/firecrawl

Convert websites into LLM-ready data with API for scraping, crawling, and structured data extraction

88.5K
Active
TypeScript
Web Scraping AI
Agents & Orchestration
TypeScript
#ai-scraping#web-crawler#llm-data

scrapy/scrapy

Scrapy is a fast, high-level web crawling and scraping framework for Python developers.

60.6K
Active
Python
Testing
Python
#web-scraping#crawling#python

gocolly/colly

Go Colly - Elegant Scraper and Crawler Framework for Golang

25.1K
Active
Go
CLI Tools
Backend Frameworks
Go
#golang#scraper#crawler

bytedance/deer-flow

DeerFlow is a super agent harness for orchestrating sub-agents, memory, and sandboxes in deep research workflows.

24.7K
Active
Python
Multi-Agent Workflows
Agents & Orchestration
LangChain
#agent-framework#ai-orchestration#deep-research

D4Vinci/Scrapling

Powerful, flexible Python library for effortless web scraping with AI-powered features.

23.6K
Active
Python
Web Scraping
Backend Frameworks
Python
#web-scraping#automation#data-extraction

BuilderIO/gpt-crawler

Crawl websites to create custom GPTs from URLs

22.2K
Experimental
TypeScript
RAG & Vector
Web Scraping AI
TypeScript
#gpt#crawler#ai

apify/crawlee

Web scraping and browser automation library for Node.js

22.0K
Active
TypeScript
Browser Automation SDKs
Testing
Node.js
#web-scraping#browser-automation#nodejs

projectdiscovery/katana

A highly customizable web crawler and spider framework for developers to build advanced crawling solutions.

15.7K
Active
Go
CLI Tools
Go
#crawler#spider-framework#web-spider

codelucas/newspaper

A Python library for extracting news articles, full-text, and metadata from websites.

15.0K
Stable
HTML
Backend Frameworks
Python
#web-scraping#news-extraction#data-extraction

waditu/tushare

A Python library for crawling historical data of China stocks.

14.5K
Archived
Python
Databases
Python
#finance#fintech#stock-data

crawlab-team/crawlab

A distributed web crawler admin platform for managing spiders in any language or framework.

12.2K
Stable
Go
Backend Frameworks
Go
#crawling#spider-management#distributed

ssssssss-team/spider-flow

A no-code web crawler platform that allows developers to define crawling workflows visually without writing code.

11.3K
Archived
Java
Web Development
Java
#crawler#jsoup#spider

dataabc/weiboSpider

Crawls and scrapes Weibo data using Python.

9.5K
Stable
Python
React
#weibo#python#scraping

kangvcar/InfoSpider

INFO-SPIDER is an open-source web scraping toolkit that helps users retrieve data from various sources like email, e-commerce, and social platforms.

8.2K
Active
Python
Backend Frameworks
ETL & Pipelines
Python
#web-scraping#data-extraction#open-source

apify/crawlee-python

Crawlee is a powerful web scraping and browser automation library for Python to build reliable crawlers.

8.2K
Active
Python
API Clients & Testing
Backend Frameworks
Playwright
#web-scraping#crawling#automation

lorien/awesome-web-scraping

A comprehensive list of libraries, tools, and APIs for web scraping and data processing.

7.8K
Active
Makefile
Backend Frameworks
ETL & Pipelines
#web-scraping#crawling#data-processing

andeya/pholcus

Pholcus is a high-concurrency web crawler software written in Go for developers needing a powerful, distributed crawling solution.

7.6K
Archived
Go
API Frameworks
CLI Tools
#crawler#spider#distributed

luyishisi/Anti-Anti-Spider

Anti-Anti-Spider is a Python library that helps developers bypass anti-crawling measures on websites to collect data.

7.3K
Archived
Python
Backend & APIs
CLI Tools
Python
#web-scraping#anti-crawling#data-collection

go-rod/rod

A Go library for automating and scraping websites using the Chrome DevTools Protocol.

6.8K
Stable
Go
Backend Frameworks
Testing
#automation#web-scraping#chrome-devtools

wzdnzd/aggregator

A Python-based platform for aggregating and crawling proxy servers, useful for building proxy-reliant applications.

6.3K
Active
Python
API Frameworks
Realtime
#proxy#crawling#aggregation

Stay in the loop

Get weekly updates on trending AI coding tools and projects.