Explore Projects

Discover 96 open source projects

Active filters (1):

Search: crawling×

Clear all

Showing 1-20 of 96 projects

firecrawl/firecrawl

Convert websites into LLM-ready data with API for scraping, crawling, and structured data extraction

88.5K

Active

TypeScript

Web Scraping AI

Agents & Orchestration

TypeScript

#ai-scraping#web-crawler#llm-data

scrapy/scrapy

Scrapy is a fast, high-level web crawling and scraping framework for Python developers.

60.6K

Active

Python

Testing

Python

#web-scraping#crawling#python

gocolly/colly

Go Colly - Elegant Scraper and Crawler Framework for Golang

25.1K

Active

CLI Tools

Backend Frameworks

#golang#scraper#crawler

bytedance/deer-flow

DeerFlow is a super agent harness for orchestrating sub-agents, memory, and sandboxes in deep research workflows.

24.7K

Active

Python

Multi-Agent Workflows

Agents & Orchestration

LangChain

#agent-framework#ai-orchestration#deep-research

D4Vinci/Scrapling

Powerful, flexible Python library for effortless web scraping with AI-powered features.

23.6K

Active

Python

Web Scraping

Backend Frameworks

Python

#web-scraping#automation#data-extraction

BuilderIO/gpt-crawler

Crawl websites to create custom GPTs from URLs

22.2K

Experimental

TypeScript

RAG & Vector

Web Scraping AI

TypeScript

#gpt#crawler#ai

apify/crawlee

Web scraping and browser automation library for Node.js

22.0K

Active

TypeScript

Browser Automation SDKs

Testing

Node.js

#web-scraping#browser-automation#nodejs

projectdiscovery/katana

A highly customizable web crawler and spider framework for developers to build advanced crawling solutions.

15.7K

Active

CLI Tools

#crawler#spider-framework#web-spider

codelucas/newspaper

A Python library for extracting news articles, full-text, and metadata from websites.

15.0K

Stable

HTML

Backend Frameworks

Python

#web-scraping#news-extraction#data-extraction

waditu/tushare

A Python library for crawling historical data of China stocks.

14.5K

Archived

Python

Databases

Python

#finance#fintech#stock-data

crawlab-team/crawlab

A distributed web crawler admin platform for managing spiders in any language or framework.

12.2K

Stable

Backend Frameworks

#crawling#spider-management#distributed

ssssssss-team/spider-flow

A no-code web crawler platform that allows developers to define crawling workflows visually without writing code.

11.3K

Archived

Java

Web Development

Java

#crawler#jsoup#spider

dataabc/weiboSpider

Crawls and scrapes Weibo data using Python.

9.5K

Stable

Python

React

#weibo#python#scraping

kangvcar/InfoSpider

INFO-SPIDER is an open-source web scraping toolkit that helps users retrieve data from various sources like email, e-commerce, and social platforms.

8.2K

Active

Python

Backend Frameworks

ETL & Pipelines

Python

#web-scraping#data-extraction#open-source

apify/crawlee-python

Crawlee is a powerful web scraping and browser automation library for Python to build reliable crawlers.

8.2K

Active

Python

API Clients & Testing

Backend Frameworks

Playwright

#web-scraping#crawling#automation

lorien/awesome-web-scraping

A comprehensive list of libraries, tools, and APIs for web scraping and data processing.

7.8K

Active

Makefile

Backend Frameworks

ETL & Pipelines

#web-scraping#crawling#data-processing

andeya/pholcus

Pholcus is a high-concurrency web crawler software written in Go for developers needing a powerful, distributed crawling solution.

7.6K

Archived

API Frameworks

CLI Tools

#crawler#spider#distributed

luyishisi/Anti-Anti-Spider

Anti-Anti-Spider is a Python library that helps developers bypass anti-crawling measures on websites to collect data.

7.3K

Archived

Python

Backend & APIs

CLI Tools

Python

#web-scraping#anti-crawling#data-collection

go-rod/rod

A Go library for automating and scraping websites using the Chrome DevTools Protocol.

6.8K

Stable

Backend Frameworks

Testing

#automation#web-scraping#chrome-devtools

wzdnzd/aggregator

A Python-based platform for aggregating and crawling proxy servers, useful for building proxy-reliant applications.

6.3K

Active

Python

API Frameworks

Realtime

#proxy#crawling#aggregation

2 3 4 5

Stay in the loop

Get weekly updates on trending AI coding tools and projects.