Explore Projects

Discover 96 open source projects

Active filters (1):
Search: crawling×
Clear all

Showing 21-40 of 96 projects

MontFerret/ferret

Declarative web scraping library written in Go, providing a powerful DSL for extracting data from websites.

5.9K
Stable
Go
Backend Frameworks
CLI Tools
#web-scraping#crawler#data-mining

yujiosaka/headless-chrome-crawler

A powerful, distributed web crawler powered by Headless Chrome for scraping websites at scale.

5.7K
Archived
JavaScript
Backend Frameworks
CLI Tools
Node.js
#web-crawler#headless-chrome#scraper

rmax/scrapy-redis

A Redis-based distributed scraping library for the Scrapy web crawling framework.

5.6K
Archived
Python
API Frameworks
Caching
Scrapy
#crawler#distributed#redis

SpiderClub/haipproxy

A high-availability distributed IP proxy pool powered by Scrapy and Redis for web crawling applications.

5.6K
Archived
Python
API Frameworks
Containerization
Scrapy
#crawler#distributed#high-availability

adbar/trafilatura

Gathers text and metadata from the web using crawling, scraping, and extraction techniques.

5.4K
Stable
Python
React
#web-scraping#text-extraction#metadata-gathering

hect0x7/JMComic-Crawler-Python

Python API for crawling and downloading content from the JMComic website, a popular source for adult manga/comics.

5.4K
Active
Python
Backend & APIs
CLI Tools
#crawler#downloader#jmcomic

hakluke/hakrawler

A fast, simple web crawler designed for quick discovery of endpoints and assets within a web application.

5.0K
Archived
Go
Penetration Testing
CLI Tools
#crawling#hacking#osint

lc/gau

A Go-based tool to fetch known URLs from various threat intelligence sources for security analysis.

4.8K
Archived
Go
Security Research
CLI Tools
#security#threat-intelligence#url-discovery

201206030/novel-plus

This is a comprehensive content management system for novels, including features like recommendation, search, reading, and more.

4.5K
Stable
Java
API Frameworks
Backend Frameworks
Spring
#book#crawl#novel

hanc00l/wooyun_public

This is an archived repository that provides a web crawler and search engine for the now-defunct Wooyun security vulnerability database.

4.4K
Archived
PHP
Security Research
Backend Frameworks
PHP
#security#vulnerability#database

omkarcloud/botasaurus

Powerful scraping framework to build undetectable web scrapers using Python

4.1K
Active
Python
Backend Frameworks
CLI Tools
#web-scraping#anti-detection#undetectable

exa-labs/exa-mcp-server

A TypeScript-based server for web search and web crawling, part of the Exa MCP ecosystem.

3.9K
Active
TypeScript
MCP Servers
Search-as-a-Service
TypeScript
#code-search#web-crawling#mcp

hardkoded/puppeteer-sharp

Headless Chrome .NET API for web automation, crawling, and end-to-end testing

3.9K
Active
C#
Testing
Backend Frameworks
#automation#chrome#crawling

DedSecInside/TorBot

DedSecInside/TorBot is a dark web OSINT tool written in Python that crawls and extracts information from the Tor network.

3.8K
Active
Python
Security Research
API Frameworks
#osint#dark-web#tor

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block, useful for privacy-conscious developers

3.7K
Stable
Python
React
#authentication#privacy#crawlers

jasonxtn/Argus

A comprehensive toolkit for information gathering and reconnaissance, including OSINT, web crawling, and more.

3.3K
Stable
Python
CLI Tools
Security Research
Python
#osint#information-gathering#reconnaissance

apache/nutch

Apache Nutch is an extensible and scalable web crawler for building search engines and data mining applications.

3.1K
Active
Java
API Frameworks
Backend Frameworks
#apache#crawling#hadoop

CrawlScript/WebCollector

An open-source web crawler framework written in Java that makes it easy to build multi-threaded web crawlers.

3.1K
Stable
Java
API Frameworks
CLI Tools
#web-crawler#multi-threaded#open-source

crawl/crawl

This is the official repository for the classic roguelike game Dungeon Crawl: Stone Soup, not a vibe coder platform.

2.8K
Active
C++
Roguelike
Backend Frameworks
#roguelike#dungeon-crawl#stone-soup

geziyor/geziyor

Geziyor is a fast web crawling and scraping framework for Go that supports JavaScript rendering.

2.8K
Experimental
Go
API Frameworks
CLI Tools
#crawler#scraper#web-scraping

Stay in the loop

Get weekly updates on trending AI coding tools and projects.