Explore Projects

Discover 170 open source projects

Active filters (1):
Search: crawler×
Clear all

Showing 41-60 of 170 projects

rmax/scrapy-redis

A Redis-based distributed scraping library for the Scrapy web crawling framework.

5.6K
Archived
Python
API Frameworks
Caching
Scrapy
#crawler#distributed#redis

SpiderClub/haipproxy

A high-availability distributed IP proxy pool powered by Scrapy and Redis for web crawling applications.

5.6K
Archived
Python
API Frameworks
Containerization
Scrapy
#crawler#distributed#high-availability

adbar/trafilatura

Gathers text and metadata from the web using crawling, scraping, and extraction techniques.

5.4K
Stable
Python
React
#web-scraping#text-extraction#metadata-gathering

DropsDevopsOrg/ECommerceCrawlers

A collection of Python-based web crawlers for scraping data from various e-commerce and online platforms.

5.4K
Archived
Python
Backend Frameworks
ETL & Pipelines
Scrapy
#web-scraping#data-extraction#e-commerce

hect0x7/JMComic-Crawler-Python

Python API for crawling and downloading content from the JMComic website, a popular source for adult manga/comics.

5.4K
Active
Python
Backend & APIs
CLI Tools
#crawler#downloader#jmcomic

hakluke/hakrawler

A fast, simple web crawler designed for quick discovery of endpoints and assets within a web application.

5.0K
Archived
Go
Penetration Testing
CLI Tools
#crawling#hacking#osint

Alfred1984/interesting-python

This GitHub repository contains a collection of interesting Python web scraping and data analysis projects.

5.0K
Archived
Jupyter Notebook
Backend Frameworks
ETL & Pipelines
#web-scraping#data-analysis#python

niespodd/browser-fingerprinting

Analysis of bot protection systems and techniques to bypass browser fingerprinting for web scraping.

5.0K
Archived
JavaScript
Security Research
Authentication
Node.js
#bot-detection#browser-fingerprinting#web-scraping

SpiderClub/weibospider

A distributed web crawler for Weibo, built using Celery and Requests.

4.8K
Archived
Python
Backend & APIs
Caching
#web-crawler#distributed-system#data-analysis

HiddenStrawberry/Crawler_Illegal_Cases_In_China

A repository that collects news, resources, and legal regulations related to web crawlers in China.

4.5K
Experimental
HTML
API Frameworks
Security Research
#china#crawler#law

myreader-io/myGPTReader

A community-driven platform to read and chat with AI bots powered by ChatGPT for developers.

4.4K
Archived
Python
LLM Frameworks
Chatbots
Python
#chatgpt#ai#crawler

hanc00l/wooyun_public

This is an archived repository that provides a web crawler and search engine for the now-defunct Wooyun security vulnerability database.

4.4K
Archived
PHP
Security Research
Backend Frameworks
PHP
#security#vulnerability#database

dataabc/weibo-crawler

A Python-based web crawler that can scrape data, images, and videos from Weibo, a popular social media platform in China.

4.4K
Active
Python
Backend Frameworks
Data Crawling & Scraping
Python
#weibo#crawler#scraper

Arachni/arachni

Arachni is a powerful open-source web application security scanner framework for penetration testing and vulnerability detection.

4.0K
Experimental
Ruby
Security Research
API Frameworks
Ruby
#security#penetration-testing#vulnerability-detection

kanasimi/work_crawler

A JavaScript crawler that downloads comics, novels, and webcomics from various online sources.

3.9K
Stable
JavaScript
Backend Frameworks
General Utilities
Node.js
#comic-downloader#novel-downloader#web-scraper

bitmagnet-io/bitmagnet

Self-hosted BitTorrent indexer, crawler, classifier and search engine with web UI and API

3.9K
Active
Go
API Frameworks
Containerization
#bittorrent#torrent#indexer

hardkoded/puppeteer-sharp

Headless Chrome .NET API for web automation, crawling, and end-to-end testing

3.9K
Active
C#
Testing
Backend Frameworks
#automation#chrome#crawling

DedSecInside/TorBot

DedSecInside/TorBot is a dark web OSINT tool written in Python that crawls and extracts information from the Tor network.

3.8K
Active
Python
Security Research
API Frameworks
#osint#dark-web#tor

NanmiCoder/CrawlerTutorial

A Python crawler tutorial for beginners, intermediate, and advanced users.

3.7K
Active
Python
React
#crawler#tutorial#python

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block, useful for privacy-conscious developers

3.7K
Stable
Python
React
#authentication#privacy#crawlers
124...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.