Explore Projects

Discover 17 open source projects

Active filters (1):

Search: web-crawlers×

Clear all

Showing 1-17 of 17 projects

firecrawl/firecrawl

Convert websites into LLM-ready data with API for scraping, crawling, and structured data extraction

88.5K

Active

TypeScript

Web Scraping AI

Agents & Orchestration

TypeScript

#ai-scraping#web-crawler#llm-data

ScrapeGraphAI/Scrapegraph-ai

AI-powered web scraping library for extracting data from websites and documents

22.9K

Active

Python

Web Scraping AI

RAG & Vector

Python

#ai-scraping#llm#rag

apify/crawlee

Web scraping and browser automation library for Node.js

22.0K

Active

TypeScript

Browser Automation SDKs

Testing

Node.js

#web-scraping#browser-automation#nodejs

crawlab-team/crawlab

A distributed web crawler admin platform for managing spiders in any language or framework.

12.2K

Stable

Backend Frameworks

#crawling#spider-management#distributed

ssssssss-team/spider-flow

A no-code web crawler platform that allows developers to define crawling workflows visually without writing code.

11.3K

Archived

Java

Web Development

Java

#crawler#jsoup#spider

apify/crawlee-python

Crawlee is a powerful web scraping and browser automation library for Python to build reliable crawlers.

8.2K

Active

Python

API Clients & Testing

Backend Frameworks

Playwright

#web-scraping#crawling#automation

BruceDone/awesome-crawler

A comprehensive collection of web crawlers and scrapers in various programming languages.

7.1K

Archived

Backend Frameworks

CLI Tools

#web-crawler#web-scraper#scraper

adithya-s-k/omniparse

A Python library for ingesting, parsing, and optimizing any data format for enhanced compatibility with GenAI frameworks.

6.8K

Stable

Python

LLM Frameworks

File Storage

Python

#ingestion-api#ocr#parser-library

firecrawl/firecrawl-mcp-server

Firecrawl MCP Server adds powerful web scraping and search capabilities to AI language models like Cursor and Claude.

5.7K

Active

JavaScript

MCP Servers

LLM Wrappers & SDKs

JavaScript

#web-scraping#search-api#llm-integration

jasonxtn/Argus

A comprehensive toolkit for information gathering and reconnaissance, including OSINT, web crawling, and more.

3.3K

Stable

Python

CLI Tools

Security Research

Python

#osint#information-gathering#reconnaissance

apache/nutch

Apache Nutch is an extensible and scalable web crawler for building search engines and data mining applications.

3.1K

Active

Java

API Frameworks

Backend Frameworks

#apache#crawling#hadoop

sjdirect/abot

A cross-platform, fast, and flexible C# web crawler framework for developers building crawlers and spiders.

2.3K

Archived

Backend Frameworks

#web-crawler#cross-platform#c-sharp

xianhu/PSpider

A simple and easy-to-use Python web scraping framework with support for multi-threading and proxies.

1.8K

Archived

Python

Backend & APIs

CLI Tools

Python

#crawler#web-scraper#multi-threading

MarginaliaSearch/MarginaliaSearch

An internet search engine focused on indexing the small, old, and weird parts of the web.

1.7K

Active

HTML

Backend Frameworks

Java

#search-engine#web-crawler#small-web

gildas-lormeau/single-file-cli

CLI tool for saving a complete web page as a single HTML file, useful for web archiving and scraping.

1.2K

Stable

JavaScript

CLI Tools

Node.js

#web-scraping#web-archiving#cli

JustinBeckwith/linkinator

A TypeScript-based tool for finding and fixing broken links in websites, documentation, and local files.

1.2K

Active

TypeScript

Backend & APIs

Testing

Node.js

#broken-links#link-checker#seo

omrilotan/isbot

A TypeScript library to detect bots, crawlers, and spiders based on their user agent string.

1.1K

Active

TypeScript

Backend & APIs

CLI Tools

Node.js

#user-agent#web-crawlers#bots

Stay in the loop

Get weekly updates on trending AI coding tools and projects.