Explore Projects

Discover 17 open source projects

Active filters (1):
Search: web-crawlerร—
Clear all

Showing 1-17 of 17 projects

firecrawl/firecrawl

Convert websites into LLM-ready data with API for scraping, crawling, and structured data extraction

88.5K
Active
TypeScript
Web Scraping AI
Agents & Orchestration
TypeScript
#ai-scraping#web-crawler#llm-data

ScrapeGraphAI/Scrapegraph-ai

AI-powered web scraping library for extracting data from websites and documents

22.9K
Active
Python
Web Scraping AI
RAG & Vector
Python
#ai-scraping#llm#rag

apify/crawlee

Web scraping and browser automation library for Node.js

22.0K
Active
TypeScript
Browser Automation SDKs
Testing
Node.js
#web-scraping#browser-automation#nodejs

crawlab-team/crawlab

A distributed web crawler admin platform for managing spiders in any language or framework.

12.2K
Stable
Go
Backend Frameworks
Go
#crawling#spider-management#distributed

ssssssss-team/spider-flow

A no-code web crawler platform that allows developers to define crawling workflows visually without writing code.

11.3K
Archived
Java
Web Development
Java
#crawler#jsoup#spider

apify/crawlee-python

Crawlee is a powerful web scraping and browser automation library for Python to build reliable crawlers.

8.2K
Active
Python
API Clients & Testing
Backend Frameworks
Playwright
#web-scraping#crawling#automation

BruceDone/awesome-crawler

A comprehensive collection of web crawlers and scrapers in various programming languages.

7.1K
Archived
Backend Frameworks
CLI Tools
#web-crawler#web-scraper#scraper

adithya-s-k/omniparse

A Python library for ingesting, parsing, and optimizing any data format for enhanced compatibility with GenAI frameworks.

6.8K
Stable
Python
LLM Frameworks
File Storage
Python
#ingestion-api#ocr#parser-library

firecrawl/firecrawl-mcp-server

Firecrawl MCP Server adds powerful web scraping and search capabilities to AI language models like Cursor and Claude.

5.7K
Active
JavaScript
MCP Servers
LLM Wrappers & SDKs
JavaScript
#web-scraping#search-api#llm-integration

jasonxtn/Argus

A comprehensive toolkit for information gathering and reconnaissance, including OSINT, web crawling, and more.

3.3K
Stable
Python
CLI Tools
Security Research
Python
#osint#information-gathering#reconnaissance

apache/nutch

Apache Nutch is an extensible and scalable web crawler for building search engines and data mining applications.

3.1K
Active
Java
API Frameworks
Backend Frameworks
#apache#crawling#hadoop

sjdirect/abot

A cross-platform, fast, and flexible C# web crawler framework for developers building crawlers and spiders.

2.3K
Archived
C#
Backend Frameworks
#web-crawler#cross-platform#c-sharp

xianhu/PSpider

A simple and easy-to-use Python web scraping framework with support for multi-threading and proxies.

1.8K
Archived
Python
Backend & APIs
CLI Tools
Python
#crawler#web-scraper#multi-threading

MarginaliaSearch/MarginaliaSearch

An internet search engine focused on indexing the small, old, and weird parts of the web.

1.7K
Active
HTML
Backend Frameworks
Search
Java
#search-engine#web-crawler#small-web

gildas-lormeau/single-file-cli

CLI tool for saving a complete web page as a single HTML file, useful for web archiving and scraping.

1.2K
Stable
JavaScript
CLI Tools
CLI Tools
Node.js
#web-scraping#web-archiving#cli

JustinBeckwith/linkinator

A TypeScript-based tool for finding and fixing broken links in websites, documentation, and local files.

1.2K
Active
TypeScript
Backend & APIs
Testing
Node.js
#broken-links#link-checker#seo

omrilotan/isbot

A TypeScript library to detect bots, crawlers, and spiders based on their user agent string.

1.1K
Active
TypeScript
Backend & APIs
CLI Tools
Node.js
#user-agent#web-crawlers#bots

Stay in the loop

Get weekly updates on trending AI coding tools and projects.