Explore Projects

Discover 385 open source projects

Active filters (1):
Search: extract×
Clear all

Showing 21-40 of 385 projects

PaddlePaddle/PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo for natural language processing.

12.9K
Stable
Python
LLM Frameworks
React
#llm#nlp#transformers

vicc/chameleon

A Swift & Objective-C color framework that supports gradients, hex codes, and extracting colors from images.

12.4K
Archived
Objective-C
Animation & Motion
Swift
#colors#gradients#hex-codes

getomni-ai/zerox

OCR and document extraction using vision models

12.2K
Experimental
TypeScript
AI Editors/Agents/Copilot
TensorFlow
#Machine Learning#Computer Vision#Natural Language Processing

code4craft/webmagic

A scalable web crawler framework for Java developers to build custom web scrapers and data extraction tools.

11.7K
Stable
Java
API Frameworks
#crawler#scraping#framework

mozilla/readability

A standalone version of the readability lib, a tool for extracting the primary readable content from web pages.

11.0K
Active
JavaScript
Frontend Frameworks
Node
#html-parsing#content-extraction#web-scraping

vanilla-extract-css/vanilla-extract

Zero-runtime Stylesheets-in-TypeScript for building modern, performant, and scalable CSS-in-JS solutions.

10.3K
Stable
TypeScript
Component Libraries (React)
React
#css-in-js#type-safe#performance

JoeanAmier/XHS-Downloader

A Python tool for extracting and downloading content from the Chinese social media platform Xiaohongshu (Little Red Book)

10.3K
Active
Python
API Frameworks
#xiaohongshu#rednote#web-scraping

addyosmani/critical

A library for extracting and inlining critical-path CSS in HTML pages to improve performance

10.2K
Active
JavaScript
Component Libraries (React)
React
#critical-css#critical-path-css#css-optimization

jsvine/pdfplumber

A Python library that provides a powerful API for extracting text and tables from PDF files.

9.8K
Active
Python
API Frameworks
Python
#pdf#pdf-parsing#table-extraction

vuejs-templates/webpack

A full-featured Webpack + vue-loader setup with hot reload, linting, testing & css extraction.

9.7K
Archived
JavaScript
Component Libraries (Vue/Svelte)
Vue
#webpack#vue#linting

sloria/TextBlob

TextBlob is a simple, Pythonic library for natural language processing tasks like sentiment analysis, part-of-speech tagging, and more.

9.5K
Active
Python
Natural Language Processing
#nlp#sentiment-analysis#part-of-speech-tagging

opendatalab/PDF-Extract-Kit

A comprehensive toolkit for high-quality PDF content extraction, focused on developer needs.

9.4K
Archived
Python
API Frameworks
Caching
Python
#pdf#extraction#parsing

pymupdf/PyMuPDF

A high-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other documents.

9.2K
Active
Python
Document Processing
#pdf#data-extraction#text-processing

blue-yonder/tsfresh

Automatic feature extraction from time series data for data science and machine learning applications.

9.1K
Stable
Jupyter Notebook
Feature Extraction
ETL & Pipelines
Python
#time-series#feature-engineering#data-science

caorushizi/mediago

A cross-platform video extraction tool that supports streaming, video, m3u8, and Bilibili video downloads.

9.0K
Active
TypeScript
API Frameworks
Frontend Frameworks
React
#downloader#m3u8#video

YaoFANGUK/video-subtitle-extractor

A Python tool for extracting hard-coded subtitles from videos and generating SRT files using deep learning-based OCR.

8.5K
Stable
Python
Computer Vision
API Frameworks
#ocr#subtitles#srt

ericchiang/pup

A command-line tool for parsing HTML, useful for web scraping and data extraction tasks.

8.4K
Archived
HTML
Backend Frameworks
CLI Tools
Node.js
#web-scraping#data-extraction#html-parsing

lukemelas/EfficientNet-PyTorch

A PyTorch implementation of the EfficientNet deep learning model for image classification and feature extraction.

8.2K
Archived
Python
Computer Vision
PyTorch
#efficientnet#feature-extraction#imagenet

apify/crawlee-python

Crawlee is a powerful web scraping and browser automation library for Python to build reliable crawlers.

8.2K
Active
Python
API Clients & Testing
Backend Frameworks
Playwright
#web-scraping#crawling#automation

TeamWiseFlow/wiseflow

A Python-based platform that uses LLMs to track and extract websites, RSS feeds, and social media for developers.

8.1K
Active
Python
LLM Frameworks
Backend Frameworks
Python
#crawler#information-gathering#information-tracker
13...20

Stay in the loop

Get weekly updates on trending AI coding tools and projects.