Showing 121-140 of 310 projects
An official compendium for the book 'Mining the Social Web' focused on web scraping and data analysis.
An open-source data catalog platform for building a high-performance, federated metadata lake.
Deep learning model for extracting & analyzing table structures from PDFs and images with datasets.
Self-hosted document management system with OCR for scanning and archiving papers digitally.
An open-source data logging library for machine learning models and data pipelines.
A collection of Jupyter Notebooks demonstrating various NLP techniques and libraries like Scikit-Learn, NLTK, Spacy, and Gensim.
A unified framework for large-scale data computation that scales popular Python data tools like NumPy, Pandas, and Scikit-Learn.
Comprehensive analytics, versioning, and ETL toolkit for multimodal data (video, audio, PDFs, images)
Snakemake is a workflow management system for reproducible and scalable data analysis.
A progressive PHP crawler framework that allows developers to build elegant web scrapers and crawlers.
A Python-based web scraper for extracting Amazon product data like titles, ratings, prices, images, and descriptions.
A Python data interface for various APIs, including economic and news data.
A collection of data science projects in Python using Jupyter Notebook.
Rill is a tool for transforming data sets into powerful dashboards using SQL, enabling BI-as-code.
DuckLake is an integrated data lake and catalog format written in C++.
A repository for the 100 Knocks of Data Science Preprocessing, focused on structured data processing.
A Python library for creating data processing pipelines using functional programming principles.
A Python library for scraping tweets from Twitter, useful for data analysis and social media monitoring.
sq is a Go-based data wrangling tool that supports a variety of data formats and databases.
Python script to parse and export Twitter archive data in various formats.
Get weekly updates on trending AI coding tools and projects.