Showing 181-200 of 310 projects
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
A high-quality PDF to Markdown conversion tool powered by large language model visual recognition.
Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.
Automated machine learning for analytics & production use cases powered by popular ML libraries.
Data augmentation for NLP using CNN and RNN, presented at EMNLP 2019
UIforETW is a C++ library for recording and managing ETW traces, providing a user interface for developers.
A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.
Open-source reverse ETL tool for data activation and customer data platform integration.
Apache Kafka-compatible broker with support for S3, PostgreSQL, SQLite, Apache Iceberg, and Delta Lake.
A Python library for scraping real-time data from Douyin (TikTok) live streams, including comments and metadata.
A Ruby library that enables streaming replication from MongoDB to PostgreSQL databases.
Concurrent data pipelines in Python for building efficient and scalable data processing workflows.
A Python library for scraping soccer data from various sources for sports analytics and data science.
A collection of Python web scraping scripts for various websites and platforms, including music, video, and real estate data.
Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.
A data quality and observability tool for monitoring and fixing data issues before they become problems.
A Python library that uses LLMs and embeddings to process datasets with up to 1000x speedups
A web crawler tool that outputs WARC files and provides a dashboard for managing crawls.
A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.
Agile data preparation workflows made easy with popular Python data science libraries.
Get weekly updates on trending AI coding tools and projects.