Showing 101-120 of 310 projects
Generates synthetic tabular data for machine learning and AI applications
ingestr is a CLI tool that seamlessly copies data between any databases with a single command.
Koalas is a pandas-like API for Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
LakeSoul is a cloud-native, real-time Lakehouse framework for fast data ingestion and analytics on cloud storage.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark.
Heritrix is an open-source, extensible web crawler for archiving websites at scale.
A Python library for extracting data from a wide range of internet sources into a pandas DataFrame.
A Rust library for interacting with Delta Lake, a data lake storage format, with Python bindings.
An interactive and reactive data science platform powered by Scala and Apache Spark.
Python scripts for extracting, transforming and loading Ethereum blockchain data into Google BigQuery.
An Awesome List of open-source data engineering projects for developers.
A high-performance I/O system for large deep learning problems with strong PyTorch support.
Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage
A Python library for comparing data across databases, supporting various database engines.
An open-source platform for building data-driven applications and AI-powered solutions with a focus on vibe coders.
A collection of Python tutorials covering a wide range of topics from computer vision to network security.
An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.
Scalable and efficient data transformation framework with backwards compatibility for dbt.
A Python library that provides a set of customizable pipeline processing blocks for data processing tasks.
Get weekly updates on trending AI coding tools and projects.