Showing 21-39 of 39 projects
LLMs-based Operators and Pipelines for data prep
An open-source data logging library for machine learning models and data pipelines.
Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.
A lightweight stream processing library for Go developers that supports various streaming platforms.
A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.
The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.
Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.
Distributed high-performance data integration engine for batch, streaming, and incremental scenarios.
Open-source reverse ETL tool for data activation and customer data platform integration.
MLeap is a library for deploying machine learning pipelines to production using Scala, Python, and Spark.
Concurrent Python made simple, with support for asyncio, multiprocessing, and threading.
Superlinked is a Python framework for building high-performance search & recommendation apps with structured and unstructured data.
A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.
A repository providing data science tools and examples for the Google Cloud Platform.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.
First open-source data discovery and observability platform for data practitioners.
A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.
Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.
Get weekly updates on trending AI coding tools and projects.