Explore Projects

Discover 8 open source projects

Active filters (1):
Search: data-lakeร—
Clear all

Showing 1-8 of 8 projects

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

5.2K
Active
Go
Data Lake
CLI Tools
#data-versioning#data-quality#git-for-data

dlt-hub/dlt

An open-source Python library that simplifies the process of loading data into data lakes and warehouses.

5.0K
Active
Python
ETL & Pipelines
CLI Tools
Python
#data-engineering#data-loading#data-pipelines

san089/Udacity-Data-Engineering-Projects

A collection of Udacity data engineering projects showcasing various tools and technologies.

1.8K
Archived
Python
Airflow
#data-engineering#cloud#infrastructure

bytedance/bitsail

Distributed high-performance data integration engine for batch, streaming, and incremental scenarios.

1.7K
Archived
Java
Flink
#authentication#streaming#real-time

san089/goodreads_etl_pipeline

An end-to-end data pipeline for building a data lake, data warehouse, and analytics platform from GoodReads data.

1.5K
Archived
Python
ETL & Pipelines
Background Jobs
Apache Airflow
#data-engineering#etl-pipeline#data-lake

lakekeeper/lakekeeper

Lakekeeper is an open-source, secure, and fast Apache Iceberg REST Catalog written in Rust for data lakehouse governance.

1.2K
Active
Rust
Databases
API Frameworks
#catalog#data-lake#iceberg

apache/amoro

Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.

1.1K
Active
Java
Databases
ETL & Pipelines
Flink
#big-data#data-lake#lakehouse

Teradata/kylo

Kylo is an enterprise-grade data lake management platform built on big data technologies like Spark and Hadoop.

1.1K
Archived
Java
ETL & Pipelines
Realtime
#data-lake#hadoop#spark

Stay in the loop

Get weekly updates on trending AI coding tools and projects.