Explore Projects

Discover 17 open source projects

Active filters (1):
Search: lakehouseร—
Clear all

Showing 1-17 of 17 projects

ClickHouse/ClickHouse

Real-time analytics database for generating data reports

46.2K
Active
C++
Databases
#analytics#big-data#clickhouse

airbytehq/airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

20.8K
Active
Python
ETL & Pipelines
#data-integration#elt#etl

prestodb/presto

Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.

16.7K
Active
Java
Databases
#big-data#sql#query

apache/doris

Apache Doris is a high-performance, unified analytics database for real-time data processing.

15.1K
Active
Java
Databases
Spark
#database#olap#real-time

StarRocks/starrocks

A high-performance open source query engine for sub-second analytics on data lakehouse.

11.4K
Active
Java
Databases
#analytics#big-data#database

databendlabs/databend

Unified cloud-native data warehouse platform for analytics, search and AI, built on top of S3 storage.

9.2K
Active
Rust
Databases
Search
Rust
#cloud-native#data-warehouse#analytics

delta-io/delta

An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.

8.6K
Active
Scala
ETL & Pipelines
API Frameworks
Spark
#big-data#data-engineering#data-lakehouse

lance-format/lance

An open-source data format for building high-performance multimodal AI applications with fast random access, vector indexing, and data versioning.

6.1K
Active
Rust
LLM Frameworks
Databases
Rust
#data-format#data-versioning#vector-index

lakesoul-io/LakeSoul

LakeSoul is a cloud-native, real-time Lakehouse framework for fast data ingestion and analytics on cloud storage.

3.2K
Active
Java
API Frameworks
Databases
#big-data#lakehouse#streaming

apache/paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark.

3.2K
Active
Java
ETL & Pipelines
Realtime
#big-data#data-ingestion#flink

apache/gravitino

An open-source data catalog platform for building a high-performance, federated metadata lake.

2.9K
Active
Java
Databases
ETL & Pipelines
Java
#data-catalog#datalake#federated-query

Mooncake-Labs/pg_mooncake

A Rust-based library that provides real-time analytics on Postgres tables, supporting features like columnstore, delta-lake, and Iceberg.

1.9K
Stable
Rust
API Frameworks
Databases
#analytics#columnstore#delta-lake

apache/fluss

Apache Fluss is a real-time streaming storage platform built for big data analytics.

1.8K
Active
Java
Databases
Realtime
#big-data#real-time-analytics#streaming

datazip-inc/olake

Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.

1.3K
Active
Go
ETL & Pipelines
Realtime
#cdc#data-pipeline#elt

lakekeeper/lakekeeper

Lakekeeper is an open-source, secure, and fast Apache Iceberg REST Catalog written in Rust for data lakehouse governance.

1.2K
Active
Rust
Databases
API Frameworks
#catalog#data-lake#iceberg

apache/incubator-xtable

Apache XTable is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

1.2K
Active
Java
ETL & Pipelines
#interoperability#lakehouse#data-processing

apache/amoro

Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.

1.1K
Active
Java
Databases
ETL & Pipelines
Flink
#big-data#data-lake#lakehouse

Stay in the loop

Get weekly updates on trending AI coding tools and projects.