Explore Projects

Discover 14 open source projects

Active filters (1):
Search: datalakesร—
Clear all

Showing 1-14 of 14 projects

sinaptik-ai/pandas-ai

Conversational data analysis with LLMs using natural language queries on databases, CSVs, and data lakes.

23.3K
Stable
Python
Agents & Orchestration
RAG Frameworks
Python
#llm#rag#data-analysis

trinodb/trino

Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.

12.6K
Active
Java
Databases
#big-data#analytics#data-science

StarRocks/starrocks

A high-performance open source query engine for sub-second analytics on data lakehouse.

11.4K
Active
Java
Databases
#analytics#big-data#database

activeloopai/deeplake

Versatile database for AI, supporting storage, querying, versioning, and visualization of any AI data.

9.0K
Active
C++
LLM Frameworks
Vector Databases
PyTorch
#ai#data-storage#vector-database

GreptimeTeam/greptimedb

Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming.

6.0K
Active
Rust
Databases
API Frameworks
#analytics#cloud-native#distributed

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

5.2K
Active
Go
Data Lake
CLI Tools
#data-versioning#data-quality#git-for-data

DataLinkDC/dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

3.7K
Stable
Java
ETL & Pipelines
Databases
Apache Flink
#datalake#datawarehouse#flink

lakesoul-io/LakeSoul

LakeSoul is a cloud-native, real-time Lakehouse framework for fast data ingestion and analytics on cloud storage.

3.2K
Active
Java
API Frameworks
Databases
#big-data#lakehouse#streaming

apache/paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark.

3.2K
Active
Java
ETL & Pipelines
Realtime
#big-data#data-ingestion#flink

apache/gravitino

An open-source data catalog platform for building a high-performance, federated metadata lake.

2.9K
Active
Java
Databases
ETL & Pipelines
Java
#data-catalog#datalake#federated-query

tansu-io/tansu

Apache Kafka-compatible broker with support for S3, PostgreSQL, SQLite, Apache Iceberg, and Delta Lake.

1.6K
Active
Rust
API Frameworks
Databases
#apache-kafka#s3#postgresql

leo-project/leofs

LeoFS is a distributed, scalable, and fault-tolerant object storage system for developers working with large data volumes.

1.6K
Active
Erlang
Serverless
Databases
#distributed-storage#object-storage#fault-tolerant

zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

1.2K
Active
Java
ETL & Pipelines
ML Ops
#identity-resolution#entity-resolution#data-deduplication

lensesio/stream-reactor

A collection of open-source Kafka connectors for various data sources and destinations maintained by Lenses.io.

1.1K
Active
Scala
MCP Frameworks
ETL & Pipelines
Scala
#kafka#connectors#data-integration

Stay in the loop

Get weekly updates on trending AI coding tools and projects.