Explore Projects

Discover 85 open source projects

Active filters (1):
Search: big-dataร—
Clear all

Showing 41-60 of 85 projects

TuiQiao/CBoard

An open-source BI reporting and dashboard platform with data visualization and business intelligence features.

3.1K
Stable
JavaScript
Charts & Visualization
API Frameworks
React
#business-intelligence#data-visualization#charts

apache/hugegraph

A highly scalable, high-performance graph database that supports over 100 billion data points.

3.0K
Active
Java
Databases
API Frameworks
Java
#big-data#graph#graph-database

FeatureBaseDB/featurebase

FeatureBase is a fast analytical database built on bitmaps, perfect for ML and data-intensive applications.

2.5K
Archived
Go
Databases
RAG & Vector
Go
#database#analytics#machine-learning

jostmey/NakedTensor

Bare-bones examples of machine learning in TensorFlow for developers working with AI tools.

2.4K
Archived
Python
TensorFlow Tutorials
TensorFlow
#machine-learning#linear-regression#tensorflow

quarylabs/quary

Open-source BI platform for engineers to explore and model large-scale data pipelines.

2.4K
Active
Rust
ORMs & Query Builders
ETL & Pipelines
Rust
#analytics#big-data#data-modeling

man-group/ArcticDB

ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.

2.2K
Active
C++
Databases
Caching
Python
#data-analysis#data-science#dataframe

kafbat/kafka-ui

Open-source web UI for managing Apache Kafka clusters, a popular distributed streaming platform.

2.1K
Active
Java
API Frameworks
Databases
Java
#apache-kafka#big-data#cluster-management

Qihoo360/poseidon

A high-performance search engine capable of handling 100 trillion lines of log data using Go.

2.0K
Archived
Go
API Frameworks
Search
#big-data#search-engine#golang

apache/bookkeeper

Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.

2.0K
Active
Java
Databases
Realtime
#distributed-systems#big-data#wal

apache/datafusion-ballista

Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.

2.0K
Active
Rust
Databases
ETL & Pipelines
#big-data#dataframe#distributed

apache/kudu

Apache Kudu is a high-performance, open-source columnar storage engine for large datasets in the Apache Hadoop ecosystem.

1.9K
Active
C++
Databases
API Frameworks
#big-data#cplusplus#open-source

fluid-cloudnative/fluid

Fluid is a distributed data abstraction and acceleration framework for Big Data and AI applications on the cloud.

1.9K
Active
Go
Caching
Realtime
Kubernetes
#big-data#distributed-cache#kubernetes

apache/fluss

Apache Fluss is a real-time streaming storage platform built for big data analytics.

1.8K
Active
Java
Databases
Realtime
#big-data#real-time-analytics#streaming

gchq/Gaffer

A large-scale entity and relation database supporting aggregation of properties for big data applications.

1.8K
Experimental
Java
Databases
API Frameworks
#big-data#graph-database#hadoop

Netflix/genie

Genie is a distributed big data orchestration service that helps manage and execute complex data pipelines.

1.8K
Active
Java
API Frameworks
Caching
Spring Boot
#big-data#distributed-systems#microservices

apache/auron

The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.

1.7K
Active
Rust
Databases
API Frameworks
Spark
#big-data#distributed-computing#vectorized-execution

bytedance/bitsail

Distributed high-performance data integration engine for batch, streaming, and incremental scenarios.

1.7K
Archived
Java
Flink
#authentication#streaming#real-time

jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

1.7K
Archived
Jupyter Notebook
Databases
ETL & Pipelines
Jupyter Notebook
#big-data#data-analysis#data-science

kantord/just-dashboard

A framework-agnostic dashboard library that allows creating dashboards using YAML or JSON files.

1.6K
Archived
JavaScript
Charts & Visualization
CLI Tools
React
#dashboard#data-visualization#yaml

tonbo-io/tonbo

Tonbo is an embedded database for serverless and edge runtimes, optimized for offline-first and big data use cases.

1.5K
Active
Rust
Databases
RAG & Vector
#embedded-database#offline-first#big-data

Stay in the loop

Get weekly updates on trending AI coding tools and projects.