Explore Projects

Discover 85 open source projects

Active filters (1):
Search: big-dataร—
Clear all

Showing 1-20 of 85 projects

binhnguyennus/awesome-scalability

A curated list of resources for designing scalable, reliable, and performant large-scale systems.

69.0K
Stable
Awesome Lists
#scalability#distributed-systems#system-design

ClickHouse/ClickHouse

Real-time analytics database for generating data reports

46.2K
Active
C++
Databases
#analytics#big-data#clickhouse

apache/spark

Unified analytics engine for large-scale data processing

42.9K
Active
Scala
ETL & Pipelines
Realtime
Apache
#big-data#spark#data-processing

donnemartin/data-science-ipython-notebooks

Data science Python notebooks covering deep learning, machine learning, big data, and more.

28.9K
Archived
Python
Computer Vision
ML Ops
TensorFlow
#data-science#deep-learning#machine-learning

apache/flink

Apache Flink is a stream processing framework for real-time and batch data processing.

25.8K
Active
Java
ETL & Pipelines
Backend Frameworks
Apache Hadoop
#stream-processing#batch-processing#data-streams

thingsboard/thingsboard

Open-source IoT platform for device management, data collection, and visualization

21.3K
Active
Java
Home Automation
Realtime
Java
#iot-platform#device-management#data-visualization

amark/gun

An open-source protocol for syncing decentralized graph data with security and privacy focus.

18.9K
Experimental
JavaScript
Realtime
JavaScript
#blockchain#decentralized#realtime

heibaiying/BigData-Notes

A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.

16.9K
Archived
Java
Databases
#big-data#hadoop#spark

prestodb/presto

Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.

16.7K
Active
Java
Databases
#big-data#sql#query

andkret/Cookbook

A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.

15.0K
Active
Python
ETL & Pipelines
Python
#data-engineering#etl#pipeline

trinodb/trino

Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.

12.6K
Active
Java
Databases
#big-data#analytics#data-science

apache/predictionio

PredictionIO is a machine learning server for developers and ML engineers, enabling building and deploying production-ready ML services.

12.5K
Archived
Scala
ML Ops
Scala
#big-data#machine-learning#predictive-analytics

vesoft-inc/nebula

Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.

12.1K
Stable
C++
Databases
C++
#database#graph-database#distributed

yahoo/CMAK

CMAK is a tool for managing Apache Kafka clusters, a popular distributed streaming platform.

11.9K
Archived
Scala
API Frameworks
#kafka#cluster-management#big-data

provectus/kafka-ui

An open-source web UI for managing Apache Kafka clusters, supporting developers working with event streaming.

11.9K
Archived
Java
API Frameworks
#apache-kafka#event-streaming#cluster-management

StarRocks/starrocks

A high-performance open source query engine for sub-second analytics on data lakehouse.

11.4K
Active
Java
Databases
#analytics#big-data#database

quickwit-oss/quickwit

Cloud-native search engine for observability, an open-source alternative to popular tools.

10.9K
Active
Rust
Rust
#search-engine#open-source#cloud-native

cython/cython

The Cython project is a Python to C compiler that enables high-performance Python applications.

10.6K
Active
Cython
API Frameworks
#performance#c-extensions#python-compiler

catboost/catboost

A high-performance gradient boosting library for machine learning tasks on CPUs and GPUs.

8.8K
Active
C++
ML Ops
API Frameworks
Python
#machine-learning#gradient-boosting#decision-trees

delta-io/delta

An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.

8.6K
Active
Scala
ETL & Pipelines
API Frameworks
Spark
#big-data#data-engineering#data-lakehouse

Stay in the loop

Get weekly updates on trending AI coding tools and projects.