Explore Projects

Discover 85 open source projects

Active filters (1):

Search: big-data×

Clear all

Showing 1-20 of 85 projects

binhnguyennus/awesome-scalability

A curated list of resources for designing scalable, reliable, and performant large-scale systems.

69.0K

Stable

Awesome Lists

#scalability#distributed-systems#system-design

ClickHouse/ClickHouse

Real-time analytics database for generating data reports

46.2K

Active

C++

Databases

#analytics#big-data#clickhouse

apache/spark

Unified analytics engine for large-scale data processing

42.9K

Active

Scala

ETL & Pipelines

Realtime

Apache

#big-data#spark#data-processing

donnemartin/data-science-ipython-notebooks

Data science Python notebooks covering deep learning, machine learning, big data, and more.

28.9K

Archived

Python

Computer Vision

ML Ops

TensorFlow

#data-science#deep-learning#machine-learning

apache/flink

Apache Flink is a stream processing framework for real-time and batch data processing.

25.8K

Active

Java

ETL & Pipelines

Backend Frameworks

Apache Hadoop

#stream-processing#batch-processing#data-streams

thingsboard/thingsboard

Open-source IoT platform for device management, data collection, and visualization

21.3K

Active

Java

Home Automation

Realtime

Java

#iot-platform#device-management#data-visualization

amark/gun

An open-source protocol for syncing decentralized graph data with security and privacy focus.

18.9K

Experimental

JavaScript

Realtime

JavaScript

#blockchain#decentralized#realtime

heibaiying/BigData-Notes

A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.

16.9K

Archived

Java

Databases

#big-data#hadoop#spark

prestodb/presto

Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.

16.7K

Active

Java

Databases

#big-data#sql#query

andkret/Cookbook

A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.

15.0K

Active

Python

ETL & Pipelines

Python

#data-engineering#etl#pipeline

trinodb/trino

Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.

12.6K

Active

Java

Databases

#big-data#analytics#data-science

apache/predictionio

PredictionIO is a machine learning server for developers and ML engineers, enabling building and deploying production-ready ML services.

12.5K

Archived

Scala

ML Ops

Scala

#big-data#machine-learning#predictive-analytics

vesoft-inc/nebula

Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.

12.1K

Stable

C++

Databases

C++

#database#graph-database#distributed

yahoo/CMAK

CMAK is a tool for managing Apache Kafka clusters, a popular distributed streaming platform.

11.9K

Archived

Scala

API Frameworks

#kafka#cluster-management#big-data

provectus/kafka-ui

An open-source web UI for managing Apache Kafka clusters, supporting developers working with event streaming.

11.9K

Archived

Java

API Frameworks

#apache-kafka#event-streaming#cluster-management

StarRocks/starrocks

A high-performance open source query engine for sub-second analytics on data lakehouse.

11.4K

Active

Java

Databases

#analytics#big-data#database

quickwit-oss/quickwit

Cloud-native search engine for observability, an open-source alternative to popular tools.

10.9K

Active

Rust

#search-engine#open-source#cloud-native

cython/cython

The Cython project is a Python to C compiler that enables high-performance Python applications.

10.6K

Active

Cython

API Frameworks

#performance#c-extensions#python-compiler

catboost/catboost

A high-performance gradient boosting library for machine learning tasks on CPUs and GPUs.

8.8K

Active

C++

ML Ops

API Frameworks

Python

#machine-learning#gradient-boosting#decision-trees

delta-io/delta

An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.

8.6K

Active

Scala

ETL & Pipelines

API Frameworks

Spark

#big-data#data-engineering#data-lakehouse

2 3 4 5

Stay in the loop

Get weekly updates on trending AI coding tools and projects.