Showing 1-20 of 85 projects
A curated list of resources for designing scalable, reliable, and performant large-scale systems.
Real-time analytics database for generating data reports
Unified analytics engine for large-scale data processing
Data science Python notebooks covering deep learning, machine learning, big data, and more.
Apache Flink is a stream processing framework for real-time and batch data processing.
Open-source IoT platform for device management, data collection, and visualization
An open-source protocol for syncing decentralized graph data with security and privacy focus.
A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.
Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.
A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.
Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.
PredictionIO is a machine learning server for developers and ML engineers, enabling building and deploying production-ready ML services.
Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.
CMAK is a tool for managing Apache Kafka clusters, a popular distributed streaming platform.
An open-source web UI for managing Apache Kafka clusters, supporting developers working with event streaming.
A high-performance open source query engine for sub-second analytics on data lakehouse.
Cloud-native search engine for observability, an open-source alternative to popular tools.
The Cython project is a Python to C compiler that enables high-performance Python applications.
A high-performance gradient boosting library for machine learning tasks on CPUs and GPUs.
An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.
Get weekly updates on trending AI coding tools and projects.