Showing 1-20 of 31 projects
Distributed gradient boosting library for fast and accurate data science solutions
Apache Flink is a stream processing framework for real-time and batch data processing.
This is a comprehensive learning resource for the Flink stream processing framework, covering concepts, principles, and real-world use cases.
A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.
An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.
Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents.
Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.
This GitHub repository contains a collection of over 600 original articles and source code samples covering Java, Docker, Kubernetes, DevOPS, and more.
A real-time product recommendation system built with Flink, Redis, HBase, and Kafka for vibe coders.
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Alink is a machine learning algorithm platform built on Apache Flink, developed by Alibaba's PAI team.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
LakeSoul is a cloud-native, real-time Lakehouse framework for fast data ingestion and analytics on cloud storage.
Glow is a distributed computation system written in Go, similar to Hadoop MapReduce, Spark, and Flink.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark.
A comprehensive guide to big data, covering various tools and technologies for learning and development.
Political activism documentation on Chinese government censorship, human rights, and censorship circumvention techniques.
This is a big data analysis system for the Shenzhen metro with support for various data processing tools.
Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.
A cloud-native DataOps and AIOps platform for building and operating data-intensive applications.
Get weekly updates on trending AI coding tools and projects.