Showing 21-34 of 34 projects
Byzer is a low-code open-source programming language for data pipeline, analytics and AI.
Genie is a distributed big data orchestration service that helps manage and execute complex data pipelines.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
A powerful Google and Naver web crawler built with Python, Selenium, and multiprocessing for efficient large-scale data collection.
Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.
Agile data preparation workflows made easy with popular Python data science libraries.
HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.
TensorBase is a new big data warehousing solution built with Rust, focused on high-performance analytics.
First open-source data discovery and observability platform for data practitioners.
A Vue.js 3.0 based big data analysis system with various Echarts and Vue 3.0 APIs.
A Kubernetes batch scheduler for high-performance workloads like AI/ML, Big Data, and HPC.
A high-performance, plugin-oriented, and scenario-agnostic development framework for building complex applications.
A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.
Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.
Get weekly updates on trending AI coding tools and projects.