Showing 1-13 of 13 projects
Distributed storage system for blobs, files, and data lakes
Data science Python notebooks covering deep learning, machine learning, big data, and more.
A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.
Ceph is a distributed object, block, and file storage platform that provides high-performance, highly available storage solutions.
JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.
A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.
Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.
Utilities for streaming large files (S3, HDFS, gzip, bz2) in Python.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.
A native Go client for interacting with the Hadoop Distributed File System (HDFS).
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly
A comprehensive collection of Nagios plugins for monitoring AWS, Hadoop, Cloud, Kafka, and other popular technologies.
Get weekly updates on trending AI coding tools and projects.