Explore Projects

Discover 13 open source projects

Active filters (1):
Search: hdfsร—
Clear all

Showing 1-13 of 13 projects

seaweedfs/seaweedfs

Distributed storage system for blobs, files, and data lakes

30.8K
Active
Go
Containerization
Databases
#distributed-storage#blob-storage#cloud-drive

donnemartin/data-science-ipython-notebooks

Data science Python notebooks covering deep learning, machine learning, big data, and more.

28.9K
Archived
Python
Computer Vision
ML Ops
TensorFlow
#data-science#deep-learning#machine-learning

heibaiying/BigData-Notes

A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.

16.9K
Archived
Java
Databases
#big-data#hadoop#spark

ceph/ceph

Ceph is a distributed object, block, and file storage platform that provides high-performance, highly available storage solutions.

16.3K
Active
C++
API Frameworks
#block-storage#cloud-storage#distributed-file-system

juicedata/juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.

13.3K
Active
Go
Databases
Go
#object-storage#s3#redis

wangzhiwubigdata/God-Of-BigData

A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.

10.4K
Archived
Databases
#big-data#hadoop#spark

apache/hbase

Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.

5.6K
Active
Java
Databases
#database#distributed#scalable

piskvorky/smart_open

Utilities for streaming large files (S3, HDFS, gzip, bz2) in Python.

3.4K
Active
Python
API Frameworks
Caching
Python
#streaming#file-handling#s3

collabH/bigdata-growth

A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.

1.7K
Stable
Shell
Databases
ETL & Pipelines
#bigdata#hadoop#spark

OBenner/data-engineering-interview-questions

This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.

1.5K
Active
Python
Interview Prep
ETL & Pipelines
#data-engineering#interview-questions#interview-prep

colinmarc/hdfs

A native Go client for interacting with the Hadoop Distributed File System (HDFS).

1.4K
Archived
Go
API Frameworks
Databases
#hdfs#hadoop#distributed-file-system

wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly

1.4K
Active
Java
ETL & Pipelines
API Frameworks
#etl#database#rdbms

HariSekhon/Nagios-Plugins

A comprehensive collection of Nagios plugins for monitoring AWS, Hadoop, Cloud, Kafka, and other popular technologies.

1.1K
Active
Python
Monitoring
CLI Tools
#monitoring#cloud#devops

Stay in the loop

Get weekly updates on trending AI coding tools and projects.