Explore Projects

Discover 56 open source projects

Active filters (1):
Search: hadoopร—
Clear all

Showing 1-20 of 56 projects

seaweedfs/seaweedfs

Distributed storage system for blobs, files, and data lakes

30.8K
Active
Go
Containerization
Databases
#distributed-storage#blob-storage#cloud-drive

donnemartin/data-science-ipython-notebooks

Data science Python notebooks covering deep learning, machine learning, big data, and more.

28.9K
Archived
Python
Computer Vision
ML Ops
TensorFlow
#data-science#deep-learning#machine-learning

dmlc/xgboost

Distributed gradient boosting library for fast and accurate data science solutions

28.1K
Active
C++
ML Ops
Multi-Purpose
#xgboost#machine-learning#distributed-systems

spotify/luigi

Luigi is a Python module that helps developers build complex batch job pipelines with dependency management and workflow orchestration.

18.7K
Active
Python
API Frameworks
#pipeline#batch-processing#dependency-management

Tencent/APIJSON

APIJSON is a secure, coding-free ORM library that provides APIs and documentation without backend coding.

18.4K
Active
Java
BaaS Platforms
Java
#baas#api#orm

heibaiying/BigData-Notes

A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.

16.9K
Archived
Java
Databases
#big-data#hadoop#spark

prestodb/presto

Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.

16.7K
Active
Java
Databases
#big-data#sql#query

apache/hadoop

Apache Hadoop is a popular open-source distributed computing framework for processing and storing large datasets.

15.5K
Active
Java
API Frameworks
#distributed-computing#big-data#nosql

deeplearning4j/deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM

14.2K
Active
Java
ML Ops
Java
#artificial-intelligence#deeplearning#neural-nets

trinodb/trino

Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.

12.6K
Active
Java
Databases
#big-data#analytics#data-science

madd86/awesome-system-design

A curated list of awesome System Design resources for developers working on distributed systems.

11.8K
Archived
API Frameworks
Node
#distributed-systems#microservices#nosql

wangzhiwubigdata/God-Of-BigData

A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.

10.4K
Archived
Databases
#big-data#hadoop#spark

linkedin/school-of-sre

An open-source curriculum for onboarding entry-level talents into the SRE role at LinkedIn.

8.1K
Stable
HTML
Tutorials & Courses
Authentication
#sre#system-design#linux

HariSekhon/DevOps-Bash-tools

A collection of 1000+ DevOps Bash scripts for managing AWS, GCP, Kubernetes, Docker, CI/CD, APIs, databases, and more.

8.1K
Active
Shell
CLI Tools
Containerization
#devops#aws#gcp

h2oai/h2o-3

An open-source, distributed machine learning platform with support for various algorithms and autoML.

7.5K
Active
Jupyter Notebook
ML Ops
Databases
#machine-learning#automl#distributed

Alluxio/alluxio

Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.

7.2K
Experimental
Java
Data Orchestration
ML Ops
Spark
#data-analysis#data-orchestration#memory-speed

apache/hive

Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.

6.0K
Active
Java
Databases
API Frameworks
#apache#big-data#database

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

5.2K
Active
Go
Data Lake
CLI Tools
#data-versioning#data-quality#git-for-data

WeBankFinTech/DataSphereStudio

DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.

3.3K
Stable
Java
ETL & Pipelines
API Frameworks
Spark
#data-management#data-analysis#data-visualization

chrislusf/glow

Glow is a distributed computation system written in Go, similar to Hadoop MapReduce, Spark, and Flink.

3.2K
Archived
Go
API Frameworks
Databases
#distributed-computing#big-data#data-processing

Stay in the loop

Get weekly updates on trending AI coding tools and projects.