Showing 1-20 of 56 projects
Distributed storage system for blobs, files, and data lakes
Data science Python notebooks covering deep learning, machine learning, big data, and more.
Distributed gradient boosting library for fast and accurate data science solutions
Luigi is a Python module that helps developers build complex batch job pipelines with dependency management and workflow orchestration.
APIJSON is a secure, coding-free ORM library that provides APIs and documentation without backend coding.
A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.
Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.
Apache Hadoop is a popular open-source distributed computing framework for processing and storing large datasets.
Suite of tools for deploying and training deep learning models using the JVM
Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.
A curated list of awesome System Design resources for developers working on distributed systems.
A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.
An open-source curriculum for onboarding entry-level talents into the SRE role at LinkedIn.
A collection of 1000+ DevOps Bash scripts for managing AWS, GCP, Kubernetes, Docker, CI/CD, APIs, databases, and more.
An open-source, distributed machine learning platform with support for various algorithms and autoML.
Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.
Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.
lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
Glow is a distributed computation system written in Go, similar to Hadoop MapReduce, Spark, and Flink.
Get weekly updates on trending AI coding tools and projects.