Showing 81-100 of 140 projects
A comprehensive tutorial covering a wide range of backend technologies like Java, Go, MySQL, Redis, and more.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.
Elassandra is a distributed search and analytics platform that combines Elasticsearch and Apache Cassandra for developers building mission-critical applications.
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.
A Scala kernel for Jupyter, allowing developers to use Scala in Jupyter Notebooks.
Distributed deep learning library for Keras and Spark, enabling scalable training of neural networks.
A base library for writing tests with Apache Spark in Scala.
This repository provides an in-depth look at the internals of the popular Apache Spark data processing framework.
Agile data preparation workflows made easy with popular Python data science libraries.
MLeap is a library for deploying machine learning pipelines to production using Scala, Python, and Spark.
This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.
Code to accompany the book Advanced Analytics with Spark, focused on Scala-based big data and machine learning.
Gluten is a Scala library that offloads JVM-based SQL engines' execution to native engines for improved performance.
An end-to-end data pipeline for building a data lake, data warehouse, and analytics platform from GoodReads data.
A machine learning platform and recommendation engine built on Kubernetes for deployment on cloud platforms.
CarbonData is a high-performance data store solution for big data analytics on Hadoop and Spark.
Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.
Lightning-fast cluster computing in Java, Scala and Python.
Get weekly updates on trending AI coding tools and projects.