Explore Projects

Discover 140 open source projects

Active filters (1):

Search: spark×

Clear all

Showing 81-100 of 140 projects

zhonghuasheng/Tutorial

A comprehensive tutorial covering a wide range of backend technologies like Java, Go, MySQL, Redis, and more.

1.7K

Experimental

Shell

API Frameworks

Databases

Spring

#java#go#backend

collabH/bigdata-growth

A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.

1.7K

Stable

Shell

Databases

ETL & Pipelines

#bigdata#hadoop#spark

apache/auron

The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.

1.7K

Active

Rust

Databases

API Frameworks

Spark

#big-data#distributed-computing#vectorized-execution

strapdata/elassandra

Elassandra is a distributed search and analytics platform that combines Elasticsearch and Apache Cassandra for developers building mission-critical applications.

1.7K

Experimental

Java

API Frameworks

Databases

#cassandra#elasticsearch#nosql

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.

1.7K

Archived

Jupyter Notebook

Databases

ETL & Pipelines

#sql#data-analysis#data-visualization

jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

1.7K

Archived

Jupyter Notebook

Databases

ETL & Pipelines

Jupyter Notebook

#big-data#data-analysis#data-science

almond-sh/almond

A Scala kernel for Jupyter, allowing developers to use Scala in Jupyter Notebooks.

1.6K

Active

Scala

API Frameworks

IDE Extensions

#jupyter#scala#repl

maxpumperla/elephas

Distributed deep learning library for Keras and Spark, enabling scalable training of neural networks.

1.6K

Archived

Python

LLM Frameworks

Databases

Keras

#deep-learning#distributed-computing#neural-networks

holdenk/spark-testing-base

A base library for writing tests with Apache Spark in Scala.

1.6K

Stable

Scala

Testing

Scala

#testing#spark#scala

japila-books/apache-spark-internals

This repository provides an in-depth look at the internals of the popular Apache Spark data processing framework.

1.5K

Experimental

API Frameworks

Databases

#apache-spark#data-processing#distributed-computing

hi-primus/optimus

Agile data preparation workflows made easy with popular Python data science libraries.

1.5K

Archived

Python

ETL & Pipelines

API Frameworks

#big-data-cleaning#data-analysis#data-cleaning

combust/mleap

MLeap is a library for deploying machine learning pipelines to production using Scala, Python, and Spark.

1.5K

Active

Scala

ML Ops

API Frameworks

Scala

#machine-learning#pipeline#production

OBenner/data-engineering-interview-questions

This GitHub repository contains over 2,000 data engineering interview questions to help developers prepare.

1.5K

Active

Python

Interview Prep

ETL & Pipelines

#data-engineering#interview-questions#interview-prep

Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.

1.4K

Active

Java

Databases

API Frameworks

#data-catalog#data-lakes#git-semantics

mesos/spark

Lightning-fast cluster computing in Java, Scala and Python.

1.4K

Archived

Scala

API Frameworks

ORMs & Query Builders

Scala

#cluster-computing#big-data#distributed-systems

1 2 3 46 7

Stay in the loop

Get weekly updates on trending AI coding tools and projects.