Explore Projects

Discover 140 open source projects

Active filters (1):
Search: sparkร—
Clear all

Showing 101-120 of 140 projects

databricks/LearningSparkV2

This is a book that teaches how to use Apache Spark for lightning-fast data analytics.

1.4K
Archived
Scala
ETL & Pipelines
Databases
Spark
#apache-spark#delta-lake#mlflow

HariSekhon/Dockerfiles

A collection of 50+ Docker images for DevOps tools, CI/CD, Hadoop, Kafka, Cassandra, and more.

1.4K
Active
Shell
Containerization
CLI Tools
#docker#kubernetes#devops

linkedin/dr-elephant

Dr. Elephant is a performance monitoring and tuning tool for Apache Hadoop and Apache Spark.

1.4K
Archived
Java
API Frameworks
#performance-monitoring#apache-hadoop#apache-spark

keyu-tian/SparK

A PyTorch implementation of a BERT-style pretraining method for convolutional networks, enabling more efficient self-supervised learning.

1.4K
Archived
Python
Computer Vision
Fine-tuning
PyTorch
#bert#cnn#convnet

jupyter-incubator/sparkmagic

Provides Jupyter magics and kernels for working with remote Spark clusters, enabling data scientists to easily interact with Spark from Jupyter Notebooks.

1.4K
Stable
Python
API Frameworks
Databases
Jupyter
#spark#jupyter-notebook#pyspark

spark-examples/pyspark-examples

A collection of PySpark examples covering RDD, DataFrame, and Dataset operations in Python.

1.3K
Stable
Python
Databases
API Frameworks
Python
#pyspark#spark#big-data

iflytek/datasophon

An open-source big data management platform that helps developers build scalable cloud-native data applications.

1.3K
Experimental
Java
API Frameworks
Databases
Kubernetes
#cloud-native#big-data#data-management

robinhood/spark

A simple Android sparkline chart view library for displaying data trends.

1.3K
Archived
Java
Charts & Visualization
Android
#android#chart#graph

DTStack/Taier

A big data development platform for submission, scheduling, operation and maintenance, and indicator information display.

1.3K
Archived
Java
API Frameworks
ETL & Pipelines
Flink
#big-data#data-pipeline#task-scheduling

mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.

1.3K
Experimental
Jupyter Notebook
Databases
ETL & Pipelines
#big-data#data-algorithms#dataframes

yahoo/CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters for vibe coders.

1.3K
Archived
Jupyter Notebook
ML Ops
API Frameworks
#deep-learning#distributed-computing#hadoop

lucko/spark

A high-performance profiler for Minecraft clients, servers, and proxies written in Java.

1.3K
Active
Java
MCP Frameworks
Performance
#minecraft#performance#profiler

t59688/arboris-novel

An AI-powered writing companion that helps spark creative inspiration for novel writing

1.2K
Active
Python
LLM Frameworks
AI App Builders
Python
#ai-writing-assistant#novel-writing#creative-inspiration

sryza/spark-timeseries

A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.

1.2K
Archived
Scala
Databases
ETL & Pipelines
Spark
#time-series#large-scale#data-processing

QingWei-Li/vue-trend

A simple, elegant spark lines library for Vue.js developers

1.2K
Archived
JavaScript
Component Libraries (Vue/Svelte)
Animation & Motion
Vue
#vue#components#animation

apachecn/spark-doc-zh

This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.

1.2K
Archived
JavaScript
Databases
API Frameworks
#big-data#spark#java

killrweather/killrweather

A reference application showcasing the integration of streaming and batch data processing with Apache Spark Streaming, Cassandra, Kafka, and Akka.

1.2K
Archived
Scala
API Frameworks
Databases
Akka
#streaming#batch-processing#time-series

lakehq/sail

LakeSail is a Rust-based computation framework that unifies batch processing, stream processing, and AI workloads.

1.2K
Active
Rust
ML Ops
ETL & Pipelines
#distributed-computing#data-engineering#big-data

emmabostian/design-inspiration

A curated collection of websites to inspire creativity and design for developers.

1.2K
Archived
Design Inspiration
#design#inspiration#creative

zinggAI/zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

1.2K
Active
Java
ETL & Pipelines
ML Ops
#identity-resolution#entity-resolution#data-deduplication

Stay in the loop

Get weekly updates on trending AI coding tools and projects.