Explore Projects

Discover 18 open source projects

Active filters (1):
Search: apache-sparkร—
Clear all

Showing 1-18 of 18 projects

mlflow/mlflow

MLflow is an open-source platform for building, tracking, and deploying AI/ML models with end-to-end observability and evaluation tools.

24.6K
Active
Python
ML Ops
Agent Coordination
LangChain
#mlflow#ai-models#experiment-tracking

microsoft/SynapseML

SynapseML is a simple and distributed machine learning library for building and deploying AI models at scale.

5.2K
Active
Scala
ML Ops
Big Data
Apache Spark
#machine-learning#distributed-computing#big-data

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

5.2K
Active
Go
Data Lake
CLI Tools
#data-versioning#data-quality#git-for-data

lw-lin/CoolplaySpark

Open-source Spark codebase analysis and library for Scala developers working with Apache Spark.

3.5K
Archived
Scala
API Frameworks
Databases
Scala
#apache-spark#spark-streaming#spark-core

spark-notebook/spark-notebook

An interactive and reactive data science platform powered by Scala and Apache Spark.

3.2K
Archived
JavaScript
Databases
ETL & Pipelines
Scala
#data-science#interactive#reactive

kubeflow/spark-operator

A Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

3.1K
Active
Go
API Frameworks
Containerization
Kubernetes
#apache-spark#kubernetes#kubernetes-operator

intel/BigDL

BigDL is a distributed deep learning library that allows developers to run TensorFlow, Keras and PyTorch models on Apache Spark/Flink and Ray.

2.7K
Stable
Jupyter Notebook
Distributed Deep Learning
API Frameworks
TensorFlow
#deep-learning#distributed-computing#spark

feathr-ai/feathr

Feathr is a scalable, unified data and AI engineering platform for enterprises, with features like feature engineering, feature governance, and a feature marketplace.

1.9K
Archived
Scala
Feature Flags
MLOps
Apache Spark
#data-engineering#feature-engineering#feature-governance

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources for developers.

1.9K
Archived
Shell

OryxProject/oryx

A distributed real-time machine learning platform built on Apache Spark and Kafka for large-scale workloads.

1.8K
Archived
Java
ML Ops
API Frameworks
Apache Spark
#real-time#machine-learning#big-data

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.

1.7K
Archived
Jupyter Notebook
Databases
ETL & Pipelines
#sql#data-analysis#data-visualization

japila-books/apache-spark-internals

This repository provides an in-depth look at the internals of the popular Apache Spark data processing framework.

1.5K
Experimental
API Frameworks
Databases
#apache-spark#data-processing#distributed-computing

san089/goodreads_etl_pipeline

An end-to-end data pipeline for building a data lake, data warehouse, and analytics platform from GoodReads data.

1.5K
Archived
Python
ETL & Pipelines
Background Jobs
Apache Airflow
#data-engineering#etl-pipeline#data-lake

databricks/LearningSparkV2

This is a book that teaches how to use Apache Spark for lightning-fast data analytics.

1.4K
Archived
Scala
ETL & Pipelines
Databases
Spark
#apache-spark#delta-lake#mlflow

lensacom/sparkit-learn

A Python library that integrates Scikit-learn into the Apache Spark distributed computing framework.

1.2K
Archived
Python
ML Ops
ETL & Pipelines
#apache-spark#scikit-learn#distributed-computing

graphframes/graphframes

GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.

1.1K
Active
Scala
Databases
Caching
#apache-spark#big-data#graph-analysis

mahmoudparsian/data-algorithms-book

This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.

1.1K
Archived
Java
Databases
ETL & Pipelines
Apache Hadoop
#data-algorithms#mapreduce#spark

databricks/spark-sklearn

Deprecated Scikit-learn integration package for Apache Spark, useful for machine learning on big data.

1.1K
Archived
Python
ML Ops
Databases
#machine-learning#scikit-learn#apache-spark

Stay in the loop

Get weekly updates on trending AI coding tools and projects.