Explore Projects

Discover 140 open source projects

Active filters (1):
Search: spark×
Clear all

Showing 21-40 of 140 projects

wangzhiwubigdata/God-Of-BigData

A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.

10.4K
Archived
Databases
#big-data#hadoop#spark

swyxio/spark-joy

A curated collection of design inspiration and UI components for developers to add delight to their products.

9.7K
Active
Animation & Motion
React
#ui-design#design-inspiration#css

perwendel/spark

A simple, expressive Java web framework for building API servers and web applications.

9.7K
Archived
Java
API Frameworks
Java
#java#web-framework#api-development

tobymao/sqlglot

A Python library for parsing and transpiling SQL queries across various databases and engines.

9.0K
Active
Python
API Frameworks
ORMs & Query Builders
Python
#sql-parser#sql-transpiler#database-abstraction

mage-ai/mage-ai

mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.

8.7K
Active
Python
ETL & Pipelines
ML Ops
Python
#data-pipelines#data-transformation#data-integration

delta-io/delta

An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.

8.6K
Active
Scala
ETL & Pipelines
API Frameworks
Spark
#big-data#data-engineering#data-lakehouse

Laravel-Lang/lang

A comprehensive set of translations for Laravel and related frameworks, enabling localization of web applications.

7.8K
Active
PHP
Backend Frameworks
Localization
Laravel
#laravel#localization#translation

h2oai/h2o-3

An open-source, distributed machine learning platform with support for various algorithms and autoML.

7.5K
Active
Jupyter Notebook
ML Ops
Databases
#machine-learning#automl#distributed

Alluxio/alluxio

Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.

7.2K
Experimental
Java
Data Orchestration
ML Ops
Spark
#data-analysis#data-orchestration#memory-speed

Angel-ML/angel

A flexible and powerful parameter server for large-scale machine learning models and distributed training.

6.8K
Stable
Java
ML Ops
API Frameworks
Scala
#machine-learning#distributed-training#parameter-server

apache/zeppelin

Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents.

6.6K
Active
Java
Databases
API Frameworks
Java
#big-data#database#data-analytics

JerryLead/SparkInternals

This repository contains notes on the design and implementation of the Apache Spark distributed computing framework.

5.4K
Archived
Learning & Education
API Frameworks
#apache-spark#distributed-computing#data-processing

microsoft/SynapseML

SynapseML is a simple and distributed machine learning library for building and deploying AI models at scale.

5.2K
Active
Scala
ML Ops
Big Data
Apache Spark
#machine-learning#distributed-computing#big-data

treeverse/lakeFS

lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.

5.2K
Active
Go
Data Lake
CLI Tools
#data-versioning#data-quality#git-for-data

tencentmusic/cube-studio

An open-source cloud-native AI platform for ML/DL workflows, model serving, and distributed training.

4.9K
Stable
Python
MLOps
BaaS Platforms
PyTorch
#ai-platform#mlops#model-serving

Cyb3rWard0g/HELK

An open-source threat hunting platform built on the ELK stack for security researchers and analysts.

3.9K
Archived
Jupyter Notebook
Search
Testing
#threat-hunting#security#elk-stack

databricks/learning-spark

Example code from the Learning Spark book, a resource for developers learning Spark.

3.9K
Experimental
Java
Books & Guides
API Frameworks
#spark#big-data#distributed-computing

yahoo/TensorFlowOnSpark

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters for distributed machine learning.

3.9K
Archived
Python
ML Ops
API Frameworks
Spark
#machine-learning#cluster#distributed-computing

RoaringBitmap/RoaringBitmap

A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.

3.8K
Active
Java
Databases
CLI Tools
Java
#bitset#druid#lucene

awslabs/deequ

Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.

3.6K
Active
Scala
ETL & Pipelines
Testing
Spark
#data-quality#unit-testing#apache-spark

Stay in the loop

Get weekly updates on trending AI coding tools and projects.