Explore Projects

Discover 34 open source projects

Active filters (1):
Search: bigdata×
Clear all

Showing 21-34 of 34 projects

byzer-org/byzer-lang

Byzer is a low-code open-source programming language for data pipeline, analytics and AI.

1.8K
Archived
Scala
ML Ops
ETL & Pipelines
Scala
#bigdata#machine-learning#sql-like-dsl

Netflix/genie

Genie is a distributed big data orchestration service that helps manage and execute complex data pipelines.

1.8K
Active
Java
API Frameworks
Caching
Spring Boot
#big-data#distributed-systems#microservices

collabH/bigdata-growth

A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.

1.7K
Stable
Shell
Databases
ETL & Pipelines
#bigdata#hadoop#spark

YoongiKim/AutoCrawler

A powerful Google and Naver web crawler built with Python, Selenium, and multiprocessing for efficient large-scale data collection.

1.7K
Archived
Python
Backend & APIs
Data Pipelines
Selenium
#web-crawler#multiprocessing#data-extraction

jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

1.7K
Archived
Jupyter Notebook
Databases
ETL & Pipelines
Jupyter Notebook
#big-data#data-analysis#data-science

hi-primus/optimus

Agile data preparation workflows made easy with popular Python data science libraries.

1.5K
Archived
Python
ETL & Pipelines
API Frameworks
#big-data-cleaning#data-analysis#data-cleaning

Intel-bigdata/HiBench

HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.

1.5K
Stable
Java
Benchmark
#big-data#benchmark#hadoop

tensorbase/tensorbase

TensorBase is a new big data warehousing solution built with Rust, focused on high-performance analytics.

1.5K
Archived
Rust
Databases
API Frameworks
#analytics#bigdata#data-warehouse

opendatadiscovery/odd-platform

First open-source data discovery and observability platform for data practitioners.

1.4K
Active
Java
Data Discovery
Data Observability
#data-catalog#data-engineering#data-governance

biubiubiu01/vue3-bigData

A Vue.js 3.0 based big data analysis system with various Echarts and Vue 3.0 APIs.

1.2K
Archived
Vue
Charts & Visualization
Component Libraries (Vue/Svelte)
Vue
#data-visualization#charts#vue3

kubernetes-retired/kube-batch

A Kubernetes batch scheduler for high-performance workloads like AI/ML, Big Data, and HPC.

1.1K
Archived
Go
API Frameworks
Containerization
Kubernetes
#kubernetes#batch-processing#scheduling

ganweisoft/TOMs

A high-performance, plugin-oriented, and scenario-agnostic development framework for building complex applications.

1.1K
Stable
CSS
API Frameworks
CLI Tools
Go
#distributed#industrial-iot#iot

josonle/Coding-Now

A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.

1.0K
Archived
Python
Databases
CLI Tools
#big-data#machine-learning#data-analysis

apache/celeborn

Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.

1.0K
Active
Java
Caching
Realtime
#bigdata#shuffle#spark
1

Stay in the loop

Get weekly updates on trending AI coding tools and projects.