Explore Projects

Discover 34 open source projects

Active filters (1):

Search: bigdata×

Clear all

Showing 21-34 of 34 projects

byzer-org/byzer-lang

Byzer is a low-code open-source programming language for data pipeline, analytics and AI.

1.8K

Archived

Scala

ML Ops

ETL & Pipelines

Scala

#bigdata#machine-learning#sql-like-dsl

Netflix/genie

Genie is a distributed big data orchestration service that helps manage and execute complex data pipelines.

1.8K

Active

Java

API Frameworks

Caching

Spring Boot

#big-data#distributed-systems#microservices

collabH/bigdata-growth

A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.

1.7K

Stable

Shell

Databases

ETL & Pipelines

#bigdata#hadoop#spark

YoongiKim/AutoCrawler

A powerful Google and Naver web crawler built with Python, Selenium, and multiprocessing for efficient large-scale data collection.

1.7K

Archived

Python

Backend & APIs

Data Pipelines

Selenium

#web-crawler#multiprocessing#data-extraction

jadianes/spark-py-notebooks

Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.

1.7K

Archived

Jupyter Notebook

Databases

ETL & Pipelines

Jupyter Notebook

#big-data#data-analysis#data-science

hi-primus/optimus

Agile data preparation workflows made easy with popular Python data science libraries.

1.5K

Archived

Python

ETL & Pipelines

API Frameworks

#big-data-cleaning#data-analysis#data-cleaning

Intel-bigdata/HiBench

HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.

1.5K

Stable

Java

Benchmark

#big-data#benchmark#hadoop

tensorbase/tensorbase

TensorBase is a new big data warehousing solution built with Rust, focused on high-performance analytics.

1.5K

Archived

Rust

Databases

API Frameworks

#analytics#bigdata#data-warehouse

opendatadiscovery/odd-platform

First open-source data discovery and observability platform for data practitioners.

1.4K

Active

Java

Data Discovery

Data Observability

#data-catalog#data-engineering#data-governance

biubiubiu01/vue3-bigData

A Vue.js 3.0 based big data analysis system with various Echarts and Vue 3.0 APIs.

1.2K

Archived

Vue

Charts & Visualization

Component Libraries (Vue/Svelte)

Vue

#data-visualization#charts#vue3

kubernetes-retired/kube-batch

A Kubernetes batch scheduler for high-performance workloads like AI/ML, Big Data, and HPC.

1.1K

Archived

API Frameworks

Containerization

Kubernetes

#kubernetes#batch-processing#scheduling

ganweisoft/TOMs

A high-performance, plugin-oriented, and scenario-agnostic development framework for building complex applications.

1.1K

Stable

CSS

API Frameworks

CLI Tools

#distributed#industrial-iot#iot

josonle/Coding-Now

A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.

1.0K

Archived

Python

Databases

CLI Tools

#big-data#machine-learning#data-analysis

apache/celeborn

Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.

1.0K

Active

Java

Caching

Realtime

#bigdata#shuffle#spark

Stay in the loop

Get weekly updates on trending AI coding tools and projects.