ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 21-40 of 310 projects

apache/incubator-seata

Distributed transaction solution for microservices

26.0K
Active
Java
Realtime
ETL & Pipelines
Java
#distributed-transaction#microservices#transaction-coordinator

apache/flink

Apache Flink is a stream processing framework for real-time and batch data processing.

25.8K
Active
Java
ETL & Pipelines
Backend Frameworks
Apache Hadoop
#stream-processing#batch-processing#data-streams

plotly/dash

Build interactive data apps and dashboards with Python, no JavaScript required.

24.4K
Active
Python
Full-Stack Frameworks
ML Ops
Flask
#data-visualization#dash#python

dataease/dataease

Open-source BI tool for data visualization and analysis

23.5K
Active
Java
ETL & Pipelines
Admin Dashboards
Spring Boot
#business-intelligence#data-visualization#data-analysis

ranaroussi/yfinance

Python library for downloading financial data from Yahoo! Finance

21.9K
Active
Python
API Clients & Testing
ETL & Pipelines
#financial-data#market-data#yahoo-finance

PrefectHQ/prefect

Workflow orchestration for resilient data pipelines in Python

21.8K
Active
Python
ETL & Pipelines
CI/CD
Python
#data-pipelines#orchestration#workflow-automation

recommenders-team/recommenders

Recommenders is a project for prototyping and operationalizing recommendation systems with Jupyter notebooks and best practices.

21.5K
Active
Python
ML Ops
ETL & Pipelines
Python
#recommendation-systems#machine-learning#data-science

vectordotdev/vector

High-performance observability data pipeline for logs and metrics

21.4K
Active
Rust
Monitoring
ETL & Pipelines
#observability#data-pipeline#logs

thingsboard/thingsboard

Open-source IoT platform for device management, data collection, and visualization

21.3K
Active
Java
Home Automation
Realtime
Java
#iot-platform#device-management#data-visualization

huggingface/datasets

AI-powered dataset management and preprocessing library for ML projects

21.2K
Active
Python
ML Ops
ETL & Pipelines
HuggingFace
#datasets#ml-ops#data-preprocessing

airbytehq/airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

20.8K
Active
Python
ETL & Pipelines
#data-integration#elt#etl

facebook/prophet

Time series forecasting with Prophet for multiple seasonality and growth patterns.

20.0K
Active
Python
Inference
ETL & Pipelines
Python
#forecasting#time-series#data-science

cube-js/cube

Open-source semantic layer for AI, BI, and embedded analytics

19.6K
Active
Rust
Agents & Orchestration
ETL & Pipelines
Rust
#semantic-layer#ai-analytics#bi-tool

argoproj/argo-workflows

Argo Workflows is a powerful open-source workflow engine for Kubernetes, enabling complex data processing and machine learning pipelines.

16.5K
Active
Go
ETL & Pipelines
Kubernetes
#kubernetes#pipelines#workflow

treeverse/dvc

dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.

15.4K
Active
Python
ETL & Pipelines
Python
#data-versioning#machine-learning#reproducibility

dagster-io/dagster

An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.

15.1K
Active
Python
ETL & Pipelines
Python
#data-engineering#data-orchestration#workflow-automation

andkret/Cookbook

A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.

15.0K
Active
Python
ETL & Pipelines
Python
#data-engineering#etl#pipeline

debezium/debezium

An open-source framework for change data capture from various databases using Apache Kafka.

12.5K
Active
Java
ETL & Pipelines
Apache Kafka
#change-data-capture#event-streaming#database

dbt-labs/dbt-core

dbt enables data analysts and engineers to transform data using software engineering practices.

12.3K
Active
Python
ETL & Pipelines
Python
#analytics#business-intelligence#data-modeling

great-expectations/great_expectations

A Python library that helps ensure data quality and reliability through data profiling and testing.

11.2K
Active
Python
ETL & Pipelines
#data-quality#data-testing#data-profiling
13...16

Stay in the loop

Get weekly updates on trending AI coding tools and projects.