Explore Projects

Discover 16 open source projects

Active filters (1):
Search: data-integrationร—
Clear all

Showing 1-16 of 16 projects

apache/airflow

Apache Airflow for workflow orchestration

44.5K
Active
Python
ETL & Pipelines
Background Jobs
Python
#airflow#data-pipelines#workflow-orchestration

airbytehq/airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

20.8K
Active
Python
ETL & Pipelines
#data-integration#elt#etl

Avaiga/taipy

Taipy is a Python library that helps developers turn data and AI algorithms into production-ready web apps quickly.

19.1K
Active
Python
Agents & Orchestration
Python
#data-engineering#data-ops#data-visualization

dagster-io/dagster

An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.

15.1K
Active
Python
ETL & Pipelines
Python
#data-engineering#data-orchestration#workflow-automation

apache/seatunnel

A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.

9.1K
Active
Java
ETL & Pipelines
Realtime
#data-integration#batch#streaming

mage-ai/mage-ai

mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.

8.7K
Active
Python
ETL & Pipelines
ML Ops
Python
#data-pipelines#data-transformation#data-integration

apache/flink-cdc

Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.

6.4K
Active
Java
ETL & Pipelines
Realtime
#streaming#cdc#change-data-capture

cloudquery/cloudquery

Data pipelines for cloud config and security data, enabling CSPM, FinOps, and vulnerability management solutions.

6.3K
Active
Go
API Frameworks
ETL & Pipelines
Go
#cloud#security#data-engineering

fluvio-community/fluvio

Fluvio is an event stream processing engine for developers to build responsive data-intensive apps.

5.2K
Active
Rust
Data Pipelines
Realtime
Rust
#streaming#real-time#data-processing

jitsucom/jitsu

Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.

4.7K
Active
TypeScript
ETL & Pipelines
API Frameworks
TypeScript
#data-ingestion#etl#segment-alternative

rudderlabs/rudder-server

Rudder Server is a privacy-focused, Segment-alternative customer data platform written in Go and React.

4.4K
Active
Go
Customer Data Platform
ETL & Pipelines
React
#customer-data-platform#customer-data-pipeline#data-integration

seandavi/awesome-single-cell

A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.

3.7K
Active
Databases
CLI Tools
#bioinformatics#single-cell#rna-seq

bruin-data/ingestr

ingestr is a CLI tool that seamlessly copies data between any databases with a single command.

3.4K
Active
Python
API Frameworks
ETL & Pipelines
Python
#data-ingestion#data-integration#data-pipeline

apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

2.9K
Active
Go
ETL & Pipelines
CLI Tools
Go
#devops#data-analysis#data-engineering

bytedance/bitsail

Distributed high-performance data integration engine for batch, streaming, and incremental scenarios.

1.7K
Archived
Java
Flink
#authentication#streaming#real-time

apache/hop

Hop is a flexible and extensible open-source data integration platform for building and orchestrating ETL and streaming pipelines.

1.3K
Active
Java
ETL & Pipelines
ETL & Pipelines
#data-integration#etl#orchestration

Stay in the loop

Get weekly updates on trending AI coding tools and projects.