Explore Projects

Discover 63 open source projects

Active filters (1):
Search: etlร—
Clear all

Showing 1-20 of 63 projects

pathwaycom/pathway

Python ETL framework for real-time analytics and LLM pipelines

59.5K
Active
Python
LLM Frameworks
ETL & Pipelines
Python
#etl#real-time#llm

apache/airflow

Apache Airflow for workflow orchestration

44.5K
Active
Python
ETL & Pipelines
Background Jobs
Python
#airflow#data-pipelines#workflow-orchestration

airbytehq/airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

20.8K
Active
Python
ETL & Pipelines
#data-integration#elt#etl

dagster-io/dagster

An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.

15.1K
Active
Python
ETL & Pipelines
Python
#data-engineering#data-orchestration#workflow-automation

elastic/logstash

Logstash is a powerful open-source data processing pipeline that can ingest, transform, and output data from a variety of sources.

14.8K
Active
Java
API Frameworks
Java
#etl#logging#real-time-processing

Unstructured-IO/unstructured

Unstructured is an open-source ETL solution for transforming complex documents into structured data for language models.

14.1K
Active
HTML
Document Processing
#document-processing#data-pipelines#natural-language-processing

risingwavelabs/risingwave

An open-source, Rust-based event streaming platform for real-time data processing and analytics.

8.8K
Active
Rust
API Frameworks
Databases
Rust
#event-streaming#real-time#data-processing

mage-ai/mage-ai

mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.

8.7K
Active
Python
ETL & Pipelines
ML Ops
Python
#data-pipelines#data-transformation#data-integration

redpanda-data/connect

A highly configurable, production-ready stream processing platform for building real-time data pipelines.

8.6K
Active
Go
Realtime
ETL & Pipelines
Go
#stream-processing#message-queue#data-engineering

pentaho/pentaho-kettle

Pentaho Data Integration (ETL) is a Java-based tool for building data integration and ETL pipelines.

8.3K
Active
Java
ETL & Pipelines
#etl#data-integration#pentaho

turbot/steampipe

Steampipe is a zero-ETL, SQL-powered platform for live querying cloud APIs and infrastructure.

7.7K
Active
Go
API Frameworks
ETL & Pipelines
#cloud#etl#sql

apache/flink-cdc

Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.

6.4K
Active
Java
ETL & Pipelines
Realtime
#streaming#cdc#change-data-capture

cloudquery/cloudquery

Data pipelines for cloud config and security data, enabling CSPM, FinOps, and vulnerability management solutions.

6.3K
Active
Go
API Frameworks
ETL & Pipelines
Go
#cloud#security#data-engineering

cocoindex-io/cocoindex

Data transformation framework for AI with ultra-fast, incremental processing capabilities.

6.3K
Active
Rust
LLM Frameworks
ETL & Pipelines
Rust
#ai#data-engineering#data-transformation

Eventual-Inc/Daft

High-performance data engine for AI and multimodal workloads, processing images, audio, video, and structured data at scale.

5.3K
Active
Rust
ML Ops
ETL & Pipelines
Rust
#ai-engineering#data-engineering#distributed

jitsucom/jitsu

Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.

4.7K
Active
TypeScript
ETL & Pipelines
API Frameworks
TypeScript
#data-ingestion#etl#segment-alternative

rudderlabs/rudder-server

Rudder Server is a privacy-focused, Segment-alternative customer data platform written in Go and React.

4.4K
Active
Go
Customer Data Platform
ETL & Pipelines
React
#customer-data-platform#customer-data-pipeline#data-integration

apache/streampark

Easy-to-use streaming application development framework and operation platform for building ETL pipelines.

4.3K
Active
Java
API Frameworks
ETL & Pipelines
#streaming#etl-pipeline#operation-platform

quadratichq/quadratic

A spreadsheet tool with AI capabilities for data analysis, engineering, and visualization.

4.0K
Active
Rust
LLM Frameworks
ETL & Pipelines
Rust
#ai#data-analysis#data-engineering

Netflix/maestro

Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.

3.7K
Active
Java
ETL & Pipelines
Background Jobs
Java
#data-engineering#batch-processing#workflow-orchestration

Stay in the loop

Get weekly updates on trending AI coding tools and projects.