Showing 1-20 of 39 projects
Python ETL framework for real-time analytics and LLM pipelines
Apache Airflow for workflow orchestration
Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes
Distributed SQL database middleware for sharding, scalability, and security
An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.
Apache DolphinScheduler is a modern data orchestration platform for creating high-performance workflows with low-code.
Unstructured is an open-source ETL solution for transforming complex documents into structured data for language models.
An open-source framework for change data capture from various databases using Apache Kafka.
mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.
A powerful customer data pipeline for collecting, processing, and analyzing user events and behavior.
Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.
A Python library for processing and analyzing data with foundation models and large language models.
Fluvio is an event stream processing engine for developers to build responsive data-intensive apps.
Rudder Server is a privacy-focused, Segment-alternative customer data platform written in Go and React.
Preswald is a WASM packager for Python-based interactive data apps that can be run completely in-browser.
A list of resources to learn Data Engineering from scratch
Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.
A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.
Memphis.dev is a highly scalable and effortless data streaming platform
ingestr is a CLI tool that seamlessly copies data between any databases with a single command.
Get weekly updates on trending AI coding tools and projects.