Explore Projects

Discover 65 open source projects

Active filters (1):
Search: data-engineerร—
Clear all

Showing 1-20 of 65 projects

apache/superset

A modern, enterprise-ready business intelligence web application for data visualization and exploration.

70.8K
Active
TypeScript
Search
Admin Dashboards
React
#data-visualization#business-intelligence#analytics

GokuMohandas/Made-With-ML

Learn to build production-grade ML applications with code and best practices

46.6K
Archived
Jupyter Notebook
ML Ops
Tutorials & Courses
Jupyter Notebook
#machine-learning#mlops#data-science

apache/airflow

Apache Airflow for workflow orchestration

44.5K
Active
Python
ETL & Pipelines
Background Jobs
Python
#airflow#data-pipelines#workflow-orchestration

DataTalksClub/data-engineering-zoomcamp

Free 9-week data engineering course with hands-on modules on pipelines, dbt, Kafka, and Spark

38.9K
Active
Jupyter Notebook
Tutorials & Courses
ETL & Pipelines
dbt
#data-engineering#course#dbt

eugeneyan/applied-ml

Curated resources for data science and machine learning in production

28.7K
Archived
ML Ops
Awesome Lists
#machine-learning#data-science#ml-ops

PrefectHQ/prefect

Workflow orchestration for resilient data pipelines in Python

21.8K
Active
Python
ETL & Pipelines
CI/CD
Python
#data-pipelines#orchestration#workflow-automation

airbytehq/airbyte

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

20.8K
Active
Python
ETL & Pipelines
#data-integration#elt#etl

Avaiga/taipy

Taipy is a Python library that helps developers turn data and AI algorithms into production-ready web apps quickly.

19.1K
Active
Python
Agents & Orchestration
Python
#data-engineering#data-ops#data-visualization

argoproj/argo-workflows

Argo Workflows is a powerful open-source workflow engine for Kubernetes, enabling complex data processing and machine learning pipelines.

16.5K
Active
Go
ETL & Pipelines
Kubernetes
#kubernetes#pipelines#workflow

dagster-io/dagster

An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.

15.1K
Active
Python
ETL & Pipelines
Python
#data-engineering#data-orchestration#workflow-automation

andkret/Cookbook

A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.

15.0K
Active
Python
ETL & Pipelines
Python
#data-engineering#etl#pipeline

datastacktv/data-engineer-roadmap

This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.

12.7K
Archived
Data Engineering
#data-engineering#roadmap#cloud

great-expectations/great_expectations

A Python library that helps ensure data quality and reliability through data profiling and testing.

11.2K
Active
Python
ETL & Pipelines
#data-quality#data-testing#data-profiling

xonsh/xonsh

A powerful, Python-powered shell with cross-platform support and a rich feature set for developers.

9.2K
Active
Python
Shell Enhancements
Python
#shell#cross-platform#automation

risingwavelabs/risingwave

An open-source, Rust-based event streaming platform for real-time data processing and analytics.

8.8K
Active
Rust
API Frameworks
Databases
Rust
#event-streaming#real-time#data-processing

mage-ai/mage-ai

mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.

8.7K
Active
Python
ETL & Pipelines
ML Ops
Python
#data-pipelines#data-transformation#data-integration

redpanda-data/connect

A highly configurable, production-ready stream processing platform for building real-time data pipelines.

8.6K
Active
Go
Realtime
ETL & Pipelines
Go
#stream-processing#message-queue#data-engineering

growthbook/growthbook

Open-source feature flagging and A/B testing platform for experimentation, data analysis, and remote config.

7.4K
Active
TypeScript
Feature Flags
Analytics & Tracking
React
#ab-testing#feature-flags#data-analysis

feast-dev/feast

An open-source feature store for AI/ML applications

6.8K
Active
Python
React
#feature-store#open-source#AI/ML

cloudquery/cloudquery

Data pipelines for cloud config and security data, enabling CSPM, FinOps, and vulnerability management solutions.

6.3K
Active
Go
API Frameworks
ETL & Pipelines
Go
#cloud#security#data-engineering

Stay in the loop

Get weekly updates on trending AI coding tools and projects.