ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 1-20 of 310 projects

OpenBB-finance/OpenBB

Financial data platform for analysts, quants, and AI agents

62.6K
Active
Python
Agents & Orchestration
API Clients & Testing
Python
#financial-data#ai-agents#api-client

pathwaycom/pathway

Python ETL framework for real-time analytics and LLM pipelines

59.5K
Active
Python
LLM Frameworks
ETL & Pipelines
Python
#etl#real-time#llm

pandas-dev/pandas

Core data analysis library for Python with labeled data structures and statistical functions

48.1K
Active
Python
ETL & Pipelines
Testing
Python
#data-analysis#pandas#python

jakevdp/PythonDataScienceHandbook

Python Data Science Handbook in Jupyter Notebooks

47.0K
Archived
Jupyter Notebook
Books & Guides
ETL & Pipelines
Jupyter Notebook
#jupyter-notebook#matplotlib#numpy

apache/airflow

Apache Airflow for workflow orchestration

44.5K
Active
Python
ETL & Pipelines
Background Jobs
Python
#airflow#data-pipelines#workflow-orchestration

streamlit/streamlit

Streamlit is a Python library for building and sharing interactive data apps quickly.

43.7K
Active
Python
CLI Tools
ETL & Pipelines
Python
#data-apps#interactive-visualization#python

apache/spark

Unified analytics engine for large-scale data processing

42.9K
Active
Scala
ETL & Pipelines
Realtime
Apache
#big-data#spark#data-processing

DataExpert-io/data-engineer-handbook

Comprehensive data engineering resource hub with learning paths, books, communities, and tools

40.4K
Stable
Jupyter Notebook
Tutorials & Courses
Awesome Lists
Apache Airflow
#dataengineering#bigdata#apachespark

DataTalksClub/data-engineering-zoomcamp

Free 9-week data engineering course with hands-on modules on pipelines, dbt, Kafka, and Spark

38.9K
Active
Jupyter Notebook
Tutorials & Courses
ETL & Pipelines
dbt
#data-engineering#course#dbt

mindsdb/mindsdb

Federated query engine for AI with built-in MCP server

38.6K
Active
Python
MCP Servers
Agents & Orchestration
Python
#ai#mcp#agents

microsoft/qlib

AI-powered quantitative investment platform for finance and trading

38.2K
Active
Python
Inference
SaaS Boilerplates
Python
#quantitative-investment#algorithmic-trading#machine-learning

pola-rs/polars

Fast DataFrame query engine in Rust with Python/Rust/Node.js/R frontends

37.6K
Active
Rust
ETL & Pipelines
CLI Tools
Rust
#dataframe#rust#arrow

drawdb-io/drawdb

Database diagram editor and SQL generator

36.8K
Active
JavaScript
ETL & Pipelines
Charts & Visualization
JavaScript
#database-diagram#sql-generator#erd-editor

SheetJS/sheetjs

SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.

36.2K
Archived
ETL & Pipelines
General Utilities
#spreadsheet#data-extraction#csv

apache/kafka

Distributed event streaming platform for data pipelines and real-time apps

32.1K
Active
Java
ETL & Pipelines
Realtime
Java
#kafka#event-streaming#data-pipelines

numpy/numpy

Fundamental package for scientific computing with Python

31.6K
Active
Python
ETL & Pipelines
Python
#numpy#scientific-computing#python-library

alibaba/canal

MySQL binlog incremental subscription and consumption component

29.6K
Active
Java
ETL & Pipelines
#mysql#binlog#data-synchronization

CSSEGISandData/COVID-19

Real-time global and U.S. data tracking for developers and researchers.

29.0K
Archived
ETL & Pipelines
Admin Dashboards
#covid-19#data-tracking#jhu-csse

donnemartin/data-science-ipython-notebooks

Data science Python notebooks covering deep learning, machine learning, big data, and more.

28.9K
Archived
Python
Computer Vision
ML Ops
TensorFlow
#data-science#deep-learning#machine-learning

academic/awesome-datascience

Comprehensive Data Science learning and resource guide

28.5K
Active
Tutorials & Courses
ML Ops
#data-science#machine-learning#deep-learning
2...16

Stay in the loop

Get weekly updates on trending AI coding tools and projects.