ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 241-260 of 310 projects

Data-Learn/data-engineering

A comprehensive resource for developers to learn and get started with data engineering using Python.

1.3K
Experimental
Python
ETL & Pipelines
Tutorials & Courses
Python
#data-engineering#python#tutorials

petl-developers/petl

A Python library for extracting, transforming, and loading tabular data.

1.3K
Stable
Python
ETL & Pipelines
Python
#etl#tabular-data#data-pipelines

datazip-inc/olake

Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.

1.3K
Active
Go
ETL & Pipelines
Realtime
#cdc#data-pipeline#elt

ezrosent/frawk

An efficient awk-like language written in Rust for text processing and data manipulation tasks.

1.3K
Stable
Rust
API Frameworks
CLI Tools
Rust
#text-processing#data-manipulation#cli-tool

datavane/tis

A Java-based framework for building agile DataOps pipelines using tools like Flink, DataX, and Chunjun with a web UI.

1.3K
Active
Java
ETL & Pipelines
API Frameworks
#dataops#etl#flink

wesm/msgvault

Archive, search, and analyze your entire email/chat history offline with DuckDB-powered analytics and AI queries.

1.3K
Active
Go
ETL & Pipelines
RAG & Vector
DuckDB
#email-archival#message-search#duckdb

DTStack/Taier

A big data development platform for submission, scheduling, operation and maintenance, and indicator information display.

1.3K
Archived
Java
API Frameworks
ETL & Pipelines
Flink
#big-data#data-pipeline#task-scheduling

GoogleCloudPlatform/DataflowTemplates

Provides pre-built Google Cloud Dataflow templates to simplify data processing tasks on the Google Cloud Platform.

1.3K
Active
Java
API Frameworks
ETL & Pipelines
Apache Beam
#google-cloud#dataflow#apache-beam

mahmoudparsian/pyspark-tutorial

PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.

1.3K
Experimental
Jupyter Notebook
Databases
ETL & Pipelines
#big-data#data-algorithms#dataframes

rsvp/fecon235

Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.

1.3K
Archived
Jupyter Notebook
Databases
ETL & Pipelines
Jupyter Notebook
#finance#economics#federal-reserve

networktocode/ntc-templates

A collection of TextFSM templates for parsing network device show commands, useful for network automation.

1.2K
Active
Python
CLI Tools
API Frameworks
#network-automation#parsing#cli-tools

stanfordjournalism/search-script-scrape

A collection of 101 real-world web scraping exercises in Python 3 for data journalists.

1.2K
Archived
Python
Backend Frameworks
ETL & Pipelines
Python
#web-scraping#data-journalism#python-3

Open-Source-Legal/OpenContracts

An enterprise-grade, API-first LLM workspace for unstructured document processing, with features like data extraction, redaction, and prompt engineering.

1.2K
Active
Python
LLM Frameworks
ETL & Pipelines
Python
#llm#prompt-engineering#etl

uber-archive/AthenaX

A scalable, SQL-based streaming analytics platform from Uber, built on top of Apache Flink.

1.2K
Archived
Java
ETL & Pipelines
API Frameworks
Java
#analytics#streaming#sql

lit26/finvizfinance

A Python library for financial analysis and data scraping from the Finviz platform.

1.2K
Active
Jupyter Notebook
ETL & Pipelines
Backend Frameworks
Jupyter Notebook
#financial-analysis#web-scraping#data-pipeline

sentinel-hub/eo-learn

An open-source Python framework for processing Earth observation data using machine learning.

1.2K
Active
Python
ML Ops
Caching
Python
#earth-observation#remote-sensing#machine-learning

drasi-project/drasi-platform

The Data Change Processing platform, a C# library for building CDC (change data capture) and change detection systems.

1.2K
Active
C#
API Frameworks
ETL & Pipelines
#cdc#change-data-capture#change-detection

kevwan/go-stash

A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.

1.2K
Stable
Go
ETL & Pipelines
Realtime
#elasticsearch#elk#kafka

calogica/dbt-expectations

A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.

1.2K
Archived
Shell
ETL & Pipelines
Testing
dbt
#data-testing#data-validation#dbt

skyplane-project/skyplane

Blazing fast, multi-cloud data transfer solution for developers looking to move data seamlessly across cloud providers.

1.2K
Archived
Python
File Storage
ETL & Pipelines
#cloud#data-transfer#multi-cloud
1...1214...16

Stay in the loop

Get weekly updates on trending AI coding tools and projects.