ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 81-100 of 310 projects

apache/streampark

Easy-to-use streaming application development framework and operation platform for building ETL pipelines.

4.3K
Active
Java
API Frameworks
ETL & Pipelines
#streaming#etl-pipeline#operation-platform

StructuredLabs/preswald

Preswald is a WASM packager for Python-based interactive data apps that can be run completely in-browser.

4.3K
Experimental
Python
LLM Frameworks
ETL & Pipelines
Python
#data-applications#data-visualization#data-pipelines

run-llama/llama_cloud_services

A set of TypeScript-based cloud services and utilities for processing and extracting structured data from various document formats.

4.2K
Active
TypeScript
File Storage
Caching
TypeScript
#document-parsing#pdf-processing#structured-data

zendesk/maxwell

Maxwell's daemon, a MySQL-to-JSON Kafka producer for building real-time data pipelines.

4.2K
Stable
Java
API Frameworks
ETL & Pipelines
Java
#real-time#streaming#data-pipeline

attardi/wikiextractor

A Python tool for extracting plain text from Wikipedia dumps, useful for natural language processing tasks.

4.0K
Archived
Python
API Frameworks
ETL & Pipelines
Python
#wikipedia#text-extraction#nlp

adilkhash/Data-Engineering-HowTo

A list of resources to learn Data Engineering from scratch

4.0K
Archived
React
#data-engineering#data-pipeline#distributed-systems

quadratichq/quadratic

A spreadsheet tool with AI capabilities for data analysis, engineering, and visualization.

4.0K
Active
Rust
LLM Frameworks
ETL & Pipelines
Rust
#ai#data-analysis#data-engineering

jghoman/awesome-apache-airflow

Curated list of resources about Apache Airflow, a popular workflow management platform.

3.9K
Active
Shell
CLI Tools
Background Jobs
#airflow#workflow-management#etl

multiprocessio/dsq

A command-line tool for running SQL queries against various data formats like JSON, CSV, Excel, and Parquet.

3.9K
Archived
Go
CLI Tools
Databases
Go
#sql#json#csv

puckel/docker-airflow

A Docker-based Apache Airflow platform for building and managing data pipelines and workflows.

3.8K
Archived
Shell
Background Jobs
ETL & Pipelines
Docker
#airflow#workflow#scheduler

Netflix/maestro

Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.

3.7K
Active
Java
ETL & Pipelines
Background Jobs
Java
#data-engineering#batch-processing#workflow-orchestration

atlanhq/camelot

Camelot is a Python library for extracting tables from PDF files, making it easier for developers to work with PDF data.

3.7K
Archived
Python
API Frameworks
CLI Tools
Python
#pdf#table-extraction#data-processing

DataLinkDC/dinky

Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.

3.7K
Stable
Java
ETL & Pipelines
Databases
Apache Flink
#datalake#datawarehouse#flink

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.

3.7K
Active
Python
Agents & Orchestration
ETL & Pipelines
Python
#agents#data-pipelines#document-processing

mrdbourke/zero-to-mastery-ml

A comprehensive machine learning and data science course with Jupyter Notebook materials.

3.6K
Archived
Jupyter Notebook
Jupyter Notebook
#machine-learning#data-science#python

ploomber/ploomber

Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.

3.6K
Experimental
Python
ETL & Pipelines
ML Ops
Python
#data-engineering#data-science#pipelines

awslabs/deequ

Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.

3.6K
Active
Scala
ETL & Pipelines
Testing
Spark
#data-quality#unit-testing#apache-spark

noflo/noflo

Flow-based programming framework for building complex JavaScript applications and services.

3.5K
Archived
JavaScript
Backend Frameworks
CLI Tools
Node
#flow-based-programming#etl-framework#visual-programming

dathere/qsv

Blazing-fast data wrangling toolkit for AI and data engineering workflows

3.5K
Active
Rust
ETL & Pipelines
Databases
#data-engineering#data-wrangling#etl

xyflow/awesome-node-based-uis

A curated list of resources for creating node-based UI editors and visual programming tools.

3.5K
Experimental
Component Libraries (React)
CLI Tools
React
#node-based-ui#visual-programming#workflow-editor
1...46...16

Stay in the loop

Get weekly updates on trending AI coding tools and projects.