ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 301-310 of 310 projects

ropensci/targets

A declarative workflow management system for R that enables reproducible research and high-performance computing.

1.1K
Active
R
Build Tools
ETL & Pipelines
R
#r#workflow#reproducibility

Mrkuhuo/data-warehouse-learning

Open-source data warehouse learning project with examples and code for building real-time and offline data pipelines.

1.1K
Stable
Java
ETL & Pipelines
API Frameworks
Flink
#data-engineering#etl#pipelines

lensesio/stream-reactor

A collection of open-source Kafka connectors for various data sources and destinations maintained by Lenses.io.

1.1K
Active
Scala
MCP Frameworks
ETL & Pipelines
Scala
#kafka#connectors#data-integration

bigdatagenomics/adam

ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.

1.0K
Experimental
Scala
ETL & Pipelines
API Frameworks
Spark
#bioinformatics#genomics#big-data

facebookresearch/cc_net

Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.

1.0K
Archived
Python
ETL & Pipelines
CLI Tools
Python
#data-processing#web-crawling#data-cleanup

Kotlin/dataframe

A Kotlin library for structured data processing, suitable for data analysis and data science tasks.

1.0K
Active
Kotlin
Databases
ETL & Pipelines
#data-analysis#data-science#dataframe

shaiwz/data-platform-open

A no-code, visual data integration platform for building big data pipelines and workflows.

1.0K
Active
Java
ETL & Pipelines
Realtime
Java
#big-data#dataflow#etl

opengeospatial/geoparquet

A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.

1.0K
Active
Python
Databases
API Clients & Testing
Python
#geoparquet#geospatial#gis

madnight/githut

A GitHub language statistics tool that provides insights into programming language usage across GitHub repositories.

1.0K
Archived
JavaScript
Charts & Visualization
ETL & Pipelines
React
#github-statistics#programming-languages#data-visualization

blaze/odo

A Python library for data migration and transformation in the Blaze project.

1.0K
Archived
Python
ETL & Pipelines
CLI Tools
#data-migration#etl#data-transformation
1...15

Stay in the loop

Get weekly updates on trending AI coding tools and projects.