ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 221-240 of 310 projects

OpnTec/parliament-scraper

A Python scraper for public data from the EU and other parliament websites.

1.4K
Archived
Python
API Frameworks
ETL & Pipelines
#web-scraping#parliament-data#etl

AI4Finance-Foundation/FinNLP

Democratizing internet-scale financial data for developers through natural language processing.

1.4K
Archived
Jupyter Notebook
NLP Frameworks
ETL & Pipelines
Jupyter Notebook
#finance#natural-language-processing#data-processing

wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly

1.4K
Active
Java
ETL & Pipelines
API Frameworks
#etl#database#rdbms

zstmfhy/zlibrary-to-notebooklm

A Python script that automatically downloads books from Z-Library and uploads them to Google NotebookLM.

1.4K
Active
Python
API Frameworks
ETL & Pipelines
#z-library#google-notebook#book-downloader

toluaina/pgsync

A Python library that syncs data from Postgres to Elasticsearch/OpenSearch, enabling real-time data pipelines.

1.4K
Active
Python
ETL & Pipelines
Realtime
Python
#change-data-capture#elasticsearch-sync#postgresql

damklis/DataEngineeringProject

An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.

1.4K
Archived
Python
ETL & Pipelines
API Frameworks
Django
#data-engineering#data-pipeline#etl

databricks/LearningSparkV2

This is a book that teaches how to use Apache Spark for lightning-fast data analytics.

1.4K
Archived
Scala
ETL & Pipelines
Databases
Spark
#apache-spark#delta-lake#mlflow

lorey/mlscraper

Effortlessly scrape data from websites using machine learning and HTML examples with this Python library.

1.4K
Archived
Python
Backend Frameworks
ETL & Pipelines
#web-scraping#data-extraction#machine-learning

PKUJohnson/OpenData

An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.

1.4K
Archived
Python
ETL & Pipelines
CLI Tools
Python
#web-scraping#data-extraction#financial-data

quiltdata/quilt

Quilt is a data mesh for connecting people with actionable data, built with TypeScript.

1.4K
Active
TypeScript
ETL & Pipelines
Data Versioning
TypeScript
#data-engineering#data-versioning#data-pipeline

gtoonstra/etl-with-airflow

This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.

1.4K
Archived
Shell
ETL & Pipelines
#etl#airflow#data-pipelines

xisuo67/XHS-Spider

A web scraping and data collection tool for the Chinese social media platform Xiaohongshu (Little Red Book).

1.4K
Stable
Backend Frameworks
API Frameworks
C#
#crawler#downloader#scraper

amphi-ai/amphi-etl

A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.

1.4K
Active
TypeScript
ETL & Pipelines
Data Analysis
TypeScript
#data-analysis#data-pipelines#data-transformation

apache/hop

Hop is a flexible and extensible open-source data integration platform for building and orchestrating ETL and streaming pipelines.

1.3K
Active
Java
ETL & Pipelines
ETL & Pipelines
#data-integration#etl#orchestration

PDAL/PDAL

PDAL is a C++ library for processing point cloud data, similar to GDAL for raster data.

1.3K
Active
C++
Databases
CLI Tools
#point-cloud#data-processing#gdal

ropensci/drake

An R-focused pipeline toolkit for reproducibility and high-performance computing.

1.3K
Archived
R
CLI Tools
ETL & Pipelines
#reproducibility#high-performance-computing#data-science

rwynn/monstache

A Go daemon that syncs MongoDB to Elasticsearch in real-time for search-powered applications.

1.3K
Stable
Go
Realtime
ETL & Pipelines
#mongodb#elasticsearch#opensearch

singer-io/getting-started

A getting started guide to Singer, a data integration framework for ETL and data analysis.

1.3K
Stable
Makefile
Makefile
#authentication#streaming#real-time

alan-turing-institute/CleverCSV

A Python package for handling messy CSV files with improved dialect detection and a command-line interface.

1.3K
Active
Python
ETL & Pipelines
CLI Tools
#csv#data-analysis#data-mining

PatMartin/Dex

Dex is a powerful data visualization tool that enables data exploration and publishing of web visualizations.

1.3K
Archived
JavaScript
ETL & Pipelines
Charts & Visualization
#data-analysis#data-visualization#data-mining
1...1113...16

Stay in the loop

Get weekly updates on trending AI coding tools and projects.