ETL & Pipelines

Explore 310 open source projects in ETL & Pipelines

Showing 161-180 of 310 projects

Mooncake-Labs/pg_mooncake

A Rust-based library that provides real-time analytics on Postgres tables, supporting features like columnstore, delta-lake, and Iceberg.

1.9K
Stable
Rust
API Frameworks
Databases
#analytics#columnstore#delta-lake

feathr-ai/feathr

Feathr is a scalable, unified data and AI engineering platform for enterprises, with features like feature engineering, feature governance, and a feature marketplace.

1.9K
Archived
Scala
Feature Flags
MLOps
Apache Spark
#data-engineering#feature-engineering#feature-governance

jimmc414/onefilellm

A tool that makes it easy to scrape and ingest content from various sources like GitHub, arXiv, and YouTube for use with large language models.

1.9K
Stable
Python
LLM Frameworks
CLI Tools
Python
#llm#text-extraction#data-ingestion

data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

1.9K
Active
CSS
ETL & Pipelines
Databases
#data-engineering#data-modeling#data-pipelines

scrapy/scrapely

A pure-python HTML screen-scraping library for developers who need to extract data from websites.

1.9K
Archived
HTML
Backend Frameworks
API Frameworks
#web-scraping#data-extraction#html-parsing

425776024/nlpcda

A one-key Chinese data augmentation package for NLP and BERT model training.

1.9K
Experimental
Python
React
#data-augmentation#chinese-data-augmentation#nlp

yougov/mongo-connector

MongoDB data stream pipeline tools for managing real-time data synchronization and replication.

1.9K
Archived
Python
ETL & Pipelines
CLI Tools
Python
#mongodb#data-streaming#replication

neil3d/excel2json

A C# library that converts Excel spreadsheets to JSON objects and saves them to a text file.

1.9K
Archived
C#
ETL & Pipelines
CLI Tools
#excel#json#data-transformation

byzer-org/byzer-lang

Byzer is a low-code open-source programming language for data pipeline, analytics and AI.

1.8K
Archived
Scala
ML Ops
ETL & Pipelines
Scala
#bigdata#machine-learning#sql-like-dsl

visual-layer/fastdup

Accelerate data curation and augmentation with this scalable, free tool for image and video analysis.

1.8K
Stable
Python
Computer Vision
ETL & Pipelines
Python
#data-augmentation#data-curation#image-processing

feldera/feldera

The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.

1.8K
Active
Rust
Databases
ETL & Pipelines
#data-analytics#data-pipelines#incremental-computation

camelot-dev/excalibur

A Python library for extracting tabular data from PDF documents, with a web interface for human-in-the-loop extraction.

1.8K
Archived
Python
Backend Frameworks
ETL & Pipelines
Flask
#pdf#table-extraction#data-processing

embulk/embulk

Embulk is a pluggable bulk data loader that helps developers load data from various sources into databases.

1.8K
Stable
Java
API Frameworks
ETL & Pipelines
#bulk-data#etl#data-pipeline

thbar/kiba

A data processing and ETL (Extract, Transform, Load) framework for Ruby developers.

1.8K
Active
Ruby
ETL & Pipelines
API Frameworks
#data#etl#ruby

teamclairvoyant/airflow-maintenance-dags

A set of Airflow DAGs to help maintain and manage the operation of an Airflow deployment.

1.8K
Archived
Python
API Frameworks
ETL & Pipelines
Apache Airflow
#airflow#maintenance#cleanup

collabH/bigdata-growth

A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.

1.7K
Stable
Shell
Databases
ETL & Pipelines
#bigdata#hadoop#spark

hitsz-ids/airda

An AI-powered data agent that can understand data needs, generate SQL/Python code for data analysis tasks.

1.7K
Archived
Python
Agents & Orchestration
ETL & Pipelines
Python
#data-analysis#data-agent#sql-generation

dbt-labs/dbt-utils

Utility functions for dbt projects, a popular data transformation tool for data engineers.

1.7K
Active
Makefile
ETL & Pipelines
CLI Tools
#data-transformation#etl#dbt

yobix-ai/extractous

Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.

1.7K
Archived
Rust
ETL & Pipelines
ETL & Pipelines
Rust
#data-extraction#unstructured-data#etl

dipanjanS/text-analytics-with-python

A powerful Python library for advanced text analytics, including classification, clustering, summarization, and sentiment analysis.

1.7K
Archived
Jupyter Notebook
Natural Language Processing
ETL & Pipelines
#text-processing#natural-language-processing#clustering
1...810...16

Stay in the loop

Get weekly updates on trending AI coding tools and projects.