Explore Projects

Discover 299 open source projects

Active filters (1):
Search: pipelineร—
Clear all

Showing 101-120 of 299 projects

apache/streampark

Easy-to-use streaming application development framework and operation platform for building ETL pipelines.

4.3K
Active
Java
API Frameworks
ETL & Pipelines
#streaming#etl-pipeline#operation-platform

StructuredLabs/preswald

Preswald is a WASM packager for Python-based interactive data apps that can be run completely in-browser.

4.3K
Experimental
Python
LLM Frameworks
ETL & Pipelines
Python
#data-applications#data-visualization#data-pipelines

jenkinsci/pipeline-examples

A collection of examples, tips and tricks for the Jenkins Pipeline plugin, a powerful workflow automation tool.

4.3K
Archived
Groovy
CI/CD
API Frameworks
#workflow-automation#ci-cd#jenkins

zendesk/maxwell

Maxwell's daemon, a MySQL-to-JSON Kafka producer for building real-time data pipelines.

4.2K
Stable
Java
API Frameworks
ETL & Pipelines
Java
#real-time#streaming#data-pipeline

adilkhash/Data-Engineering-HowTo

A list of resources to learn Data Engineering from scratch

4.0K
Archived
React
#data-engineering#data-pipeline#distributed-systems

hygieia/hygieia

CapitalOne's DevOps dashboard for continuous integration, delivery, and deployment.

3.9K
Archived
TypeScript
CI/CD
API Frameworks
TypeScript
#devops#continuous-integration#continuous-delivery

AnswerDotAI/RAGatouille

A modular and easy-to-use Python library for training and using state-of-the-art retrieval models like ColBERT in RAG pipelines.

3.9K
Experimental
Python
RAG & Vector
API Frameworks
Python
#retrieval#rag#colbert

chonkie-inc/chonkie

A lightweight ingestion library for fast, efficient and robust RAG pipelines

3.8K
Active
Python
React
#RAG#pipelines#ingestion

puckel/docker-airflow

A Docker-based Apache Airflow platform for building and managing data pipelines and workflows.

3.8K
Archived
Shell
Background Jobs
ETL & Pipelines
Docker
#airflow#workflow#scheduler

aws/copilot-cli

A CLI tool that helps developers build, release, and operate containerized apps on AWS App Runner and ECS Fargate

3.7K
Stable
Go
CLI Tools
Containerization
#aws-apprunner#aws-ecs#aws-fargate

firecow/gitlab-ci-local

A local GitLab CI pipeline runner that allows developers to test their .gitlab-ci.yml without pushing to the repo.

3.7K
Active
TypeScript
CLI Tools
CI/CD
TypeScript
#ci#gitlab#pipeline

Netflix/maestro

Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.

3.7K
Active
Java
ETL & Pipelines
Background Jobs
Java
#data-engineering#batch-processing#workflow-orchestration

mapillary/OpenSfM

Open-source library for 3D reconstruction from images using Structure-from-Motion (SfM) algorithms.

3.7K
Active
Python
Computer Vision
API Frameworks
Python
#3d-reconstruction#computer-vision#structure-from-motion

denji/awesome-http-benchmark

An extensive collection of HTTP benchmarking tools for testing and debugging RESTful APIs.

3.7K
Active
API Clients & Testing
#http-benchmarking#restful-api#testing

polyaxon/polyaxon

Polyaxon is an MLOps platform for managing and orchestrating the machine learning lifecycle.

3.7K
Active
MLOps
API Frameworks
Kubernetes
#machine-learning#mlops#kubernetes

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL workflows for unstructured data analysis.

3.7K
Active
Python
Agents & Orchestration
ETL & Pipelines
Python
#agents#data-pipelines#document-processing

seandavi/awesome-single-cell

A curated list of software packages and data resources for single-cell analysis, including RNA-seq and ATAC-seq.

3.7K
Active
Databases
CLI Tools
#bioinformatics#single-cell#rna-seq

HQarroum/docker-android

A Docker image for running the Android emulator as a service, useful for CI/CD pipelines.

3.7K
Active
Shell
Containerization
CLI Tools
#android#android-emulator#ci-pipeline

google/deepvariant

DeepVariant is an AI-powered bioinformatics pipeline for calling genetic variants from DNA sequencing data.

3.6K
Stable
Python
Bioinformatics
API Frameworks
TensorFlow
#bioinformatics#genomics#deep-learning

ploomber/ploomber

Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.

3.6K
Experimental
Python
ETL & Pipelines
ML Ops
Python
#data-engineering#data-science#pipelines
1...57...15

Stay in the loop

Get weekly updates on trending AI coding tools and projects.