Explore Projects

Discover 299 open source projects

Active filters (1):
Search: pipelineร—
Clear all

Showing 161-180 of 299 projects

timeplus-io/proton

Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.

2.2K
Active
C++
ETL & Pipelines
API Frameworks
#sql#etl#stream-processing

dabochen/spreadsheet-is-all-you-need

A spreadsheet-based pipeline for running a nanoGPT model, aimed at developers working with AI tools.

2.1K
Archived
LLM Frameworks
AI Code Generation
React
#machine-learning#language-model#spreadsheet-integration

databricks/spark-deep-learning

Deep learning library for Apache Spark that provides high-level APIs and models for building machine learning pipelines.

2.0K
Archived
Python
ML Ops
ETL & Pipelines
Apache Spark
#machine-learning#deep-learning#spark

elyra-ai/elyra

Elyra extends JupyterLab with an AI-centric approach for developing and deploying ML/AI pipelines.

2.0K
Active
Python
ML Ops
MCP Frameworks
JupyterLab
#ai#machine-learning#jupyterlab

bytewax/bytewax

Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.

2.0K
Experimental
Python
ETL & Pipelines
API Frameworks
Python
#streaming#data-engineering#data-processing

allenai/scispacy

A spaCy pipeline and models for processing scientific/biomedical documents.

1.9K
Stable
Python
NLP
API Frameworks
spaCy
#bioinformatics#biomedical#nlp

data-engineering-community/data-engineering-wiki

A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.

1.9K
Active
CSS
ETL & Pipelines
Databases
#data-engineering#data-modeling#data-pipelines

JuliaAI/MLJ.jl

A flexible machine learning framework for the Julia programming language, used for classification, clustering, and more.

1.9K
Stable
Julia
ML Ops
Databases
Julia
#machine-learning#data-science#predictive-modeling

microsoft/azure-pipelines-agent

The Azure Pipelines Agent is a tool for running build and deployment tasks in a CI/CD pipeline.

1.9K
Active
C#
CI/CD
#ci-cd#azure#pipelines

yougov/mongo-connector

MongoDB data stream pipeline tools for managing real-time data synchronization and replication.

1.9K
Archived
Python
ETL & Pipelines
CLI Tools
Python
#mongodb#data-streaming#replication

zhp8341/flink-streaming-platform-web

A real-time streaming platform built on Apache Flink for building scalable and reliable data pipelines.

1.9K
Stable
Java
API Frameworks
Streaming
Java
#flink#sql#streaming

tdrussell/diffusion-pipe

A pipeline parallel training script for diffusion models, useful for AI and machine learning researchers.

1.9K
Active
Python
LLM Frameworks
ML Ops
Python
#diffusion-models#machine-learning#ai-research

bokmann/font-awesome-rails

The font-awesome font bundled as an asset for the Rails asset pipeline.

1.9K
Archived
HTML
Icons & Assets
Rails
#icons#fonts#rails

twitter/ios-twitter-image-pipeline

Robust and performant image loading and caching framework for iOS clients

1.9K
Archived
C
Component Libraries (React)
iOS
React
#cache#image-pipeline#ios

alecthomas/voluptuous

Voluptuous is a Python data validation library for building flexible data validation pipelines.

1.8K
Active
Python
API Frameworks
Validation
#data-validation#schema-validation#python-library

byzer-org/byzer-lang

Byzer is a low-code open-source programming language for data pipeline, analytics and AI.

1.8K
Archived
Scala
ML Ops
ETL & Pipelines
Scala
#bigdata#machine-learning#sql-like-dsl

opendataloader-project/opendataloader-pdf

Fast local PDF-to-Markdown/JSON converter for RAG pipelines. No GPU needed.

1.8K
Active
Java
RAG Frameworks
RAG & Vector
Java
#pdf-parser#rag-pipeline#markdown-conversion

san089/Udacity-Data-Engineering-Projects

A collection of Udacity data engineering projects showcasing various tools and technologies.

1.8K
Archived
Python
Airflow
#data-engineering#cloud#infrastructure

hu17889/go_spider

A flexible and modular Go-based web crawler framework with a concurrent architecture.

1.8K
Archived
Go
API Frameworks
CLI Tools
#crawler#concurrent#pipeline

edyoda/data-science-complete-tutorial

This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.

1.8K
Archived
Jupyter Notebook
Databases
Machine Learning
#data-science#machine-learning#numpy
1...810...15

Stay in the loop

Get weekly updates on trending AI coding tools and projects.