Explore Projects

Discover 24 open source projects

Active filters (1):
Search: ingestionร—
Clear all

Showing 1-20 of 24 projects

elastic/logstash

Logstash is a powerful open-source data processing pipeline that can ingest, transform, and output data from a variety of sources.

14.8K
Active
Java
API Frameworks
Java
#etl#logging#real-time-processing

coderamp-labs/gitingest

A Python tool that generates a prompt-friendly extract of a GitHub codebase by replacing 'hub' with 'ingest' in any GitHub URL.

14.1K
Active
Python
AI Code Ingestion
Python
#code-ingestion#codebase-extraction#prompt-engineering

getlago/lago

Open-source metering and usage-based billing API for consumption tracking, subscription management, pricing, and revenue analytics.

9.4K
Active
Go
Payments & Billing
API Clients & Testing
React
#billing#payments#subscriptions

apache/seatunnel

A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.

9.1K
Active
Java
ETL & Pipelines
Realtime
#data-integration#batch#streaming

risingwavelabs/risingwave

An open-source, Rust-based event streaming platform for real-time data processing and analytics.

8.8K
Active
Rust
API Frameworks
Databases
Rust
#event-streaming#real-time#data-processing

QuivrHQ/MegaParse

Optimized file parser for LLM ingestion with no loss, supporting PDFs, Docx, and PPTx.

7.3K
Experimental
Python
React
#LLM#parser#PDF

adithya-s-k/omniparse

A Python library for ingesting, parsing, and optimizing any data format for enhanced compatibility with GenAI frameworks.

6.8K
Stable
Python
LLM Frameworks
File Storage
Python
#ingestion-api#ocr#parser-library

jitsucom/jitsu

Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.

4.7K
Active
TypeScript
ETL & Pipelines
API Frameworks
TypeScript
#data-ingestion#etl#segment-alternative

chonkie-inc/chonkie

A lightweight ingestion library for fast, efficient and robust RAG pipelines

3.8K
Active
Python
React
#RAG#pipelines#ingestion

morphik-org/morphik-core

A comprehensive document search and storage platform for building AI applications using Python.

3.5K
Active
Python
LLM Frameworks
API Frameworks
Python
#artificial-intelligence#document-search#document-storage

bruin-data/ingestr

ingestr is a CLI tool that seamlessly copies data between any databases with a single command.

3.4K
Active
Python
API Frameworks
ETL & Pipelines
Python
#data-ingestion#data-integration#data-pipeline

lakesoul-io/LakeSoul

LakeSoul is a cloud-native, real-time Lakehouse framework for fast data ingestion and analytics on cloud storage.

3.2K
Active
Java
API Frameworks
Databases
#big-data#lakehouse#streaming

apache/paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark.

3.2K
Active
Java
ETL & Pipelines
Realtime
#big-data#data-ingestion#flink

apache/incubator-devlake

An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.

2.9K
Active
Go
ETL & Pipelines
CLI Tools
Go
#devops#data-analysis#data-engineering

airbnb/streamalert

A serverless, real-time data analysis framework for ingesting, analyzing, and alerting on data from any environment.

2.9K
Archived
Python
Serverless
Realtime
AWS
#serverless#streaming#data-analysis

jimmc414/onefilellm

A tool that makes it easy to scrape and ingest content from various sources like GitHub, arXiv, and YouTube for use with large language models.

1.9K
Stable
Python
LLM Frameworks
CLI Tools
Python
#llm#text-extraction#data-ingestion

Micke-K/IntuneManagement

A PowerShell script and WPF UI tool to manage Intune and Azure policies and profiles.

1.9K
Active
PowerShell
API Frameworks
CLI Tools
#intune#microsoft-graph-api#powershell-scripting

bruin-data/bruin

A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.

1.4K
Active
Go
ETL & Pipelines
API Frameworks
Go
#data-pipelines#data-ingestion#data-transformation

superstreamerapp/superstreamer

An open-source, scalable online streaming toolkit for developers building video apps and services.

1.4K
Experimental
TypeScript
API Frameworks
Video Processing
TypeScript
#streaming#video-processing#ffmpeg

datazip-inc/olake

Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.

1.3K
Active
Go
ETL & Pipelines
Realtime
#cdc#data-pipeline#elt
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.