Showing 161-180 of 310 projects
A Rust-based library that provides real-time analytics on Postgres tables, supporting features like columnstore, delta-lake, and Iceberg.
Feathr is a scalable, unified data and AI engineering platform for enterprises, with features like feature engineering, feature governance, and a feature marketplace.
A tool that makes it easy to scrape and ingest content from various sources like GitHub, arXiv, and YouTube for use with large language models.
A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.
A pure-python HTML screen-scraping library for developers who need to extract data from websites.
A one-key Chinese data augmentation package for NLP and BERT model training.
MongoDB data stream pipeline tools for managing real-time data synchronization and replication.
A C# library that converts Excel spreadsheets to JSON objects and saves them to a text file.
Byzer is a low-code open-source programming language for data pipeline, analytics and AI.
Accelerate data curation and augmentation with this scalable, free tool for image and video analysis.
The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.
A Python library for extracting tabular data from PDF documents, with a web interface for human-in-the-loop extraction.
Embulk is a pluggable bulk data loader that helps developers load data from various sources into databases.
A data processing and ETL (Extract, Transform, Load) framework for Ruby developers.
A set of Airflow DAGs to help maintain and manage the operation of an Airflow deployment.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
An AI-powered data agent that can understand data needs, generate SQL/Python code for data analysis tasks.
Utility functions for dbt projects, a popular data transformation tool for data engineers.
Powerful, fast, and efficient unstructured data extraction library written in Rust with language bindings.
A powerful Python library for advanced text analytics, including classification, clustering, summarization, and sentiment analysis.
Get weekly updates on trending AI coding tools and projects.