Showing 61-80 of 310 projects
A Python package to tackle the curse of imbalanced datasets in machine learning
A powerful customer data pipeline for collecting, processing, and analyzing user events and behavior.
A flexible workflow orchestration platform that seamlessly integrates data, ML, and analytics stacks.
Papermill is a Python library that allows you to parameterize, execute, and analyze Jupyter notebooks.
Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.
Data pipelines for cloud config and security data, enabling CSPM, FinOps, and vulnerability management solutions.
Data transformation framework for AI with ultra-fast, incremental processing capabilities.
Pachyderm is a data-centric pipeline and data versioning platform for building and scaling data-intensive applications.
A Python library for processing and analyzing data with foundation models and large language models.
Apache NiFi is a powerful data flow management system that enables developers to build complex data pipelines.
DataX-Web is a visual data integration platform that supports RDBMS, Hive, HBase, ClickHouse, MongoDB and other data sources.
A collection of Python-based web crawlers for scraping data from various e-commerce and online platforms.
High-performance data engine for AI and multimodal workloads, processing images, audio, video, and structured data at scale.
This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.
An open-source Python library that simplifies the process of loading data into data lakes and warehouses.
This GitHub repository contains a collection of interesting Python web scraping and data analysis projects.
Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.
A data augmentation library for natural language processing (NLP) tasks, enabling developers to improve model performance.
A Python library that provides a simple and unified interface for extracting text from any document format.
Rudder Server is a privacy-focused, Segment-alternative customer data platform written in Go and React.
Get weekly updates on trending AI coding tools and projects.