Category
Showing 351-400 of 897 trending projects
A Rust library for serializing and deserializing data in the Rusty Object Notation (RON) format.
A parallel processing library for Pandas that improves performance on multi-core CPUs.
A Python database adapter for PostgreSQL, allowing developers to interact with their databases.
A collection of Jupyter Notebook files focused on data visualization and machine learning concepts.
The Data Transfer Project enables direct transfer of user data between online service providers.
A Python library for extracting tabular data from PDF files, useful for data processing and analysis.
A SQL database explorer supporting multiple database engines like SQLite, PostgreSQL, and MySQL.
Koalas is a pandas-like API for Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
A Go driver for the ClickHouse analytics database, enabling fast and efficient data processing.
A cloud-native PostgreSQL database developed by Alibaba Cloud for high-performance, scalable data storage and management.
A high-performance datastore for time series and tick data built on top of MongoDB.
OpenMapTiles is an open-source vector tile schema implementation for creating custom map tiles.
A Python library for comparing data across databases, supporting various database engines.
Comprehensive dataset of China's administrative divisions (province, city, county, town) in JSON, CSV, and SQL formats.
An open-source project that captures the public GitHub timeline and makes it accessible for analysis.
A Go library for creating high-quality plots and visualizations of data
Scalable and efficient data transformation framework with backwards compatibility for dbt.
A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.
Collaborative offline-first SQLite wrapper for syncing app state across users & devices
A powerful data visualization and plotting library for the Julia programming language.
An in-process OLAP SQL Engine powered by ClickHouse, enabling fast and efficient data analysis.
A book on data science, covering topics from basic math to machine learning using Python and Jupyter Notebooks.
A comprehensive Python library for color science and color space conversions.
Rill is a tool for transforming data sets into powerful dashboards using SQL, enabling BI-as-code.
GridDB is a fast and scalable open-source database for time-series IoT and big data applications.
An ultra-lightweight database that supports key-value and time series data for embedded and IoT applications.
A curated list of community detection research papers with implementations for data science and network analysis.
A comprehensive dataset of ISO 3166-1 country codes and their corresponding UN Geoscheme regional codes, ready to use in various formats.
Malloy is an open-source language for describing data relationships and transformations.
Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.
Fast in-memory cache library for Go with low GC overhead, optimized for a large number of entries.
A Python library to access historical market data from the Binance cryptocurrency exchange.
Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.
Open source time series library for Python, useful for statistical analysis and modeling.
A simple Python library for creating dataclasses from dictionaries.
A collection of stock analysis tools across various programming languages and platforms.
An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.
Irmin is a distributed database that follows the same design principles as Git, allowing for distributed version control of data.
An Internet-scale distributed database system built on C++, inspired by Google's Bigtable.
This is a Python library for financial applications, not a tool for AI-powered vibe coders.
A community-driven wiki for learning data engineering, covering topics like data modeling, pipelines, and databases.
Fluid is a distributed data abstraction and acceleration framework for Big Data and AI applications on the cloud.
A Python package for time series classification, useful for developers working with time-series data.
A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.
A curated list of resources for machine learning-based algorithmic trading and quantitative finance.
A Python library for processing and visualizing satellite imagery data.
This is a data repository for the Seaborn data visualization library in Python.
An educational OLAP database system built in Rust for learning and experimentation.
Poisson Surface Reconstruction is a C++ library for reconstructing surfaces from point cloud data.
Highly available PostgreSQL cluster using Docker, focused on data infrastructure for developers.
Get weekly updates on trending AI coding tools and projects.