Category
Showing 451-500 of 897 trending projects
A C++ library for reading and writing .npy and .npz files, commonly used in scientific computing.
An exabyte-scale, multi-region distributed file system for developers building AI-powered applications.
A Python package for analyzing heart rate data from PPG and ECG signals.
A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.
A curated list of Twitter datasets and resources for data scientists and social network analysts.
An open-source graph database written in Go, useful for building applications that require linked data and graph-based queries.
This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.
A Python library for accurate and scalable fuzzy matching, record deduplication, and entity resolution.
A data access layer (DAL) and ORM-like library for working with SQL and NoSQL databases in Go.
A Go driver for the ClickHouse analytics database, enabling fast and efficient data processing.
A curated collection of resources for data science and machine learning enthusiasts.
This Scala library provides a high-performance implementation of the node2vec algorithm for embedding graphs.
Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.
A Python toolbox for gaining geometric insights into high-dimensional data, useful for vibe coders working with AI tools.
An educational OLAP database system built in Rust for learning and experimentation.
Entity Framework Core provider for PostgreSQL, enabling .NET developers to easily interact with PostgreSQL databases.
A fast spatial index library for 2D points and rectangles in JavaScript, useful for geospatial applications.
cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.
A collection of SQL practice problems for developers to improve their SQL skills.
A Python library providing SQL views for Dune Analytics, a popular blockchain data analysis platform.
A parallel corpus of classical Chinese and modern Chinese texts for language processing and analysis.
R package for Bayesian generalized multivariate non-linear multilevel models using Stan
An end-to-end data engineering project example showcasing tools and technologies for building data pipelines.
First open-source data discovery and observability platform for data practitioners.
This is a book that teaches how to use Apache Spark for lightning-fast data analytics.
A collection of Unix, R, and Python tools for bioinformatics and data science projects.
An educational project to build a disk-based key-value store in Python for learning purposes.
This is an astronomy visualization project that maps orbits of asteroids in the solar system.
A full-featured file system for online data storage, built with Python.
This repository provides code and data for a book on statistics for data scientists.
PoloDB is an embedded document database written in Rust for building cross-platform, local-first applications.
Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.
No description provided for this medical data repository.
A powerful, multi-database ORM for .NET that supports a wide range of SQL databases and provides a seamless data access layer.
SQL query builder for C# developers, supporting multiple databases and complex queries.
A Python tool to convert CAJ (China Academic Journals) files to PDF for developers who work with academic literature.
Comprehensive dataset of China's administrative divisions (province, city, county, town) in JSON, CSV, and SQL formats.
A Go library for creating high-quality plots and visualizations of data
A collection of medical imaging datasets for researchers and developers in the healthcare industry.
This is a big data analysis system for the Shenzhen metro with support for various data processing tools.
A curated list of community detection research papers with implementations for data science and network analysis.
Zui is a powerful desktop app for exploring and working with data, with support for CSV, JSON, and the Zed data format.
Irmin is a distributed database that follows the same design principles as Git, allowing for distributed version control of data.
Fluid is a distributed data abstraction and acceleration framework for Big Data and AI applications on the cloud.
A Python statistical package based on Pandas, providing various statistical methods and tests.
Poisson Surface Reconstruction is a C++ library for reconstructing surfaces from point cloud data.
TuGraph-DB is a high-performance graph database built for fast and efficient graph data processing.
A Python script that generates a CSV file with data about players in the English Premier League Fantasy League.
A Python library for reading and writing a wide range of image and video formats, including DICOM, animated GIFs, and webcam capture.
A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.
Get weekly updates on trending AI coding tools and projects.