Category
Showing 351-400 of 897 trending projects
Scalable and efficient data transformation framework with backwards compatibility for dbt.
C++ DataFrame library for statistical, financial, and machine learning analysis.
A Python library that provides a set of customizable pipeline processing blocks for data processing tasks.
A Python script to fetch Garmin health data and populate it in an InfluxDB database for visualization in Grafana.
A Postgres extension for high-performance vector search, complementing pgvector for scale.
PyPika is a Python SQL query builder that provides a readable, Pythonic syntax for constructing complex SQL queries.
An open-source data catalog platform for building a high-performance, federated metadata lake.
A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.
Collaborative offline-first SQLite wrapper for syncing app state across users & devices
An extensible, high-performance columnar file format for data storage and processing.
Feather is a fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.
A basic document (NoSQL) database implementation in Go, suitable for small-scale projects.
A powerful data visualization and plotting library for the Julia programming language.
This Scala library provides a high-performance implementation of the node2vec algorithm for embedding graphs.
RBush is a high-performance JavaScript R-tree-based 2D spatial index for points and rectangles.
An in-process OLAP SQL Engine powered by ClickHouse, enabling fast and efficient data analysis.
A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.
A Python library for creating easy-to-use, visually appealing data tables and summaries.
A database migration and schema management tool for PHP developers, supporting multiple database engines.
A book on data science, covering topics from basic math to machine learning using Python and Jupyter Notebooks.
A Python library with most common stock market technical indicators, making it easy to implement quantitative finance and algorithmic trading.
Python data structures library focused on serialization, deserialization, and validation of complex data schemas.
This is Facebook's branch of the Oracle MySQL database, including the MyRocks storage engine.
A dataset for music analysis and research, with support for deep learning and reproducible research.
A Python library for survival analysis, useful for developers working with time-to-event data.
A collection of data science projects in Python using Jupyter Notebook.
A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.
FeatureBase is a fast analytical database built on bitmaps, perfect for ML and data-intensive applications.
A fast C++ library for high-performance matrix and vector operations.
A comprehensive Python library for color science and color space conversions.
Rill is a tool for transforming data sets into powerful dashboards using SQL, enabling BI-as-code.
A collection of medical imaging datasets for researchers and developers in the healthcare industry.
DuckLake is an integrated data lake and catalog format written in C++.
A repository for the 100 Knocks of Data Science Preprocessing, focused on structured data processing.
A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster).
A Python library for creating data processing pipelines using functional programming principles.
This repository provides Python implementations of exercises from the book 'An Introduction to Statistical Learning'.
A distributed database with CRDT sync, offline support, and end-to-end encryption for vibe coders.
GridDB is a fast and scalable open-source database for time-series IoT and big data applications.
A JavaScript library that provides a NumPy-like interface for working with multi-dimensional arrays and matrices.
sq is a Go-based data wrangling tool that supports a variety of data formats and databases.
Sample database for SQL Server, Oracle, MySQL, PostgreSQL, SQLite, DB2
This is a big data analysis system for the Shenzhen metro with support for various data processing tools.
An ultra-lightweight database that supports key-value and time series data for embedded and IoT applications.
AI-native database unifying vector, text, and structured data for hybrid search and in-database AI workflows.
An intuitive Python library that adds plotting functionality to scikit-learn machine learning models
A repository for collecting study materials and resources related to data analysis and related fields.
A curated list of community detection research papers with implementations for data science and network analysis.
A comprehensive dataset of ISO 3166-1 country codes and their corresponding UN Geoscheme regional codes, ready to use in various formats.
Get weekly updates on trending AI coding tools and projects.