Category
Showing 701-750 of 897 trending projects
Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.
A registry of publicly available datasets hosted on AWS for data-driven developers.
Core database component for the Realm Mobile Database SDKs, a popular NoSQL database for mobile apps.
This Python library provides additional linear models for statistical modeling and analysis.
Olric is a distributed, in-memory key/value store and cache for Go applications and services.
A modern, embedded SQL database written in Go for embedded and mobile applications.
A cross-platform way to express data transformation, relational algebra, and standardized record expression and plans.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.
Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.
A Rust-based graph database for developers who need to store and query connected data.
A collection of stock analysis tools across various programming languages and platforms.
A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.
A highly scalable, distributed, document-oriented NoSQL database with full-text search, spatial, and time-series support.
A simple Python library for creating dataclasses from dictionaries.
gget is a Python library that enables efficient querying of genomic reference databases like NCBI, Ensembl, and UniProt.
A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.
A comprehensive set of Python notes and resources for developers, covering a wide range of topics including data science, machine learning, and scientific computing.
AI-native database unifying vector, text, and structured data for hybrid search and in-database AI workflows.
An advanced ORM library for Java and Kotlin developers that provides powerful caching and data management features.
A C# library for reading and writing metadata in media files, useful for audio and video processing applications.
A curated list of Python packages for chemistry, including computational chemistry, molecular dynamics, and quantum chemistry.
OrientDB is a versatile, multi-model DBMS that supports Graph, Document, Reactive, Full-Text, and Geospatial models.
Fluid is a distributed data abstraction and acceleration framework for Big Data and AI applications on the cloud.
SQL Lineage Analysis Tool that provides data discovery and governance insights through Python.
AWS Glue code samples for building data integration and ETL pipelines on AWS.
Open-source massively parallel processing (MPP) database, an alternative to Greenplum.
A C# NuGet package that provides technical indicators and trading insights for financial market data analysis.
CrateDB is a distributed, scalable SQL database for storing and analyzing massive amounts of data in near real-time.
Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.
A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.
A collection of simple tools for data cleaning and wrangling in R for data science tasks.
A comprehensive collection of notes and resources for understanding different database technologies and concepts.
PDAL is a C++ library for processing point cloud data, similar to GDAL for raster data.
Trill is a single-node query processor for temporal or streaming data.
An ORM (Object-Relational Mapping) library for .NET that supports a wide range of database providers, including SQL Server, MySQL, PostgreSQL, and more.
A Python package for analyzing heart rate data from PPG and ECG signals.
Modin: Scalable Pandas workflows with a single line of code change, enabling distributed data processing.
A curated list of free/public domain text datasets for natural language processing (NLP) tasks.
Extremely fast, easy to use, and fully async NoSQL database for Flutter apps
A dataset for music analysis and research, with support for deep learning and reproducible research.
A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.
A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.
A collection of Unix, R, and Python tools for bioinformatics and data science projects.
Index your Gmail account to a SQLite DB and perform custom data analysis on your email.
A Python library with data related to Brazilian municipalities, including IBGE codes, latitude, longitude, and more.
A high-performance B-tree implementation for Go, useful for building database-like applications.
An in-memory key-value store using Python's orjson module for persistence, with SQLite support.
ggstatsplot is an R library that enhances ggplot2 visualizations with statistical analysis and hypothesis testing.
A Python script that generates a CSV file with data about players in the English Premier League Fantasy League.
Get weekly updates on trending AI coding tools and projects.