Category
Showing 351-400 of 897 trending projects
Olric is a distributed, in-memory key/value store and cache for Go applications and services.
A Python library for survival analysis, useful for developers working with time-to-event data.
An end-to-end data pipeline for building a data lake, data warehouse, and analytics platform from GoodReads data.
A Python library that provides common financial risk and performance metrics used in financial analysis.
A powerful suite of sparse matrix algorithms and libraries for scientific and numerical computing.
Educational notebooks on quantitative finance, algorithmic trading, financial modeling, and investment strategy.
A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.
A curated list of Google Earth Engine resources for geospatial analysis and remote sensing applications.
A repository of public data sources for building and testing recommender systems.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
A Python library for creating circular data visualizations like Circos plots, chord diagrams, and radar charts.
This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.
A definition and DDLs for the OMOP Common Data Model (CDM), a data model for healthcare data.
A large-scale open-access corpus of scientific papers and metadata for researchers and developers.
Lightweight local JSON database for JavaScript/TypeScript apps
A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.
Docker images containing Jupyter applications for data science and machine learning workflows.
A curated list of awesome R packages, frameworks and software for data analysis and data science.
An in-depth tutorial covering mainstream database knowledge for backend developers.
Matplot++: A C++ graphics library for creating high-quality data visualizations and scientific plots.
An open-source index of Google Trends data, useful for developers building data-driven applications.
Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.
This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.
An embeddable, replicated, and fault-tolerant SQL engine for building robust and scalable applications.
A cloud-native PostgreSQL database developed by Alibaba Cloud for high-performance, scalable data storage and management.
A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.
A dataset for music analysis and research, with support for deep learning and reproducible research.
A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.
Malloy is an open-source language for describing data relationships and transformations.
This is a data repository for the Seaborn data visualization library in Python.
Highly available PostgreSQL cluster using Docker, focused on data infrastructure for developers.
A fast, embeddable column database written in Go, optimized for AI/ML workloads.
A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.
Percona Toolkit is a collection of advanced open source database tools for MySQL, MongoDB, and PostgreSQL.
A fast and elegant data exploration library for Elixir, providing series and dataframes for data science workflows.
An advanced geospatial data analysis platform for tasks like geomorphology, hydrology, and remote sensing.
Open-source data warehouse learning project with examples and code for building real-time and offline data pipelines.
ORM for Node.js/TypeScript with multiple database support
A data repository for the data journalism site FiveThirtyEight, containing data and code behind their articles and graphics.
This is a collection of readings and resources related to databases, not a vibe coder platform.
OrientDB is a versatile, multi-model DBMS that supports Graph, Document, Reactive, Full-Text, and Geospatial models.
Mimesis is a fast Python library for generating fake data in multiple languages for testing and development purposes.
A Rust-based graph database for developers who need to store and query connected data.
A simple Python library for creating dataclasses from dictionaries.
Official code repository for the Genome Analysis Toolkit (GATK), a bioinformatics library for working with next-generation DNA sequencing data.
A Rust library that provides persistent data structures for efficient and immutable data management.
A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.
An advanced ORM library for Java and Kotlin developers that provides powerful caching and data management features.
SQL Lineage Analysis Tool that provides data discovery and governance insights through Python.
Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.
Get weekly updates on trending AI coding tools and projects.