Category
Showing 851-897 of 897 trending projects
A repository of open-source data sets created for stories on The Pudding, a digital publication focused on data journalism.
Cloud-native genomic dataframes and batch computing for bioinformatics and genetics research.
A Python library for creating circular data visualizations like Circos plots, chord diagrams, and radar charts.
Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.
Core database component for the Realm Mobile Database SDKs, a popular NoSQL database for mobile apps.
ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.
A powerful Python library for record linkage and duplicate detection in data-driven applications.
A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.
A geospatial data library for Ruby that provides a set of tools for working with geographic data.
An open-source N-body simulation library for astrophysics and planetary science.
A curated list of resources for time series forecasting, including papers, code, and other materials.
This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.
A Python helper library for enhancing Jupyter Notebooks with data visualization and analysis capabilities.
Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.
Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.
Tools to download and cleanup Common Crawl data, a large web crawl dataset, for further analysis and processing.
This repository contains efficient tools for LiDAR processing, focused on working with point cloud data.
A Python library for implementing the Louvain community detection algorithm on graphs.
SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.
This Python library provides additional linear models for statistical modeling and analysis.
A simple SQLite file viewer that allows you to view and explore SQLite databases online.
A Kotlin library for structured data processing, suitable for data analysis and data science tasks.
A PostgreSQL sample database for testing and learning SQL queries.
A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.
HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.
Compilation of R and Python programming codes for data science and machine learning projects.
A fast and flexible R package for reading flat files (CSV, TSV, fixed-width) into R data frames.
Open source research data repository software built with Java.
A no-code, visual data integration platform for building big data pipelines and workflows.
An open-source C++ framework for fast and parallel map matching of GPS trajectories.
A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.
Data quality assessment and reporting tool for data frames and database tables in R
A time series library for Apache Spark that provides a high-level API for working with time series data.
A definition and DDLs for the OMOP Common Data Model (CDM), a data model for healthcare data.
A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.
A large-scale open-access corpus of scientific papers and metadata for researchers and developers.
An ordered map implementation in Go with amortized O(1) performance for common operations.
ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.
A library of functional, durable data structures written in Java for developers building robust applications.
MySQL Connector/J is a JDBC driver that enables Java applications to connect to MySQL databases.
A comprehensive English word database with translations, parts of speech, and definitions for developers.
A Python library for searching and downloading Copernicus Sentinel satellite images for geographic data analysis.
A multi-page Streamlit app for geospatial data visualization and analysis, useful for housing and real estate applications.
A space-efficient C++ implementation of the Cuckoo filter, a probabilistic data structure for set membership testing.
A Python library for data migration and transformation in the Blaze project.
SciRuby provides a collection of tools for scientific computation in Ruby, catering to developers working with data and scientific applications.
EasyDB is a lightweight desktop app that lets you query local CSV, Excel, and JSON files with SQL, without an external database.
Get weekly updates on trending AI coding tools and projects.