Category
Showing 651-700 of 897 trending projects
PoloDB is an embedded document database written in Rust for building cross-platform, local-first applications.
A curated list of Twitter datasets and resources for data scientists and social network analysts.
A library of functional, durable data structures written in Java for developers building robust applications.
A high-performance Python library for working with large tabular datasets, offering efficient data manipulation and visualization.
A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.
A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.
A fast, embeddable column database written in Go, optimized for AI/ML workloads.
A collection of Unix, R, and Python tools for bioinformatics and data science projects.
Python code for causal inference, a book by Miguel Hernán and James Robins.
Apache XTable is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
A Go library for creating high-quality plots and visualizations of data
A fast and accurate short-read sequence aligner written in C for genomics applications.
A Python library that provides common financial risk and performance metrics used in financial analysis.
A collection of Python code examples and tutorials for data science, machine learning, and web development.
A high-performance Java library for data analysis, visualization, and machine learning.
ggstatsplot is an R library that enhances ggplot2 visualizations with statistical analysis and hypothesis testing.
Agile data preparation workflows made easy with popular Python data science libraries.
A high-performance B-tree implementation for Go, useful for building database-like applications.
A curated list of tools and datasets for anomaly detection on time-series data.
Comprehensive dataset of China's administrative divisions (province, city, county, town) in JSON, CSV, and SQL formats.
R package for Bayesian generalized multivariate non-linear multilevel models using Stan
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.
A comprehensive dataset of ISO 3166-1 country codes and their corresponding UN Geoscheme regional codes, ready to use in various formats.
A C++ library for importing OpenStreetMap data into a PostgreSQL/PostGIS database.
Cartopy is a Python library for creating maps and visualizing spatial data with matplotlib support.
A high-performance compression library written in C for developers working with large data sets.
A fast and efficient C++ hash map and hash set implementation using robin hood hashing.
A Python library for performing multivariate exploratory data analysis, including techniques like PCA, CA, MCA, MFA, and FAMD.
Core database component for the Realm Mobile Database SDKs, a popular NoSQL database for mobile apps.
A time series library for Apache Spark that provides a high-level API for working with time series data.
This repository provides a comprehensive guide on optimizing MySQL performance and solving common database problems.
A Python statistical package based on Pandas, providing various statistical methods and tests.
A fast spatial index library for 2D points and rectangles in JavaScript, useful for geospatial applications.
A collection of SQL practice problems for developers to improve their SQL skills.
A full-featured file system for online data storage, built with Python.
A repository for the 100 Knocks of Data Science Preprocessing, focused on structured data processing.
A .NET Standard library that provides strongly typed exceptions for Entity Framework Core across multiple database providers.
A Python library for reading and writing a wide range of image and video formats, including DICOM, animated GIFs, and webcam capture.
gget is a Python library that enables efficient querying of genomic reference databases like NCBI, Ensembl, and UniProt.
A Python library for arbitrary-precision floating-point arithmetic, providing advanced numerical capabilities.
ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.
A collection of medical imaging datasets for researchers and developers in the healthcare industry.
A standard filetree template for data curation and organization, useful for developers interested in data management.
Tegola is an open-source Mapbox Vector Tile server written in Go, enabling efficient geospatial data visualization.
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly
A Python library for analyzing movement trajectory data using GeoPandas.
DBngin is a free, open-source, cross-platform database management tool for developers.
A high-level geospatial data visualization library for Python developers working with spatial data.
Lightweight, fast, and reliable key-value database engine in Go for high-throughput applications.
Get weekly updates on trending AI coding tools and projects.