Category
Showing 451-500 of 897 trending projects
A Python package for analyzing heart rate data from PPG and ECG signals.
gget is a Python library that enables efficient querying of genomic reference databases like NCBI, Ensembl, and UniProt.
A Go library with types and utilities for working with 2D geometry, geospatial data, and mapping.
A community-driven catalog of geospatial datasets for use with Google Earth Engine.
A Chinese translation of the book 'Python for Data Analysis' 2nd Edition, covering NumPy, Pandas, and other data analysis tools.
RRDtool is a time-series database system for efficiently storing and graphing data.
This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.
Programmable CUDA/C++ GPU Graph Analytics library for high-performance parallel graph processing.
A repository of open-source data sets created for stories on The Pudding, a digital publication focused on data journalism.
Cloud-native genomic dataframes and batch computing for bioinformatics and genetics research.
A geospatial data library for Ruby that provides a set of tools for working with geographic data.
A Python helper library for enhancing Jupyter Notebooks with data visualization and analysis capabilities.
A Python library for implementing the Louvain community detection algorithm on graphs.
A fast and flexible R package for reading flat files (CSV, TSV, fixed-width) into R data frames.
An open-source C++ framework for fast and parallel map matching of GPS trajectories.
Data quality assessment and reporting tool for data frames and database tables in R
MySQL Connector/J is a JDBC driver that enables Java applications to connect to MySQL databases.
ORM for Node.js/TypeScript with multiple database support
MySQL binlog incremental subscription and consumption component
Lightweight local JSON database for JavaScript/TypeScript apps
Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.
An open-source graph database written in Go, useful for building applications that require linked data and graph-based queries.
Realm is a mobile database that serves as a replacement for SQLite and ORMs.
A lightweight SQLite3 driver for Go that implements the database/sql interface.
Grid Studio is a web-based application for data science with full integration of open source data science frameworks and languages.
SSDB is a fast NoSQL database, an alternative to Redis, with support for leveldb and rocksdb backends.
Pentaho Data Integration (ETL) is a Java-based tool for building data integration and ETL pipelines.
Efficient in-memory cache in Go for storing and retrieving large amounts of data.
The versioned, forkable, syncable database for developers who need a scalable, distributed data solution.
Records is a Python SQL library that makes interacting with databases more intuitive and human-friendly.
An educational distributed SQL database written in Rust, not focused on AI coding tools.
Pandas Cookbook is a collection of recipes for using Python's powerful data analysis library, Pandas.
Hazelcast is a high-performance, distributed in-memory data platform for real-time insights and stream processing.
LevelDB key/value database in Go for building high-performance data-intensive applications.
Pachyderm is a data-centric pipeline and data versioning platform for building and scaling data-intensive applications.
A curated list of free/public domain text datasets for natural language processing (NLP) tasks.
Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.
A next-generation curated knowledge sharing platform for data scientists and other technical professionals.
Automatically visualize your pandas dataframes with a single print command, enabling quick EDA.
A C# library for reading and writing CSV files, with support for a wide range of CSV file formats.
Sequel is a Ruby library that provides a powerful and flexible object-relational mapping (ORM) for databases.
OrientDB is a versatile, multi-model DBMS that supports Graph, Document, Reactive, Full-Text, and Geospatial models.
An open-source, self-hosted database management tool with a spreadsheet-like interface for Postgres
Mimesis is a fast Python library for generating fake data in multiple languages for testing and development purposes.
Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.
A collection of code examples and baselines for common data science and machine learning competitions.
A Python library for accurate and scalable fuzzy matching, record deduplication, and entity resolution.
A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.
MongoEngine is a Python Object-Document-Mapper (ODM) for working with MongoDB databases.
Get weekly updates on trending AI coding tools and projects.