Category
Showing 801-850 of 897 trending projects
A Python tool that automatically cleans and preprocesses data for analysis and machine learning.
A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.
This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.
Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.
A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.
Intake is a lightweight Python package for discovering, investigating, loading and distributing data.
A Rust library to work with the Arrow data format, without requiring the Transmute crate.
A Redis module that provides a time series data structure for storing and querying time series data.
An in-memory key-value store using Python's orjson module for persistence, with SQLite support.
A curated list of Polars, an open-source, high-performance data manipulation library for Python and Rust.
A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.
Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.
SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.
A Python library for data manipulation and analysis, part of the core data science toolkit.
CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.
A personal data aggregator and analysis tool for self-tracking and quantified self enthusiasts.
Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.
Core database component for the Realm Mobile Database SDKs, a popular NoSQL database for mobile apps.
ADAM is a genomics analysis platform with specialized file formats built using Apache Spark and Apache Parquet.
A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.
A powerful Python library for record linkage and duplicate detection in data-driven applications.
An open-source N-body simulation library for astrophysics and planetary science.
A Python helper library for enhancing Jupyter Notebooks with data visualization and analysis capabilities.
Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.
Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.
This repository contains efficient tools for LiDAR processing, focused on working with point cloud data.
SnappyData is a memory-optimized analytics database based on Apache Spark and Apache Geode, enabling real-time stream processing, transactions, and predictive analytics.
This Python library provides additional linear models for statistical modeling and analysis.
A simple SQLite file viewer that allows you to view and explore SQLite databases online.
A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.
A Kotlin library for structured data processing, suitable for data analysis and data science tasks.
Compilation of R and Python programming codes for data science and machine learning projects.
HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.
Open source research data repository software built with Java.
A no-code, visual data integration platform for building big data pipelines and workflows.
A time series library for Apache Spark that provides a high-level API for working with time series data.
Data quality assessment and reporting tool for data frames and database tables in R
A definition and DDLs for the OMOP Common Data Model (CDM), a data model for healthcare data.
A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.
An ordered map implementation in Go with amortized O(1) performance for common operations.
ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.
A library of functional, durable data structures written in Java for developers building robust applications.
A multi-page Streamlit app for geospatial data visualization and analysis, useful for housing and real estate applications.
A space-efficient C++ implementation of the Cuckoo filter, a probabilistic data structure for set membership testing.
A Python library for data migration and transformation in the Blaze project.
SciRuby provides a collection of tools for scientific computation in Ruby, catering to developers working with data and scientific applications.
Real-time global and U.S. data tracking for developers and researchers.
Druid is a high-performance database connection pool for Java applications, designed for monitoring and management.
Prisma1 is a database toolkit with an ORM, migrations, and admin UI for Postgres, MySQL, and MongoDB.
A comprehensive collection of data science cheatsheets for developers and data scientists.
Get weekly updates on trending AI coding tools and projects.