Category
Showing 601-650 of 897 trending projects
Python demos for spatial data analytics, geostatistics, and machine learning to support courses.
Tegola is an open-source Mapbox Vector Tile server written in Go, enabling efficient geospatial data visualization.
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly
An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.
A Python-based image processing framework with plugins for common image processing libraries.
A fast, in-memory B-tree implementation for sorted collections in Swift.
A Python package for handling messy CSV files with improved dialect detection and a command-line interface.
A PHP library that provides a MySQL backup functionality, similar to the mysqldump CLI tool.
Xodus is a transactional, schema-less embedded database used by JetBrains products like YouTrack and Hub.
A high-performance, persistent, off-heap data structure written in Clojure for data-intensive applications.
DBngin is a free, open-source, cross-platform database management tool for developers.
MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.
Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.
This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.
Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.
This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.
Programmable CUDA/C++ GPU Graph Analytics library for high-performance parallel graph processing.
An in-memory key-value store using Python's orjson module for persistence, with SQLite support.
Apache Celeborn is a high-performance shuffle and spilled data service for big data applications.
A powerful GUI/CLI tool for biologists to work with NGS data, not a vibe coder tool.
Data quality assessment and reporting tool for data frames and database tables in R
An ordered map implementation in Go with amortized O(1) performance for common operations.
Hazelcast is a high-performance, distributed in-memory data platform for real-time insights and stream processing.
Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.
A curated list of free/public domain text datasets for natural language processing (NLP) tasks.
This is a MySQL library containing China's 5-level administrative regions, not a vibe coder tool.
A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support.
CrateDB is a distributed, scalable SQL database for storing and analyzing massive amounts of data in near real-time.
Ploomber is a fast and versatile tool for building and deploying data pipelines that can be used with a variety of AI and ML tools.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
Feather is a fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.
Python data structures library focused on serialization, deserialization, and validation of complex data schemas.
A lightweight key-value store built with C++ using a skiplist data structure.
A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.
This repository provides a comprehensive guide on optimizing MySQL performance and solving common database problems.
A Python library for processing and visualizing satellite imagery data.
A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.
A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.
A high-performance, MySQL-compatible vector database that supports structured and unstructured data for AI-driven applications.
AWS Glue code samples for building data integration and ETL pipelines on AWS.
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
Synth is a Rust library for generating realistic, randomized test data for applications and databases.
A Python library for calculating customer lifetime value metrics and cohort analysis.
A fast and efficient C++ hash map and hash set implementation using robin hood hashing.
A collection of simple tools for data cleaning and wrangling in R for data science tasks.
tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.
An R package that provides support for simple features, a standardized way to encode spatial vector data.
PumpkinDB is an immutable, ordered key-value database engine written in Rust.
A powerful Python package to manage and work with extremely large amounts of data.
Get weekly updates on trending AI coding tools and projects.