Category
Showing 801-850 of 897 trending projects
A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.
An advanced geospatial data analysis platform for tasks like geomorphology, hydrology, and remote sensing.
A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.
Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.
Kylo is an enterprise-grade data lake management platform built on big data technologies like Spark and Hadoop.
An open-source platform for building and sharing datasets, focused on trust, privacy, and decentralization.
A library for calling Python functions from the Ruby language, enabling data science and ML workflows.
Connect processes into powerful data pipelines with a simple git-like filesystem interface
Overture Maps Data is a Python library providing access to open-source geographic data.
A Python package for analyzing heart rate data from PPG and ECG signals.
gget is a Python library that enables efficient querying of genomic reference databases like NCBI, Ensembl, and UniProt.
A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.
A curated list of Twitter datasets and resources for data scientists and social network analysts.
Mycelite is a SQLite extension that enables replication between SQLite instances.
A Go library with types and utilities for working with 2D geometry, geospatial data, and mapping.
Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.
A community-driven catalog of geospatial datasets for use with Google Earth Engine.
A collection of monthly reports on the internals of Alibaba Cloud's database products.
A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.
A Chinese translation of the book 'Python for Data Analysis' 2nd Edition, covering NumPy, Pandas, and other data analysis tools.
A collection of data science, machine learning, and web development project code for Dataquest's YouTube channel.
TrailDB is an efficient database for storing and querying series of events.
Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x
This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.
Azure/AzurePublicDataset is a repository containing Microsoft Azure Traces, a Jupyter Notebook-based resource.
RRDtool is a time-series database system for efficiently storing and graphing data.
An intuitive library to extract features from time series data for data science and machine learning.
A repository containing various NLP datasets collected and organized by the owner.
A Python library for arbitrary-precision floating-point arithmetic, providing advanced numerical capabilities.
This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.
A simple embedded database library in Rust modeled after SQLite, useful for Rust projects.
A Python tool that automatically cleans and preprocesses data for analysis and machine learning.
A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.
This repository contains data on Chinese administrative divisions, including names, pinyin, and codes.
Docker image for the popular MongoDB database, enabling easy deployment and integration with other services.
A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.
This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.
Intake is a lightweight Python package for discovering, investigating, loading and distributing data.
A Rust library to work with the Arrow data format, without requiring the Transmute crate.
Programmable CUDA/C++ GPU Graph Analytics library for high-performance parallel graph processing.
An in-memory key-value store using Python's orjson module for persistence, with SQLite support.
A Redis module that provides a time series data structure for storing and querying time series data.
A curated list of Polars, an open-source, high-performance data manipulation library for Python and Rust.
A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.
Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.
SciRuby/daru is a Ruby library for data analysis and manipulation, useful for data scientists and developers working with data.
Open-source data warehouse learning project with examples and code for building real-time and offline data pipelines.
A Python library for data manipulation and analysis, part of the core data science toolkit.
CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.
A personal data aggregator and analysis tool for self-tracking and quantified self enthusiasts.
Get weekly updates on trending AI coding tools and projects.