Category
Showing 601-650 of 897 trending projects
HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.
Mongoose is a MongoDB object modeling tool for Node.js and Deno, simplifying database interactions with schemas and models.
Apache Pinot is a realtime distributed OLAP datastore for fast querying of large datasets.
A JavaScript library for visualizing and understanding complex data structures.
A collection of simple tools for data cleaning and wrangling in R for data science tasks.
A comprehensive Julia library for probability distributions and related statistical functions.
Comprehensive roadmap for data engineering and AI development in Python
RRDtool is a time-series database system for efficiently storing and graphing data.
A curated list of awesome resources for network analysis and visualization, with a focus on R tools.
High-performance, transactional key-value database engine for embedded systems and cryptocurrencies.
A Rust library that enables querying Excel spreadsheets using SQLite, making data extraction and analysis more efficient.
ActiveRecord-like API for CoreData, a powerful object-relational mapping (ORM) for iOS development.
Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x
A C# library for reading and writing CSV files, with support for a wide range of CSV file formats.
A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.
A repository of public data sources for building and testing recommender systems.
OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.
Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.
This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.
Graft is an open-source transactional storage engine optimized for lazy, partial, and strongly consistent replication, ideal for edge, offline-first, and distributed applications.
Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.
A command-line tool for version controlling database snapshots, allowing developers to save, restore, and archive database state.
The official C++ client API for PostgreSQL, providing a high-level interface for interacting with PostgreSQL databases.
Apache Hive is a data warehouse software built on top of Apache Hadoop for querying and managing large datasets.
COVID-19 data repository for developers, providing daily updated case, death, and testing information.
SQLite JDBC Driver - a Java library for accessing SQLite databases
A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.
An exabyte-scale, multi-region distributed file system for developers building AI-powered applications.
A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.
A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.
Immutable database and Datalog query engine for Clojure, ClojureScript and JS
An open-source distributed SQL database with high availability, scalability, and ACID transactions.
Apache Avro is a data serialization system for efficient storage and transmission of structured data.
An open-source project that captures the public GitHub timeline and makes it accessible for analysis.
A Python library that allows developers to easily draw datasets within their notebooks.
Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.
A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.
A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.
Connect processes into powerful data pipelines with a simple git-like filesystem interface
This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.
OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.
cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.
A frequency word list generator and processed files for text analysis and natural language processing.
PumpkinDB is an immutable, ordered key-value database engine written in Rust.
A grammar of graphics library for creating highly customizable and publication-quality plots in Python.
A Python library for extracting tabular data from PDF files, useful for data processing and analysis.
A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.
This repository contains a collection of portfolio projects for a data analyst, not a developer discovery platform.
This Python library provides additional linear models for statistical modeling and analysis.
AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.
Get weekly updates on trending AI coding tools and projects.