Category
Showing 501-550 of 897 trending projects
A robust Python library for materials analysis and computational materials science.
Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.
High-performance, transactional key-value database engine for embedded systems and cryptocurrencies.
A Python library that allows developers to easily draw datasets within their notebooks.
A repository of public data sources for building and testing recommender systems.
SQLite JDBC Driver - a Java library for accessing SQLite databases
A persistent, relational store inspired by Datomic and DataScript, written in Rust.
This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.
Docker images containing Jupyter applications for data science and machine learning workflows.
This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.
A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.
A community-driven catalog of geospatial datasets for use with Google Earth Engine.
ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.
An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.
A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.
A Rust library that provides persistent data structures for efficient and immutable data management.
Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.
A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.
A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.
Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.
A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.
Simple script for downloading YouTube comments without using the YouTube API.
This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.
SciRuby provides a collection of tools for scientific computation in Ruby, catering to developers working with data and scientific applications.
A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.
A Go ORM and query builder for interacting with databases in Go applications.
A frequency word list generator and processed files for text analysis and natural language processing.
An extensible framework for linking databases and interactive views, focused on scalability and visualization.
A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.
An open-source project that captures the public GitHub timeline and makes it accessible for analysis.
A Python library that provides common financial risk and performance metrics used in financial analysis.
A Python library for searching and downloading Copernicus Sentinel satellite images for geographic data analysis.
OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.
Python code for causal inference, a book by Miguel Hernán and James Robins.
Index your Gmail account to a SQLite DB and perform custom data analysis on your email.
Modin: Scalable Pandas workflows with a single line of code change, enabling distributed data processing.
An open-source PostgreSQL client application for macOS, providing an easy way to set up and manage a local PostgreSQL database.
Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.
AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.
Extremely fast, easy to use, and fully async NoSQL database for Flutter apps
A command-line tool for version controlling database snapshots, allowing developers to save, restore, and archive database state.
dplyr is a powerful R library for data manipulation, providing a grammar of data manipulation.
A Chinese name corpus and generator for natural language processing and entity recognition.
A Java-based database subsetting and relational data browsing tool for popular databases.
A Python library that provides a set of customizable pipeline processing blocks for data processing tasks.
A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.
Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.
A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.
A Python library for cleaning and transforming data, inspired by the R package Janitor.
Get weekly updates on trending AI coding tools and projects.