Category
Showing 401-450 of 897 trending projects
A Python module for extracting and mapping Chinese province, city, and district data.
A collection of R packages for data science, including tools for data manipulation, visualization, and modeling.
A fast and accurate short-read sequence aligner written in C for genomics applications.
Utility functions for dbt projects, a popular data transformation tool for data engineers.
A .NET Standard library that provides strongly typed exceptions for Entity Framework Core across multiple database providers.
A Rust library that provides persistent data structures for efficient and immutable data management.
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
A registry of publicly available datasets hosted on AWS for data-driven developers.
A Python library that allows developers to easily draw datasets within their notebooks.
Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.
A fast spatial index library for 2D points and rectangles in JavaScript, useful for geospatial applications.
A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.
A fast, embeddable column database written in Go, optimized for AI/ML workloads.
Tonbo is an embedded database for serverless and edge runtimes, optimized for offline-first and big data use cases.
A curated collection of resources related to image registration, including books, papers, videos, and toolboxes.
An end-to-end data pipeline for building a data lake, data warehouse, and analytics platform from GoodReads data.
A collection of SQL practice problems for developers to improve their SQL skills.
A collection of efficient Python tricks and tools for data scientists to improve their productivity.
A cross-platform way to express data transformation, relational algebra, and standardized record expression and plans.
A Python library providing SQL views for Dune Analytics, a popular blockchain data analysis platform.
Percona Toolkit is a collection of advanced open source database tools for MySQL, MongoDB, and PostgreSQL.
Open source SQL query assistant service for databases and data warehouses
Python interface for the igraph library, a powerful tool for network analysis and visualization.
Graft is an open-source transactional storage engine optimized for lazy, partial, and strongly consistent replication, ideal for edge, offline-first, and distributed applications.
A distributed, Redis-compatible NoSQL database that provides high performance and scalability.
A Python library that syncs data from Postgres to Elasticsearch/OpenSearch, enabling real-time data pipelines.
A collection of Unix, R, and Python tools for bioinformatics and data science projects.
An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.
This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.
A Python-based image processing framework with plugins for common image processing libraries.
A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.
A fast and elegant data exploration library for Elixir, providing series and dataframes for data science workflows.
Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.
A full-featured file system for online data storage, built with Python.
This repository provides code and data for a book on statistics for data scientists.
A Python library with data related to Brazilian municipalities, including IBGE codes, latitude, longitude, and more.
DBngin is a free, open-source, cross-platform database management tool for developers.
Lakekeeper is an open-source, secure, and fast Apache Iceberg REST Catalog written in Rust for data lakehouse governance.
Python library for using dplyr-like syntax with pandas and SQL databases
A curated list of Google Earth Engine resources for geospatial analysis and remote sensing applications.
A comprehensive enrichment analysis tool for interpreting omics data, with support for GO, KEGG, and more.
A Python package for processing earth-observing satellite data with support for common data formats and tools.
A versatile Python library for bioinformatics, providing data structures, algorithms, and educational resources.
This is a Python library focused on basketball analytics and data processing.
A beginner-friendly Python toolkit for financial data extraction, analysis, and automation.
A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
A DICOM to NIfTI converter for medical imaging research and neuroimaging applications.
An advanced geospatial data analysis platform for tasks like geomorphology, hydrology, and remote sensing.
Overture Maps Data is a Python library providing access to open-source geographic data.
Get weekly updates on trending AI coding tools and projects.