Category
Showing 501-550 of 897 trending projects
This is a data repository for the Seaborn data visualization library in Python.
A persistent, relational store inspired by Datomic and DataScript, written in Rust.
A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.
Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Open source SQL query assistant service for databases and data warehouses
A comprehensive set of Python notes and resources for developers, covering a wide range of topics including data science, machine learning, and scientific computing.
SQL Lineage Analysis Tool that provides data discovery and governance insights through Python.
A frequency word list generator and processed files for text analysis and natural language processing.
CrateDB is a distributed, scalable SQL database for storing and analyzing massive amounts of data in near real-time.
A community-driven catalog of geospatial datasets for use with Google Earth Engine.
A simple Python library for creating dataclasses from dictionaries.
RRDtool is a time-series database system for efficiently storing and graphing data.
This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.
A collection of medical imaging datasets for researchers and developers in the healthcare industry.
Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x
A high-performance Python library for working with large tabular datasets, offering efficient data manipulation and visualization.
Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents.
A Java-based framework for building agile DataOps pipelines using tools like Flink, DataX, and Chunjun with a web UI.
A collection of Python code examples and tutorials for data science, machine learning, and web development.
A Python script that generates a CSV file with data about players in the English Premier League Fantasy League.
A collection of SQL practice problems for developers to improve their SQL skills.
This Python library provides additional linear models for statistical modeling and analysis.
An open-source global repository of address, building, and parcel data for developers and geospatial applications.
Apache XTable is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
A Rust-based graph database for developers who need to store and query connected data.
A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.
An open-source graph database written in Go, useful for building applications that require linked data and graph-based queries.
A .NET Standard library that provides strongly typed exceptions for Entity Framework Core across multiple database providers.
An embedded time-series database written in Go for storing and querying metrics data.
This repository provides a comprehensive dataset of over 850,000 Chinese poems from ancient to modern times, making it a valuable resource for developers working with Chinese poetry.
Agile data preparation workflows made easy with popular Python data science libraries.
A curated list of Python packages for chemistry, including computational chemistry, molecular dynamics, and quantum chemistry.
A command-line tool for version controlling database snapshots, allowing developers to save, restore, and archive database state.
OrientDB is a versatile, multi-model DBMS that supports Graph, Document, Reactive, Full-Text, and Geospatial models.
A curated collection of resources for data science and machine learning enthusiasts.
A Python library for extracting data from a wide range of internet sources into a pandas DataFrame.
Index your Gmail account to a SQLite DB and perform custom data analysis on your email.
Real-time global and U.S. data tracking for developers and researchers.
Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.
A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.
Highly available PostgreSQL cluster using Docker, focused on data infrastructure for developers.
PDAL is a C++ library for processing point cloud data, similar to GDAL for raster data.
PoloDB is an embedded document database written in Rust for building cross-platform, local-first applications.
A Java-based database subsetting and relational data browsing tool for popular databases.
A Python library that syncs data from Postgres to Elasticsearch/OpenSearch, enabling real-time data pipelines.
A curated list of tools and datasets for anomaly detection on time-series data.
Fluid is a distributed data abstraction and acceleration framework for Big Data and AI applications on the cloud.
A registry of publicly available datasets hosted on AWS for data-driven developers.
A curated collection of resources related to image registration, including books, papers, videos, and toolboxes.
Get weekly updates on trending AI coding tools and projects.