Category
Showing 751-800 of 897 trending projects
This is a book that teaches how to use Apache Spark for lightning-fast data analytics.
A curated list of Twitter datasets and resources for data scientists and social network analysts.
An open-source graph database written in Go, useful for building applications that require linked data and graph-based queries.
A Python database adapter for PostgreSQL, allowing developers to interact with their databases.
A Python library for performing multivariate exploratory data analysis, including techniques like PCA, CA, MCA, MFA, and FAMD.
Eloquent ORM for Java 8, 11, 17, 21, 23 and Spring boot 2.x, 3.x
Druid is a high-performance database connection pool for Java applications, designed for monitoring and management.
A Go library for creating high-quality plots and visualizations of data
Entity Framework Core provider for PostgreSQL, enabling .NET developers to easily interact with PostgreSQL databases.
A high-level geospatial data visualization library for Python developers working with spatial data.
An intuitive library to extract features from time series data for data science and machine learning.
A comprehensive dataset of ISO 3166-1 country codes and their corresponding UN Geoscheme regional codes, ready to use in various formats.
Powerful plotting and data visualization library for the Julia programming language.
Simple Python interface for Graphviz, a popular open-source data visualization tool.
A fast spatial index library for 2D points and rectangles in JavaScript, useful for geospatial applications.
Open source hot backup tool for InnoDB and XtraDB databases
A curated collection of resources related to image registration, including books, papers, videos, and toolboxes.
A parallel corpus of classical Chinese and modern Chinese texts for language processing and analysis.
A Python library that syncs data from Postgres to Elasticsearch/OpenSearch, enabling real-time data pipelines.
Open-source BI platform for engineers to explore and model large-scale data pipelines.
A full-featured file system for online data storage, built with Python.
An R package that provides customizable and presentation-ready data summary and analytic result tables.
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.
A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.
A repository for the 100 Knocks of Data Science Preprocessing, focused on structured data processing.
Tegola is an open-source Mapbox Vector Tile server written in Go, enabling efficient geospatial data visualization.
An R package that provides support for simple features, a standardized way to encode spatial vector data.
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL databases seamlessly
An educational project to build a disk-based key-value store in Python for learning purposes.
A collection of PySpark examples covering RDD, DataFrame, and Dataset operations in Python.
A fast, efficient C extension for NumPy that provides optimized array functions.
SQLBoiler is a Go ORM that generates code tailored to your database schema, making it easy to interact with databases.
A collection of medical imaging datasets for researchers and developers in the healthcare industry.
Documentation for the popular .NET ORM Entity Framework Core and Entity Framework 6.
A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.
A Python toolbox for seismology and seismological observatories, providing tools for data processing and analysis.
Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.
A Ruby library that makes it easy to group temporal data, useful for developers working with time-series data.
This is a big data analysis system for the Shenzhen metro with support for various data processing tools.
A Python module for extracting and mapping Chinese province, city, and district data.
A Python library for reading and writing a wide range of image and video formats, including DICOM, animated GIFs, and webcam capture.
Synth is a Rust library for generating realistic, randomized test data for applications and databases.
A comprehensive resource for developers to learn and get started with data engineering using Python.
LuxCore is a high-performance path-tracing render engine for realistic 3D graphics and visualization.
This GitHub repository provides tutorials on effectively using the Pandas library for data analysis.
A fast C++ library for high-performance matrix and vector operations.
Irmin is a distributed database that follows the same design principles as Git, allowing for distributed version control of data.
A JavaScript library for efficient querying and transformation of array-backed data tables.
Percona Server is an enhanced, open-source version of the MySQL database management system.
Get weekly updates on trending AI coding tools and projects.