Category
Showing 601-650 of 897 trending projects
Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.
A distributed knowledge graph store built in Go for managing large-scale semantic data.
A persistent, relational store inspired by Datomic and DataScript, written in Rust.
A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.
Performant probabilistic data structures for processing continuous, unbounded streams in Go.
A blazingly fast analytics database built with Rust, optimized for rapidly devouring large amounts of data.
A functional, type-safe, composable Scala data access library for Postgres databases.
A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.
SQLite with Branches - a lightweight, embedded database with version control capabilities.
A fast, lightweight SQLite-based persistence layer with CloudKit synchronization for Swift developers.
A C++ library for importing OpenStreetMap data into a PostgreSQL/PostGIS database.
An open-source COVID-19 dashboard powered by the fastpages framework, featuring data visualizations.
A standard filetree template for data curation and organization, useful for developers interested in data management.
A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.
A framework-agnostic, datastore-agnostic JavaScript ORM built for ease of use and peace of mind.
Concurrent data pipelines in Python for building efficient and scalable data processing workflows.
A curated list of awesome materials and resources for database development.
An open-source, TypeScript-based Entity-Relationship Diagram (ERD) editor for developers working with databases.
Cartopy is a Python library for creating maps and visualizing spatial data with matplotlib support.
A free database of geographic place names and corresponding geospatial data for developers to use.
A collection of articles and source code on using the pandas data analysis library.
A data quality and observability tool for monitoring and fixing data issues before they become problems.
A collection of SQL queries to analyze social media datasets.
Agile data preparation workflows made easy with popular Python data science libraries.
cryo is a Rust library for extracting blockchain data to parquet, CSV, JSON, or Python dataframes.
AWS Glue code samples for building data integration and ETL pipelines on AWS.
A Python library for retrieving administrative division codes for China's GB/T 2260 standard.
A searchable compilation of Kaggle past solutions for data science and machine learning developers.
Open source hot backup tool for InnoDB and XtraDB databases
A JavaScript library for efficient querying and transformation of array-backed data tables.
A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.
A curated collection of resources related to image registration, including books, papers, videos, and toolboxes.
A Go ORM and query builder for interacting with databases in Go applications.
A concise guide to the MongoDB NoSQL database for developers.
A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.
HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.
A popular Scala library for parsing and manipulating JSON data in Scala applications.
An offline IP database for developers to look up IP address geolocation information.
A Python library for cleaning and transforming data, inspired by the R package Janitor.
A data workflow tool for data engineers and analysts, similar to 'Make for data'.
A lightweight Python OLAP framework for multi-dimensional data analysis and reporting.
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
AgensGraph is a transactional graph database based on PostgreSQL for enterprise-level applications.
Synth is a Rust library for generating realistic, randomized test data for applications and databases.
A Python library for calculating customer lifetime value metrics and cohort analysis.
Dremio is an open-source data analytics platform that simplifies and accelerates big data analysis.
A high-performance compression library written in C for developers working with large data sets.
Python demos for spatial data analytics, geostatistics, and machine learning to support courses.
EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.
Diagrams and documentation for InnoDB, the storage engine used by MySQL and MariaDB databases.
Get weekly updates on trending AI coding tools and projects.