Category
Showing 801-850 of 897 trending projects
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
DataLink is a real-time and offline data exchange platform that supports synchronization between heterogeneous data sources.
A searchable compilation of Kaggle past solutions for data science and machine learning developers.
An interactive tutorial for the Dask distributed computing library, focused on data analysis and manipulation.
A blazingly fast analytics database built with Rust, optimized for rapidly devouring large amounts of data.
Concurrent data pipelines in Python for building efficient and scalable data processing workflows.
TensorBase is a new big data warehousing solution built with Rust, focused on high-performance analytics.
EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.
Quilt is a data mesh for connecting people with actionable data, built with TypeScript.
Mycelite is a SQLite extension that enables replication between SQLite instances.
The versioned, forkable, syncable database for developers who need a scalable, distributed data solution.
Feather is a fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow.
A data warehouse for COVID-19 time series data, useful for data analysis and visualization.
Crafty statistical graphics library for the Julia programming language
A Python library for calculating customer lifetime value metrics and cohort analysis.
A book that teaches the basics of using the Redis in-memory data structure store.
A JavaScript library for working with multidimensional arrays, useful for data visualization and scientific computing.
A high-performance, persistent, off-heap data structure written in Clojure for data-intensive applications.
A library for text mining and natural language processing using tidy data principles in R.
Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.
A distributed, scalable Prometheus-compatible time series database written in Scala.
A fast, in-memory B-tree implementation for sorted collections in Swift.
Python data structures library focused on serialization, deserialization, and validation of complex data schemas.
A columnar storage extension for Postgres built as a foreign data wrapper.
A data quality and observability tool for monitoring and fixing data issues before they become problems.
Mondrian is an OLAP server that enables real-time analysis of large data sets for business users.
Grid Studio is a web-based application for data science with full integration of open source data science frameworks and languages.
A versatile ORM for multiple databases including MySQL, SQLite, MariaDB, PostgreSQL, and MongoDB in Deno.
HiBench is a big data benchmark suite for evaluating the performance of different big data frameworks.
Modern database IDE for dev & data workflows, supporting MySQL, PostgreSQL & MongoDB.
Prisma1 is a database toolkit with an ORM, migrations, and admin UI for Postgres, MySQL, and MongoDB.
Titan is a distributed graph database that can be used for building large-scale data-intensive applications.
A MongoDB schema analysis tool that helps developers understand and optimize their NoSQL database.
Self-Driving Database Management System from Carnegie Mellon University
A Python library for retrieving administrative division codes for China's GB/T 2260 standard.
Python library for using dplyr-like syntax with pandas and SQL databases
A Swift extension for RealmSwift that provides reactive programming support using RxSwift.
Open source time series library for Python, useful for statistical analysis and modeling.
A fast B+ tree indexing structure in C for efficient storage and retrieval of billions of key-value pairs.
Connect processes into powerful data pipelines with a simple git-like filesystem interface
A Python tool that automatically cleans and preprocesses data for analysis and machine learning.
Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.
FeatureBase is a fast analytical database built on bitmaps, perfect for ML and data-intensive applications.
A tool to easily import CSV and JSON data into PostgreSQL databases.
Sample datasets for users of the Yelp Academic Dataset, useful for data analysis and machine learning.
A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.
This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.
A composable data framework for building ambitious web applications using TypeScript.
A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.
A space-efficient trie data structure in Go with fast lookup performance.
Get weekly updates on trending AI coding tools and projects.