Category
Showing 551-600 of 897 trending projects
A Python library for creating beautiful visualizations of language differences across document types.
A data warehouse for COVID-19 time series data, useful for data analysis and visualization.
ggstatsplot is an R library that enhances ggplot2 visualizations with statistical analysis and hypothesis testing.
Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.
A distributed SQL database built from scratch, not focused on vibe coders or AI tools.
A unified interface for distributed computing on Spark, Dask and Ray without any rewrites.
A collection of Python code, notebooks, and examples for practical business data analysis and visualization.
Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.
Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.
Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.
A JavaScript library that converts CSV and tab-delimited data to web-friendly formats like JSON and XML.
Powerful plotting and data visualization library for the Julia programming language.
A space-efficient trie data structure in Go with fast lookup performance.
A versatile ORM for multiple databases including MySQL, SQLite, MariaDB, PostgreSQL, and MongoDB in Deno.
Crafty statistical graphics library for the Julia programming language
A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.
A Python toolbox for gaining geometric insights into high-dimensional data, useful for vibe coders working with AI tools.
A high-performance, memory-efficient Python data analysis library for handling large datasets.
A fast B+ tree indexing structure in C for efficient storage and retrieval of billions of key-value pairs.
A Python statistical package based on Pandas, providing various statistical methods and tests.
MongoDB data stream pipeline tools for managing real-time data synchronization and replication.
A C# library that converts Excel spreadsheets to JSON objects and saves them to a text file.
A Python library for technical analysis indicators, with Chinese translation and documentation.
An interactive tutorial for the Dask distributed computing library, focused on data analysis and manipulation.
A high-performance C++ linear algebra library focused on solvers, sparse matrices, and numerical computing.
This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.
MongoShake is a universal data replication platform based on MongoDB's oplog, enabling redundant replication and active-active replication.
Entity Framework Core provider for PostgreSQL, enabling .NET developers to easily interact with PostgreSQL databases.
A JavaScript statistical library that provides a wide range of statistical functions for data analysis.
A C# in-memory document database with source generator-based embedded typed readonly data.
This Python repository contains code examples and notes for data analysis and mining.
A large-scale entity and relation database supporting aggregation of properties for big data applications.
A columnar storage extension for Postgres built as a foreign data wrapper.
SchemaCrawler is a free database schema discovery and comprehension tool that supports various database management systems.
A collection of solutions to Chinese data competitions, primarily using Python.
A MongoDB schema analysis tool that helps developers understand and optimize their NoSQL database.
Fast n-dimensional filtering and grouping of records, a powerful data manipulation library for JavaScript.
Self-Driving Database Management System from Carnegie Mellon University
A database solution that provides better analytics on top of MongoDB and makes it easier to migrate from MongoDB to SQL.
This R library provides historical investment returns analysis for the overall stock market.
A high-performance, highly available, and distributed time series database written in Rust.
A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.
Graph and network visualization library for R developers working with tabular data
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
Documentation for the popular .NET ORM Entity Framework Core and Entity Framework 6.
The Auron accelerator framework leverages vectorized execution to speed up distributed computing on big data platforms like Spark.
A tutorial for performing statistical data analysis using Python, covering topics like regression, hypothesis testing, and more.
PaxosStore is a high-performance, distributed database solution built for large-scale applications.
A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.
A Python client library for interacting with the InfluxDB time-series database.
Get weekly updates on trending AI coding tools and projects.