Category
Showing 751-800 of 897 trending projects
Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.
A high-performance, highly available, and distributed time series database written in Rust.
A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.
This repository provides code and data for a book on statistics for data scientists.
A collection of procedures for the Neo4j graph database, providing advanced graph algorithms and utilities.
AgensGraph is a transactional graph database based on PostgreSQL for enterprise-level applications.
An open-source financial data extraction tool that allows easy API access to web scrape data from various websites.
A curated list of awesome database libraries, resources, and tools for developers.
A portfolio of data science projects covering machine learning, NLP, and more for personal and academic use.
A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.
A PostgreSQL extension that adds HyperLogLog data structures as a native data type.
A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.
A fast, efficient C extension for NumPy that provides optimized array functions.
A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.
Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.
Real-time global and U.S. data tracking for developers and researchers.
A next-generation curated knowledge sharing platform for data scientists and other technical professionals.
A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.
The Go kernel for Jupyter notebooks and nteract, enabling data science and numerical computing in Go.
A unified interface for distributed computing on Spark, Dask and Ray without any rewrites.
This is a Python library for financial applications, not a tool for AI-powered vibe coders.
A tutorial for performing statistical data analysis using Python, covering topics like regression, hypothesis testing, and more.
Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.
Python library for clustering categorical data using k-modes and k-prototypes algorithms.
A color palette package in R inspired by works at the Metropolitan Museum of Art in New York.
This is Facebook's branch of the Oracle MySQL database, including the MyRocks storage engine.
A fast C++ library for high-performance matrix and vector operations.
Apache BookKeeper is a scalable, fault tolerant and low latency storage service optimized for append-only workloads.
GeoMesa is a suite of tools for working with big geo-spatial data in a distributed fashion.
tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.
A pure Go library for reading and writing Parquet files, a columnar data format.
A Python library for extracting, transforming, and loading tabular data.
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
A Python library for creating data processing pipelines using functional programming principles.
A Python library that implements database internals from scratch, useful for learning database concepts.
A data science and machine learning library for Go, providing DataFrame functionality similar to Python's Pandas.
A high-performance, open-source data processing pipeline for ingesting Kafka data and sending it to Elasticsearch.
This is a Python library focused on basketball analytics and data processing.
Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases.
Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.
A versatile app for querying, scripting, and visualizing data from various databases, files, and APIs.
A high-performance, memory-efficient Python data analysis library for handling large datasets.
MongoShake is a universal data replication platform based on MongoDB's oplog, enabling redundant replication and active-active replication.
A searchable compilation of Kaggle past solutions for data science and machine learning developers.
A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.
Python data structures library focused on serialization, deserialization, and validation of complex data schemas.
A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.
A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.
An R package for training and plotting classification and regression models.
Performant probabilistic data structures for processing continuous, unbounded streams in Go.
Get weekly updates on trending AI coding tools and projects.