Category
Showing 551-600 of 897 trending projects
Dremio is an open-source data analytics platform that simplifies and accelerates big data analysis.
A comprehensive guide to technical references for data careers, including Python, machine learning, and data science.
DiceDB is an open-source, fast, reactive, in-memory database optimized for modern hardware.
SQL query builder for C# developers, supporting multiple databases and complex queries.
Linq to database provider for .NET, supporting various database engines.
A dataset of cluster data collected from Alibaba's production clusters for cluster management research.
tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.
This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.
This is a data repository for the Seaborn data visualization library in Python.
A C++ library for reading and writing large multi-dimensional arrays, useful for scientific and data-intensive applications.
A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.
WCDB is a cross-platform database framework developed by WeChat for Android, iOS, Linux, macOS, and Windows.
Comprehensive collection of city and administrative region data for China, with features like CSV export, JS code generation, and web scraping.
An in-process OLAP SQL Engine powered by ClickHouse, enabling fast and efficient data analysis.
LibRaw is a C++ library for reading RAW image files from digital cameras.
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
A curated list of Polars, an open-source, high-performance data manipulation library for Python and Rust.
Cloud-based database manager UI for querying, managing, and visualizing databases across multiple platforms.
A Rust library for quantitative finance, including tools for machine learning, option pricing, and trading.
A Python library that provides support for the pgvector vector database, enabling efficient vector search and storage.
Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.
Open-source relational database engine powering web apps, APIs, and data-driven backends worldwide.
A type-safe, Swift-language layer over SQLite3 for building database-backed Swift applications.
Automatically generates beautiful and easy-to-read ER diagrams from your database.
Apache DataFusion Ballista is a distributed query engine for big data analysis, built with Rust and Arrow.
A powerful C library for analyzing complex networks and graph-based data structures.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
A tutorial for performing statistical data analysis using Python, covering topics like regression, hypothesis testing, and more.
Embedded Go Database, a fast open-source NoSQL database solution for Go projects.
A library for generating MaxMind GeoIP2 databases for China IP addresses.
A Python tool that automatically cleans and preprocesses data for analysis and machine learning.
Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.
A Go driver for the ClickHouse analytics database, enabling fast and efficient data processing.
Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.
LiteDB is a lightweight, embedded NoSQL document database for .NET applications that can be used in a single data file.
Hazelcast is a high-performance, distributed in-memory data platform for real-time insights and stream processing.
A Python library for common data analysis and machine learning tasks
A powerful data visualization and plotting library for the Julia programming language.
An open-source, community-driven platform for data-intensive scientific analysis and visualization.
A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.
A Python package for analyzing heart rate data from PPG and ECG signals.
A Python package for interactive geospatial analysis and visualization with Google Earth Engine.
C++ DataFrame library for statistical, financial, and machine learning analysis.
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.
A C# library for reading and writing metadata in media files, useful for audio and video processing applications.
A curated list of awesome resources for the data transformation tool dbt, focused on analytics engineering.
Archive, search, and analyze your entire email/chat history offline with DuckDB-powered analytics and AI queries.
A curated list of resources for graph databases and graph computing tools, useful for developers working with graph-based data.
A community-driven catalog of geospatial datasets for use with Google Earth Engine.
Get weekly updates on trending AI coding tools and projects.