Category
Showing 351-400 of 897 trending projects
A library for calling Python functions from the Ruby language, enabling data science and ML workflows.
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
Overture Maps Data is a Python library providing access to open-source geographic data.
A modular quantitative trading framework for algorithmic trading, backtesting, and financial analysis.
An extensible framework for linking databases and interactive views, focused on scalability and visualization.
This GitHub repository provides time series data on COVID-19 cases, useful for data analysis and visualization.
A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.
Simple script for downloading YouTube comments without using the YouTube API.
This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.
A Python library for retrieving administrative division codes for China's GB/T 2260 standard.
A PostgreSQL extension that adds HyperLogLog data structures as a native data type.
A Redis module that provides a time series data structure for storing and querying time series data.
Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.
Converts MySQL database dumps to SQLite3 compatible formats for easier migration and data portability.
A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.
A free, interactive SQL learning platform with an online SQL editor, real-time query results, and syntax highlighting.
This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.
A high-performance, memory-efficient Python data analysis library for handling large datasets.
A lightweight Python OLAP framework for multi-dimensional data analysis and reporting.
Distributed, massively parallel SQL query engine for big data analytics and timeseries workloads.
This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.
A lightweight SQLite3 driver for Go that implements the database/sql interface.
GDAL is an open-source library for working with various geospatial data formats, useful for remote sensing and GIS applications.
A collection of articles and source code on using the pandas data analysis library.
A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.
A Python library for data migration and transformation in the Blaze project.
Apache Fluss is a real-time streaming storage platform built for big data analytics.
A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.
A collection of open data sets and tools for data science and machine learning tasks.
This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.
An educational relational database management system (RDBMS) implementation in C++.
This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.
A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.
A PHP library for dumping the contents of a database to a file, supporting multiple database engines.
TrailDB is an efficient database for storing and querying series of events.
A Chinese translation of a popular book on using Python for data analysis with libraries like pandas and numpy.
A collection of notebooks covering quantitative finance and numerical methods in Python.
First open-source data discovery and observability platform for data practitioners.
Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.
An R package that provides customizable and presentation-ready data summary and analytic result tables.
A Chinese translation of the book 'Python for Data Analysis' 2nd Edition, covering NumPy, Pandas, and other data analysis tools.
A Python library for fast, customizable, and interactive data profiling and exploratory data analysis.
A highly scalable, high-performance graph database that supports over 100 billion data points.
A C++ library for processing data streams, potentially useful for vibe coders working with AI-powered tools.
Distributed SQL database middleware for sharding, scalability, and security
Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.
A collection of R packages for data science, including tools for data manipulation, visualization, and modeling.
QueryKit is a simple CoreData query language for Swift and Objective-C developers.
A transactional, relational-graph-vector database that uses Datalog for query, designed for AI and ML use cases.
Scalable and efficient data transformation framework with backwards compatibility for dbt.
Get weekly updates on trending AI coding tools and projects.