Category
Showing 201-250 of 897 trending projects
Distributed transactional key-value database, originally created to complement TiDB
A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.
A curated list of awesome big data frameworks, resources and other awesomeness.
A distributed database with CRDT sync, offline support, and end-to-end encryption for vibe coders.
Fast, embeddable key-value database written in Go for building high-performance storage applications.
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
A Python library for financial analysis and data scraping from the Finviz platform.
An open-source N-body simulation library for astrophysics and planetary science.
A high-quality, cross-platform data plotting library for Rust developers, including WebAssembly support.
Statsmodels is a Python library for statistical modeling and econometrics, providing tools for data analysis and prediction.
OrioleDB is a cloud-native PostgreSQL extension that solves performance and scalability challenges.
Build vector tilesets from large collections of GeoJSON features.
Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.
An open-source, TypeScript-based Entity-Relationship Diagram (ERD) editor for developers working with databases.
An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.
Unified cloud-native data warehouse platform for analytics, search and AI, built on top of S3 storage.
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
A C++ implementation of the Python NumPy library for scientific computing and numerical analysis.
A Redis-compatible database implemented in Go, supporting SQL and multiple backends like PostgreSQL and SQLite.
A collection of data science projects in Python using Jupyter Notebook.
A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.
A simple Python wrapper for the Tabula Java library, which extracts tables from PDF files into Pandas DataFrames.
A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.
This GitHub repository contains SQL data analysis and visualization projects using various tools and databases.
Kibana is an open-source data visualization and management tool for Elasticsearch
A JavaScript library for visualizing and understanding complex data structures.
A distributed SQL database built from scratch, not focused on vibe coders or AI tools.
A C++ library for multidimensional array operations with broadcasting and lazy computing.
A tutorial for using the popular Python data analysis library Pandas, presented at PyCon 2015.
A Python toolbox for gaining geometric insights into high-dimensional data, useful for vibe coders working with AI tools.
SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.
Synthea is an open-source synthetic patient population simulator for generating realistic healthcare data.
A MySQL-compatible relational database with a storage agnostic query engine, implemented in Go.
This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.
Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage
C++ DataFrame library for statistical, financial, and machine learning analysis.
Apache Fluss is a real-time streaming storage platform built for big data analytics.
Scalable and efficient data transformation framework with backwards compatibility for dbt.
lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.
A collection of study notes, ebooks, and resources on big data, machine learning, Linux, and more for developers.
MongoDB data stream pipeline tools for managing real-time data synchronization and replication.
Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.
A curated list of awesome materials and resources for database development.
Programmable CUDA/C++ GPU Graph Analytics library for high-performance parallel graph processing.
This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.
Intake is a lightweight Python package for discovering, investigating, loading and distributing data.
WCDB is a cross-platform database framework developed by WeChat for Android, iOS, Linux, macOS, and Windows.
PRQL is a modern, powerful, and pipelined SQL replacement for transforming data.
A lightweight SQLite3 driver for Go that implements the database/sql interface.
GDAL is an open-source library for working with various geospatial data formats, useful for remote sensing and GIS applications.
Get weekly updates on trending AI coding tools and projects.