Category
Showing 51-100 of 897 trending projects
High-performance distributed graph database for real-time use cases
Web-based database diagramming editor with AI-powered export and schema import
Fast, lightweight search backend alternative to Elasticsearch
Kibana is an open-source data visualization and management tool for Elasticsearch
Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes
Distributed SQL database middleware for sharding, scalability, and security
Dolt is Git for Data, enabling version control for SQL databases with Git-like commands and features.
MyBatis SQL Mapper for Java simplifies database interactions with object mapping.
Turso is an in-process SQL database, compatible with SQLite, written in Rust for high performance.
A lightweight, fault-tolerant distributed database built on SQLite, designed for high availability.
A data repository for the data journalism site FiveThirtyEight, containing data and code behind their articles and graphics.
A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.
AKShare is a simple and elegant Python library for accessing financial data APIs.
QuestDB is a high-performance, open-source, time-series database for real-time analytics and financial applications.
networkx is a Python library for creating, manipulating, and studying the structure and dynamics of complex networks.
Presto is an open-source distributed SQL query engine for big data, allowing fast analysis of large datasets.
Apache Arrow is a fast columnar data format and toolset for in-memory analytics and data interchange.
Distributed transactional key-value database, originally created to complement TiDB
Argo Workflows is a powerful open-source workflow engine for Kubernetes, enabling complex data processing and machine learning pipelines.
libSQL is an open-source, open-contribution fork of SQLite, a widely used embedded database.
Prisma1 is a database toolkit with an ORM, migrations, and admin UI for Postgres, MySQL, and MongoDB.
A comprehensive collection of data science cheatsheets for developers and data scientists.
FoundationDB is an open-source, distributed, transactional key-value store that provides ACID guarantees.
Fast, embeddable key-value database written in Go for building high-performance storage applications.
dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.
A high-performance NoSQL data store compatible with Apache Cassandra and Amazon DynamoDB.
Apache Doris is a high-performance, unified analytics database for real-time data processing.
An open-source data orchestration platform for developing, running, and observing data pipelines and workflows.
This is a comprehensive learning resource for the Flink stream processing framework, covering concepts, principles, and real-world use cases.
An open-source graph database written in Go, useful for building applications that require linked data and graph-based queries.
A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.
A Python library for crawling historical data of China stocks.
SciPy is a Python library for scientific and technical computing, providing a wide range of algorithms and tools.
A curated list of awesome big data frameworks, resources and other awesomeness.
ArangoDB is a multi-model database supporting documents, graphs, and key-values for high-performance applications.
Dexie.js is a minimalistic IndexedDB wrapper that simplifies offline storage and database management in web applications.
Apache Druid is a high-performance real-time analytics database for vibe coders working with data-intensive applications.
Dask is a Python library for parallel computing and distributed data processing, providing a scalable alternative to NumPy and Pandas.
A JavaScript library that allows you to run SQLite on the web, enabling local database functionality for web apps.
A Python library for fast, customizable, and interactive data profiling and exploratory data analysis.
JuiceFS is a distributed POSIX file system built on top of Redis and S3 for big data and cloud-native applications.
Google's Operations Research tools for combinatorial optimization, linear programming, and operations research.
This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.
Trino is a distributed SQL query engine for big data, allowing fast, scalable, and cost-effective analytics.
An open-source framework for change data capture from various databases using Apache Kafka.
Citus is a distributed PostgreSQL database that enables scaling out your Postgres database across multiple nodes.
Get weekly updates on trending AI coding tools and projects.