Category
Showing 301-350 of 897 trending projects
WebAssembly version of the DuckDB analytical database, enabling fast in-browser analytics and SQL queries.
A database modeling language (DBML) that helps define and document database structures.
Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.
An open-source, scalable, and fault-tolerant NoSQL database with a focus on reliability and offline-first design.
A Python tool for automatically scraping data on China's statutory holidays from government announcements.
This is a comprehensive learning resource for the Flink stream processing framework, covering concepts, principles, and real-world use cases.
Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.
ArangoDB is a multi-model database supporting documents, graphs, and key-values for high-performance applications.
A Python library for financial data visualization using Matplotlib, focused on candlestick and OHLC charts.
Technical Analysis Library using Pandas and Numpy for financial data analysis and trading strategies.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
Efficient in-memory cache in Go for storing and retrieving large amounts of data.
A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.
A quantitative research and stock analysis platform for finance professionals.
A comprehensive collection of geospatial tools and resources for data analysis, machine learning, and spatial applications.
An open-access book on scientific visualization using Python and Matplotlib for data-driven developers
Biopython is a set of Python modules that provide a wide range of functionality for bioinformatics, including DNA/RNA/protein sequence analysis, phylogenetics, and more.
Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.
A real-time Postgres data replication and streaming library built in Rust for building CDC pipelines.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.
An acoustic spectrum analyzer library written in C++ for audio analysis and visualization.
Open-source repository for sharing code related to the MIMIC family of critical care databases.
Apache Pinot is a realtime distributed OLAP datastore for fast querying of large datasets.
A high-performance, MySQL-compatible vector database that supports structured and unstructured data for AI-driven applications.
Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.
A Python library for extracting tabular data from PDF files, useful for data processing and analysis.
A basic document (NoSQL) database implementation in Go, suitable for small-scale projects.
A data repository for the data journalism site FiveThirtyEight, containing data and code behind their articles and graphics.
Apache Druid is a high-performance real-time analytics database for vibe coders working with data-intensive applications.
A transactional, relational-graph-vector database that uses Datalog for query, designed for AI and ML use cases.
MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.
A grammar of graphics library for creating highly customizable and publication-quality plots in Python.
A SQL database explorer supporting multiple database engines like SQLite, PostgreSQL, and MySQL.
A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster).
Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.
An open-source, TypeScript-based Entity-Relationship Diagram (ERD) editor for developers working with databases.
A Rust library for quantitative finance, including tools for machine learning, option pricing, and trading.
A Python library that allows developers to easily draw datasets within their notebooks.
Modin: Scalable Pandas workflows with a single line of code change, enabling distributed data processing.
An ultra-lightweight database that supports key-value and time series data for embedded and IoT applications.
A Go-based tool for database anonymization and synthetic data generation to help with security, QA, and data masking.
A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.
Matplot++: A C++ graphics library for creating high-quality data visualizations and scientific plots.
A book on data science, covering topics from basic math to machine learning using Python and Jupyter Notebooks.
A powerful suite of sparse matrix algorithms and libraries for scientific and numerical computing.
A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.
A Rust library for serializing and deserializing data in the Rusty Object Notation (RON) format.
Percona Toolkit is a collection of advanced open source database tools for MySQL, MongoDB, and PostgreSQL.
This is a code repository for a book on practical statistics for data scientists, not a developer discovery platform.
Get weekly updates on trending AI coding tools and projects.