Category
Showing 201-250 of 897 trending projects
A Python library for common data analysis and machine learning tasks
COVID-19 data repository for developers, providing daily updated case, death, and testing information.
Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.
A next-generation curated knowledge sharing platform for data scientists and other technical professionals.
A curated list of data science interview questions and answers for developers.
A Python library for financial portfolio optimization, including classical efficient frontier and advanced techniques.
Automatically visualize your pandas dataframes with a single print command, enabling quick EDA.
This is a MySQL library containing China's 5-level administrative regions, not a vibe coder tool.
An in-depth tutorial covering mainstream database knowledge for backend developers.
OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.
Titan is a distributed graph database that can be used for building large-scale data-intensive applications.
A C# library for reading and writing CSV files, with support for a wide range of CSV file formats.
A Python tool to parse Redis dump.rdb files, analyze memory usage, and export data to JSON.
lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.
Fluvio is an event stream processing engine for developers to build responsive data-intensive apps.
Sequel is a Ruby library that provides a powerful and flexible object-relational mapping (ORM) for databases.
This is a Python project for big data analysis, focusing on HQL, SQL, and data processing.
A curated list of awesome database tools and resources to make working with databases easier.
dplyr is a powerful R library for data manipulation, providing a grammar of data manipulation.
An open-source Python library that simplifies the process of loading data into data lakes and warehouses.
OrientDB is a versatile, multi-model DBMS that supports Graph, Document, Reactive, Full-Text, and Geospatial models.
A lightweight data processing framework built on DuckDB and 3FS for vibe coders working with AI tools.
Biopython is a set of Python modules that provide a wide range of functionality for bioinformatics, including DNA/RNA/protein sequence analysis, phylogenetics, and more.
Technical Analysis Library using Pandas and Numpy for financial data analysis and trading strategies.
An educational relational database management system (RDBMS) implementation in C++.
Lightweight, fast, and reliable key-value database engine in Go for high-throughput applications.
An open-source, self-hosted database management tool with a spreadsheet-like interface for Postgres
Easy-to-use data handling for SQL data stores with support for implicit table creation, bulk loading, and transactions.
BuntDB is an embeddable, in-memory key/value database for Go with custom indexing and geospatial support.
Matplot++: A C++ graphics library for creating high-quality data visualizations and scientific plots.
An open-source index of Google Trends data, useful for developers building data-driven applications.
Mimesis is a fast Python library for generating fake data in multiple languages for testing and development purposes.
A comprehensive collection of geospatial tools and resources for data analysis, machine learning, and spatial applications.
Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.
A collection of code examples and baselines for common data science and machine learning competitions.
Automatically generates beautiful and easy-to-read ER diagrams from your database.
Cloud-based database manager UI for querying, managing, and visualizing databases across multiple platforms.
An open-source distributed SQL database with high availability, scalability, and ACID transactions.
esProc SPL is a JVM-based programming language for structured data computation, serving as both a data analysis tool and an embedded computing engine.
Open-source data pipeline engine for real-time ETL, connecting data sources to warehouses like BigQuery, Snowflake, Redshift.
A collection of Jupyter Notebook files for data analysis using Python, including a Chinese translation of the popular 'Python for Data Analysis' book.
A Redis-compatible database implemented in Go, supporting SQL and multiple backends like PostgreSQL and SQLite.
A high-quality, cross-platform data plotting library for Rust developers, including WebAssembly support.
A quantitative research and stock analysis platform for finance professionals.
A Python package for accessing and analyzing Formula 1 racing data, including results, schedules, timing, and telemetry.
A grammar of graphics library for creating highly customizable and publication-quality plots in Python.
A Python library that provides a simple and unified interface for extracting text from any document format.
A Python library for accurate and scalable fuzzy matching, record deduplication, and entity resolution.
A comprehensive search tool for finding Chinese NLP datasets, with support for common English NLP datasets as well.
CrateDB is a distributed, scalable SQL database for storing and analyzing massive amounts of data in near real-time.
Get weekly updates on trending AI coding tools and projects.