Category
Showing 651-700 of 897 trending projects
Percona Toolkit is a collection of advanced open source database tools for MySQL, MongoDB, and PostgreSQL.
Embedded Go Database, a fast open-source NoSQL database solution for Go projects.
A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.
Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.
A simple SQLite file viewer that allows you to view and explore SQLite databases online.
OctoSQL is a powerful SQL query tool that allows you to join, analyze, and transform data from multiple databases and file formats.
A high-performance compressed bitset library for Java used in Apache Spark, Netflix Atlas, and others.
An offline IP database for developers to look up IP address geolocation information.
A DICOM to NIfTI converter for medical imaging research and neuroimaging applications.
Connect processes into powerful data pipelines with a simple git-like filesystem interface
A grammar of graphics library for creating highly customizable and publication-quality plots in Python.
A C++ library for multidimensional array operations with broadcasting and lazy computing.
A Python library for extracting tabular data from PDF files, useful for data processing and analysis.
A Python library for data sketching techniques like MinHash, LSH, HyperLogLog, and HNSW for approximate similarity search.
A C++ library for reading and writing .npy and .npz files, commonly used in scientific computing.
A frequency word list generator and processed files for text analysis and natural language processing.
This repository contains a collection of portfolio projects for a data analyst, not a developer discovery platform.
A Python library that implements database internals from scratch, useful for learning database concepts.
AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.
An open-source global repository of address, building, and parcel data for developers and geospatial applications.
Open Babel is a chemical toolbox for working with chemical data and cheminformatics.
A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.
Docker images containing Jupyter applications for data science and machine learning workflows.
KurrentDB is an event-native database designed for modern software and event-driven architectures.
A Java-based framework for building agile DataOps pipelines using tools like Flink, DataX, and Chunjun with a web UI.
Cloud-native genomic dataframes and batch computing for bioinformatics and genetics research.
A Python library for extracting data from a wide range of internet sources into a pandas DataFrame.
Python interface for the igraph library, a powerful tool for network analysis and visualization.
A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.
Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.
Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.
Fluent Migrator is a .NET migration framework for managing database schema changes across multiple database providers.
A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.
PySAL is a Python Spatial Analysis Library meta-package for geographical data analysis and modeling.
A curated list of Google Earth Engine resources for geospatial analysis and remote sensing applications.
A Python library that summarizes news articles by extracting the most important sentences.
An open-source C++ framework for fast and parallel map matching of GPS trajectories.
A powerful, multi-database ORM for .NET that supports a wide range of SQL databases and provides a seamless data access layer.
Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.
Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.
A curated list of awesome JSON datasets that don't require authentication.
A cloud-native PostgreSQL database developed by Alibaba Cloud for high-performance, scalable data storage and management.
A Rust library that provides persistent data structures for efficient and immutable data management.
A powerful suite of sparse matrix algorithms and libraries for scientific and numerical computing.
Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.
An extensible framework for linking databases and interactive views, focused on scalability and visualization.
An embedded time-series database written in Go for storing and querying metrics data.
A comprehensive enrichment analysis tool for interpreting omics data, with support for GO, KEGG, and more.
This is a roadmap for becoming a data engineer, not a developer discovery platform for vibe coders.
A Chinese name corpus and generator for natural language processing and entity recognition.
Get weekly updates on trending AI coding tools and projects.