Category
Showing 151-200 of 897 trending projects
MMseqs2 is an ultra-fast and sensitive bioinformatics tool for sequence search and clustering.
GlobalBuildingAtlas is an open global and complete dataset of building polygons, heights and LoD1 3D models.
TuGraph-DB is a high-performance graph database built for fast and efficient graph data processing.
Lightweight and extensible compatibility layer between popular dataframe libraries like Pandas, Dask, and PySpark.
A Python library for creating circular data visualizations like Circos plots, chord diagrams, and radar charts.
ORM for TypeScript and JavaScript with support for multiple databases and platforms.
Mongoose is a MongoDB object modeling tool for Node.js and Deno, simplifying database interactions with schemas and models.
Fast, lightweight search backend alternative to Elasticsearch
Distributed SQL database middleware for sharding, scalability, and security
A lightweight, fault-tolerant distributed database built on SQLite, designed for high availability.
A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.
QuestDB is a high-performance, open-source, time-series database for real-time analytics and financial applications.
Argo Workflows is a powerful open-source workflow engine for Kubernetes, enabling complex data processing and machine learning pipelines.
Apache Doris is a high-performance, unified analytics database for real-time data processing.
Dexie.js is a minimalistic IndexedDB wrapper that simplifies offline storage and database management in web applications.
A JavaScript library that allows you to run SQLite on the web, enabling local database functionality for web apps.
An open-source metadata platform for managing your data and AI stack across the enterprise.
Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.
A flexible and standardized cookiecutter template for doing and sharing data science work in Python.
A high-performance, concurrent, embedded key-value database written in Rust for vibe coders.
A free, open-source SQLite database manager for multiple platforms.
Flink CDC is a streaming data integration tool that enables real-time data pipelines and change data capture.
GDAL is an open-source library for working with various geospatial data formats, useful for remote sensing and GIS applications.
A powerful, multi-database ORM for .NET that supports a wide range of SQL databases and provides a seamless data access layer.
AliSQL is a MySQL branch originated from Alibaba Group, focused on high performance and scalability.
A curated list of data science interview questions and answers for developers.
A Python library for financial portfolio optimization, including classical efficient frontier and advanced techniques.
dplyr is a powerful R library for data manipulation, providing a grammar of data manipulation.
Automatically generates beautiful and easy-to-read ER diagrams from your database.
OrioleDB is a cloud-native PostgreSQL extension that solves performance and scalability challenges.
A curated list of awesome resources for network analysis and visualization, with a focus on R tools.
A transactional, relational-graph-vector database that uses Datalog for query, designed for AI and ML use cases.
Deequ is a Scala library for defining "unit tests for data" to measure data quality in large datasets.
Blazing-fast data wrangling toolkit for AI and data engineering workflows
An open-source data modeling tool designed for PostgreSQL, allowing developers to generate DDL commands visually.
Official Rust implementation of the Apache Arrow data format for efficient data processing and storage.
A Python library for extracting data from a wide range of internet sources into a pandas DataFrame.
Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage
A Postgres extension for high-performance vector search, complementing pgvector for scale.
A distributed database with CRDT sync, offline support, and end-to-end encryption for vibe coders.
sq is a Go-based data wrangling tool that supports a variety of data formats and databases.
Sample database for SQL Server, Oracle, MySQL, PostgreSQL, SQLite, DB2
Hamilton is an open-source ETL framework that helps data scientists and engineers build modular, testable dataflows with lineage and metadata.
Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.
WebAssembly version of the DuckDB analytical database, enabling fast in-browser analytics and SQL queries.
A Python library for portfolio optimization using scikit-learn and convex optimization techniques.
A Python tool for automatically scraping data on China's statutory holidays from government announcements.
A robust Python library for materials analysis and computational materials science.
Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.
A Go-based tool for database anonymization and synthetic data generation to help with security, QA, and data masking.
Get weekly updates on trending AI coding tools and projects.