Category
Showing 301-350 of 897 trending projects
A JavaScript library that allows you to run SQLite on the web, enabling local database functionality for web apps.
A tutorial for writing a SQLite clone from scratch in C, a useful resource for developers building database-backed applications.
A Rust-based implementation of an LSM-Tree storage engine (database) for developers to build and learn from.
A Python library for portfolio optimization using scikit-learn and convex optimization techniques.
An interactive tutorial for the Dask distributed computing library, focused on data analysis and manipulation.
ORM for Node.js/TypeScript with multiple database support
A high-quality, cross-platform data plotting library for Rust developers, including WebAssembly support.
Lightweight and extensible compatibility layer between popular dataframe libraries like Pandas, Dask, and PySpark.
QueryKit is a simple CoreData query language for Swift and Objective-C developers.
Transporter is a powerful ETL tool that allows developers to sync data between various persistence engines.
A simple Objective-C library that provides a one-line CRUD interface for SQLite databases on iOS.
A fast, in-memory B-tree implementation for sorted collections in Swift.
This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.
A Python library for creating beautiful visualizations of language differences across document types.
A powerful Python package to manage and work with extremely large amounts of data.
A Go driver for the ClickHouse analytics database, enabling fast and efficient data processing.
Fast, single-binary C++ SQL ETL pipeline for stream processing, observability, analytics, and AI/ML.
Apache Parquet Format, a columnar data storage format used in the Apache Hadoop ecosystem.
A transactional, relational-graph-vector database that uses Datalog for query, designed for AI and ML use cases.
MetricFlow allows developers to define, build, and maintain metrics in code for business intelligence and analytics.
A comprehensive guide to big data technologies like Hadoop, Spark, Kafka, and more for developers.
A C++ library for reading and writing .npy and .npz files, commonly used in scientific computing.
A lightweight, document-oriented database optimized for happiness, used as a Python library or CLI.
A Python library for 3D plotting and mesh analysis using the Visualization Toolkit (VTK)
Scripts to download genomes from the NCBI FTP servers for bioinformatics and genomics research.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
Modern database IDE for dev & data workflows, supporting MySQL, PostgreSQL & MongoDB.
A scalable, SQL-based streaming analytics platform from Uber, built on top of Apache Flink.
A high-performance, persistent, off-heap data structure written in Clojure for data-intensive applications.
MyBatis SQL Mapper for Java simplifies database interactions with object mapping.
OpenRefine is a powerful data cleaning and transformation tool that helps developers work with messy data.
mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.
A Python library for survival analysis, useful for developers working with time-to-event data.
Apache Beam is a unified programming model for batch and streaming data processing.
Xodus is a transactional, schema-less embedded database used by JetBrains products like YouTrack and Hub.
A public dataset of daily COVID-19 cases and deaths per country, useful for data analysis and visualization.
MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.
Pentaho Data Integration (ETL) is a Java-based tool for building data integration and ETL pipelines.
A Rust library that provides multi-writer and CRDT support for SQLite databases.
An acoustic spectrum analyzer library written in C++ for audio analysis and visualization.
A collection of simple tools for data cleaning and wrangling in R for data science tasks.
A scalable, distributed ETL framework for building data lake analytics pipelines.
Lightweight local JSON database for JavaScript/TypeScript apps
A comprehensive collection of geospatial tools and resources for data analysis, machine learning, and spatial applications.
A SQL database explorer supporting multiple database engines like SQLite, PostgreSQL, and MySQL.
A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.
This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.
The LevelDB key-value database in the Go programming language.
A Rust library for interacting with Delta Lake, a data lake storage format, with Python bindings.
A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.
Get weekly updates on trending AI coding tools and projects.