Category
Showing 101-150 of 897 trending projects
dbt enables data analysts and engineers to transform data using software engineering practices.
Open-source relational database engine powering web apps, APIs, and data-driven backends worldwide.
Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.
OpenRefine is a powerful data cleaning and transformation tool that helps developers work with messy data.
Python wrapper for the TA-Lib technical analysis library, useful for financial pattern recognition.
A curated list of awesome PostgreSQL software, libraries, tools and resources.
WCDB is a cross-platform database framework developed by WeChat for Android, iOS, Linux, macOS, and Windows.
An open-source metadata platform for managing your data and AI stack across the enterprise.
Realm is a mobile database that serves as a replacement for SQLite and ORMs.
A high-performance open source query engine for sub-second analytics on data lakehouse.
Statsmodels is a Python library for statistical modeling and econometrics, providing tools for data analysis and prediction.
A Python library that helps ensure data quality and reliability through data profiling and testing.
An open-access book on scientific visualization using Python and Matplotlib for data-driven developers
This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.
An open-source multi-tool for exploring and publishing data, focused on simplifying data analysis and sharing.
Kedro is a Python toolkit for building production-ready data science and machine learning pipelines.
PRQL is a modern, powerful, and pipelined SQL replacement for transforming data.
A comprehensive list of learning materials to help developers understand database internals.
DiceDB is an open-source, fast, reactive, in-memory database optimized for modern hardware.
A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.
Modin: Scalable Pandas workflows with a single line of code change, enabling distributed data processing.
A tutorial for writing a SQLite clone from scratch in C, a useful resource for developers building database-backed applications.
A type-safe, Swift-language layer over SQLite3 for building database-backed Swift applications.
A fast, scalable, and distributed database for transactional, analytical, and AI workloads.
A repository of data science interview questions and answers for developers.
A flexible and standardized cookiecutter template for doing and sharing data science work in Python.
A PHP database abstraction layer that provides a simple, consistent API for interacting with different database systems.
Apache Cassandra is a distributed, wide-column store database system designed for high availability, scalability, and performance.
A high-performance GPU DataFrame library for data analysis and machine learning workloads.
LiteDB is a lightweight, embedded NoSQL document database for .NET applications that can be used in a single data file.
A comprehensive database of countries, states, and cities with data in multiple formats
Unified cloud-native data warehouse platform for analytics, search and AI, built on top of S3 storage.
A high-performance Python library for data extraction, analysis, conversion and manipulation of PDF and other documents.
A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.
Official Git mirror of the SQLite source tree, a popular and widely-used embedded database engine.
A lightweight SQLite3 driver for Go that implements the database/sql interface.
A high-performance, concurrent, embedded key-value database written in Rust for vibe coders.
Grid Studio is a web-based application for data science with full integration of open source data science frameworks and languages.
A unified metadata platform for data discovery, data observability, and data governance.
A Chinese translation of a popular book on using Python for data analysis with libraries like pandas and numpy.
OrbitDB is a peer-to-peer database for the decentralized web, enabling developers to build offline-first, distributed applications.
Lightning-fast in-process vector DB for RAG & semantic search in C++
mage-ai is a Python-based platform for building, running, and managing data pipelines and integrating/transforming data.
An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.
Apache Iceberg is an open-source table format for large analytic datasets, providing a versioned and scalable data lake architecture.
SSDB is a fast NoSQL database, an alternative to Redis, with support for leveldb and rocksdb backends.
Apache Beam is a unified programming model for batch and streaming data processing.
A high-performance Python library for working with large tabular datasets, offering efficient data manipulation and visualization.
Apache DataFusion is a powerful SQL query engine written in Rust, designed for big data processing and analysis.
A Rust-based, Elasticsearch-quality search engine for PostgreSQL, enabling fast, real-time analytics and HTAP use cases.
Get weekly updates on trending AI coding tools and projects.