Category
Showing 601-650 of 897 trending projects
Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.
Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.
An ORM (Object-Relational Mapping) library for .NET that supports a wide range of database providers, including SQL Server, MySQL, PostgreSQL, and more.
PyWavelets is a Python library for wavelet transform algorithms and techniques, useful for image and signal processing.
TuGraph-DB is a high-performance graph database built for fast and efficient graph data processing.
SQL Lineage Analysis Tool that provides data discovery and governance insights through Python.
Open source hot backup tool for InnoDB and XtraDB databases
Open source SQL query assistant service for databases and data warehouses
A Java-based framework for building agile DataOps pipelines using tools like Flink, DataX, and Chunjun with a web UI.
A comprehensive enrichment analysis tool for interpreting omics data, with support for GO, KEGG, and more.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
PDAL is a C++ library for processing point cloud data, similar to GDAL for raster data.
Archive, search, and analyze your entire email/chat history offline with DuckDB-powered analytics and AI queries.
A Python library for portfolio optimization and back-testing in finance.
Fluent Migrator is a .NET migration framework for managing database schema changes across multiple database providers.
Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.
This repository provides a comprehensive dataset of over 850,000 Chinese poems from ancient to modern times, making it a valuable resource for developers working with Chinese poetry.
A curated collection of resources related to image registration, including books, papers, videos, and toolboxes.
An in-memory key-value store using Python's orjson module for persistence, with SQLite support.
Druid is a high-performance database connection pool for Java applications, designed for monitoring and management.
An open-source graph database written in Go, useful for building applications that require linked data and graph-based queries.
Koalas is a pandas-like API for Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
A curated collection of resources for data science and machine learning enthusiasts.
RBush is a high-performance JavaScript R-tree-based 2D spatial index for points and rectangles.
Official code repository for the Genome Analysis Toolkit (GATK), a bioinformatics library for working with next-generation DNA sequencing data.
A tutorial for performing statistical data analysis using Python, covering topics like regression, hypothesis testing, and more.
PySAL is a Python Spatial Analysis Library meta-package for geographical data analysis and modeling.
Rust-based bindings for the NumPy C-API, enabling developers to leverage Rust for numerical computing.
A Rust library that enables querying Excel spreadsheets using SQLite, making data extraction and analysis more efficient.
Educational notebooks on quantitative finance, algorithmic trading, financial modeling, and investment strategy.
A Python package for analyzing heart rate data from PPG and ECG signals.
A simple Python library for creating dataclasses from dictionaries.
A C# NuGet package that provides technical indicators and trading insights for financial market data analysis.
Core database component for the Realm Mobile Database SDKs, a popular NoSQL database for mobile apps.
Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents.
A Python database adapter for PostgreSQL, allowing developers to interact with their databases.
CrateDB is a distributed, scalable SQL database for storing and analyzing massive amounts of data in near real-time.
An educational OLAP database system built in Rust for learning and experimentation.
A modern, embedded SQL database written in Go for embedded and mobile applications.
A PyTorch library for data loading and utility functions shared across PyTorch domain libraries.
A collection of SQL practice problems for developers to improve their SQL skills.
A Python library that syncs data from Postgres to Elasticsearch/OpenSearch, enabling real-time data pipelines.
DataSphereStudio is a one-stop data application development and management portal covering data exchange, analysis, and visualization.
This is a big data analysis system for the Shenzhen metro with support for various data processing tools.
A JavaScript library for efficient querying and transformation of array-backed data tables.
A parallel corpus of classical Chinese and modern Chinese texts for language processing and analysis.
This is a book that teaches how to use Apache Spark for lightning-fast data analytics.
PoloDB is an embedded document database written in Rust for building cross-platform, local-first applications.
Apache XTable is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
An advanced geospatial data analysis platform for tasks like geomorphology, hydrology, and remote sensing.
Get weekly updates on trending AI coding tools and projects.