Category
Showing 251-300 of 897 trending projects
Meltano is a declarative, code-first data integration engine for building and scaling data and ML-powered products.
A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.
ArcticDB is a high-performance, serverless DataFrame database for the Python data science ecosystem.
A dataset of cluster data collected from Alibaba's production clusters for cluster management research.
A powerful C library for analyzing complex networks and graph-based data structures.
Official code repository for the Genome Analysis Toolkit (GATK), a bioinformatics library for working with next-generation DNA sequencing data.
The Feldera Incremental Computation Engine is a Rust-based library for building real-time data pipelines and materialized views.
Simple Python interface for Graphviz, a popular open-source data visualization tool.
An open-source, community-driven platform for data-intensive scientific analysis and visualization.
A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.
A curated list of awesome MATLAB frameworks, libraries, and software for scientific computing and data analysis.
An advanced ORM library for Java and Kotlin developers that provides powerful caching and data management features.
SQL Lineage Analysis Tool that provides data discovery and governance insights through Python.
A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.
PySAL is a Python Spatial Analysis Library meta-package for geographical data analysis and modeling.
A Python library that provides common financial risk and performance metrics used in financial analysis.
A powerful suite of sparse matrix algorithms and libraries for scientific and numerical computing.
Build vector tilesets from large collections of GeoJSON features.
Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.
tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.
Pongo is a MongoDB-compatible database that runs on top of PostgreSQL, offering strong consistency benefits.
A powerful Python package to manage and work with extremely large amounts of data.
A comprehensive collection of notes and resources for understanding different database technologies and concepts.
Rust-based bindings for the NumPy C-API, enabling developers to leverage Rust for numerical computing.
Fastest open-source data pipeline tool for replicating databases to data lakes in Apache Iceberg format.
The official C++ client API for PostgreSQL, providing a high-level interface for interacting with PostgreSQL databases.
An extensible framework for linking databases and interactive views, focused on scalability and visualization.
A web scraping tool for collecting data from Xiaohongshu, Bilibili, and other Chinese social platforms.
A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.
A Python package for easy access to financial market data in China for quantitative finance and FinTech applications.
A Python library for financial analysis and data scraping from the Finviz platform.
A repository of public data sources for building and testing recommender systems.
A collection of code snippets and tutorials for data science and data analysis in Python.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
A simple Windows desktop app for viewing and querying Apache Parquet files, a popular big data format.
A fast and scalable library for reading and writing spreadsheet files (CSV, XLSX, ODS) in PHP.
This GitHub repository contains notes and code for analyzing RNA-seq data using Python and Snakemake.
A curated list of resources for time series forecasting, including papers, code, and other materials.
This is a dataset of Borg cluster traces from Google, which can be useful for researchers and developers in the field of distributed systems and cloud infrastructure.
A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.
A large-scale open-access corpus of scientific papers and metadata for researchers and developers.
A comprehensive English word database with translations, parts of speech, and definitions for developers.
EasyDB is a lightweight desktop app that lets you query local CSV, Excel, and JSON files with SQL, without an external database.
Kibana is an open-source data visualization and management tool for Elasticsearch
MyBatis SQL Mapper for Java simplifies database interactions with object mapping.
dvc is a data versioning and ML experiments tool that helps developers manage and track data and model changes.
This is a comprehensive learning resource for the Flink stream processing framework, covering concepts, principles, and real-world use cases.
WCDB is a cross-platform database framework developed by WeChat for Android, iOS, Linux, macOS, and Windows.
A Python library that helps ensure data quality and reliability through data profiling and testing.
This repository contains code samples for SQL Server, Azure SQL, and related data services from Microsoft.
Get weekly updates on trending AI coding tools and projects.