Category
Showing 401-450 of 897 trending projects
A comprehensive enrichment analysis tool for interpreting omics data, with support for GO, KEGG, and more.
A Go library with types and utilities for working with 2D geometry, geospatial data, and mapping.
A high-performance Java library for data analysis, visualization, and machine learning.
Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.
PyPika is a Python SQL query builder that provides a readable, Pythonic syntax for constructing complex SQL queries.
Fast in-memory cache library for Go with low GC overhead, optimized for a large number of entries.
Powerful plotting and data visualization library for the Julia programming language.
A comprehensive repository covering big data knowledge, including data warehouse modeling, real-time computing, Hadoop, Spark, and more.
A fast and accurate short-read sequence aligner written in C for genomics applications.
Open source hot backup tool for InnoDB and XtraDB databases
PySAL is a Python Spatial Analysis Library meta-package for geographical data analysis and modeling.
Rust-based bindings for the NumPy C-API, enabling developers to leverage Rust for numerical computing.
A Java-based framework for building agile DataOps pipelines using tools like Flink, DataX, and Chunjun with a web UI.
A C# NuGet package that provides technical indicators and trading insights for financial market data analysis.
An R package that provides customizable and presentation-ready data summary and analytic result tables.
Apache XTable is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
A comprehensive collection of resources and learning materials for big data technologies like Flink, Spark, Hadoop, and Hive.
Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.
Amundsen is an open-source data discovery platform for improving productivity of data analysts and engineers.
A highly scalable, distributed, document-oriented NoSQL database with full-text search, spatial, and time-series support.
A comprehensive dataset of ISO 3166-1 country codes and their corresponding UN Geoscheme regional codes, ready to use in various formats.
Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.
A powerful C library for analyzing complex networks and graph-based data structures.
A flexible and powerful SQL string builder library plus a zero-config ORM for Go developers.
A standard filetree template for data curation and organization, useful for developers interested in data management.
A curated list of awesome materials and resources for database development.
A high-performance compression library written in C for developers working with large data sets.
MetPy is a Python library for reading, visualizing, and performing calculations with weather data.
A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.
PDAL is a C++ library for processing point cloud data, similar to GDAL for raster data.
Useful scripts, UDFs, views, and other utilities for migration and data warehouse operations in BigQuery.
A Python library with data related to Brazilian municipalities, including IBGE codes, latitude, longitude, and more.
A high-performance B-tree implementation for Go, useful for building database-like applications.
Overture Maps Data is a Python library providing access to open-source geographic data.
An intuitive library to extract features from time series data for data science and machine learning.
A repository of open-source data sets created for stories on The Pudding, a digital publication focused on data journalism.
A specification for storing geospatial vector data (point, line, polygon) in the Parquet file format, enabling efficient cloud-native geospatial data processing.
AlaSQL is a JavaScript SQL database for browser and Node.js that handles both relational tables and nested JSON data.
A comprehensive set of Python notes and resources for developers, covering a wide range of topics including data science, machine learning, and scientific computing.
An ORM (Object-Relational Mapping) library for .NET that supports a wide range of database providers, including SQL Server, MySQL, PostgreSQL, and more.
A C++ library for multidimensional array operations with broadcasting and lazy computing.
A Python database adapter for PostgreSQL, allowing developers to interact with their databases.
A simple, fast, and embeddable key-value store written in Go that supports transactions and data structures.
PyWavelets is a Python library for wavelet transform algorithms and techniques, useful for image and signal processing.
A Python library for accessing the HDF5 binary data format, a popular format for scientific and numerical data.
ggstatsplot is an R library that enhances ggplot2 visualizations with statistical analysis and hypothesis testing.
A curated list of resources for machine learning-based algorithmic trading and quantitative finance.
Simple Python interface for Graphviz, a popular open-source data visualization tool.
A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.
Dremio is an open-source data analytics platform that simplifies and accelerates big data analysis.
Get weekly updates on trending AI coding tools and projects.