Category
Showing 751-800 of 897 trending projects
A high-performance B-tree implementation for Go, useful for building database-like applications.
NFStream is a flexible network data analysis framework for network monitoring, security, and traffic classification.
A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.
A Python data analysis library optimized for humans instead of machines.
A scalable, distributed ETL framework for building data lake analytics pipelines.
Open-source massively parallel processing (MPP) database, an alternative to Greenplum.
A comprehensive Julia library for probability distributions and related statistical functions.
A C# NuGet package that provides technical indicators and trading insights for financial market data analysis.
MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.
This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.
Python library for using dplyr-like syntax with pandas and SQL databases
Distributed, massively parallel SQL query engine for big data analytics and timeseries workloads.
PoloDB is an embedded document database written in Rust for building cross-platform, local-first applications.
An R package that provides customizable and presentation-ready data summary and analytic result tables.
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
A curated list of Google Earth Engine resources for geospatial analysis and remote sensing applications.
A comprehensive enrichment analysis tool for interpreting omics data, with support for GO, KEGG, and more.
A Python library for portfolio optimization and back-testing in finance.
A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.
A Python library that summarizes news articles by extracting the most important sentences.
A Python package for processing earth-observing satellite data with support for common data formats and tools.
A repository of public data sources for building and testing recommender systems.
A versatile Python library for bioinformatics, providing data structures, algorithms, and educational resources.
A fast, efficient C extension for NumPy that provides optimized array functions.
A Swift extension for RealmSwift that provides reactive programming support using RxSwift.
A PHP library for dumping the contents of a database to a file, supporting multiple database engines.
Apache XTable is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Mondrian is an OLAP server that enables real-time analysis of large data sets for business users.
A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.
A collection of code snippets and tutorials for data science and data analysis in Python.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
This GitHub repository provides time series data on COVID-19 cases, useful for data analysis and visualization.
This is a Python library focused on basketball analytics and data processing.
A beginner-friendly Python toolkit for financial data extraction, analysis, and automation.
The LevelDB key-value database in the Go programming language.
A free and easy-to-use .NET library for reading and writing CSV and fixed-length data files.
An R project focused on providing high-performance statistical models, data analysis, and visualization tools.
A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.
A simple JSON data set of country information, useful for building apps that need country data.
A collection of open data sets and tools for data science and machine learning tasks.
A comprehensive knowledge hub for data engineering, machine learning, and MLOps tools and practices.
GraphFrames provides DataFrame-based Graphs for Apache Spark, enabling scalable graph analysis and algorithms.
Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.
Comprehensive roadmap for data engineering and AI development in Python
A DICOM to NIfTI converter for medical imaging research and neuroimaging applications.
DataLink is a real-time and offline data exchange platform that supports synchronization between heterogeneous data sources.
An ORM for RethinkDB that provides an elegant and intuitive API for interacting with the database.
A library for generating MaxMind GeoIP2 databases for China IP addresses.
This GitHub repository provides tutorials on effectively using the Pandas library for data analysis.
A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.
Get weekly updates on trending AI coding tools and projects.