Category
Showing 301-350 of 897 trending projects
A Go-based tool for database anonymization and synthetic data generation to help with security, QA, and data masking.
A Python library for conveniently reading data from the Tongdaxin financial data platform.
This is a book that teaches how to use Apache Spark for lightning-fast data analytics.
A library for text mining and natural language processing using tidy data principles in R.
MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.
This R library provides historical investment returns analysis for the overall stock market.
MySQL Connector/J is a JDBC driver that enables Java applications to connect to MySQL databases.
Nebula is a fast, open-source, distributed graph database with horizontal scalability and high availability.
An open-source N-body simulation library for astrophysics and planetary science.
A Python library for quantitative trading and stock analysis.
A fast and efficient C++ hash map and hash set implementation using robin hood hashing.
A Python library providing multivariate imputation and matrix completion algorithms.
Python library for using dplyr-like syntax with pandas and SQL databases
A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.
A collection of code snippets and tutorials for data science and data analysis in Python.
A PostgreSQL sample database for testing and learning SQL queries.
Sequel is a Ruby library that provides a powerful and flexible object-relational mapping (ORM) for databases.
Extremely fast, easy to use, and fully async NoSQL database for Flutter apps
Anatomy of Matplotlib tutorial for SciPy conference, focused on data visualization for scientific computing.
Fast, cost-effective data replication tool from Postgres to data warehouses, queues, and storage
A Python tool for automatically scraping data on China's statutory holidays from government announcements.
Unified cloud-native data warehouse platform for analytics, search and AI, built on top of S3 storage.
A Python library for financial portfolio optimization, including classical efficient frontier and advanced techniques.
A Python library that provides a tour of the wonderland of math with visualizations and algorithms.
A searchable compilation of Kaggle past solutions for data science and machine learning developers.
Statsmodels is a Python library for statistical modeling and econometrics, providing tools for data analysis and prediction.
A Python-based image processing framework with plugins for common image processing libraries.
lakeFS is a Git-like version control system for data lakes, enabling data engineers to manage data versioning and data quality.
A Redis-compatible database implemented in Go, supporting SQL and multiple backends like PostgreSQL and SQLite.
Fast, accurate, and scalable probabilistic data linkage with support for multiple SQL backends.
An automatic database ORM library for Objective-C that provides thread-safe and deadlock-free database operations.
A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.
A Python library for extracting schema, statistics, and entities from datasets, useful for data profiling and privacy analysis.
A tool to easily import CSV and JSON data into PostgreSQL databases.
This is a C++ repository for a Kaggle competition in 2014, not a developer discovery platform.
Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.
This repository contains efficient tools for LiDAR processing, focused on working with point cloud data.
SheetJS Spreadsheet Data Toolkit for data extraction and spreadsheet generation.
Kibana is an open-source data visualization and management tool for Elasticsearch
An open-source data lakehouse framework that enables building data pipelines with leading big data compute engines.
EJDB2 is an embeddable JSON database engine with a simple XPath-like query language (JQL) for C/C++ applications.
A port of Great Expectations to dbt test macros for data testing and validation in data engineering workflows.
A collection of Unix, R, and Python tools for bioinformatics and data science projects.
An R project focused on providing high-performance statistical models, data analysis, and visualization tools.
Optimized Roaring bitmaps in C and C++ with SIMD (AVX2, AVX-512, NEON) for high-performance data processing.
A pure Go library for reading and writing Parquet files, a columnar data format.
A simple embedded database library in Rust modeled after SQLite, useful for Rust projects.
A comprehensive cookbook for data engineers, covering best practices, big data, and data engineering concepts.
A data platform that enables building data pipelines with SQL, Python, and ingesting from various sources.
An ORM for RethinkDB that provides an elegant and intuitive API for interacting with the database.
Get weekly updates on trending AI coding tools and projects.