Category
Showing 701-750 of 897 trending projects
PySpark-Tutorial provides basic algorithms using PySpark for big data analytics and data processing.
Apache Impala is a high-performance, open-source, SQL query engine that runs on Apache Hadoop and Apache Kudu.
Percona Server is an enhanced, open-source version of the MySQL database management system.
A fast C-based implementation of Dynamic Time Warping, a popular algorithm for comparing time series data.
A library for text mining and natural language processing using tidy data principles in R.
A library for time series analysis on Apache Spark, enabling efficient large-scale time series processing.
A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.
A Swift extension for RealmSwift that provides reactive programming support using RxSwift.
The LevelDB key-value database in the Go programming language.
A free and easy-to-use .NET library for reading and writing CSV and fixed-length data files.
A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.
A simple JSON data set of country information, useful for building apps that need country data.
Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.
This is a Docker container for running Apache Hive, a data warehousing tool for big data analysis.
A Python tool that automatically cleans and preprocesses data for analysis and machine learning.
A Go database/sql driver for the DuckDB database engine, enabling fast and efficient data processing.
A Rust library to work with the Arrow data format, without requiring the Transmute crate.
A Redis module that provides a time series data structure for storing and querying time series data.
A library that allows developers to use LINQ to retrieve data from spreadsheets and CSV files.
A Python library for data manipulation and analysis, part of the core data science toolkit.
CSV Data Source for Apache Spark 1.x, a Scala library for working with structured data.
Apache Phoenix is a scalable, distributed SQL engine that connects to HBase for low-latency queries.
A powerful Python library for record linkage and duplicate detection in data-driven applications.
A curated list of resources for time series forecasting, including papers, code, and other materials.
Provides Bayesian data analysis demos in Python for developers interested in probabilistic modeling.
HyperLogLog data structure library with space-efficient sparse and LogLog-Beta implementations.
ToplingDB is a cloud-native, distributed, and searchable key-value store built on RocksDB.
A Python library for data migration and transformation in the Blaze project.
Zeppelin is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents.
A collection of Jupyter Notebook files for data analysis using Python, including a Chinese translation of the popular 'Python for Data Analysis' book.
Extremely fast, easy to use, and fully async NoSQL database for Flutter apps
An interactive and reactive data science platform powered by Scala and Apache Spark.
A high-performance datastore for time series and tick data built on top of MongoDB.
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks
A Python library for creating data processing pipelines using functional programming principles.
A repository for collecting study materials and resources related to data analysis and related fields.
A composable data framework for building ambitious web applications using TypeScript.
A data warehouse for COVID-19 time series data, useful for data analysis and visualization.
A collection of Python code, notebooks, and examples for practical business data analysis and visualization.
This repository provides a comprehensive guide on optimizing MySQL performance and solving common database problems.
MongoDB data stream pipeline tools for managing real-time data synchronization and replication.
A C# library that converts Excel spreadsheets to JSON objects and saves them to a text file.
An interactive tutorial for the Dask distributed computing library, focused on data analysis and manipulation.
A columnar storage extension for Postgres built as a foreign data wrapper.
A collection of R packages for data science, including tools for data manipulation, visualization, and modeling.
A MongoDB schema analysis tool that helps developers understand and optimize their NoSQL database.
Fast n-dimensional filtering and grouping of records, a powerful data manipulation library for JavaScript.
Self-Driving Database Management System from Carnegie Mellon University
PaxosStore is a high-performance, distributed database solution built for large-scale applications.
A Python client library for interacting with the InfluxDB time-series database.
Get weekly updates on trending AI coding tools and projects.