Category
Showing 751-800 of 897 trending projects
Open-source massively parallel processing (MPP) database, an alternative to Greenplum.
A comprehensive Julia library for probability distributions and related statistical functions.
A C# NuGet package that provides technical indicators and trading insights for financial market data analysis.
This repository provides the official Apache Spark documentation in Chinese, a popular big data processing framework.
MongoHub is a native macOS MongoDB client that provides a GUI for managing and interacting with MongoDB databases.
Python library for using dplyr-like syntax with pandas and SQL databases
Distributed, massively parallel SQL query engine for big data analytics and timeseries workloads.
PoloDB is an embedded document database written in Rust for building cross-platform, local-first applications.
An R package that provides customizable and presentation-ready data summary and analytic result tables.
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
A Python library that provides a Predictive Power Score (PPS) to measure the predictive power between variables.
A Python library that summarizes news articles by extracting the most important sentences.
A fast, efficient C extension for NumPy that provides optimized array functions.
A Swift extension for RealmSwift that provides reactive programming support using RxSwift.
A PHP library for dumping the contents of a database to a file, supporting multiple database engines.
Apache XTable is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
A time series forecasting library for R, providing a wide range of models and tools for accurate predictions.
This GitHub repository provides time series data on COVID-19 cases, useful for data analysis and visualization.
The LevelDB key-value database in the Go programming language.
A free and easy-to-use .NET library for reading and writing CSV and fixed-length data files.
An R project focused on providing high-performance statistical models, data analysis, and visualization tools.
A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.
A simple JSON data set of country information, useful for building apps that need country data.
A collection of open data sets and tools for data science and machine learning tasks.
Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.
Comprehensive roadmap for data engineering and AI development in Python
DataLink is a real-time and offline data exchange platform that supports synchronization between heterogeneous data sources.
A library for generating MaxMind GeoIP2 databases for China IP addresses.
This GitHub repository provides tutorials on effectively using the Pandas library for data analysis.
A Swiss army knife for big data, enabling seamless integration with popular data warehousing solutions.
Apache Amoro is an open-source Lakehouse management system built on big data formats like Flink, Hudi, and Iceberg.
A curated list of resources for the Hadoop ecosystem, not a developer discovery platform focused on vibe coders.
Kylo is an enterprise-grade data lake management platform built on big data technologies like Spark and Hadoop.
An open-source platform for building and sharing datasets, focused on trust, privacy, and decentralization.
A library for calling Python functions from the Ruby language, enabling data science and ML workflows.
Connect processes into powerful data pipelines with a simple git-like filesystem interface
Overture Maps Data is a Python library providing access to open-source geographic data.
A curated list of Twitter datasets and resources for data scientists and social network analysts.
Mycelite is a SQLite extension that enables replication between SQLite instances.
A Go library with types and utilities for working with 2D geometry, geospatial data, and mapping.
Contextualise is a powerful tool for organizing diverse information resources in knowledge-intensive projects.
A collection of monthly reports on the internals of Alibaba Cloud's database products.
A high-quality dataset repository for building recommender systems, useful for vibe coders working on AI-powered applications.
A collection of data science, machine learning, and web development project code for Dataquest's YouTube channel.
TrailDB is an efficient database for storing and querying series of events.
An intuitive library to extract features from time series data for data science and machine learning.
This repository provides a comprehensive guide and implementations for data algorithms using MapReduce, Spark, Java, and Scala.
A repository containing various NLP datasets collected and organized by the owner.
A Python library for arbitrary-precision floating-point arithmetic, providing advanced numerical capabilities.
A simple embedded database library in Rust modeled after SQLite, useful for Rust projects.
Get weekly updates on trending AI coding tools and projects.