Category
Showing 751-800 of 897 trending projects
Dozer is a real-time data movement tool that leverages CDC to move data between various sources and sinks.
A curated list of resources for graph databases and graph computing tools, useful for developers working with graph-based data.
This is an astronomy visualization project that maps orbits of asteroids in the solar system.
A repository for collecting study materials and resources related to data analysis and related fields.
A Jupyter Notebook repository focused on time series analysis using Python, likely not targeted at vibe coders.
Cloud-native, MySQL-compatible, AI-ready database with Git for Data, vector search, and full-text search capabilities.
Xodus is a transactional, schema-less embedded database used by JetBrains products like YouTrack and Hub.
Fiona is a Python library for reading and writing geographic data files, with support for CLI usage.
A high-level geospatial data visualization library for Python developers working with spatial data.
Apache Accumulo is a scalable and robust key-value store that provides a sparse, sorted, distributed, and persistent multi-dimensional table.
A library of functional, durable data structures written in Java for developers building robust applications.
A Python library for extracting, transforming, and loading tabular data.
A Python library for creating beautiful visualizations of language differences across document types.
The Data Transfer Project enables direct transfer of user data between online service providers.
Koalas is a pandas-like API for Apache Spark, enabling data scientists to work with big data using familiar pandas syntax.
Python demos for spatial data analytics, geostatistics, and machine learning to support courses.
This repository provides best practices and examples for building ETL (Extract, Transform, Load) pipelines using Apache Airflow.
A fast C++ library for high-performance matrix and vector operations.
A JavaScript library that provides a NumPy-like interface for working with multi-dimensional arrays and matrices.
A Python package for handling messy CSV files with improved dialect detection and a command-line interface.
An R package for training and plotting classification and regression models.
A scalable, distributed ETL framework for building data lake analytics pipelines.
A high-performance datastore for time series and tick data built on top of MongoDB.
PaxosStore is a high-performance, distributed database solution built for large-scale applications.
An R project focused on providing high-performance statistical models, data analysis, and visualization tools.
A simple JSON data set of country information, useful for building apps that need country data.
A collection of monthly reports on the internals of Alibaba Cloud's database products.
COVID-19 data repository for developers, providing daily updated case, death, and testing information.
Performant probabilistic data structures for processing continuous, unbounded streams in Go.
Diagrams and documentation for InnoDB, the storage engine used by MySQL and MariaDB databases.
tidyr is an R package that provides a set of functions to tidy messy data into a format suitable for analysis.
A TypeScript toolkit for data transformation and analysis inspired by Pandas and LINQ.
Trill is a single-node query processor for temporal or streaming data.
A PHP library for dumping the contents of a database to a file, supporting multiple database engines.
A functional, type-safe, composable Scala data access library for Postgres databases.
A Java connector for integrating MongoDB with Hadoop ecosystems for big data processing.
A collection of articles and source code on using the pandas data analysis library.
A fast, hierarchical key-value storage engine written in C++ for applications that require high performance and scalability.
A pure Python library for reading and writing ESRI Shapefiles, a popular geospatial data format.
A high-performance C++ linear algebra library focused on solvers, sparse matrices, and numerical computing.
This repository provides comprehensive tutorials and resources for learning data science and machine learning using Python.
A powerful Python package to manage and work with extremely large amounts of data.
A repository of NumPy exercises for developers looking to improve their Python and data manipulation skills.
A collection of simple tools for data cleaning and wrangling in R for data science tasks.
A corpus of company names, abbreviations, and brands that can be used for Chinese text segmentation and entity recognition.
A PHP library that provides a MySQL backup functionality, similar to the mysqldump CLI tool.
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks
AI-native database unifying vector, text, and structured data for hybrid search and in-database AI workflows.
A Python-based image processing framework with plugins for common image processing libraries.
A JavaScript statistical library that provides a wide range of statistical functions for data analysis.
Get weekly updates on trending AI coding tools and projects.