Category
Showing 401-450 of 897 trending projects
This repo contains a list of the 10,000 most common English words, useful for NLP and language modeling tasks.
A curated list of awesome resources for network analysis and visualization, with a focus on R tools.
A comprehensive Python library for modeling and forecasting financial time series data using ARCH models.
Maestro is Netflix's workflow orchestrator for building data pipelines and batch processing workflows.
A Python module for extracting and mapping Chinese province, city, and district data.
A dataset of cluster data collected from Alibaba's production clusters for cluster management research.
An efficient and compressed N-dimensional array library for Python, useful for data scientists and ML engineers.
SSDB is a fast NoSQL database, an alternative to Redis, with support for leveldb and rocksdb backends.
A comprehensive Python library for color science and color space conversions.
Starter code for working with the YouTube-8M dataset, a large-scale video understanding dataset.
An open-source PostgreSQL client application for macOS, providing an easy way to set up and manage a local PostgreSQL database.
Bytewax is a Python library for building scalable, fault-tolerant, and low-latency data processing pipelines.
An open-source C++ framework for fast and parallel map matching of GPS trajectories.
A highly scalable, distributed, document-oriented NoSQL database with full-text search, spatial, and time-series support.
A fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more.
A fast, flexible, ocean-flavored fluid dynamics library for climate and ocean modeling on CPUs and GPUs.
Irmin is a distributed database that follows the same design principles as Git, allowing for distributed version control of data.
MetricFlow allows developers to define, build, and maintain metrics in code for business intelligence and analytics.
A visual data preparation tool powered by Python, designed for data analysis and ETL tasks.
A collection of data science take-home challenges and solutions implemented in Jupyter Notebooks.
AWS Glue code samples for building data integration and ETL pipelines on AWS.
A comprehensive Go library for working with Cassandra/Scylla databases, providing a query builder, ORM, and migration tool.
Mimesis is a fast Python library for generating fake data in multiple languages for testing and development purposes.
A comprehensive guide to feature engineering and feature selection techniques in Python, with examples.
A Go driver for the ClickHouse analytics database, enabling fast and efficient data processing.
A C++ library for multidimensional array operations with broadcasting and lazy computing.
A curated collection of open-source Chinese medical NLP resources including datasets, models, and more.
Nessie is a transactional data catalog for data lakes that provides Git-like semantics and functionality.
Utility functions for dbt projects, a popular data transformation tool for data engineers.
Scalable, low-latency vector search in Postgres, revolutionizing vector search and databases.
This is a collection of readings and resources related to databases, not a vibe coder platform.
Apache HBase is a distributed, scalable, fault-tolerant database for large datasets built on top of HDFS.
Apache Spark and Python tutorials for big data analysis and machine learning as Jupyter notebooks.
A powerful data visualization and plotting library for the Julia programming language.
An extensible framework for linking databases and interactive views, focused on scalability and visualization.
A dbt adapter for the DuckDB database, enabling developers to build data pipelines and models with dbt.
A Rust library that provides persistent data structures for efficient and immutable data management.
A Python library for pulling current and historical baseball statistics, including Statcast, Baseball Reference, and FanGraphs data.
A curated list of resources for machine learning-based algorithmic trading and quantitative finance.
A robust Python library for materials analysis and computational materials science.
Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.
A powerful C library for analyzing complex networks and graph-based data structures.
Notebooks for financial economics, including analyses of Federal Reserve, GDP, inflation, and more.
An open-source dev data platform to ingest, analyze, and visualize data from DevOps tools for engineering insights.
A Python library providing SQL views for Dune Analytics, a popular blockchain data analysis platform.
Cloud-native genomic dataframes and batch computing for bioinformatics and genetics research.
Educational notebooks on quantitative finance, algorithmic trading, financial modeling, and investment strategy.
This repository provides a comprehensive JSON dataset containing metadata on anime series, movies, and cross-references to various anime sites.
Alluxio is an open-source data orchestration platform for analytics and machine learning workloads in the cloud.
Get weekly updates on trending AI coding tools and projects.