Showing 1-12 of 12 projects
A curated list of Site Reliability and Production Engineering resources.
A curated collection of resources on how organizations practice Site Reliability Engineering (SRE)
A comprehensive guide to prepare for Site Reliability Engineer (SRE) interviews.
A Chaos Engineering Platform for Kubernetes
A curated list of Chaos Engineering resources for building resilient and fault-tolerant systems.
An easy-to-use chaos engineering experiment toolkit for fault injection and microservice testing.
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way for building resilient systems.
Chaos testing, network emulation, and stress testing tool for containers
HolmesGPT is an AI agent that helps SREs and DevOps teams solve incidents faster with automatic correlations, investigations, and more.
A web UI for the Jaeger distributed tracing system, built with React and JavaScript.
A curated list of Site Reliability and Production Engineering tools for developers.
A collection of postmortem templates for incident reporting and site reliability engineering.
Get weekly updates on trending AI coding tools and projects.