Showing 1-7 of 7 projects
A curated list of Site Reliability and Production Engineering resources.
A curated collection of resources on how organizations practice Site Reliability Engineering (SRE)
Open-source status page with uptime monitoring and API monitoring as code
Open-source platform for monitoring and observability, focused on incident management and on-call workflows.
HolmesGPT is an AI agent that helps SREs and DevOps teams solve incidents faster with automatic correlations, investigations, and more.
Oncall is a calendar tool for scheduling and managing on-call shifts for paging systems.
Incident response documentation and best practices from PagerDuty for managing on-call and security incidents.
Get weekly updates on trending AI coding tools and projects.