Web Server Outage Response Protocol
This page specifies the response protocol to be followed should you become aware of a web site outage.
Joss, Michael, Dave, Lonnie, Karen and Ben are all responsible for responding to web service outages when you notice it is down. This list of people shall be placed on the recipient list of whatever web site monitoring service Rare Form media selects to use, and on 24x7 if we continue to use that. Thus, one way you might learn that the website is down is by receiving an email from the monitoring service.
Should you become aware of a web-site outage, the protocol to follow is as follows:
- Check your email to see if one of us is already responding to it.
- If so, take no further action until asked.
- If no one has claimed it, send an email to this group saying that you are responding to it (so we don't step on each other's toes).
- If you have the know-how and ability, log into the server and try to diagnose the problem, and collect forensic info. Restart services or reboot from the Linux command line, if possible. (I don't expect this from Karen or Ben).
- When #2 is not an option or doesn't work, go to https://console.cloud.google.com / Compute Engine / VM instances / Project = Main Web site, and from the ⋮ menu for lumina-com-wp-vm-1, select "reset".
- Wait one minute, then check that the website is functional.
- If the website continues to fail, and you need to escalate, then do your best to escalate to the appropriate person -- Joss or Michael during UK hours, Dave before Lonnie when Dave is available, and when escalating to Lonnie call, don't email.
Comments
Enable comment auto-refresher