503 access errors
You must be logged in to post a reply. If you don't have an account, you can register.
|dbrid||Was anyone else unable to access the site after 15:00 Utc on Sept 14th?|
|lightvector||The site went down, yes. An email announcement acknowledging this would have been helpful, and the lack of one definitely caused a uncertainty and confusion at the time (there were threads about it on life in 19 and reddit)...
... except that I think the email system had recently broken and still is at least partly broken for reasons still being investigated. (In the recent week or two, comments notifications and announcements of new lectures have been absent or heavily backed-up).
The site team is very small - to my knowledge, there's literally only one person on the tech/infrastructure/web-coding front. So although unfortunate it seems pretty understandable - as far as I know, it might not have even been possible to send an email announcement right now.
Yes, the site was down for many hours. Guo Juan and I did everything we could think of at the time, but the timing was unfortunate. Our Webmaster had just left for the mountains for the night where there was no phone coverage.
There is, in fact, a backup webmaster who could have gotten us back into operation. I didn't call him until too late at night and couldn't reach him - my fault. In the morning, before he could even dig into it, the primary came back and got us back online.
We did think to send out an email, but our email lists are on the server and we couldn't access them either.
We have made a plan with the programmer to have *more* backup plus separate email lists so that we can get an email out if necessary in the future. It's not done yet but will be soon.
The site breakage was caused by some hacker running a robot doing rapid fire page loads (Whether intentional or not, it was basically a DOS attack). There wasn't even anything non-public (ie require being logged in) on the pages being loaded. We have a tool in place that should catch and block this, but it didn't. It needed a small re-configuration to handle the particular attack. This is done now.
We've been grateful for the understanding and encouragement we've received on the heels of this. There's always something new to learn for us too.
Post a reply