 |
General News |
|
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
About
EnterTheGame |
|
 |
|
 |
 |
|
 |
 |
|
 |
 |
EnterTheGame
Realms |
|
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
IRC
Network Information |
|
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
IRC
Help and Guides |
|
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
 |
|
 |
|
 |
Explaination of Recent Downtime |
 |
June 16th, 2007 |
 |
|
Greetings,
I would like to take a moment to further explain our recent downtime (in the past 1.5 weeks).
We have been doing some major upgrades to our infrastructure the past couple weeks. We invested in a large managed switch to replace our large unmanaged one we had been using the past 5 or so years. We invested in remote power cycling equipment to be able to cycle power on servers remotely, and we also spent a bit of time to get WOL functionality setup and tested on ChanServ`s main server.
All of these investments and this and other work was to improve our overall uptime and hasten our ability to respond to any downtime. The overall outages we were having during this time was a couple 1-2.5 hour outages during installations and testing.
We still have not finished with all of this work and will continue working on it soon, but put everything on hold after a major stint of downtime we had on Wed 6/13/07. This downtime was completely unexpected and uncontrolable by us.
Long story short the morons at AT&T messed up one their backbone routers affecting many of their customers in our area, giving all of us an inability to connect to the majority of the Internet including most high-traffic sites such as Yahoo etc. Our main office & ChanServ got caught in this mess and likewise both were unable to connect to the rest of ETG as well. The outage was immediately noticed, pages went out, phone calls were made, and staff was dispatched to the office to see what was wrong. Sadly upon getting there we discovered it was a bad internet outage. We had other plans in place if it dealt with power, but this was something we didn`t have a good answer for.
We called AT&T around 12:30 and talked to them the majority of the early morning hours. After struggling for a long time to get past Tier 1 techs who only knew how to read off questions and answers off a flowchart, we finally got to Tier 2 support that wasn`t much more helpful. They didn`t take our outage very seriously and considered it something needing a machine or router restart on our side despite us saying there had been no changes and the problem layed with them.
We were in the office until around 5:30am when we finally gave up getting AT&T to listen and decided to try again in a couple hours.
Around 6-7am AT&T started getting calls from a number of its other business customers complaining of an outage. ONLY THEN did they get off thier butts and take it seriously enough to even LOOK TO SEE IF THERE WAS AN ISSUE and try to find the issue. This is just completely insane. I understand, yes you are a big company and normally if there is a problem on your side you will get alot of people calling in. But if it is not during standard business hours, guess what, people won`t be at thier office to see it and call in. So maybe, just MAYBE if you didn`t have lazy people, who instead would take the initiative when a problem is reported at 12:30am you wouldn`t have had an outage until 5:30pm.
Anyways, after they found the problem. Then they spent many hours trying to figure out how to fix what they messed up in the first place. =/ Rule #1 in router changes, backup your config before you make changes. If things don`t work you can revert to the old one. There is NO GOOD REASON that it would take almost 12 hours from when they realized there is a problem or about 18 from the second it was first reported to repair the problem. Especially for a big company like AT&T which definately had infrastucture in place they could have switched traffic off to while they repaired it.
Anyways, we will be contacting a customer rep this coming week to make sure our voice on this outage is heard well.
We are very sorry for this downtime, and we hope AT&T will put policies in place to actually investigate outages more proactively as a result. Thank you for your understanding.
VD-WHiZ |
|
 |
|
 |
3945 characters |
 |
|
In The Forums |
|