Note to self: make no further references to my desire to switch email platforms, lest I continue to incur the wrath of the Exchange gods. Last week, one day after openly lamenting the current state of scheduling, particularly within Exchange, we experienced significant email downtime for the first time in months. Not having learned my lesson, I then took a look at the opportunities I saw for Zimbra and other Exchange alternatives. It took a few days, but those words seem to have caught up with me, as our Exchange server decided to take a bit of a vacation last night beginning around 11 PM MT.
While the ExchangeOutlook connectivity, per a conversation with James this morning, seems to be ok, the OWA (what I use to gain access to our calendars via the Evolution Exchange Plugin) connectivity was down until about 11 AM this morning, and the IMAP connectivity still hasn’t recovered.
Given that the only way I’ve gotten filters – spam, listserv, and otherwise – to work at anything other than a glacial pace is via IMAP, the lack of it is something of a problem. Now that Evolution has offline caching of IMAP content, I at least have client side access to emails prior to last night, but the real problem now is the volume of spam and list emails (opensolaris-discuss, in particular) radidly accumulating in besieged Inbox. They’re creating a real logjam in there, and unfortunately given the limitations of the Firefox OWA client, I can only manipulate one message at a time (one of the missing features, regrettably, in the non-IE version is “check all”).
When filing a service ticket this morning, however, I did discover the cause of last week’s outage – here’s ASP-One’s explanation:
First of all, we would like to apologize for the problems that you
have experienced causing poor services being delivered.
This worm issue started on Tuesday 8/16 when the World Wide Web was
hit by an outbreak of this computer Worm also referred to as Esbot or
Zotob. Within 6 to 10 hours this worm had mutated itself creating
about 15 variations of the worm. As soon as our Operations Team had
detected the problem, we were confronted with the following situation:
1. Some of the Win2K servers, including one of DNS server had already
2. We had to prevent infection on about 100 Win2k based servers
3. The worm had generated extreme congestion within our network
At that point, the priorities we had on our list were as follows
1. Revive and clean the infected servers in order to avoid possible
infection of the worm
2. Address any servers that were “none cleanable”; i.e. Rebuild and restore
3. Scan 100 Win2k servers
=> These above actions were completed late Thursday 8/18
4. Address the remaining networking issue since we had confined the worm
=> This was completed at 4:45PM on Friday 9/19
As of right now, the network has been stabilized and our Operations
team is monitoring very closely to make sure that the fixes we have
applied to the servers have eradicated any infection of the worm and
any possibility of additional infection. We are also re-enforcing the
number of DNS/DC servers in order to ensure consistent service for
authentication and name resolution.
During this outage period, our support team and also Senior Management
had worked together and have posted regular Updates and Outage notice
on our corporate website www.asp-one.com and also in the portal at
www.bizatlarge.net, per the SLA.
For your information, this computer worm issue was not isolated to
ASP-One as it has caused damages across the US, Europe and Asia.
Please see articles at the bottom of this email.
ASP-One does understand that the service provided to you is mission
critical and that is why our Operations Team has followed the proper
security protocols and has worked night and day in order to restore
service and safeguard the applications and data hosted at ASP-One
Again we do apologize and please let us know if you have additional questions.
While the reliance on Windows 2000 servers is in and of itself interesting, given the nearly deprecated nature of the OS within Microsoft’s security plans, I can’t pin this one entirely on Microsoft. A patch has been available for this vulnerability, I’m told, since August and while patching Windows machines is not as easy as it needs to be, the organizations running Windows 2000 need to bear the brunt of the responsibility here (IMO). I’d much rather have them schedule a bit of downtime with me for patching then lose my email for hours on an unscheduled basis.
Anyway, I also wanted to thank the couple of folks who’ve written in and offered to host our email for us on an ad hoc basis. Much as I appreciate the offers however, such arrangements would typically keep me in an admin role in terms of administering the systems in question – a job that, like Christopher, I’m eager to lose. We have hardware in-house and coloed that could easily meet our email needs, but while I try to install and run as many of the messaging packages as I can for research purposes, I’m not eager to take on the production responsibility – particularly for backups. At this point in the RedMonk corporate lifecycle, I’m more than willing to trade money for time there, and make it someone else’s problem.
So for now we’re just going to struggle through the current flakiness and await that joyous day that I can migrate to a brand new platform, and deal with brand new problems 😉 If I’m a bit slow in getting back to your emails, though, now you know why.