If only because it particularly ticked me off that bloglines went awol just a few days after I posted this glowing thank you for improved performance, I was keen to find out what was up.
So when Bob Sutor dug under the covers by pointing me here, I was very interested. Mark Fletcher (sorry to hear about the broken toe, old chap) does a great job of explaining some of the rationale behind the recent bloglines data center migration, the whys and wherefores.
In case you’re wondering about the title of this post, Mark says:
For those interested, we think it’s an issue with the ACPI support in the Redhat Enterprise Linux 4 Update 2 kernel on the Dell 1850s that we use.
I would be really interested to hear more about what kind of support bloglines got from both vendors in trying to troubleshoot the problem. In the mean time the story is bound to get some attention from folks like Dana Myers down in Santa Clara.
Please though- no more drama or plumbers or pirates for a while. I may not “pay” bloglines for the service, but lord knows it gets enough free advertising and referrals from me.
Mark Fletcher says:
January 4, 2006 at 6:57 pm
Our core database machines are all Dell 1850s, running RHEL 4 Update 2. We had been running a couple of these at the old datacenter, but had never experienced the problem we’re seeing now (different kernel…). The syslogs have a bunch of stuff, with the key lines (we think) being:
Jan 2 02:39:40 bldb01bos kernel: irq 193: nobody cared! (screaming interrupt?)
Jan 2 02:39:40 bldb01bos kernel: irq 193: Please try booting with acpi=off and report a bug
Unfortunately, and for reasons that baffle us, multiple machines tend to fail within a 30 minute period with the same problem.
Anyways, more info than you wanted to know. I’ll post something to the blog when we track down the problem. That post will most likely also include much ranting about how bad Dell 1850s are in general.
Thanks for the support and thanks for the well wishes about the toe!
Mark