tecosystems

Who Turned Out the Lights? – Or, the Straw that Broke 1and1’s Back

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

As some of you have obviously realized, given the number of emails we’ve received on the topic, the RedMonk blogs went dark several days ago – Thursday evening, to be precise. The pages themselves remained available, because they’re static HTML and require nothing more than a functioning web server to operate, but your ability to comment and our ability to post was taken away. Why? Because we had a reoccurence of the problem that knocked us offline on April 25th; our MySQL databases became unreachable.

For what they term security reasons, 1and1 chooses to host their MySQL environment not on dedicated boxes themselves, but within a separate DMZed environment. In architectural terms, this is an entirely appropriate decision and is defensible on both security and scalability grounds. As a systems integrator, this is the approach that I would generally prescribe. For our needs, however, this is serious overkill. While we’re running at around 2 million page views a month now (thanks in large part to our audience of highly irritating spammers), our needs are minimal. And when the data, presentation and application tiers are separated, there’s always the possibility of connectivity issues.

Those are to be expected, and frankly we expect some downtime with our hosted services. All that we ask is that the downtime is kept to a minimum, and that we’re kept apprised of the situation so that we can plan accordingly. Unfortunately, 1and1 failed miserably on both of these counts. I frankly have never before experienced such appallingly poor customer service; 4+ days to fix a simple database outage? They could have built a machine from scratch and reimaged it in maybe 2. Even when we had a multi-day outage of email a few years back, ASP-One at least kept us informed as to progress. 1and1? No such luck. They seem reluctant to be the bearers of bad news, never having learned the crucial lesson that bad news is always better than no news.

Despite probably a dozen calls to 1and1’s customer support service in the past 4 days – at 30 minutes plus each, no less – I still have no real conception of what the problem is. 1and1 has clearly grown too quickly, and their customer support department is obviously totally overwhelmed. With each call, I was told by the tier 1 support that the situation was escalated, and in the hands of admin. When I requested that they call admin and get a latest update, I was kept on hold for 20 or 30 minutes and told that they couldn’t get through. Tried the supervisor route, same deal. Tried an email address that turned up in a Google of “1and1 sucks,” no joy.

There is no more helpless feeling professionally than having the tools of your livelihood taken out of your hands, with no real recourse open to you. Being at the mercy of an organization that doesn’t – and probably can’t, in 1and1’s case – care at all about your business is a terrible feeling.

All of which leads me to a question I’d like to ask you guys (presuming that the database is still working tomorrow and you can actually comment): what should we do next? That we’re leaving 1and1 is a given, at this point. Fortunately, porting the site shouldn’t be terribly difficult, based on my experiences uploading my blog in TextPattern and Typo.

Our options, as I see them, are as follows:

  1. Shared Hosting:
    Where we now have a dedicated box to ourselves, we could go back to a shared hosting plan. Given our experiences doing this before, I’m not keen to try this again. Whenever something goes wrong, admins are usually unable to determine whether it’s you or some of your fellow tenants.

    • Pros: Price
    • Cons: Control, flexibility, performance, space, etc.
  2. Dedicated Hosting:
    This is what we have now, and it’s been a much better experience than sharing a server with lots of other parties who may or may not be doing evil things to the box.

    • Pros: Flexibility, performance, space, etc.
    • Cons: Control, price
  3. Colocation:
    We currently have available to us a tremendously underutilized dual Opteron, 2G ram Sun V20Z. It’s a lot more hardware than we’d probably be willing to pay for from a host, but is basically sitting there crying for a greater workload.

    • Pros: Even more flexibility, performance, space, etc.
    • Cons: Workload (backup, firewall, etc)

There is also the Virtual Server option using Xen or equivalents, which in theory offers some of the advantages of both shared and dedicated hosting. Donnie, noting our distress via my del.icio.us links, was kind enough to send along a recommendation of Linode (thanks for that sir), and he speaks very highly of them. My first reaction was to dismiss the notion, believeing that we need the full power of our own machine, but I’ll actually have to think on that more.

When I thought of hosts, I initially thought of TextDrive, but their prices are an order of magnitude greater than we’re currently paying: we’re sub $100/month for our dedicated machine, TextDrive starts at a grand. Even presuming their service lives up to that pricing, that’s not in my IT budget given our rather pedestrian requirements. Alex recommended Chris over at Austin Web Development, and we’ll certainly look into that and appreciate the tip.

How about the rest of you? Do you have any recommendations you might pass along? Any hosts that you can speak very highly of? While we’re migrating, I might as well get us onto a platform I’m comfortable with: does anyone know hosts that are running Ubuntu? For those of you running your own production servers (I know all about running dev ones), how much of a pain is it to backup and protect your machine? I’m not talking about SELinux or Trusted Solaris level of protection, but I’m used to having basic firewall needs taken care of for me. It’d be nice if we could simply eliminate 1and1 and shift everything over to the V20z; are there remote backup services that work effectively and cost efficiently over a WAN? Any thoughts or assistance in this matter greatly appreciated.

On the good news front, we’re nearing what we hope will be a final decision on our messaging/calendaring problem and anticipate being more reachable and more easily scheduled in the very near future. More on that hopefully shortly.

Lastly, I’d like to apologize on behalf of myself and my two colleagues for any inconvenience you may have experienced during our outage, and rest assured we’re taking measures to ensure it doesn’t happen again. In addition, I officially retract all the positive things I’ve said about 1and1in the past, and hereby recommend that future customers avoid them. They’ve simply gotten too big to care about their customers, which is a shame because their uptime has generally been good and their administration front end is quite nice.