I had breakfast with Sun last week before the
launch of the new
T-1 UltraSparc -based servers, although I missed the official launch. We had a cosy little breakfast for industry analysts…
and the 451 group (just kidding guys, Will knows more about Unix than I will ever forget…), at a place called Nippon Tuk. I kid you not – was this a joke about the fact Sun has been undergoing some pretty serious cosmetic surgery of late?
Jonathan Schwartz was there, and it was a pleasure as ever to talk to him, but nothing he said came close to being as compelling as what his customers told me. He’ll be glad to hear it.
Customers make the best evangelists .
People keep asking how Sun plans to make money. Selling hot boxes might be an idea.
The comments from both customers were glowing, practically gushing.
Fiducia runs IT services for more than 900 Cooperative Banks. It plans to do less with the mainframe and more in the middle tier. It built its own benchmark because existing industry benchmarks don’t include a call to mainframe IMS.
The contact was a rather nice chap called Matthias Schorer.
Here are excepts from my notes:
“We ran our benchmarks – it was just amazing. 800 parallel requests into the single machine. 280 requests per second coming up, less than 500 millisecond response time.
The initial number of Mercury Loadrunner Machines couldn’t handle it, so we had to increase the number of load generators. “
Mercury load generators kept crapping out before the server: that tells you something. Apparently Germans are worried about electricity cost as well, and are excited about replacing five servers with one.
“I am an apple guy, they must have taken it to apple to design its so pretty. On unpacking the box – three guys went wowee – we took a photo. It was like a Ferrari or a beautiful woman.”
So what about Strato, a web-hosting company, another T-1 alpha tester? It does so much hosting that it measures tin by the ton. I talked to Rene Weinholtz (CTO) and Carsten Zorger (PR).
The lead developer called and said have you read your email – you wont believe it – this app runs on one chip. 20 ultraspacr 2s and 14 ultrassparc 3s and it was running at 50%…
Strato represents a third of all german email traffic- and its running on one chip.
If we completely restructured – we would only need 10% of floor space and power. Less power counts twice-less air conditioning…
Forgive my stereotyping but I tend to trust Germans on engineering issues. Quality is very much in the mindset, in my experience. Germans tend not to take claims on trust. They are thorough. Their trains run on time. Their cars are usually impressive. The manufacturing industry in Germany is world-class. You could argue some German companes focus too much on quality at the expense of cost, but the brands can sustain it. I recently bought a
Bosch washing machine, and paid more for it than others I looked at. That machine is quiet… which I just what you need when you have a new baby.
All I mean is these guys seemed like they had given the servers a good going over and the experience, from out of the box to production, was positive. Fiducia, for example, looked at AMD servers from multiple vendors during its benchmarking. Buying behaviour will likely follow, but I am not going to speak for Fiducia or Strato on that score.
The T-1 is not right for all workloads. It is designed for mid tier rather than OLTP, from a cache perspective. Sun needs to emphasize the segmentation.
Jaime Cardoso has some good thoughts in that regard. Perhaps you could build a table, or online service, that recommended AMD, oldline SPARC or T-1 based on workload?
In talking to Fiducia something struck me. I was talking to service providers in both cases. They were happy campers with Tomcat and Apache, Posgres and so on, in Strato’s case with little or no interest in J2EE. The weirdest thing about Sun’s current position, is that if the market does swing comprehensively away from Java and EJB in the way some commentators now say it is, then Sun will actually be hurt less than other companies from a revenue perspective. You could take away all JES revenues tomorrow and Sun could still build a business selling servers to support Apache, Postgres, MySQL, PHP and all that. Oh wait – maybe Sun already thought of that…
German IT organisations are evidently thinking about environmental characteristics. It is not just a US phenomenon. Rene called himself a “tree hugger”. Strato’s data center is on the hot banks of the Rhine (38 degrees in summer).
I can’t promise any kind of wide sample size in the research presented above- but these anecdotes felt instructive to me. So I posted them.
Anyway – that is my report. Perhaps its not just plastic surgery.
———————————————————————————————
Disclaimer: Not only Sun is a paying client but Jonathan Schwartz, Sun’s President and COO, recently publicly said that I am a tier one analyst. Note to other “tier one analysts” –
Solaris is open source now. If you read blogs you’d know. Just a suggestion.
James says:
December 17, 2005 at 11:31 pm
Jonathan is correct that you are a tier one analyst. Now if you could only get the folks at Sun to consider that customers may have different thinking on federations than what is previously discussed and that if they continue to ignore their voices they may be alone in the wilderness.
In fact, it would be really great if you could talk to all the employees of Sun who blog and get them to understand that the Internet is a two-way conversation and that they shouldn’t avoid getting customer feedback by turning off trackback…
james governor says:
December 20, 2005 at 1:40 pm
good point – i hate blogrolls, for example, that only cover your own colleagues….
its always a bad sign of potential groupthink.
you want i should blog on the trackback issue? i tend to see that as a smapo killing issue, perhaps…
Richard Friedman says:
January 3, 2006 at 5:32 pm
Tracebacks are messy, and not the best way to get a 2-way conversation going. The Forums are a much better way to talk to Sun and the community in general.
http://developers.sun.com/forums/
and
http://www.sun.com/forums/
Chris Rijk says:
January 3, 2006 at 6:36 pm
Interesting examples. Thanks. Always nice to see what customers say about performance – often more important than published benchmarks. I suspect the UltraSPARC T1 will often do better on proper real-world benchmarks (customers testing their own workloads) than “industry standard” ones.
When you say that it “is designed for mid tier rather than OLTP, from a cache perspective”, do you have any similar customer examples? Since you say “from a cache perspective”, I suspect this is a prediction not a measurement. If so, do you think the cache is too small? My own prediction is that the T1 could stumble a bit on some OLTP setups but not because of the cache size – with the T1’s massive Thread Level Parallelism, cache size is less of an issue.
With regards to “could build a table, or online service, that recommended AMD, oldline SPARC or T-1 based on workload”, I think such a thing might give incorrect predictions too often to be useful. Certainly T1 systems would not be an option on workloads with non trivial amounts of floating-point operations, or where one T1 isn’t fast enough (and clustering isn’t an option) though.
The easist way I can think of to reasonably estimate how upgrading to a T1 system would go in advance with just one calculation would be as follows: measure the number of instructions per second executed on your current system (retired micro-ops/sec for x86), then figure on a top-speed T1 system doing about 7 GOPS (billion instructions per second) in comparison. So if your current system averages around 2 GOPS under maximum throughput then a 1.2GHz T1 would likely be about 3.5x faster. If actual results are significantly different then it’d probably due to: too many fp ops, I/O limits, too much hot lock contention, inherant lack of multi-threaded scalability in the application, bad porting, or something else.
Putting it another way, the less ILP (instruction level parallelism) there is on the running system, the better the T1’s TLP approach should work in comparison.
PS I haven’t used any Niagara/T1 systems myself. My “7 GOPS” estimate for a 1.2GHz 8 core T1 is based on what seems to be typical IPCs (instructions per cycle). Most server workloads would probably be within 20% of that, though that’s based on figures mostly taken from blogs.sun.com entries. I’d sure like to see a table listing IPCs (or GOPS) for systems before they were migrated to T1, and after (with hardware specs).
Darren Moffat says:
January 4, 2006 at 1:52 pm
J2EE vs Apache/Tomcat/Postgres, and interesting comment that you think that the worlds of J2EE and Apache/Tomcat/Postgres are some how different. I often see the use of Tomcat where people don’t need all the features of J2EE but that doesn’t mean they aren’t doing J2EE it often means they just need what Tomcat provides and/or are happy with its admin model rather than moving to a “full blown” J2EE application server.
I’m sure I don’t need to tell you that part of the implementation of Sun’s J2EE application server is tomcat.
james governor says:
January 5, 2006 at 11:41 am
And I am sure I don’t need to tell you, Darren, that part of the implementation of WebSphere is Apache HTTP services… how many organisations that implemented WebSphere between 1998 and the present really needed a full blown EJB container architecture, costing tens of thousands of dollars per CPU?
Sun has less to lose, in revenue terms, as the space commoditises further. If customers choose to Apache itself, which many do, then Sun still has a strong play providing hardware.
My comments were not meant as a critique of Glassfish or other Sun JEE efforts, so much as a pointer to the fact Sun is reasonably well positioned to win Web 2.0 workloads.
Strato, for example, builds directly on open source componentry, rather than buying high dollar software from enterprise vendors. That is what many successful service providers do.
james governor says:
January 5, 2006 at 11:51 am
Actually i was talking about cache size, but your insights are very valuable. i think it is important, perhaps critical, for Sun to establish exactly what kinds of workloads will benefit from the architecture, or what environmental benefits are most relevant. Sun is now very much a multi-architecture company, and customers and prospects will need help navigating the thicket. why go T-1 rather than AMD, why go for an E10k?
james governor says:
January 5, 2006 at 12:02 pm
you are right about trackback messiness Richard, but sloppy loose coupling is not always a bad thing. Bear in mind that my blog is “outside the community” – that is, there is a good chance my readership is interested in server choices but would never visit your forums. Its about providing threads for people to follow, some of which are sticking out of the core ball of yarn. why then should Sun point to me? Its about quid pro quo, interest and attention. If i post about Sun servers and it establishes a good conversation, like the thread here, then its more likely I will do more of the same in future, again, providing new entry points for prospects, customers or reporters.
To whit my point about self-referentiality. Its not enough for Sun’s community of evangelists to tell each other how great Sun is. what else would they say? Sun needs to reach out to new customer sets, in order to drive sales. The existing base of Sun bigots just isn’t big enough, as quarterly results over the last four years have repeatedly shown.
2 way conversation is valuable but I am talking aboout n-way dialogue…
I disagree that Sun forums is necessarily the best place to talk about Sun. Remember I am an industry analyst, not a Sun analyst or Sun customer, per se.
Chris Rijk says:
January 8, 2006 at 4:06 am
Ah, so comments do actually work – I got no response last time.
Going back to cache size: if you have 1-2 cores/chip or 1-2 threads/core then cache is mostly about reducing latency and partly about reducing main memory bandwidth requirements. If you have a chip like the UltraSPARC T1 (or what Rock seems to be like), then I’d say cache is mostly about reducing main memory bandwith and partly about reducing latency.
How caches need to be optimised is not the only thing that I think will change with chips optimised for Thread Level Parallelism:
http://www.aceshardware.com/read.jsp?id=65000333
These presentation PDFs have some interesting bits of info on T1 and Rock:
http://www.cse.ucsd.edu/~rakumar/dasCMP/talk01.pdf
http://ru.sun.com/pdf/t-time/tremblay.pdf
Looking at the CPI figures for T1/Niagara on a “large scale commercial workload”, the L2 cache misses only reduce performance by about 18% even with 4 threads/core active. Or about the same as pipeline stalls (due to branches I assume) and L1 data cache misses, and less than L1 instruction cache misses.
Looking at the Rock simulated performance stuff, “hardware scout” can warm up the caches enough that a 1MB L2 cache can give the performance of an 8MB one without the technology. I wouldn’t be surprised if Rock has even less cache relative to its performance (compared to T1).
With regards to Sun helping customers choose, I’m not sure how much they can realisticly do to predict performance in advance (relative to x86). Nobody can do that (without doing actual measurements). If the OSs are different too, then it’s even harder. Most customers seem to make their architectual choices in advance and aren’t willing to re-evaluate which is best for every new project. Customers who really do care would probably be happy to get trial systems and compare the performance directly – after all, measured results are much better than vendor guesses.
On the other hand, Sun’s pricing on the T1 systems is quite aggressive. Enough that unless it hits one of the gotchas mentions previously, a T1 based solution would probably have better price/perf (and TCO) than Sun’s Opteron systems in most cases.
Jeff Wang says:
January 28, 2007 at 11:34 am
I am very glad to hear some people talking about the cache issues. CPU caches are the topic of my Ph.D thesis. From my years investigations on CPU caches, the conclusion is that CPU caches really matters, but you imply can not change it. Small cache sizes do not make hit rates significantly lower, and large cache sizes do not make hit rates noticeably better. SUN, Intel, IBM people are all too aware of this fact. The issue is so complex that there is no consensus.
Does anybody notice that Sparc T1 have four main memory ports!
Four of them, each costs more than 200 pins!
This really matters!