Wednesday, February 28, 2007

It's a Delightfully Parallel World


T
  raveling every week so far in 2007 has had a major impact on my blogging frequency--I haven't had time to read my feeds in weeks, let alone write anything. But when I started this blog in 2005, I had a goal of writing at least one post a week, and I'm determined to get back to that.

And last night, after a 12 hour day when I returned to my hotel room to start a post, how does Blogger greet me? By letting me know that Servlet NewFrontend is currently unavailable. Great. Can someone remind me why I use this service?

Now, back to our regularly scheduled program.

Spending a ton of time at customer sites this year has given me a vastly improved perspective how much demand there is for distributed computing power. I've been at a customer site where people were asking for time on a brand-new development grid because they need to get production analysis runs completed. I've seen people running from desktop machine to desktop machine starting analysis software. And, of course, the most prevalent "grid" out there: using remote desktop to access many servers, starting processes on each.

And why exactly are people using these slow, inefficient methods for getting work done? Can it even help? Of course it can. Because, after all, it's a delightfully parallel world.

Several years ago, long before Digipede had released a product but after we had decided on a feature set, we were told by one of the luminaries of distributed computing that the problem with a system like ours is that it could only solve "embarrassingly parallel" problems.

For those of you unfamiliar with the term, the term embarrassingly parallel refers to computing problems that are easy to segment for separate, parallel computation. Moreover, embarrassingly parallel problems require no communication between the various pieces of the problem.

There has long been a feeling in the academic computer science community that "embarrassingly parallel" problems aren't worth spending time on. Academics have been much more intent on solving those problems that can't be easily broken up, that require constant communication and direct memory access between processes. Fields like Finite Element Analysis and Complex Fluid Dynamics, for example, are enormously complex, require vast amounts of computing power, and have great computer science minds struggling to come up with new and innovative technologies.

While the academics have been solving these very difficult problems, they've been looking down their noses at embarrassingly parallel problems--the name itself is quite condescending.


But when you go out into corporate America, and you look at the problems that most developers are trying to solve, and you look at the compute loads that are strangling most overworked servers, you find a nasty little secret:

It's a delightfully parallel world.

That industry luminary told us that, in his estimation, perhaps 10% of computing problems might be considered embarrassingly parallel--everything else requires "real" distributed computing.

Having spent a bunch of time with customers, I think he is exactly wrong. Why? Because it's a delightfully parallel world.

Most customers out there who are adapting their software to run on a grid or cluster aren't tearing apart their algorithms, rewriting every line of code using a complex toolkit so it functions across processors. Carving an algorithm like that is amazingly difficult and requires enormous expertise.

No, customers do something far more efficient and practical: instead of trying to carve up their algorithms, they break up their data.

Say you've written a routine that can analyze the risk for a customer's portfolio--it runs for 5 seconds. If you have 1000 customers, it's going to take an hour and a half to run. Imagine you have 20 servers--you'd really like to spread that work around to get it done quicker. Now you could try to rewrite that algorithm in such a way that it uses multiple processors simultaneously, but that would involve complex technology like MPI and completely rearchitecting your routine. Here's a much easier solution: leave your algorithm exactly the way it is now, and break up the data instead. Each server analyzes 50 customers, and your analysis is done in about 4 minutes. Why was that possible?

Because it's a delightfully parallel world. Your customers' portfolios aren't dependant on each other--each can be analyzed independently.

And when you venture into corporate America, and you look at the server loads, you see that most of the analysis they are doing falls into this category.

A special effects company needs to render 50,000 frames for a scene. An electric power company needs to generate 20,000 complex bills for their largest customers. A web application needs to generate PDFs for users on the website. A bioinformatician needs to check 300 different proteins to see how well they dock on a segment of DNA. A trader needs to try 50 different trading algorithms against the history of a stock's performance.

All of these are daunting problems in terms of computing capacity--and all can be solved in parallel by dividing up the data.

Now, before the MPI-jockeys take me to task, some disclaimers: I don't pretend that every problem in the world can be divided like this, and I understand that dividing data can be a complex task in its own right. Moreover, what you guys do is really, really hard. I get that, and I'm glad you're out there solving those problems.

But for the other 90% of developers out there: don't rewrite your algorithms. Break up your data.

Because, as John Powers says, it's a delightfully parallel world!

Photo credit: Scott Liddell

Sunday, February 11, 2007

MSVCRT Update for DST?

[Update 2007-03-14 1:24pm] There's now an update to MSVCRT.DLL! Read all about it here, or go straight to the source and grab it.

B
  elieve it or not, I'm not the only software developer in my family--my wife, Cindy, has been writing software nearly as long as I have.

She's working at a company that's still supporting some products with pretty old code bases. Many of the products deal with a lot of time-series data, and they have to deal with data from different timezones.

At least some of the code utilizes the mktime, tzset, localtime, and gmttime functions in the MSVC runtime library to help translate times and time zones.

As many of you have probably heard, Congress changed the dates for daylight saving time this year. As a smaller number of you have heard, Microsoft is updating their operating systems to reflect these changes. They're also updating their newer development tools. However, it's not clear what's happening for their older customers. This page says this:
For customers who rely on the TZ environment variable for the DST information, they will get outdated DST information for 2007 and beyond (i.e., they will get DST information according to the previous system). Microsoft is currently working on a fix for this issue and will post information about its availability on the Visual Studio Support page. In the interim, developers are advised to test their applications to determine the impact of the DST update on their applications.
Wow. Ok, she's tested their applications--the old mktime functions are definitely not going to work this year. Now what? This is less than a month away, Microsoft--is this release coming sometime soon?

I know Microsoft can't support old products forever. On the other hand, they know that people make a commitment to their platform, and many people can't force their customers to upgrade software. It's also not like releasing a new MSVCRT.DLL would be a huge deal--they released a DST bug fix a few years ago.

So, all you Microsofties out there: any word on this?

Technorati tags: ,

Friday, February 02, 2007

Video: Excel Services and Grid Computing


E
  xcel Services is such a new product that many people have trouble understanding exactly what it does, why they'll need it, and why a grid computing guy like myself is always talking about it.

Simply put, Excel Services is like Excel on SOA: it lets you take the power of a spreadsheet in Excel, then make it available as a service (either through a browser or web services) to everyone in your enterprise. As I've written extensively in the past, I think it's a game-changing product.

What does this have to do with grid? Scalability. When you make something available as a service, you've got to ensure that you're ready to have many people use it at once.

As a Gold Certified Microsoft ISV partner, I was tasked to make a 5-minute video for the Office 2007 launch explaining how a grid can help ensure scalability for Excel Services and UDFs. If you're interested, check it out here on the Microsoft site. Read all about it, or scroll down and click the video link. (Note: the video link worked for me in Firefox, but not in IE 7)

Technorati tags: ,

Thursday, February 01, 2007

Turnaround: Scoble Gets It Right This Time


S
  coble lost his senses for a couple days in September, but a trip to DEMO07 seems to have straightened him out.

Responding to a post in WeBreakStuff, Scoble said then that he liked the fact that "money is a filter;" that only companies that could scrape together the $30,000 fee were allowed on stage.

I responded vociferously: Scoble Is Dead Wrong about DEMO. The filter isn't the $30K--it's Chris Shipley. As I said in my post back then:
To participate in DEMO you undergo a rigorous application process and a very critical evaluation. When my company demonstrated in 2005, there were over 700 applications--and around 70 were accepted. That's close to an Ivy League acceptance rate! Chris Shipley and the team have a great eye for technology, and you have to pass their sniff test (and outshine the competition) to get in. Their track record is fantastic (products like Palm, Java, and TiVo were all featured at DEMO in the past--prior to their great success).
Well, a trip to Phoenix seems to have changed Robert's mind. In his post yesterday, he seems to have grokked the fact that the money isn't the important hurdle:
Chris Shipley interviews each company, which means that everyone on stage has reached a certain bar and the $30,000 fee makes sure that entrepreneurs have some skin in the game.
I'll always look fondly on DEMO@15 (where we announced our product in 2005): not only did we get to demonstrate there, but it was the first time I met Robert--and the first time I started to see the value in blogging.

I wish I were there--it's an exciting event. And I'm glad to see that he's appreciating the true value in it!

Technorati tags: