Thursday, June 29, 2006

Holy MacMandelbrot, Batman!

W
  e've been using a Mandelbrot generator as a sample application for quite a while now; those of you who have seen us at a tradeshow may have seen something like this before:

Or have you? Look closer!

Our new intern Robert started today, and he brought his Intel based MacBook. He booted to XP using Boot Camp, installed .NET, added himself to our network and installed a Digipede Agent. Next thing you know, he was adding CPU power to our grid!

Notes:
  • The MacBook (with the Intel Duo Core) is screaming fast.
  • Robert only has 512 MB of RAM; he says that's plenty to run XP quickly, but OS X really lags. Now who's got the bloated OS?
  • My PowerBook at home died recently; maybe I'll have to think about the MacBook...

  • Technorati tags: ,

    Putting a grid behind Excel 2007

    O
      ver in the Excel 2007 blog, David Gainer discussed using a cluster behind Excel Services. The Excel Calculation Engine is now multithreaded and can take advantage of a beefy server, but David points out that, for even more powerful computation, you may want to use the power of many machines. I knew they had been working on this, and Stevan Vidich had an Excel Services demo running on the cluster at the SIA Technology Management conference.

    As David points out in his recent post, you can
    ...deploy your UDF to the cluster, and then use an XLL to (essentially – I am simplifying a tiny bit – and we will make sample code available at some point) call the UDF on the cluster with the appropriate parameters.
    I know there was a lot going on behind the scenes to make this happen--installing Excel Services on each cluster node, deploying an XLL around the cluster, etc.. Being a West Coast Grid guy, I wanted to see this run natively in .NET across a cluster. So I did this:

    First, I installed Sharepoint Server 2007 Enterprise Edition (which includes Excel Services) on the head node on my cluster. I made sure it was working by writing a UDF (that's User Defined Function) that does my old standby Monte Carlo simulation: calculating pi. It looked like this:

    for (int i = 0; i < mNumberOfDraws; i++) {
    x = rand.NextDouble();
    y = rand.NextDouble();
    if (Math.Sqrt(x * x + y * y) <= 1.0) {
    numberInsideOfUnitCircle++;
    }
    }

    mPi = 4.0 * numberInsideOfUnitCircle / mNumberOfDraws;
    It's inefficient and silly, but it works.

    In order to make this run on a grid, I pulled that code into its own class (which I called a DigipedePiWorker). Then, in my UDF, I added this code:

    DigipedeClient mDigipedeClient = new DigipedeClient();
    mDigipedeClient.SetUrlFromHost("leg7");
    mDigipedeClient.SetCredentials("dan", "dan");

    JobTemplate jobTemplate = JobTemplate.NewWorkerJobTemplate(typeof(PiWorker));
    Job job = new Job();
    for (int i = 0; i < tasks; i++) {
    DigipedePiWorker thisWorker = new DigipedePiWorker(numdraws);
    Task task = job.Tasks.Add();
    task.Worker = thisWorker;
    }
    job.TaskCompleted +=
    delegate(object sender, TaskStatusEventArgs e) {
    DigipedePiWorker dpw = (DigipedePiWorker)e.Worker;
    pi += dpw.Pi;
    };
    SubmissionResult submissionResult = mDigipedeClient.SubmitJob(jobTemplate, job);
    mDigipedeClient.WaitForJob(submissionResult.JobId);
    result = pi / tasks;
    This was fantastic. It took about 20 lines of code in total. My .NET objects were automatically distributed around my grid (and my cluster, because my grid includes a small CCS cluster). I didn't install Excel Services on any of the nodes. And I didn't have to manually deploy my UDF anywhere--the grid handled all of that for me.

    This is going to be a very powerful use case for Excel 2007--the ability to take a UDF and run it on many machines simultaneously puts a huge powerhouse in your spreadsheets. And Excel Services puts your spreadsheet into a browser. What does all of that add up to?

    Any user on my network can now view a spreadsheet in his browser, enter the inputs they want, and have the entire grid work on the results. That is cool stuff.

    I also can't get over how smoothly this worked. I'm a newbie to Sharepoint, so working with it took a little getting used to. But as far as writing my UDF (in Visual Studio 2005) and plugging it into a spreadsheet, grid enabling the UDF, then adding that spreadsheet to Sharepoint and making it accessible via browser--all that was done in less than an hour and a half this morning.

    Kudos to the Excel 2007 team. From the demos I had seen, I knew it looked great. I had no idea how easy and powerful it was going to be to work with!
    Technorati tags: , , ,

    Tuesday, June 27, 2006

    Kicking it old school...

    I
      spend a lot of time talking about developing grid enabled software, and certainly our critically acclaimed SDK is aimed at making that process easy. We love our API (and our customers do, too).

    I spend less time, however, talking about our ease-of-use with respect to traditional distributed computing: good old CLI. Certainly most distributed computing to date has involved moving a command-line application to many machines, running it (sometimes with different arguments or input files), then putting the results somewhere.

    It's old school, but it's important. And it's also important that you can do it without having to learn how to program or write perl scripts. Earlier today I read Joe Landman's post about Microsoft's entree into HPC. He certainly has many valid points (Joe's a smart guy, and one of the smartest when it comes to clustering), but on one thing I definitely disagree: HPC has historically been too difficult for many users, and needless complexity has indeed been a hindrance to adoption. I've had customers tell me this directly. You have to remember: not everyone knows how to write scripts or use a compiler.

    To that end, I recently made a short video that shows how to create and submit a command line job to the Digipede Network using our Workbench tool. This isn't a glitzy video; it simply shows how easy it is to run a command line job on our system. I linked to this page earlier today, but (coincidentally) the video just went up a few minutes ago. To find the video, click here then scroll down to "Submitting a Command-Line Job with Digipede Workbench."

    Technorati tags:

    Happenings at Digipede

    T
      hings keep happening! I like to keep the content on the blog as close to grid computing as possible and not let it come too close to being a company blog. But some weeks I spend all my time on business and marketing, so I don't necessarily have grid computing insights to share.

    Here's a Digipede post for you:

    The SIA Technology Management Conference was absolutely worth the trip to New York. I have to thank our partners (especially Stevan Vidich at Microsoft, and Dan Cox and Doug de Werd at HP) for doing such a great job of helping us evangelize grid computing (and our product) to the attendees. In addition to the Digipede booth, the Microsoft booth, and the HP booth, there was an "HP/Microsoft break room;" a place for private meetings with customers. They had built a screaming little 16-processor cluster (Microsoft CCS and Digipede Network on ProLiant servers with dual Opterons), and they were showcasing Excel Services, our software, and the interoperability thereof. It was fantastic to be endorsed by our partners so publically.

    Richard and Carl at .NET Rocks! put up my interview this morning! I haven't listened to it yet (and I'm not sure I will; it's always a little strange to hear myself speaking, and I find myself agonizing over every word choice). It was a good experience, though, and I was glad to find out that those guys were so excited about grid computing. Of course, they're obviously both SETI@Home fans, and the SETI@Home perspective on grid computing is a lot different than the enterprise perspective on grid computing (I'll probably write a post about that soon). But it was fun to do, and it was good to get the exposure. I don't podcast, so if you're interested in hearing about grid computing rather than reading about it, check it out. (Oh, and I don't know if I've ever linked to these before, but I now have 3 videos up on the Digipede site; check them out over here).

    Lastly, and this is a pretty cool one: the US Army Corps of Engineers is now using the Digipede Network to aid in their weather simulations. They're not doing any development at all: using their existing software and existing hardware, they're using the Digipede Workbench to design and submit jobs to their grid with no programming and no scripting. It's cool to see the Digipede Network being used to keep coastal communities safer.

    Ok, that's it for the commercial. Those of you reading the feed who don't see the update on the pushupometer, you'll be happy to know that today is day 178, and I'll top 16,000 pushups (for the year) tomorrow. Only 50,000 more to go!

    Technorati tags: ,

    Monday, June 19, 2006

    SIA Tomorrow

    I
      took the redeye from Oakland to JFK (still have yet to have a bad experience flying JetBlue—I'm definitely a fan) with my colleagues John and Nathan. Setting up for a show like this is always a blur of many details. The SIA Technology Management Conference is a bit of a bizarre bazaar. I've never seen so many booths packed so tightly into such a confined space (not even at Super Computing 2006, which was pretty tight). I have no idea what this place will look like when 7,000 attendees show up.

    Normally, I don't get too hyped up about shows that I'm attending as a vendor. However, I'm unusually excited about this show. As our press release said this morning, we've got some big announcements this week. First of all, we're very pleased to formalize our relationship with HP. Packaging our software with the leading server manufacturer's hardware makes it as easy as possible for customers to get the power of .NET grid computing on a cluster—and ease of use has always been

    We're also glad to further our relationship with Microsoft. We've always viewed them as our most important partner (in fact, we've been a gold certified ISV partner since before our product was released); having them present us in their booth here at SIA represents a new level of cooperation. The Financial Services team at Microsoft has been terrifically supportive—and very excited about what the Digipede Network gives them. They see it like we've always seen it: it's the best way for .NET developers to take advantage of Cluster Compute Server.

    So if you're at the SIA Tech Management Conference in the next couple of days, stop by our booth (#4506 upstairs), the Microsoft booth, or the HP booth. I'd love to see you.

    Technorati tags: , ,

    Friday, June 16, 2006

    Computers get slower every day

    W
      e have a saying around our office: "Computers get slower every day."

    "What?" you ask! "What about Moore's Law? What about 3 GHz Opterons? What about 3.4 GHz Xeons?"

    Well, ok, that's true. AMD and Intel are still innovating, and they're still churning out faster and faster chips. But the truth is, they're not keeping up.

    As I wrote last fall, increases in processor speed just can't keep up with increases in networking speed and disk density (I referenced this Scientific American article). When you take into account the amount of data that we can effectively move to a computer, the processor itself is relatively slower than it used to be.

    Now, combine this ever-increasing network speed with Jim Gray's observations about distributed computing: "Keep the computation near the data." But the corollary is, when you take into account increasing network speeds, "Your data is getting closer and closer every day." In other words, because networks are getting faster and faster, it makes more sense to move data to more processors in order to work on it.

    I write all of this because I had first-hand experience with it yesterday. Over on the Digipede Community boards, delcom5 had written to say that he had 15gb files to zip, and ask if the Digipede Network could be used to speed up the zipping process.

    I was curious about it, and I set out to try it. It was extremely simple to set up a job to zip files (took me maybe one minute using Digipede Workbench). However, when I ran it, even though I had 10 machines working on zipping, I got barely any speedup. Why? Well, I was dealing with 100MB files--and those take a while to move around my 100 MBit network! I was pretty frustrated.

    Digipede Workbench screenshotThen I decided to submit to a subnet that's wired with Gigabit Ethernet. Wow! What a difference. Zipping a gigabyte of files went from almost a minute and a half to fifteen seconds.

    A couple of years ago, this wouldn't have made any sense as a distributed computing problem--100MBit networks just weren't fast enough to make it work. If you wanted to zip a bunch of files, you were forced to do it on one machine. But with the order-of-magnitude performance increase that Gigabit Ethernet gives us, you get tremendous improvement by distributing this problem.

    The lesson here isn't just about zipping files, of course. As networks get faster faster than chips get faster, more and more problems like this become "eligible" for distributed computing every day.

    Friday, June 09, 2006

    CCS Released!

    A
      t long last, Microsoft released Compute Cluster Server 2003 today! Kyril Faenov, Ryan Waite and crew have worked for a long time, and they're releasing a great OS (Compute Cluster Edition) and set of HPC tools (Compute Cluster Pack).

    I know they'll be making a splash at Tech Ed 2006. Unfortunately, scheduling problems won't permit me to be there. Fortunately, Scott Swigart over at Tech Blender pointed me to Virtual Tech Ed, where I'll be able to follow all of the action!

    Anyway: Congratulations Kyril and Ryan! Have blast in Boston.

    Technorati tags: ,

    Unscheduled Uptime on Blogger: Post Immediately!

    T
    ough week to be blogging on Blogger. There was downtime most of the day, nearly every single day. I'm surprised that I had any hits at all--I could almost never see my own site. In fact, I could almost never see any Blogspot site. Blogger, here's a suggestion: create a status/information page that's hosted on another site so we can get information when you're down!

    I'm considering moving--perhaps to my own site on Bluehost, running Wordpress. Any other ideas? I'd be interested in running .NET at a host who supports .NET (I know GoDaddy does), but are there any good .NET blogging packages out there?


    We had an exciting week here at Digipede. We nabbed a new customer in the financial services space who will be scaling their SOA using the Digipede Network--I think we'll have a press release out soon. We also continue to prepare for the SIA TMC 2006 conference. The amazing thing about that is that we're now running our VSTO 2005 project in Excel 2007--with no recompiling! Just loaded it up, and it ran. So that means we'll be showing Excel 2007 starting .NET jobs on a grid running on Compute Cluster Edition cluster. Sweet.

    I also recorded my .NET Rocks! interview this week. That was a blast; Carl and Richard are smart guys (and if you ever want to see some crazy photos of water cooled super PC's, check out Richard's blog). They're shuffling their schedule right now around TechEd; current projections are that my interview will "air" on June 27th. I'll keep you posted on that.

    Technorati tags: , ,

    Monday, June 05, 2006

    Matrix Multiplication on the Digipede Network

    R
    ich Ciapala did a great, extensive review of Windows Compute Cluster Server 2003 in the April, 2006 edition of MSDN Magazine. He took a sample application (matrix multiplication) and runs it on CCS three different ways: in a single call, as a distributed application using several calls, and as a single, MPI-enabled application.

    It's a great, in-depth article. Rich dove in headfirst, and he dove in deep! Thanks for such a practical, hands-on guide. Matrix multiplication takes a ton of calculation to do and is very parallelizable, so it's a great sample.

    In fact, it was such a good sample, I decided to take the same application and implement it using the Digipede Network!

    We don't have an MPI implementation (and if you need it, CCS is the right tool for you!), so I'm going to mimic the functionality of Rich's second example: taking their (command line) matrix multiplication tool, and run it using the Digipede Workbench.

    First, I created a 5000 row by 5000 column matrix. For my test, I'm going to write code to square it (multiply it by itself). Each of the 25,000,000 cells in the resulting matrix takes 5000 multiplications to calculate.

    Next, in order to have a "control" for my experiment, I multiplied it by itself on one machine. I chose one of our faster servers for this: a 3GHz box. Multiplying my huge matrix by itself on that one machine took just over an hour:


    Ok, now we've got a baseline. Next, I used our GUI tool, Digipede Workbench, to create a job submission. The MatrixMultiplier takes the following arguments:

    MatrixMultiplier.exe [input matrix] [input matrix] [output matrix] [quadrant X] [quadrant Y] [quadrant dim]

    For my test, I broke the multiplication up into 16 tasks. For example, the command lines for my first three tasks need to look like this:
    MatrixMultiplier.exe InputMatrix.mtx InputMatrix.mtx output0.mtx 0 0 1250
    MatrixMultiplier.exe InputMatrix.mtx InputMatrix.mtx output1.mtx 1250 0 1250
    MatrixMultiplier.exe InputMatrix.mtx InputMatrix.mtx output1.mtx 2500 0 1250


    Fortunately, creating command lines like that are a breeze with the Workbench. I did this whole thing using variables, by the way, so I could re-use it later for more matrix multiplication jobs without going through the whole process again.


    First, I gave it the name of the executable:

    Then, I gave it the name of the Matrix I was going to multiply.

    I defined XQuadrant as a range variable from 0 to 4999 (the size of my matrix), step 1250.



    I defined YQuadrant the same way. Then I defined QuadrantSize to be 1250. If, in the future, I want to use this Job to multiply different size matrices (or using different size quadrants), I can change these values on submission.

    After telling it where to put the result file, I defined the command line:

    Notice all of the variables on the command-line: the file name of the input file (used twice), the output file name, the quadrants (x and y), and the quadrant size.

    Then I clicked "Submit!"

    The whole job definition process took about 10 minutes or so. And the job took 13.5 minutes to run--nearly a 4.5x speedup!

    Here's an aside: With about 10 machines on the network, I wondered why I was only seeing less than a 5x speedup. I looked at the task times, and I realized that my five slowest machines were so much slower than the others that they were actually holding things up. I resubmitted my job; this time, I broke it into 64 pieces (this was really easy; I just had to go through the wizard again, changing my quadrant size and step sizes to 625. Piece of cake!)


    And the results were good: it knocked 3 minutes off my run time, bringing it down to 10.5 minutes. By giving finer-grained work to the Digipede Network, I allowed its CPU load balancing to work most efficiently. My grid did 60 minutes of work in 10 minutes; that's a 6x speedup. Considering that I have 4 fast machines and 5 slow machines (much slower than my "control" machine), the 6x speedup was just about optimal for this problem.

    [Update 2007-10-17]: We're doing some testing right now, and we've just added two eight-way boxes to our test grid; today it is 14 machines with a total of 35 cores. I just re-ran this test and it took two minutes and 19 seconds. Yum!

    Technorati tags: ,

    Friday, June 02, 2006

    Not Another Trademark Dispute!

    Grid Computing Planet logo

    T
    hanks to Paul Shred and the gang over at Grid Computing Planet for mentioning our release of the Digipede Developer Edition and launch of the Digipede Developer Network in yesterday's edition.

    However, there seems to be a slight typo in the headline, and I don't want to get in an O'Reilly-esque trademark dispute.


    So let's be clear (Bill and Steve, are you listening?): We launched the Digipede Developer Network. I'm pretty sure someone already has something called the Microsoft Developer Network!

    Technorati tags: ,

    Hey everybody, look at me!

    M
    y friends and family all know that I'm an attention hound. Well, here are some upcoming events where you can see me, hear me, or maybe even both!

  • 6/6/06 - Developer Webinar: This 30-minute webinar will give a technical overview of the Digipede Network and then dive into Visual Studio, showing how to grid-enable an application. Register here.

  • 6/13/06 - .NET ROCKS: I am very excited to be the guest on .NET Rocks--the Internet Audio Talk Show for >NET developers. Carl Franklin and Richard Campbell are wizards at taking a very technical discussion and making it a very entertaining show. Listen here.

  • 6/13/06 - Finance Webinar: Another webinar, but this one will be lighter on the code and heavier on the application. Thirty minutes, concentrating on uses of grid computing in finance. Register here.

  • 6/20-22 - SIA TMC 2006: The Security Industry Association's Technology Management Conference is Wall Street's E3--the place to see the latest, greatest tech tools being used in capital markets. We'll be there, showing some really cool applications and talking about what our finance customers are doing with grid computing. We'll also have some very exciting partner announcements, but the details on those will have to wait. For those of you in New York, this is a chance to come check out the Digipede Network in person.

  • Technorati tags: