Tuesday, October 23, 2007

7 REPSLM-C, Expanded

This post is a follow up to 7 Reasons Every Programmer Should Love Multi-Core and a direct response to this comment.

Maybe I should have put 6 before 4 because 6 makes the point that most of today's programs aren't written to take advantage of multi-core. So what exactly do I mean by take advantage? It seems you think I'm saying, it means simply running today's GUI, client/server, and P2P apps as is, on multi-core machines and expecting magic to happen. But that is not what I'm talking about.

Aside:

With some existing apps like Postfix, WebSphere, SJSDS, IntelliJIDEA 7.0, PVF, and most Java bit torrent trackers/clients [just to name a few] magic can happen. While others require tuning (i.e. Apache, PostgreSQL, and many others). Most applications, especially desktop GUI apps, will require a major rewrite to take full advantage of multi-core machines.

What I'm talking about is programmers finding opportunities to exploit parallelism at every turn, which is what items 1-4 are about. Let's take something as mundane as sorting (i.e. merge sort, quick sort) as an example. Merge sort and quicksort are excellent use cases for applying a divide and conquer strategy. They consist of a partitioning step, a sorting step, and a combining step. Once partitioned, the partitions can be distributed across multiple threads [and thus multiple processors/cores/hardware-thread] and sorted in parallel. Some of you may say, "that's only 1 out of 3 steps, big deal." Others may take it even further and say, "1 out of 3. That means 2/3'rds of the algorithm is sequential. Amdahl's Law at work buddy!". But what you would be over looking is, [in the serial version] for a large enough dataset, the sorting step would dominate the runtime. So even though we have managed to only parallelize a single step we can still realize substantial runtime performance gains. This behavior is expressed quite eloquently by John L. Gustafson in his [short] essay, Reevaluating Amdahl's Law.

So what does all of this have to do w/ your comment? Let's start with your admonition of GUI applications taking advantage of multi-core and I'll use boring old sorting to make my point.

It is sometime in the future and there is a guy name Bob. Bob's current computer just died (CPU burned out) and he goes out and buys a new one. Bob doesn't know or care about multi-core [or whatever the future marketing term is for [S]MP]. He just wants something affordable that will run all his applications. Nevertheless, his new machine is a 128 way box (it is the future after-all), with tons of RAM. Bob takes his new machine home and fires it up. Bob keeps all his digital photographs and video on a 4 terabyte external storage array. He bought the original unit years ago before 32 terabyte hard drives came standard with PCs. You see, Bob's daughter is pregnant and is in her final trimester and her birthday is just around the corner. Bob wants to make her a Blue-HDD-DVDDD-X2 disk containing stills and video footage of her life, starting before she was even born, and up to her current pregnancy. It begins with the ultrasound video of her in her mother's womb and ends with the ultrasound of his grandchild in his daughter's womb. So Bob fires up his [hypothetical] image manager and tells it to create a workspace containing all the images and videos on the storage array, sorted by date. It's almost 30 years worth of data. And though the image manager software is old, some programmer, long ago, wrote a sorting algorithm that would scale with the number of processors available to it. So Bob clicks a button and in less than 5 minutes 3.5 terabytes of data has been sorted and ready to be manipulated. So what's the point? The point is it doesn't matter than "99%" of the CPU time was spent "waiting for some event", because when it mattered, (when Bob clicked the button) all the available resources were employed to solve the user's problem efficiently, resulting in a great user experience. Now I know the example is contrived but the premise upon which it is based is real. If you look at most GUI applications of today, very few of them can handle multiple simultaneous events or even rapid fire sequential events. In large part because most of the work (the action to be performed) happens on the same thread that is supposed to be listening for new events. Which is why the user interface freezes when the action to be performed requires disk or network access or is CPU bound. The classic example is loading a huge file into RAM from disk. Most GUI apps provide a progress meter and a cancel button but once the I/O starts, clicking cancel doesn't actually do anything because the thread that's supposed to be processing mouse events is busy reading the file in from disk. So yes, GUI application programmers should Love Multi-Core!

Client/Server and P2P are in the same boat in that they are both network applications. But they, like GUI and every other problem domain, can benefit from data decomposition driven parallelism (divide and conquer). I'm not going into great detail about how network applications benefit from multi-core because that subject has been beaten to death. I'll just say a couple things. The consensus is more processors equal more concurrent connections and/or reduced latency (user's aren't waiting around as long for a free thread to become available to process their requests). Finally multi-core affects vertical and horizontal scaling. Let's say you work at a web company and the majority of your web traffic is to static content on your web server (minimal contention between requests). Let us also assume that have unlimited bandwidth. The web server machine is a 2 socket box and quad-core capable but you only bought one 1P processor. A month passes and you got dugg and the blogosphere is abuzz about what you are selling. Customers are browsing and signing up in droves. Latency is climbing and connections are timing out in droves. You overnight 2 quad-core CPUs and additional RAM. Latency drops to a respectable level and you just avoided buying, powering, and cooling a brand new machine that would have cost you 3x as much as just spent for the CPUs and RAM. That is scaling vertically. If you were building a cluster (horizontal scaling), multi-core means you need less physical machines for the same amount of processing power. In other words, multi-core reduces the cost of horizontal scaling both in terms of dollars and latency. Access to RAM will always be faster than the network. So there is a lot less latency with performing the work locally --pushing it across the FSB, HyperTransport, etc, to multiple cores-- than pushing it out over the network and [eventually] pulling the results back. So yes, if you are coding or deploying network applications, P2P, client/server, or otherwise, you should Love Multi-Core!

No comments: