How to think big by thinking small
Computer programming logic is easy really. It all boils down to endless variations of the following pseudo-English constructs:
- do <some thing>
- if <some condition is true> do <some thing>
- while <some condition is true> do <thing thing>
That's it. Nothing else to it. Granted, there are a million and one ways to express all three of these. There are also a million and one poetically beautiful ways to creating new language constructs that hide the do/if/while stuff from direct view. But, when it is all boiled down, it is just do/if/while in endless permutations and combinations.
On the face of it, these are very primitive constructs. Constructs that application designers can feel free to use in whatever way they choose. However, it is becoming increasingly apparent that the third one of these - the while construct - needs to be treated with great care if you want your application to scale to internet size proportions easily.
The easiest way to illustrate what I am talking about is with an example so here goes. Imagine you have some calculation to perform on some type of file. You design your application according to classic decomposition theory. That is, you write a self-contained chunk of code that works on one file. You then stick a while loop over the top. Your program has this shape:
- do <pick the next file>
- while <we have a file to work on>
- do <work on that file>
- do <pick the next file>
Everything works fine until one day your program is invoked with 1.5 million input files. Oops! The program is correct and will give a result eventually but, well, it might take quite a while to chew threw 1.5 million input files.
What to do? Well, think of the application as consisting of two parts. The first part does the real work: working on a file-by-file basis. The second part is to feed files one by one into the first part. It is in this second part that the troublesome while loop exists.
Here is an alternative way of thinking about the problem. Imagine if you had at your disposal, a standard mechanism for running the core of your program over any number of files without you having to code the while loop explicitly. Imagine that this standard mechanism automatically handles spreading the work across a bunch of processing nodes for you. Now also imagine that you could get your hands on the results of each separate invocation and pull all that stuff together afterwards without you having to work about failures/retries or any of that stuff.
With such a facility, handling 1.5 million files is no longer such a big deal. To be able to take advantage of this sort of facility you need to think differently about your primary while loops. Where possible, don't bake them deep into your code. Keep them external where possible. If you do this carefully, then you will find that the MapReduce world into which a significant chunk of data processing at internet scale is heading will be yours to exploit - even if you do not need it right now.
On the other hand, if you do not watch the placement of your while loops, your application may require significant re-engineering in the event that it proves such a runaway success that it needs to be run over 1.5 million files.
ITworld.com
Symantec Backup Exec 12 and Backup Exec System Recovery 8 deliver industry leading Windows data protection and system recovery. Download this whitepaper to find out the top reasons to upgrade and how to get continuous data protection and complete system recovery.
Data and system loss — from a hard drive failure, malicious attack, natural disaster, or simple human error — can happen anytime. Don’t leave your business vulnerable. Make sure you have a secure recovery strategy in place. Symantec's latest backup and system recovery technology can efficiently restore critical applications, individual emails and documents and even restore your entire system in minutes in the event of a loss.
Businesses face a growing challenge to ensure that the IT environment is properly protected. Backup Exec 12 integrates with other applications in the Symantec family of products, to complement your current data protection strategy, keep your data securely backed up and make it recoverable when you need it most.







