How to think big by thinking small

By , |  Development, computer, logic

Computer programming logic is easy really. It all boils down to endless variations of the following pseudo-English constructs:

  • do <some thing>
  • if <some condition is true> do <some thing>
  • while <some condition is true> do <thing thing>

That's it. Nothing else to it. Granted, there are a million and one ways to express all three of these. There are also a million and one poetically beautiful ways to creating new language constructs that hide the do/if/while stuff from direct view. But, when it is all boiled down, it is just do/if/while in endless permutations and combinations.

On the face of it, these are very primitive constructs. Constructs that application designers can feel free to use in whatever way they choose. However, it is becoming increasingly apparent that the third one of these - the while construct - needs to be treated with great care if you want your application to scale to internet size proportions easily.

The easiest way to illustrate what I am talking about is with an example so here goes. Imagine you have some calculation to perform on some type of file. You design your application according to classic decomposition theory. That is, you write a self-contained chunk of code that works on one file. You then stick a while loop over the top. Your program has this shape:

  • do <pick the next file>
  • while <we have a file to work on>
  • do <work on that file>
  • do <pick the next file>

Everything works fine until one day your program is invoked with 1.5 million input files. Oops! The program is correct and will give a result eventually but, well, it might take quite a while to chew threw 1.5 million input files.

What to do? Well, think of the application as consisting of two parts. The first part does the real work: working on a file-by-file basis. The second part is to feed files one by one into the first part. It is in this second part that the troublesome while loop exists.

Here is an alternative way of thinking about the problem. Imagine if you had at your disposal, a standard mechanism for running the core of your program over any number of files without you having to code the while loop explicitly. Imagine that this standard mechanism automatically handles spreading the work across a bunch of processing nodes for you. Now also imagine that you could get your hands on the results of each separate invocation and pull all that stuff together afterwards without you having to work about failures/retries or any of that stuff.

Join us:






Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Ask a Question