ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topics Careers ITworld Voices ITwhirled Changing the way you view IT

How to think big by thinking small

ITworld 03/31/2008

Sean McGrath, ITworld.com

Bookmark and Share

Computer programming logic is easy really. It all boils down to endless variations of the following pseudo-English constructs:

- do <some thing>
- if <some condition is true> do <some thing>
- while <some condition is true> do <thing thing>

That's it. Nothing else to it. Granted, there are a million and one ways to express all three of these. There are also a million and one poetically beautiful ways to creating new language constructs that hide the do/if/while stuff from direct view. But, when it is all boiled down, it is just do/if/while in endless permutations and combinations.

On the face of it, these are very primitive constructs. Constructs that application designers can feel free to use in whatever way they choose. However, it is becoming increasingly apparent that the third one of these - the while construct - needs to be treated with great care if you want your application to scale to internet size proportions easily.

The easiest way to illustrate what I am talking about is with an example so here goes. Imagine you have some calculation to perform on some type of file. You design your application according to classic decomposition theory. That is, you write a self-contained chunk of code that works on one file. You then stick a while loop over the top. Your program has this shape:

do <pick the next file>
while <we have a file to work on>
do <work on that file>
do <pick the next file>

Everything works fine until one day your program is invoked with 1.5 million input files. Oops! The program is correct and will give a result eventually but, well, it might take quite a while to chew threw 1.5 million input files.

What to do? Well, think of the application as consisting of two parts. The first part does the real work: working on a file-by-file basis. The second part is to feed files one by one into the first part. It is in this second part that the troublesome while loop exists.

Here is an alternative way of thinking about the problem. Imagine if you had at your disposal, a standard mechanism for running the core of your program over any number of files without you having to code the while loop explicitly. Imagine that this standard mechanism automatically handles spreading the work across a bunch of processing nodes for you. Now also imagine that you could get your hands on the results of each separate invocation and pull all that stuff together afterwards without you having to work about failures/retries or any of that stuff.

With such a facility, handling 1.5 million files is no longer such a big deal. To be able to take advantage of this sort of facility you need to think differently about your primary while loops. Where possible, don't bake them deep into your code. Keep them external where possible. If you do this carefully, then you will find that the MapReduce world into which a significant chunk of data processing at internet scale is heading will be yours to exploit - even if you do not need it right now.

On the other hand, if you do not watch the placement of your while loops, your application may require significant re-engineering in the event that it proves such a runaway success that it needs to be run over 1.5 million files.

On this topic

 

Sean McGrath is CTO of Propylon. He is an internationally acknowledged authority on XML and related standards. He served as an invited expert to the W3C's Expert Group that defined XML in 1998. He is the author of three books on markup languages published by Prentice Hall. Visit his site at: http://seanmcgrath.blogspot.com.

Read more of Sean McGrath's ITworld.com columns here.




Sponsored Links

IP Networks Boost Secure Health Communications
AT&T provides secure communication to keep health care moving forward.
Closing the Gap Between Patient and Caregiver
Optical network solutions from AT&T provide scalable, secure bandwidth to keep the health care provider and the patient connected, despite increasing network traffic.
Protecting the Enterprise Network Through Web Security
New focus is being placed on securing Web-based threats.
See how EASY REMOTE SUPPORT can be. Try WebEx FREE!
DELIVER SUPPORT MORE EFFICIENTLY. Remotely Control Applications. Leap Securely through Firewalls!
FREE network scan for VoIP, IM, Games & More
What’s on your network? Use the Sophos Application Discovery Tool to find out!
» Buy a link now

Advertisements
Sponsored links
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Locate Hidden Software on business PCs with this free tool
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
 Home   Application Development
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   Industry Standard   Infoworld   ITworld  
JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

DEMO   IDG Connect   IDG Knowledge Hub   IDG TechNetwork   IDG World Expo  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.