ITworld.com
  Search  
ITworld Home Page ITworld Webcasts ITworld White Papers ITworld Newsletters ITworld News ITworld Topics Careers ITworld Voices ITwhirled Changing the way you view IT
Java Tuning and Java Virtual Machines, Part 2
JAVA TUTOR --- 09/11/2002

Adrian Cockcroft

More Percollator: httpop/s x time
The most CPU-intensive operation is loading a new file. Each file contains about 280 rows and 20 to 30 columns of data, mostly numeric. Processing the file is tricky and inefficient, taking up to a second of CPU time. I tried to use the StreamTokenizer, but it doesn't behave the way I want it to when processing numbers. I store the result in a class called a DataFrame that contains a Vector of Vectors. This can cope with the variation in type from one column to another. Some data starts with a numeral, but is really a string, like the local timezone timestamp -- "08:35:20." The StreamTokenizer breaks this into two tokens if you let it interpret the number. We force it to pick off space delimited strings and check that the whole token is a valid number before converting the data using Double(String). This conversion operation is inefficient -- a comment in the code for the Double class points this out. The StreamTokenizer contains its own code for parsing numbers which seems like wasteful duplication, but it is much faster. I'm hoping that the performance of Double(String) will be fixed for me one day. 

On this topic

The other area of concern is the size of the program. It can load many days worth of data and then show up to 16 differently colored plot lines on the graph. The process size on Solaris when run as an application can reach well over 10 MB. The code size is under 50 KB, but it does require the additional Java WorkShop GUI class library visualrt.zip, which is about 600 KB. To save downloading this dynamically each time you start an applet, you may want to grab a copy and put it on your local CLASSPATH. The main memory hog is the array of DataFrames each containing a Vector of Vectors of Doubles or Strings. I'd like to make it a Vector of arrays of doubles where possible but haven't yet figured out how. I did trim the Vectors down to size once I had finished reading the file. I tried to find a tool that could pinpoint the memory usage of each object, but only found people that agreed with me that it would be a good idea to build one.

GPercollator is Java 1.1 based, and it uses the new event model feature, so it doesn't work with older versions of Java. This means that you cannot run it using Netscape Navigator (at least up to version 4.02) or Microsoft Internet Explorer. It does work in HotJava 1.0 (the version provided with Solaris 2.6) and the JDK 1.1.3 appletviewer also provided with Solaris 2.6. It also runs as a program from the command line using JDK 1.1.3. By default Java WorkShop builds in the code wrappers that allow generated programs to be run as applets and as applications without any changes. As full Java 1.1 support rolls out over the next few months it will be easier to use it as an applet.

The Java WorkShop performance analyzer
Graham Hazel wrote the new graphical percollator browser (GPercollator) with a lot of feedback and a few code fixes from me. We built GPercollator using Java WorkShop 2.0 as the development tool and GUI builder. One feature of Java WorkShop is that it provides a simple menu option that starts the program or applet along with a performance profiler. After the program exits, the profile is loaded, and you can see which methods took longest to run. You can also see and traverse the call hierarchy. When we first tried this, our top routine was an iso8859 character conversion method. Initially we didn't see it because the profiler only shows your own code. When we looked at the system library code as well we could see the problem. When we tracked it down we realized that we were processing the input data without buffering it first. This is a common mistake, and when we wrapped a buffer around the input stream it went a lot faster, and that routine dropped way down the list. We also compiled the application with debug turned on to start with, and when we changed to invoke the optimizer, the individual class sizes dropped very significantly. As a result, we got a reasonable speedup.

Profiler Display Examples
I compiled the code with debug and used the old style input stream methods. This is deprecated in Java 1.1, but is based on old code I wrote using Java 1.0. I started it up as an applet from Java WorkShop using the profiler button. The tool automatically loaded a data file, and I reloaded it another four times so that the load time would dominate the tool startup time. The initial profile shown does not include system routines.

When the system routines are shown, the top one is the idle time routine Object.wait. Next comes the stream tokenizer using about 15 seconds of CPU. The first routine of my own code, DataFrame.fetch, is about 1.5 seconds. Input goes via a BufferedInputStream.

The code is now brought up to date by adding an InputStreamReader between the input stream and the StreamTokenizer rather than a BufferedInputStream.

InputStreamReader is = new InputStreamReader(url.openStream()); StreamTokenizer st = new StreamTokenizer(is);

This is part of the improved internationalization of Java 1.1. Spot the deliberate mistake. There is now about 200 seconds of overhead with 104 seconds in ByteToChar8859_1.convert on its own. It needs a buffer!

This increases the size of the chunk of data being processed by each method invocation, thus reducing the overall overhead. The new code wraps a BufferedReader around the input.

BufferedReader br = new BufferedReader(new
InputStreamReader(url.openStream())); StreamTokenizer st = new StreamTokenizer(br);

This reduces the overhead to about the same level as the original code.

The next step is to turn off the debug compilation flag and turn on the optimizer. The total size of the compiled classes is 54 KB compiled with debug and 46 KB when compiled with -O.

Running without the profiler with debug code, the average time taken to do a load operation, as measured by metrognome, is 0.775 seconds. This is a lot faster than the profiled time of 5.47 seconds, so there is quite a large profiler overhead. When the code was optimized overall performance did not increase much, but because most of the time is spent in system routines this is not really surprising. If we look at the profile of the optimized code, excluding system time, the DataFrame.fetch routine is a lot faster, but that only amounts to a reduction from 1.5 seconds to 1.0 seconds, as the total CPU for five fetch operations. To tune the code further I need to work on making more efficient use of the system functions. Here is the optimized profile for my own code.

Wrap up
So far I have managed to build a program that is fast enough to be useful, but bigger than I would like. Its a useful test bed for me as I learn more about tuning Java code to make it smaller and faster. I was given a useful URL for more information on Java tuning tips: Java Optimization. The subject is not covered much in the many books on Java, but I'll include whatever I can figure out in future columns and the updated version of my own book. Any hints and tips (also GPercollator bugs and fixes) from more experienced Java programmers are welcome.

 

Adrian started out as a software engineer and UNIX system administrator and was one of the first customers of Sun in the UK circa 1984. In 1988 he joined Sun UK as a Systems Engineer and built a reputation as a specialist in SPARC and performance- related issues. In 1993 Adrian transferred to Sun's corporate headquarters in the USA to work for SMCC Technical Product Marketing. A major function of this group is to create the technical information required to support Sun's Systems Engineers on a worldwide basis. He now works in the Server Division of SMCC, and in April 1996 relocated back to the UK, teleworking to California. Adrian is the author of Sun Performance and Tuning: SPARC and Solaris, published by Sun Microsystems Press/PTR Prentice Hall.



Advertisements
Sponsored links
Top 5 Reasons to Combine App Performance and Security
KODAK i1400 Series Scanners stand up to the challenge
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Locate Hidden Software on business PCs with this free tool
 Home   Newsletters  JAVA TUTOR
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Industry Standard   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.