The other area of concern is the size of the program. It can load many
days worth of data and then show up to 16 differently colored plot lines
on the graph. The process size on Solaris when run as an application can
reach well over 10 MB. The code size is under 50 KB, but it does require
the additional Java WorkShop GUI class library visualrt.zip, which is
about 600 KB. To save downloading this dynamically each time you start
an applet, you may want to grab a copy and put it on your local
CLASSPATH. The main memory hog is the array of DataFrames each
containing a Vector of Vectors of Doubles or Strings. I'd like to make
it a Vector of arrays of doubles where possible but haven't yet figured
out how. I did trim the Vectors down to size once I had finished reading
the file. I tried to find a tool that could pinpoint the memory usage of
each object, but only found people that agreed with me that it would be
a good idea to build one.
GPercollator is Java 1.1 based, and it uses the new event model feature,
so it doesn't work with older versions of Java. This means that you
cannot run it using Netscape Navigator (at least up to version 4.02) or
Microsoft Internet Explorer. It does work in HotJava 1.0 (the version
provided with Solaris 2.6) and the JDK 1.1.3 appletviewer also provided
with Solaris 2.6. It also runs as a program from the command line using
JDK 1.1.3. By default Java WorkShop builds in the code wrappers that
allow generated programs to be run as applets and as applications
without any changes. As full Java 1.1 support rolls out over the next
few months it will be easier to use it as an applet.
The Java WorkShop performance analyzer
Graham Hazel wrote the new graphical percollator browser (GPercollator)
with a lot of feedback and a few code fixes from me. We built
GPercollator using Java WorkShop 2.0 as the development tool and GUI
builder. One feature of Java WorkShop is that it provides a simple menu
option that starts the program or applet along with a performance
profiler. After the program exits, the profile is loaded, and you can
see which methods took longest to run. You can also see and traverse the
call hierarchy. When we first tried this, our top routine was an iso8859
character conversion method. Initially we didn't see it because the
profiler only shows your own code. When we looked at the system library
code as well we could see the problem. When we tracked it down we
realized that we were processing the input data without buffering it
first. This is a common mistake, and when we wrapped a buffer around the
input stream it went a lot faster, and that routine dropped way down the
list. We also compiled the application with debug turned on to start
with, and when we changed to invoke the optimizer, the individual class
sizes dropped very significantly. As a result, we got a reasonable
speedup.
Profiler Display Examples
I compiled the code with debug and used the old style input stream
methods. This is deprecated in Java 1.1, but is based on old code I
wrote using Java 1.0. I started it up as an applet from Java WorkShop
using the profiler button. The tool automatically loaded a data file,
and I reloaded it another four times so that the load time would
dominate the tool startup time. The initial profile shown does not
include system routines.
When the system routines are shown, the top one is the idle time routine
Object.wait. Next comes the stream tokenizer using about 15 seconds of
CPU. The first routine of my own code, DataFrame.fetch, is about 1.5
seconds. Input goes via a BufferedInputStream.
The code is now brought up to date by adding an InputStreamReader
between the input stream and the StreamTokenizer rather than a
BufferedInputStream.
InputStreamReader is = new InputStreamReader(url.openStream());
StreamTokenizer st = new StreamTokenizer(is);
This is part of the improved internationalization of Java 1.1. Spot the
deliberate mistake. There is now about 200 seconds of overhead with 104
seconds in ByteToChar8859_1.convert on its own. It needs a buffer!
This increases the size of the chunk of data being processed by each
method invocation, thus reducing the overall overhead. The new code
wraps a BufferedReader around the input.
BufferedReader br = new BufferedReader(new
InputStreamReader(url.openStream()));
StreamTokenizer st = new StreamTokenizer(br);
This reduces the overhead to about the same level as the original code.
The next step is to turn off the debug compilation flag and turn on the
optimizer. The total size of the compiled classes is 54 KB compiled with
debug and 46 KB when compiled with -O.
Running without the profiler with debug code, the average time taken to
do a load operation, as measured by metrognome, is 0.775 seconds. This
is a lot faster than the profiled time of 5.47 seconds, so there is
quite a large profiler overhead. When the code was optimized overall
performance did not increase much, but because most of the time is spent
in system routines this is not really surprising. If we look at the
profile of the optimized code, excluding system time, the
DataFrame.fetch routine is a lot faster, but that only amounts to a
reduction from 1.5 seconds to 1.0 seconds, as the total CPU for five
fetch operations. To tune the code further I need to work on making more
efficient use of the system functions. Here is the optimized profile for
my own code.
Wrap up
So far I have managed to build a program that is fast enough to be
useful, but bigger than I would like. Its a useful test bed for me as I
learn more about tuning Java code to make it smaller and faster. I was
given a useful URL for more information on Java tuning tips: Java
Optimization. The subject is not covered much in the many books on Java,
but I'll include whatever I can figure out in future columns and the
updated version of my own book. Any hints and tips (also GPercollator
bugs and fixes) from more experienced Java programmers are welcome.