From: www.itworld.com
May 1, 2001 —
In the first installment of this series, we outlined what it means to test graphical user interface (GUI) applications, and to what ends that testing is done. (See part 1.) Now we'll apply those abstractions to a model situation that illustrates a handful of the best testing techniques specifically applicable to GUIs.
A sample program
The source code for this article is all written in Tcl/Tk. This is the one GUI binding most likely to be installed and already available on your workstation. There's a good chance you can enter the source code below and run it immediately on your desktop.
# Program 0.
set last_push 0
pack [button .b -text "Push me" -command {
set current_push [clock seconds]
if $last_push {
.l configure -text \
"It has been [expr $current_push - $last_push]\
seconds since the last button push."
} else {
.l configure -text "That was the first button push."
}
set last_push $current_push
}] [label .l]
That is a tiny application with one button, and one textual display updated by the button. It's written in a rather Visual Basic (VB) style of Tcl/Tk.
The fundamental problem
Even a program this small and simple raises several deep questions for GUI verification. At the most fundamental level lies the mismatch between the natural human expressions for useful tests, in terms of widgets and displays, and the computer's pixel orientation. The most important natural test of the application is one that validates the value of the display. However, as noted in the first installment in this series, there's no direct way to capture that information from an executing GUI application. Operating system access is in terms of pixels; translating a pixel display into a string -- like "11 seconds" -- is a surprisingly difficult problem. That's one reason we often choose clock- or calendar-related examples: they present interesting testing challenges in just a few lines of code. An inspection of commercially available products reveals how many of them mishandle even simple time-dependent displays and calculations.
One solution to the problem is to do "null comparisons": define a specific sequence of actions and check the precise pixel layout resulting from those actions. Judge the test successful if the layout corresponds exactly to a reference pixel display. Otherwise, fail the test.
That solution requires a separate test for each possible display. It offers no opportunity for parametrization, no recognition that the display with "13 seconds" might be in a sequence with a display for "14 seconds." Moreover, the strategy is sensitive to display details. A change in font, or screen resolution, or even smaller matters, causes tests to fail.
That strategy is, however, generally adopted by commercial record-and-replay products. Later we'll look at ways to make the strategy viable.
Our preference in most cases is to introduce a "testability layer" in the design of the GUI application. We're going to reduce our testing problems to separate the GUI from the textual parts. We'll make the GUI elements very simple, so simple that they can be designed correctly and validated easily "by hand." The full power of "classical" command-line testing verifies the textual parts.
Separating computation from the interface
Working code examples make this clear. Rather than Program 0, above, consider
# Program 1.
# Notice that this procedure has no GUI
# elements.
proc get_text_for_label {} {
set current_push [clock seconds]
if [info exists ::last_push] {
set result
"It has been [expr $current_push - $::last_push]\
seconds since the last button push."
} else {
set result "That was the first button push."
}
set ::last_push $current_push
return $result
}
# This procedure acts purely on the label.
# It is independent of the button.
proc set_label {} {
.l configure -text [get_text_for_label]
}
label .l
button .b -text "Push me" -command set_label
pack .b .l
Program 1 takes a few more lines than Program 0. Extreme Programming teaches its adherents to write minimally and simply. For us, though, validation is always a core requirement, so we favor Program 1 as being smaller than the combination of Program 0 plus the external test harnesses needed to bring it to the former's level of testability.
Notice that the design now has greatly simplified GUI procedures, a get_text_for_label with purely textual results, and a set_label procedure that binds the GUI and computational parts together. With this design, the definition of set_label is so simple we can inspect it with confidence.
We also test get_text_for_label in isolation, to ensure that it is algorithmically correct. In fact, it could even be factored into a separate source file and assigned to a distinct development and quality assurance (QA) team. Python, incidentally, is one of several languages whose programmer community has a strong tradition of writing simple "self-diagnostic" unit tests along with class definitions in the same source files. Notice that traditional QA automation techniques can test purely textual procedures like get_text_for_label comprehensively.
What about user actions?
This takes us halfway toward decomposition of the Fundamental
Problem of GUI testing. We have a model for the isolation of algorithmic results from their display.
We're still in a trap on the aspect of GUIs dual to display: user action. While designers and users think in terms of such action sequences as "First push button one, then push button two, then ...", the native OS can't converse in such terms. Operating systems typically do not have a notion of "button"; they can only implement a collection of screen pixels in a particular shape and then report as the pointing device moves and clicks over it. The problem is that an action transcript at this level is not robust against changes in screen layout or even window placement.
However, several GUI toolkits are "scriptable": they provide a language for programming higher-order events such as button pushes and menu selections. Look back at Program 1. To "fire" the button, we can
set_label should be tied to the button action and automate its invocationTk is a scriptable language, though, and provides a third alternative for user actions, one which makes "hands-off" testing practical:
.b invokeA sufficiently powerful scripting interface to a GUI toolkit makes it practical to write tests directly. In Tk, for our little Program 0, we can define
proc push_button_and_retrieve_value {} {
.b invoke
set text_of_label [.l cget -text]
return $text_of_label
}
A QA department might then incorporate this push_button_and_retrieve_value in a whole suite of tests to see whether the values are correct after quick button pushes, after slow ones, in the presence of other operations, and so on.
That's it! We're done; we've reduced the problem of testing this GUI application to the two simpler problems of testing a purely textual, or "command line" part, and a very thin, purely automated GUI part. This decomposition gives us a way to express all the tests that QA needs to verify.
The decomposition also encourages a more general QA approach that's healthy. Simplistic testing schemes we've seen often neglect exception-handling. With GUI properly separated from "business logic," though, it becomes easier to validate such aspects as
Tcl/Tk is a good vehicle for this decomposition. Tcl has a long association with testing; it's the base language for the standard quality assurance tests of such products as gcc (the GNU compiler), the Sybase relational database management system (RDBMS), the Jikes Java compiler, and much more. TkReplay was an early-generation GUI testing tool. Smarttest is an interesting load-testing application written mostly in Tk. In fact, Tcl is so rich in automation facilities that an experienced Tk programmer can now choose an object-oriented style for the rewrite above, or an out-of-process automation, or several other variations.
Coping with less-than-ideal situations
What if you're working with a less testable toolkit than Tcl/Tk? You still have several options. VB presents one sort of challenge with its encouragement that "business logic" be mixed in with GUI actions. Intractable as this habit is with experienced VBers, the good news is that recent versions of VB can be automated. At least in principle, user actions can be simulated and screen displays "scraped" for their content. COM-aware scripting languages like the Windows versions of Perl, Python, and Tcl are good choices to glue together such sequences into test suites.
GTK+ calls for a different application of the same principles. The GTK+ programming community recognizes the applicability of the facade pattern in GUI design, often in terms of the model-view-controller (MVC) elaboration. It's natural enough with GTK+ to separate algorithms from a thin GUI wrapper. However, GTK+ admits no direct way to automate user interaction.
Don't despair! Refactoring a GTK+ design into an explicit, thin GUI layer exposes an application's algorithms to testing. For the GUI proper, a new lightweight X11 record-and-playback product joins the traditional commercial offerings for those who choose this testing mode.
You need an Android
Android exploits X's XTEST extension. This means that, although Android itself happens to be implemented as a Tcl/Tk application, it can control and test any X11 application, written with any toolkit.
Independent developer Larry Smith maintains Android. He began its development while an employee of what was then Digital Equipment Corporation (DEC), where Android accelerated the testing cycles of several projects.
Android's use resembles that of the commercial products in GUI application testing. What makes it particularly exciting, though, is its easy language for the simulation of user actions. When used in record mode, Android might transcribe one particular button-push in terms of all the shaky small motions you use to locate the button with your pointer. You can maintain the scripts, though, at a higher level with Tcl:
# This simulates motion to the location of
# the "Accept" button, followed by a
# button-push there. Use it simply by
# invoking "Accept" as a command where
# necessary.
proc Accept {} {
send_xevents @202,385 click 1
}
Most modern record-and-playback products have a roughly equivalent "macro" or scripting capability for defining these higher-level user actions. Android, which works with any X application, is the easiest testing recorder to install and start. Also, because Android is so convenient and flexible, it makes automations practical outside a testing context. Industrial applications, for example, are often designed to be used only through a single interface. If that happens to be a graphical interface, Android gives you a great way to automate configuration or maintenance chores.
Summary
If you're responsible for testing a GUI program, the best thing you can do to improve its quality is to improve the program's design. "Refactor" it to make a clean separation of the GUI and algorithmic aspects.
Only after you've done everything practical to simplify the GUI layer should you start to test and automate GUI user actions. If your graphical toolkit doesn't allow you to script meaningful tests directly, then, as a last resort, set up a test suite using a record-and-playback product. Among these, Android is a recent open source offering that we find particularly convenient.
Resources
"Testing GUI Applications, Part 1," Cameron Laird and Kathryn Soraiz: http://www.itworld.com/AppDev/1262/UIR010316testinggui1/
"Tk Sets the Standard," Cameron Laird and Kathryn Soraiz: http://www.itworld.com/AppDev/1243/UIR000804tk/
"Hypertools" (a discussion of the origin of Tk scriptability): http://mini.net/cgi-bin/wikit/939.html
interNetwork's Smarttest homepage: http://www.internetwork-ag.de/english/home_m.htm
"GTK+ Matures," Cameron Laird and Kathryn Soraiz:
http://www.itworld.com/App/827/UIR001110gtk/
Facade Design Pattern: http://www.cs.oberlin.edu/~sdugar/facade.html
Interview with Robert C. Martin on Extreme Programming, design, testing: http://forums.itworld.com/webx?230@@.ee6eed4
Cameron Laird's personal notes on GUI toolkits: http://starbase.neosoft.com/~claird/comp.windows.misc/toolkits.html
Extreme Programming homepage: http://www.extremeprogramming.org/
Software testing FAQ -- GUI test drivers: http://www.cigital.com/marick/faqs/t-gui.htm
ITworld.com's Scripting Languages & Techniques discussion: http://forums.itworld.com/webx?230@@.ee6b679
Unix Insider