After a perfunctory gambit of pleasantries from the interviewer,
conversation jumped straight into a request for an explanation of what
is known as the Chinese Room argument[2]. The Chinese Room argument is a
thought experiment invented by Mr. Searle as a means of debating issues
in Artificial Intelligence (more on this in a moment). With as much
grace and patience as he could muster (I'm paraphrasing here), Mr.
Searle pointed out to the interviewer that the Chinese Room argument is
but one very small part of his oeuvre and that it was a long time ago,
and that there is so much else in his work . . . and then humored the
interviewer with a short explanation of the Chinese Room argument.
Note to Searle fans - I know that there is much more to Searle's thought
and work than the Chinese Room argument. But, alas and alak, like the
Dr. Spock dualism that plagues Leonard Nimoy, I suspect that Mr. Searle
will always be associated with the Chinese Room Argument in the minds of
the masses. (To my mind the most interesting aspect of Searle's work -
and the part that will be most relevant to enterprise IT - is Speech
Act Theory[3]. More on that subject in a future column.)
Now fear not gentle reader! Fear not the whooshing noise of dense
philosophical argument. This column has essentially *nothing* to do with
Artificial Intelligence. Relax. There is no need to let your eyes glaze
over just yet.
Today, I want to talk about something much more mainstream and relevant
to the daily grind of commercial IT and I propose to use a variant on
Searle's Chinese Room Argument along the way.
First, a ruthlessly short and necessarily selective explication of the
Chinese Room Argument. Picture the scene. You are in a closed box shut
off from the outside word. The box has two holes, large enough for cards
with symbols written on them to pass in and out of the box. You are a
monolingual English speaker. You have a set of rules in your head that
tell you what card(s) to send out of the box based on the cards that are
sent in to the box.
The symbols on the cards are in Chinese script. The "algorithm" you are
executing - matching symbols and applying rules - results in the person
outside the box - a Chinese speaker - concluding that the entity inside
the box can read Chinese script and understand Chinese. You can't. Does
it mean that the box 'understands' language in any meaningful sense?
Let us skip the avalanche of philosophy about AI that rumbles around
this analogy and ask a completely different question of this scenario.
What if the symbols going in and out of the box represent arbitrary
pieces of data in your organization? Imagine if all the data going in
and out of the box were represented semantically on the cards. Perhaps
in XML conforming to a clearly documented schema. That is to say, any
IT person from your organization can read and understand the meaning of
the data by simply looking at the cards.
In your capacity as someone charged with performing data processing for
your organization, your concern is what data goes in and what data comes
out of the box. Your primary focus is that data. Do you care what goes
on inside the box?
I contend that although you may care what is inside the box, your
primary concern is the data that goes in and out of the box. If you do
not control that aspect of the data processing function you have a real
problem. The problem is that you don't understand (and therefore don't
really 'own') your own data. You can only interpreted it with recourse
to some black box that interprets it for you. Not good. Shouldn't that
issue take precedence in your mind over any desire to know what is
inside the box?
Which is more important to your business - that your data be 'open' or
that your application programs be 'open'? I would suggest that the
former is the case in the majority of businesses. And yet, we live in
an age where open source - not open data - is the hot topic.
Yes, I know that there are numerous benefits to open source from
quality, reliability, longevity, risk control, and umpteen other
perspectives. Believe me, I'm a believer in open source. I use lots of
it and I contribute some as best I can, when I can.
However, with my business hat on, I am primarily focused on open data
and only secondarily focused on open source. My first question with any
piece of software, open source or closed source, free or expensive, is
this - "what data goes in and what data comes out? Do I understand
everything about the data on both sides without having to ask the box
again?" If I fully understand the data on both sides of the box, I'm
happy. After that, access to the source is sure nice but it is not a
must have.
I have seen open source software where the data may as well have been
completely proprietary, binary goo, for all the meaning I could
attribute to it outside of the application that created it. Equally, I
have seen completely closed box, proprietary systems with very open data
interfaces that made it easy to integrate the proprietary system with
other open/closed systems. The power to do integration comes primarily
from full disclosure of the data - not from full disclosure of the
algorithms.
Open source is great and may well take over the world but what we really
need is open data.
It is worth remembering that open data is not an automatic byproduct of
open source and that closed data is not an automatic byproduct of closed
source.
[1] http://ist-socrates.berkeley.edu/~jsearle/
[2] http://www.utm.edu/research/iep/c/chineser.htm
[3] http://www.openebxml.org/methodology/SAT/sat.html
Sean writes ITworld.com's E-Business in the Enterprise newsletter. To
see his other columns, go to:
http://www.itworld.com/nl/ebiz_ent/
Or reach him at: http://seanmcgrath.blogspot.com