A while back I wrote an article about my pet peeves related to XML. Some time soon I hope to write something similar related to my other main weapon of choice, namely, Python. Before that though, I thought it might be interesting to think about the peeve that relates the two. A peeve that joins them at the hip actually. Namely, the English language.
The English language is many things but a precision expression tool is not one of them. This is not a fault of English per se of course. It applies to all human languages. So much of their power and versatility and beauty comes from this lack of precision. The beautiful fuzziness bites of course, when a human language such as English is used as a method of precise expression. Recently I wrote about how the fuzziness of language is such a bane in the area of regulation specification. I cannot think of a starker example of the problem of language than the "regulation" of the behaviour of lumps of silicon. An activity we refer to these days as information systems design. To get these pesky, trumped up bundles of sand and sparks to do our bidding we must find ways of expressing, in languages like English - with as little ambiguity as possible - what goes it, what comes out and what goes on in between...
It is tough to do that well. My favourite book on the topic is Data and Reality by William Kent. One of the examples that Kent uses in the book illustrates the problem nicely. Imagine widgets in the famous expository widget factory. Imagine that these widgets have color and weight. We are interested in managing these widgets with a computer system. There are widgets and each widget will have a color and a weight.
Fine. Now lets think about how we specify the representation and behavior of the system in silicon. Widget domain experts will say things like "that widget has a weight of 6 pounds" but they will not say "that widget has a color of red". That is bad English. They will most likely say "that widget is red". So, in data modelling terms, is color an example of a has_a relationship or an is_a relationship?
We started out with what sounded like a simple situation in which widgets had two attributes : weight and color. How come that 5 seconds later the attributes have sailed off into two separate modelling methods? The issue, as Kent points out over and over again in the book is that human languages such as English create a view of the world and that view of the world finds its way into the models we make and the choices we make in designing computer systems. Would two designers approaching the same widget factory from two different linguistic backgrounds, end up with the same model?
It is fairly easy to accept that, for example a native French speaker would create a different Widget model from a native English speaker. Especially given the significant difference in the handling of possessive and genitive pronouns. You might also find it reasonable to accept that an Irish speaker would model a Bishop differently from an English speaker. But what about when designers speak the same language? Why is it so that two designers from the same backgrounds can create such wildly different models when creating computer systems?
I blame human language - at least to a degree:-) I like the idea (from Sauserre I believe, but Wittgenstein springs to mind too) that we all have our own personal, private language in our heads. If these overlap strongly - as evidenced by an ability to communicate with voice and pen - we think of them as being the "same" language. They are never exactly the same though and maybe this helps explain why no two system designers - given the same human language expression of a problem - end up with the same model inside the computer?
Human language is a delicious pain in the neck. Isn't it?