Ubuntu's Census Taker Getting Bad Rap

Canonical's census app will only count, not track

I've got one thing to say about the ruckus over the canonical-census package recently found in the Ubuntu repositories:

Calm down.

Here's what's being discussed in the community: canonical-census is a software package designed to phone home and track how many computers are using Ubuntu at any given time. Upon hearing that (overly simplified and wrong) explanation, privacy advocates within the free and open source communities went ballistic, seeing canonical-census as an explicit violation of privacy.

It hasn't helped that Canonical is already bearing the brunt of some community animosity over recent reports indicating they might not be contributing enough code upstream to the GNOME Project, followed up by a stunningly obtuse rant (and, later, apology) against free software by a prominent member of the Ubuntu community.

When the news about canonical-census broke, I have to admit I slapped my forehead and thought, "what the heck is wrong with these guys? Do they like poking hornets' nests with sticks?"

After doing a little digging, though, I discovered that in this case, the hornets are seriously overreacting: canonical-census is not designed to track, it's designed to count. There's a big difference.

First off, the canonical-census package is not going to end up in every instance of Ubuntu 10.10 (Maverick Meerkat). Canonical put this application together for one of their (as yet unnamed) OEM customers. The OEM wanted this application to count user numbers, and it's the OEM which will be getting the data, not Canonical.

Okay, so it's not Canonical who's violating privacy, they're just aiding and abetting, right?

Not quite.

Here's what the canonical-census package is supposed to do: it pings the central server and reports the computer's product name (as noted on the system DMI), which version of Ubuntu is running, the distributor channel, and how many times canonical-census has sent this information to the central server. This method is used, according to Rick Spencer, Engineering Manager of the Ubuntu Desktop team, because it allows OEMs to count machines but not track them, because there's no unique identifier associated with the machine being counted.

Sure, you say, that's what Canonical wants you to think, be we really know what's going on.

Actually, we really do. See, there's this thing about most code that runs on Ubuntu, perhaps you have heard of it: it's called open source. At any time, any one of us can go here and download the source code for canonical-census and see what's what.

Fortunately, since I am not a developer, a developer has already done so, and wrote a thorough analysis of the canonical-census application's source code. Stephan Peijnik's verdict? The code does exactly what Canonical says it will do:

"After getting the canonical-census Debian source package... the source package shows, besides the Debian packaging information, two scripts:

  • "census (written in Python) and
  • "send-census (a GNU bash script)...

"send-census is installed in /etc/cron.daily, which means it will be executed once a day by the system's cron daemon. It's a mere 48 lines long, and its code is quite simple. So everyone with at least some shell scripting experience can easily check what it's doing. Now guess what, it sends exactly the information as reported on slashdot to Canonical. Nothing more and nothing less."

In fact, Peijnik adds, the information gathered by canonical-census is a lot less info than what Debian GNU/Linux's popcon package gathers. Presumably this is true of openSUSE's popcorn package, too.

Yet, privacy advocates aren't jumping on these packages, at least as much as I've heard. Go figure.

The concerns of privacy advocates are valid: anything that tracks you or your activity as a user should be viewed with suspicion, particularly if it's doing so in a less-than-open manner. But, in this case, I think the knee-jerk reaction is unwarranted. There is nothing in canonical-census that indicates it's going to track users or their machines. It's just a count.

I, for one, welcome such a count tool, and I hope this methodology succeeds. If it does, it would be great to start getting some real data about number of installs. (Canonical has said it's not sure yet if the OEM customer will share data.) Too often, we have to rely on download numbers, or surveys like this one that use web-site traffic data, which is better for showing trends than real numbers.

If canonical-census works as purported, perhaps it should be included in a general release of Ubuntu. If Canonical and the Ubuntu community put some messaging behind it, they could tout it as the first real Linux census. Since canonical-census is open, there's nothing to prevent there from being "fedora-census" or "opensuse-census" applications that could gather the same info.

I realize that such an expansion would have its detractors, but why is the notion of a head count such an anathema? For years, community members and vendors have lamented there's no good way of determining how many Linux installs there are. If this is indeed a workable, anonymous way of getting such a count, then Linux advocates should be applauding tools like canonical-census, not trying to kill them off.

