31 March 2006

Open Source Rocks

There's nothing new about companies deciding to open source their products and make money in other ways. But it's still good to come across new examples of the breed to confirm that the logic remains as strong as ever.

A case in point is Symfony, which describes itself as "a web application framework for PHP5 projects". It is unusual in two respects: first, because it uses the liberal MIT licence, and secondly, because it is sponsored by a French company, Sensio. And according to them, open source rocks.

30 March 2006

Googling the Genome

I came across this story about Google winning an award as part of the "Captain Hook Awards for Biopiracy" taking part in the suitably piratical-sounding Curitiba, Brazil. The story links to the awards Web site - rather fetching in black, white and red - where there is a full list of the lucky 2006 winners.

I was particularly struck by one category: Most Shameful Act of Biopiracy. This must have been hard to award, given the large field to choose from, but the judges found a worthy winner in the shape of the US Government for the following reason:

For imposing plant intellectual property laws on war-torn Iraq in June 2004. When US occupying forces “transferred sovereignty” to Iraq, they imposed Order no. 84, which makes it illegal for Iraqi farmers to re-use seeds harvested from new varieties registered under the law. Iraq’s new patent law opens the door to the multinational seed trade, and threatens food sovereignty.

Google's citation for Biggest Threat to Genetic Privacy read as follows:

For teaming up with J. Craig Venter to create a searchable online database of all the genes on the planet so that individuals and pharmaceutical companies alike can ‘google’ our genes – one day bringing the tools of biopiracy online.

I think it unlikely that Google and Venter are up to anything dastardly here: from studying the background information - and from my earlier reading on Venter when I was writing Digital Code of Life - I think it is much more likely that they want to create the ultimate gene reference, but on a purely general, not personal basis.

Certainly, there will be privacy issues - you won't really want to be uploading your genome to Google's servers - but that can easily be addressed with technology. For example, Google's data could be downloaded to your PC in encrypted form, decrypted by Google's client application running on your computer, and compared with your genome; the results could then be output locally, but not passed back to Google.

It is particularly painful for me to disagree with the Coalition Against Biopiracy, the organisation behind the awards, since their hearts are clearly in the right place - they even kindly cite my own 2004 Googling the Genome article in their background information to the Google award.

29 March 2006

Linus Torvalds' First Usenet Posting

It was 15 years ago today that Linus made his first Usenet posting, to the comp.os.minix newsgroup. This is how it began:

Hello everybody,
I've had minix for a week now, and have upgraded to 386-minix (nice), and duly downloaded gcc for minix. Yes, it works - but ... optimizing isn't working, giving an error message of "floating point stack exceeded" or something. Is this normal?

Minix was the Unix-like operating system devised by Andy Tanenbaum as a teaching aid, and gcc a key hacker program that formed part of Stallman's GNU project. Linus' question was pretty standard beginner's stuff, and yet barely two days later, he answered a fellow-newbie's question as if he were some Minix wizard:

RTFSC (Read the F**ing Source Code :-) - It is heavily commented and the solution should be obvious (take that with a grain of salt, it certainly stumped me for a while :-).

He may have been slightly premature in according himself this elevated status, but it wasn't long before he not only achieved it but went far beyond. For on Sunday, 25 August, 1991, he made another posting to the comp.os.minix newsgroup:

Hello everybody out there using minix -
I'm doing a (free) operating system (just a hobby, won't be big and professional like gnu) for 386(486) AT clones. This has been brewing since april, and is starting to get ready.

The hobby, of course, was Linux, and this was its official announcement to the world.

But imagine, now, that Linus had never made that first posting back in March 1991. It could have happened: as Linus told me in 1996 when I interviewed him for a feature in Wired, back in those days

I was really so shy I didn't want to speak in classes. Even just as a student I didn't want to raise my hand and say anything.

It's easy to imagine him deciding not to “raise his hand” in the comp.os.minix newsgroup for fear of looking stupid in front of all the Minix experts (including the ultimate professor of computing, Tanenbaum himself). And if he'd not plucked up courage to make that first posting, he probably wouldn't have made the others or learned how to hack a simple piece of code he had written for the PC into something that grew into the Linux kernel.

What would the world look like today, had Linux never been written? Would we be using the GNU Hurd – the kernel that Stallman intended to use originally for his GNU operating system, but which was delayed so much that people used Linux instead? Or would one of the BSD derivatives have taken off instead?

Or perhaps there would simply be no serious free alternative to Microsoft Windows, no open source movement, and we would be living in a world where computing was even more under the thumb of Bill Gates. In this alternative reality, there would be no Google either, since it depends on the availability of very low-cost GNU/Linux boxes for the huge server farms that power all its services.

It's amazing how a single post can change the world.

28 March 2006

Dancing Around Openness

The concept of "openness" has featured fairly heavily in these posts - not surprisingly, given the title of this blog. But this conveniently skates over the fact that there is no accepted definition of what "open" really means in the context of technology. This has fairly serious implications, not least because it means certain companies can try to muddy the waters.

Against this background I was delighted to come across this essay by David A. Wheeler on the very subject, entitled "Is OpenDocument an Open Standard? Yes!" As his home page makes clear, David is well-placed to discuss this at the deepest level; indeed, he is the author of perhaps the best and most thorough analysis of why people should consider using open source software.

So if you ever wonder what I'm wittering on about, try reading David's essay on openness to find out what I really meant.

27 March 2006

The Science of Open Source

The OpenScience Project is interesting. As its About page explains:

The OpenScience project is dedicated to writing and releasing free and Open Source scientific software. We are a group of scientists, mathematicians and engineers who want to encourage a collaborative environment in which science can be pursued by anyone who is inspired to discover something new about the natural world.

But beyond this canonical openness to all, there is another, very important reason why scientific software should be open source. With proprietary software, you simply have to take on trust that the output has been derived correctly from the inputs. But this black-box approach is really anathema to science, which is about examining and checking every assumption along the way from input to output. In some sense, proprietary scientific software is an oxymoron.

The project supports open source scientific software in two ways. It has a useful list of such programs, broken down by category (and it's striking how bioinformatics towers over them all); in addition, those behind the site also write applications themselves.

What caught my eye in particular was a posting asking an important question: "How can people make money from open source scientific software?" There have been two more postings so far, exploring various ways in which free applications can be used as the basis of a commercial offering: Sell Hardware and Sell Services. I don't know what the last one will say - it's looking at dual licensing as a way to resolve the dilemma - but the other two have not been able to offer much hope, and overall, I'm not optimistic.

The problem goes to the root of why open source works: it requires lots of users doing roughly the same thing, so that a single piece of free code can satisfy their needs and feed off their comments to get better (if you want the full half-hour argument, read Rebel Code).

That's why the most successful open source projects deliver core computing infrastructure: operating system, Web server, email server, DNS server, databases etc. The same is true on the client-side: the big winners have been Firefox, OpenOffice.org, The GIMP, Audacity etc. - each serving a very big end-user group. Niche projects do exist, but they don't have the vigour of the larger ones, and they certainly can't create an ecosystem big enough to allow companies to make money (as they do with GNU/Linux, Apache, Sendmail, MySQL etc.)

Against this background, I just can't see much hope for commercial scientific open source software. But I think there is an alternative. Because this open software is inherently better for science - thanks to its transparency - it could be argued that funding bodies should make it as much of a priority as more traditional areas.

The big benefit of this approach is that it is cumulative: once the software has been funded to a certain level by one body, there is no reason why another should't pick up the baton and pay for further development. This would allow costs to be shared, along with the code.

Of course, this approach would take a major change of mindset in certain quarters; but since open source and the other opens are already doing that elsewhere, there's no reason why they shouldn't achieve it in this domain too.

Searching for an Answer

I have always been fascinated by search engines. Back in March 1995, I wrote a short feature about the new Internet search engines - variously known as spiders, worms and crawlers at the time - that were just starting to come through:

As an example of the scale of the World-Wide Web (and of the task facing Web crawlers), you might take a look at Lycos (named after a spider). It can be found at the URL http://lycos.cs.cmu.edu/. At the time of writing its database knew of a massive 1.75 million URLs.

(1.75 million URLs - imagine it.)

A few months later, I got really excited by a new, even more amazing search engine:

The latest pretender to the title of top Web searcher is called Alta Vista, and comes from the computer manufacturer Digital. It can be found at http://www.altavista.digital.com/, and as usual costs nothing to use. As with all the others, it claims to be the biggest and best and promises direct access to every one of 8 billion words found in over 16 million Web pages.

(16 million pages - will the madness never end?)

My first comment on Google, in November 1998, by contrast, was surprisingly muted:

Google (home page at http://google.stanford.edu/) ranks search result pages on the basis of which pages link to them.

(Google? - it'll never catch on.)

I'd thought that my current interest in search engines was simply a continuation of this story, a historical relict, bolstered by the fact that Google's core services (not some of its mickey-mouse ones like Google Video - call that an interface? - or Google Finance - is this even finished?) really are of central importance to the way I and many people now work online.

But upon arriving at this page on the OA Librarian blog, all became clear. Indeed, the title alone explained why I am still writing about search engines in the context of the opens: "Open access is impossible without findability."

Ah. Of course.

Update: Peter Suber has pointed me to an interesting essay of his looking at the relationship between search engines and open access. Worth reading.

26 March 2006

DE-commerce, XXX-commerce

One of the nuggets that I gathered from reading the book Naked Conversations is that there are relatively few bloggers in Germany. So I was particularly pleased to find that one of these rare DE-bloggers had alighted, however transiently, on these very pages, and carried, magpie-like, a gewgaw back to its teutonic eyrie.

The site in question is called Exciting Commerce, with the slightly pleonastic subheading "The Exciting Future of E-commerce". It has a good, clean design (one that coincidentally seems to use the same link colour as the HorsePigCow site I mentioned yesterday).

The content is good, too, not least because it covers precisely the subject that I lament is so hard to observe: the marriage of Web 2.0 and e-commerce. The site begs to differ from me, though, suggesting that there is, in fact, plenty of this stuff around.

Whichever camp you fall into, it's a useful blog for keeping tabs on some of the latest e-commerce efforts from around the world (and not just in the US), even if you don't read German, since many of the quotations are in English, and you can always just click on the links to see where they take you.

My only problem is the site's preference for the umbrella term "social commerce" over e-commerce 2.0: for me, the former sounds perilously close to a Victorian euphemism.

25 March 2006

Not Your Average Animal Farm

And talking of the commons, I was pleased to find that the Pinko Marketing Manifesto has acquired the tag "commons-based unmarketing" (and it's a wiki).

This site is nothing if not gutsy. Not content with promoting something proudly flying the Pinko flag (in America?), it is happy to make an explicit connection with another, rather more famous manifesto (and no, we're not talking about the Cluetrain Manifesto, although that too is cited as a key influence).

And talking of Charlie, another post says:

I started researching elitism versus the voice of the commons and I happened upon something I haven't read since second year university, The Communist Manifesto.

(So, that's re-reading The Communist Manifesto: how many brownie points does this woman want?)

And to top it all, HorsePigCow - for so it is named - has possibly the nicest customisation of the standard Minima Blogger template I've seen, except that the posts are too wide: 65 characters max is the rule, trust me.

Do take a gander.

Update: Sadly, I spoke too soon: the inevitable mindless backlash has begun....

The Commonality of the Commons

Everywhere I go these days, I seem to come across the commons. The Creative Commons is the best known, but the term refers to anything held in common for the benefit of all. A site I've just come across, called On the Commons, puts it well, stressing the concomitant need to conserve the commons for the benefit of future generations:

The commons is a new way to express a very old idea — that some forms of wealth belong to all of us, and that these community resources must be actively protected and managed for the good of all. The commons are the things that we inherit and create jointly, and that will (hopefully) last for generations to come. The commons consists of gifts of nature such as air, water, the oceans, wildlife and wilderness, and shared “assets” like the Internet, the airwaves used for broadcasting, and public lands. The commons also includes our shared social creations: libraries, parks, public spaces as well as scientific research, creative works and public knowledge that have accumulated over centuries.

It's also put together a free report that spells out in more detail the various kinds of commons that exist: the atmosphere, the airwaves, water, culture, science and even quiet.

What's fascinating for me is how well this maps onto the intertwined themes of this blog and my interests in general, from open content, open access and open spectrum to broader environmental issues. The recognition that there is a commonality between different kinds of commons seems to be another idea that is beginning to spread.

Picture This

I wrote about Riya.com a month ago; now it's out in beta, so you can try out its face recognition technology. I did, and was intrigued to find that this photo was tagged as "Bill Gates". Maybe Riya uses more artificial intelligence than they're letting on.

It's certainly a clever idea - after all, the one thing people (misanthropes apart) are interested in, is people. But you do have to wonder about the underlying technology when it uses addresses like this:

http://www.riya.com/highRes?search=1fSPySWh
FrHn7AnWgnSyHaqJl6bzuGByoFKJuG1H%2Fv
otjYbqlIMI22Qj88Vlcvz2uSnkixrhzHJP%0Aej%
2B9VuGvjiodlKDrBNS8pgy%2FaVqvckjfyo%2
BjhlL1sjK5CgHriGhifn3s2C1q%2B%2FnL1Emr
0OUPvn%2FM%0AJ0Ire5Zl2QUQQLUMi2Naq
Ny1zboiX7JtL77OG96NmV5VT8Buz4bzlyPFmi
ppcvmBJagMcftZjHUG%0AFlnXYIfp1VOGWx
gYijpgpDcsU9M4&pageNumber=9&e=bIaIR30d
SGNoZcG8jWL8z2LhcH%2FEg1LzsBF%2F6pr
Fd2Jm7tpMKFCXTu%2FBsOKk%2FVdS

I know a picture is supposed to be worth a thousand words, but not in the URL, surely....

A Question of Standards

Good to see Andy Updegrove's blog getting Slashdotted. This is good news not just for him, but also for his argument, which is that open source ideas are expanding into new domains (no surprise there to readers of this blog), and that traditional intellectual property (IP) models are being re-evaluated as a result.

Actually, this piece is rather atypical, since most of the posts are to do with standards, rather than open source or IP (though these are inevitably bound up with standards). Andy's blog is simply the best place to go for up-to-the-minute information on this area; in particular, he is following the ODF saga more closely - and hence better - than anyone. In other words, he's not just reporting on standards, but setting them, too.

24 March 2006

A Little Note About Microformats

Further proof that things are starting to bubble: small but interesting ideas like microformats pop up out of nowhere (well, for me, at least). As the About page of the eponymous Web site says:

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.

The key thing is that they are built around XHTML, which is effectively HTML done properly. Examples of microformats that are based on things that may be familiar include hCard, based on the very old vCard, hCalendar, based on the equally venerable iCalendar, moderately old stuff like XHTML Friends Network (XFN), which you stumble across occasionally on the Web, and the inscrutable XOXO (which I've heard of, but never seen brandished in anger).

That's the upside; on the downside, Bill Gates has started talking about it.

23 March 2006

Open Data in the Age of Exponential Science

There's a very interesting article in this week's Nature, as part of its 2020 Computing Special (which miraculously is freely available even to non-subscribers), written by Alexander Szalay and Jim Gray.

I had the pleasure of interviewing Gray a couple of years back. He's a Grand Old Man of the computing world, with a hugely impressive curriculum vitae; he's also a thoroughly charming interviewee with some extremely interesting ideas. For example:

I believe that Alan Turing was right and that eventually machines will be sentient. And I think that's probably going to happen in this century. There's much concern that that might work out badly; I actually am optimistic about it.

The Nature article is entitled "Science in an exponential world", and it considers some of the approaching problems that the vast scaling up of Net-based, collaborative scientific endeavour is likely to bring us in the years to come. Here's one key point:

A collaboration involving hundreds of Internet-connected scientists raises questions about standards for data sharing. Too much effort is wasted on converting from one proprietary data format to another. Standards are essential at several levels: in formatting, so that data written by one group can be easily read and understood by others; in semantics, so that a term used by one group can be translated (often automatically) by another without its meaning being distorted; and in workflows, so that analysis steps can be executed across the Internet and reproduced by others at a later date.

The same considerations apply to all open data in the age of exponential science: without common standards that allow data from different groups, gathered at different times and in varying circumstance, to be brought together meaningfully in all sorts of new ways, the openness is moot.

Synchronicity

I'm currently reading Naked Conversations (sub-title: "how blogs are changing the way businesses talk with customers"). It's full of well-told anecdotes and some interesting ideas, although I'm not convinced yet that it will add up to more than a "corporate blogging is cool" kind of message.

That notwithstanding, I was slightly taken aback to find myself living out one of the ideas from the book. This came from the inimitable Dave Winer, who said, speaking of journalists:

They don't want the light shone on themselves, which is ironic because journalists are experts at shining the light on others.... This is why we have blogs. We have blogs because we can't trust these guys.

Speaking as a journalist, can I just say "Thanks, Dave," for that vote of confidence.

But the idea that bloggers can watch the journalists watching them is all-too true, as I found when I went to Paul Jones' blog, and found this posting, in which he not only tells the entire world what I'm up to (no great secret, to be sure), but also effectively says that he will be publishing his side of the story when my article comes out so that readers can check up on whether I've done a good job.

The only consolation is that at least I can leave a comment on his posting on my article about him....

22 March 2006

Digital Libraries - the Ebook

It seems appropriate that a book about digital libraries has migrated to an online version that is freely available. Digital Libraries - for such is the nicely literalist title - is a little long in the tooth in places as far as the technical information is concerned, but very clearly written (via Open Access News).

It also presents things from a librarian's viewpoint, which is quite different from that of a your usual info-hacker. I found Chapter 6, on Economic and legal issues, particularly interesting, since it touches most directly on areas like open access.

Nonetheless, I was surprised not to see more (anything? - there's no index at the moment) about Project Gutenberg. Now, it may be that I'm unduly influenced by an extremely thought-provoking email conversation I'm currently engaged in with the irrepressible Michael Hart, the founder and leader of the project.

But irrespective of this possible bias, it seems to me that Project Gutenberg - a library of some 17,000 ebooks, with more being added each day - is really the first and ultimate digital library (or at least it will be, once it's digitised the other million or so books that are on its list), and deserves to be recognised as such.

21 March 2006

Why the GPL Doesn't Need a Test Case

There was an amusing story in Groklaw yesterday, detailing the sorry end of utterly pointless legal action taken against the Free Software Foundation (FSF) on the grounds that

FSF has conspired with International Business Machines Corporation, Red Hat Inc., Novell Inc. and other individuals to “pool and cross license their copyrighted intellectual property in a predatory price fixing scheme.”

It sounded serious, didn't it? Maybe a real threat to free software and hence Civilisation As We Know It? Luckily, as the Groklaw story explains, the judge threw it out in just about every way possible.

However, welcome as this news is, it is important to note that the decision does not provide the long-awaited legal test of the GPL in the US (a court has already ruled favourably on one in Germany). Some people seem to feel that such a test case is needed to establish the legal foundation of the GPL - and with it, most of the free software world. But one person who disagrees, is Eben Moglen, General Counsel for the FSF, and somebody who should know.

As he explained to me a few weeks ago:

The stuff that people do with GPL code – like they modify it, they copy it, they give it to other people – is stuff that under the copyright law you can't do unless you have permission. So if they've got permission, or think they have permission, then the permission they have is the GPL. If they don't have that permission, they have no permission.

So the defendant in a GPL violation situation has always been in an awkward place. I go to him and I say basically, Mr So and So, you're using my client's copyrighted works, without permission, in ways that the copyright law says that you can't do. And if you don't stop, I'm going to go to a judge, and I'm going to say, judge, my copyrighted works, their infringing activity, give me an injunction, give me damages.

At this point, there are two things the defendant can do. He can stand up and say, your honour, he's right, I have no permission at all. But that's not going to lead to a good outcome. Or he can stand up and say, but your honour, I do have permission. My permission is the GPL. At which point, I'm going to say back, well, your honour, that's a nice story, but he's not following the instructions of the GPL, so he doesn't really have the shelter he claims to have.

But note that either way, the one thing he can't say is, your honour, I have this wonderful permission and it's worthless. I have this wonderful permission, and it's invalid, I have this wonderful permission and it's broken.

In other words, there is no situation in which the brokenness or otherwise of the GPL is ever an issue: whichever is true, violators are well and truly stuffed.

(If you're interested in how, against this background, the GPL is enforced in practice, Moglen has written his own lucid explanations.)

20 March 2006

What Open Source Can Learn from Microsoft

In case you hadn't noticed, there's been a bit of a kerfuffle over a posting that a Firefox 2.0 alpha had been released. However, this rumour has been definitively scotched by one of the top Firefox people on his blog, so you can all relax now (well, for a couple of days, at least, until the real alpha turns up).

And who cares whether the code out there is an alpha, or a pre-alpha or even a pre-pre-alpha? Well, never mind who cares, there's another point that everyone seems to be missing: that this flurry of discoveries, announcements, commentaries, denials and more commentaries is just what Firefox needs as it starts to become respectable and, well, you know, slightly dull.

In fact, the whole episode should remind people of a certain other faux-leak about a rather ho-hum product that took place fairly recently. I'm referring to the Origami incident a couple of weeks ago, which produced an even bigger spike in the blogosphere.

It's the same, but different, because the first happened by accident in a kind of embarrassed way, while the latter was surely concocted by sharp marketing people within Microsoft. So, how about if the open source world started to follow suit by "leaking" the odd bit of code to selected bloggers who can be relied upon to get terribly agitated and to spread the word widely?

At first sight, this seems to be anathema to a culture based on openness, but there is no real contradiction. It is not a matter of hiding anything, merely making the manner of its appearance more tantalising - titillating, even. The people still get their software, the developers still get their feedback. It's just that everyone has super fun getting excited about nothing - and free software's market share inches up another notch.

19 March 2006

How Do I Blog Thee?

Let me count the ways.

List blog

The original: lots and lots of links to things with no theme but their sum.

Diary blog

The other original - but don't try this at home unless you are really interesting.

Shard blog

Not quite a list blog, not quite a diary blog: instead, small fragments of a life refracted through the links encountered each day.

News blog

Lots of useful links on a well-defined subject area, plus quotes and the odd dash of intelligent comment.

Essay blog

Longer, more thoughtful postings, typically one per day: mental meat to chew on.

Photo blog

A picture is worth a thousand blog postings.

Video blog

Done well, this is the ultimate magic casement in the middle of your screen, a window on another world.

18 March 2006

Economistical with the Truth

The Economist is a strange beast. It has a unique writing style, born of the motto "simplify, then exaggerate"; and it has an unusual editorial structure, whereby senior editors read every word written by those reporting to them - which means the editor reads every word in the magazine (at least, that's the way it used to work). Partly for this reason, nearly all the articles are anonymous: the idea is that they are in some sense a group effort.

One consequence of this anonymity is that I can't actually prove I've written for title (which I have, although it was a long time ago). But on the basis of a recent showing, I don't think I want to write for it anymore.

The article in question, which is entitled "Open, but not as usual", is about open source, and about some of the other "opens" that are radiating out from it. Superficially, it is well written - as a feature that has had multiple layers of editing should be. But on closer examination, it is full of rather tired criticisms of the open world.

One of these in particular gets my goat:

...open source might already have reached a self-limiting state, says Steven Weber, a political scientist at the University of California at Berkeley, and author of “The Success of Open Source” (Harvard University Press, 2004). “Linux is good at doing what other things already have done, but more cheaply—but can it do anything new? Wikipedia is an assembly of already-known knowledge,” he says.

Well, hardly. After all, the same GNU/Linux can run globe-spanning grids and supercomputers; it can power back office servers (a market where it bids fair to overtake Microsoft soon); it can run on desktops without a single file being installed on your system; and it is increasingly appearing in embedded devices - mp3 players, mobile phones etc. No other operating system has ever achieved this portability or scalability. And then there's the more technical aspects: GNU/Linux is simply the most stable, most versatile and most powerful operating system out there. If that isn't innovative, I don't know what is.

But let's leave GNU/Linux aside, and consider what open source has achieved elsewhere. Well, how about the Web for a start, whose protocols and underlying software have been developed in a classic open source fashion? Or what about programs like BIND (which runs the Internet's name system), or Sendmail, the most popular email server software, or maybe Apache, which is used by two-thirds of the Internet's public Web sites?

And then there's Wikimedia, which powers Wikipedia (and a few other wikis): even if Wikipedia were merely "an assembly of already-known knowledge", Wikimedia (based on the open source applications PHP and MySQL) is an unprecedentedly large assembly, unmatched by any proprietary system. Enough innovation for you, Mr Weber?

But the saddest thing about this article is not so much these manifest inaccuracies as the reason why they are there. Groklaw's Pamela Jones (PJ) has a typically thorough commentary on the Economist piece. From corresponding with its author, she says "I noticed that he was laboring under some wrong ideas, and looking at the finished article, I notice that he never wavered from his theory, so I don't know why I even bothered to do the interview." In other words, the feature is not just wrong, but wilfully wrong, since others, like PJ, had carefully pointed out the truth. (There's an old saying among journalists that you should never let the facts get in the way of a good story, and it seems that The Economist has decided to adopt this as its latest motto.)

But there is a deeper irony in this sad tale, one carefully picked out by PJ:

There is a shocking lack of accuracy in the media. I'm not at all kidding. Wikipedia has its issues too, I've no doubt. But that is the point. It has no greater issues than mainstream articles, in my experience. And you don't have to write articles like this one either, to try to straighten out the facts. Just go to Wikipedia and input accurate information, with proof of its accuracy.

If you would like to learn about Open Source, here's Wikipedia's article. Read it and then compare it to the Economist article. I think then you'll have to agree that Wikipedia's is far more accurate. And it isn't pushing someone's quirky point of view, held despite overwhelming evidence to the contrary.

Wikipedia gets something wrong, you can correct it by pointing to the facts; The Economist gets it wrong - as in the piece under discussion - and you are stuck with an article that is, at best, Economistical with the truth.

17 March 2006

Google's Grief, Open Source's Gain?

The news that a judge has ordered Google to turn over all emails from a Gmail account, including deleted messages, has predictably sent a shiver of fear down the collective spine of the wired community, all of whom by now have Gmail accounts. Everybody can imagine themselves in a similar situation, with all their most private online thoughts suddenly revealed in this way.

The really surprising thing about this development is not that it's happened, but that anyone considers it surprising. Lawyers were bound to be tempted by the all unguarded comments lying in emails, and judges were bound to be convinced that since they existed it was legitimate to look at them for evidence of wrong-doing. And Google, ultimately, is bound to comply: after all, it's in the business of making money, not of martyrdom.

So the question is not so much What can we do to stop such court orders being made and executed? but What can we do to mitigate them?

Moving to another email provider like Yahoo or Hotmail certainly won't help. And even setting up your own SMTP server to send email won't do much good, since your ISP probably has copies of bits of your data lying around on its own servers that sooner or later will be demanded by somebody with a court order.

The only real solution seems to be to use strong encryption to make each email message unreadable except by the intended recipient (and even this is an obvious weakness).

It would, presumably, be relatively simple for Google to add this to Gmail. But even if it won't, there is also a fine open source project called Enigmail, which is an extension to the Mozilla family of email readers - Thunderbird et al. - currently nearing version 1.0. The problem is that installation is fairly involved, since you must first set up GnuPG, which provides the cryptographic engine. If the free software world could make this process easier - a click, a passphrase and you're done - Google's present grief could easily be turned into open source's opportunity.

16 March 2006

The Power of Open Genomics

The National Human Genome Research Institute (NHGRI), one of the National Institutes of Health (NIH), has announced the latest round of mega genome sequencing projects - effectively the follow-ons to the Human Genome Project. These are designed to provide a sense of genomic context, and to allow all the interesting hidden structures within the human genome to be teased out bioinformatically by comparing them with other genomes that diverged from our ancestors at various distant times.

Three more primates are getting the NHGRI treatment: the rhesus macacque, the marmoset and the orangutan. But alongside these fairly obvious choices, eight more mammals will be sequenced too. As the press release explains:

The eight new mammals to be sequenced will be chosen from the following 10 species: dolphin (Tursiops truncates), elephant shrew (Elephantulus species), flying lemur (Dermoptera species), mouse lemur (Microcebus murinus), horse (Equus caballus), llama (Llama species), mole (Cryptomys species), pika (Ochotona species), a cousin of the rabbit, kangaroo rat (Dipodomys species) and tarsier (Tarsier species), an early primate and evolutionary cousin to monkeys, apes, and humans.

If you are not quite sure whom to vote for, you might want to peruse a great page listing all the genomes currently being sequenced for the NHGRI, which provides links to a document (.doc, alas, but you can open it in OpenOffice.org) explaining why each is important (there are pix, too).

More seriously, it is worth noting that this growing list makes ever more plain the power of open genomics. Since all of the genomes will be available in public databases as soon as they are completed (and often before), this means that bioinformaticians can start crunching away with them, comparing species with species in various ways. Already, people have done the obvious things like comparing humans with chimpanzees, or mice with rats, but the possibilities are rapidly becoming extremely intriguing (tenrec and elephant, anyone?).

And beyond the simple pairing of genomes, which yields a standard square-law richness, there are even more inventive combinations involving the comparison of multiple genomes that may reveal particular aspects of the Great Digital Tree of Life, since everything may be compared with everything, without restriction. Now imagine trying to do this if genomes had been patented, and groups of them belonged to different companies, all squabbling over their "IP". The case for open genomics is proved, I think.

15 March 2006

Microsoft Goes (a Bit More) Open Source

Many people were amazed back in 2004 when Microsoft released its first open source software, Windows Installer XML (WiX). But this was only the first step in a long journey towardness openness that Microsoft is making - and must make - for some time to come.

It must make it because the the traditional way of writing software simply doesn't work for the ever-more complex, ever-more delayed projects that Microsoft is engaged upon: Brooks' Law, which states that "Adding manpower to a late software project makes it later," will see to this if nothing else does.

Microsoft itself has finally recognised this. According to another fine story from Mary Jo Foley, who frequently seems to know more about what's happening in the company than Bill Gates does:

Beta testing has been the cornerstone of the software development process for Microsoft and most other commercial software makers for as long as they've been writing software. But if certain powers-that-be in Redmond have their way, betas may soon be a thing of the past for Microsoft, its partners and its customers.

The alternative is to adopt a more fluid approach that is a commonplace in the open source world:

Open source turned the traditional software development paradigm on its head. In the open source world, testers receive frequent builds of products under development. Their recommendations and suggestions typically find their way more quickly into developing products. And the developer community is considered as important to writing quality code as are the "experts" shepherding the process.

One approach to mitigating the effects of Brooks' Law is to change the fashion in which the program is tested. Instead of doing this in a formal way with a few official betas - which tend to slow down the development process - the open source method allows users to make comments earlier and more frequently on multiple builds as they are created, and without hindering the day-to-day working of developers, who are no longer held hostage by artificial beta deadlines that become ends in themselves rather than means.

E-commerce 2.0

It is striking how everybody is talking about Web 2.0, and yet nobody seems to mention e-commerce 2.0. In part, this is probably because few have managed to work out how to apply Web 2.0 technologies to e-commerce sites that are not directly based on selling those technologies (as most Web 2.0 start-ups are).

For a good example of what an e-commerce 2.0 site looks like, you could do worse than try Chinesepod.com (via Juliette White), a site that helps you learn Mandarin Chinese over the Net.

The Web 2.0-ness is evident in the name - though I do wish people would come up with a different word for what is, after all, just an mp3 file. It has a viral business model - make the audio files of the lessons freely available under a Creative Commons licence so that they can be passed on, and charge for extra features like transcripts and exercises. The site even has a wiki (which has some useful links).

But in many ways the most telling feature is the fact that as well as a standalone blog, the entire opening page is organised like one, with the lessons arranged in reverse chronological order, complete with some very healthy levels of comments. Moreover, the Chinesepod people (Chinese podpeople?) are very sensibly drawing on the suggestions of their users to improve and extend their service. Now that's what I call e-commerce 2.0.

14 March 2006

Will Data Hoarding Cost 150 Million Lives?

The only thing separating mankind from a pandemic that could kill 150 million people are a few changes in the RNA of the H5N1 avian 'flu virus. Those changes would make it easier for the virus to infect and pass between humans, rather than birds. Research into the causes of the high death-rate among those infected by the Spanish 'flu - which killed between 50 and 100 million people in 1918 and 1919, even though the world population was far lower then than now - shows that it was similar changes in a virus otherwise harmless to humans that made the Spanish 'flu so lethal.

The good news is that with modern sequencing technologies it is possible to track those changes as they happen, and to use this information to start preparing vaccines that are most likely to be effective against any eventual pandemic virus. As one recent paper on the subject put it:

monitoring of the sequences of viruses isolated in instances of bird-to-human transmission for genetic changes in key regions may enable us to track viruses years before they develop the capacity to replicate with high efficiency in humans.

The bad news is that most of those vital sequences are being kept hidden away by the various national laboratories that produce them. As a result, thousands of scientists outside those organisations do not have the full picture of how the H5N1 virus is evolving, medical communities cannot plan properly for a pandemic, and drug companies are hamstrung in their efforts to develop effective vaccines.

The apparent reason for the hoarding - because some scientists want to be able to publish their results in slow-moving printed journals first so as to be sure that they are accorded full credit by their peers - beggars belief against a background of growing pandemic peril. Open access to data never looked more imperative.

Although the calls to release this vital data are gradually becoming more insistent, they still seem to be falling on deaf ears. One scientist who has been pointing out longer than most the folly of the current situation is the respected researcher Harry Niman. He has had a distinguised career in the field of viral genomics, and is the founder of the company Recombinomics.

The news section of this site has long been the best place to find out about the latest developments in the field of avian 'flu. This is for three reasons: Niman's deep knowledge of the subject, his meticulous scouring of otherwise-neglected sources to find out the real story behind the news, and - perhaps just as important - his refusal meekly to tow the line that everything is under control. For example, he has emphasised that the increasing number of infection clusters indicates that human-to-human transmission is now happening routinely, in flat contradiction to the official analysis of the situation.

More recently, he has pointed out that the US decision to base its vaccine on a strain of avian 'flu found in Indonesia is likely to be a waste of time, since the most probable pandemic candidate has evolved away from this.

The US Government's choice is particularly worrying because human cases of avian 'flu in North America may be imminent. In another of Niman's characteristically forthright analyses, he suggests that there is strong evidence that H5N1 is already present in North America:

Recombinomics is issuing a warning based on the identification of American sequences in the Qinghai strain of H5N1 isolated in Astrakhan, Russia. The presence of the America sequences in recent isolates in Astrakhan indicates H5N1 has already migrated to North America. The levels of H5N1 in indigenous species will be supplemented by new sequences migrating into North America in the upcoming months.

Niman arrived at this conclusion by tracking the genomic changes in the virus as it travelled around the globe with migrating birds, using some of the few viral sequences that have been released.

Let's hope for the sake of everyone that WHO and the other relevant organisations see the light and start making all the genomic data available. This would allow Niman and his many able colleagues to monitor even the tiniest changes, so that the world can be alerted at the earliest possible moment to the start of a pandemic that may be closer than many think.

Update: In an editorial, Nature is now calling for open access to all this genomic data. Unfortunately, the editorial is not open access....

13 March 2006

OU on UK ID DBs

Talking of the Open University, here's an interesting research report from them on the UK Government's plans to introduce ID cards. The study looks at things from a slightly novel angle: people's attitudes to the scheme, and how they vary according to the details.

The most interesting result was that even those moderately in favour of the idea became markedly less enthusiastic when the card was compulsory and a centralised rather than distributed database was used to store the information. Since this is precisely what the government is planning to do, the research rather blows a hole in their story that the British population is simply begging them to introduce ID cards. John Lettice has provided more of his usual clear-headed analysis on the subject.

What is also fascinating is how the British public - or at least the sample interviewed - demonstrated an innate sense of how unwise such a centralised database would be. I think this argues a considerable understanding of what is on the face of it quite an abstract technical issue. There's hope yet - for the UK people, if not for the UK Government....

12 March 2006

Mozart the Blogger

To celebrate the 250th anniversary of Mozart's birth, I've been reading some of his letters, described by Einstein (Alfred, not his cousin Albert) as "the most lively, the most unvarnished, the most truthful ever written by a musician". It is extraordinary to think that these consist of the actual words that ran through Mozart's head, probably at the same time when he was composing some masterpiece or other as a background task. To read them is to eavesdrop on genius.

The other striking thing about them is their volume and detail. Mozart was an obsessive letter-writer, frequently knocking out more than one a day to his wide range of regular correspondents. And these are no quick "having a lovely time, wish you were here" scribbles on the back of a postcard: they often run to many pages, and consist of extended, complex sentences full of dazzling wordplay, describing equally rich ideas and complicated situations, or responding in thoughtful detail to points made in the letters he received.

Because they are so long, the letters have a strong sense of internal time: that is, you feel that the end of the letter is situated later than the beginning. As a result, his letters often function as a kind of diary entry, a log of the day's events and impressions - a kind of weblog without the reverse chronology (and without the Web).

Mozart was a blogger.

If this intense letter-writing activity can be considered a proto-blog, the corollary is that blogs are a modern version of an older epistolary art. This is an important point, because it addresses two contemporary concerns in one fell swoop: that the art of the letter is dead, and that there is a dearth of any real substance in blogs.

We are frequently told that modern communications like the telephone and email have made the carefully-weighed arrangement of words on the page, the seductive ebb and flow of argument and counter-argument, redundant in favour of the more immediate, pithier forms. One of the striking things about blogs is that some - not all, certainly - are extremely well written. And even those that are not so honed still represent considerable effort on the part of their authors - effort that 250 years ago was channelled into letters.

This means that far from being the digital equivalent of dandruff - stuff that scurfs off the soul on a daily basis - the growing body of blog posts represents a renaissance of the art of letter-writing. In fact, I would go further: no matter how badly written a blog might be, it has the inarguable virtue of being something that is written, and then - bravely - made public. As such, it is another laudable attempt to initiate or continue a written dialogue of a kind that Mozart would have understood and engaged with immediately. It is another brick - however humble - in the great edifice of literacy.

For this reason, the current fashion to decry blogs as mere navel-gazing, or vacuous chat, is misguided. Blogs are actually proof that more and more people - 30,000,000 of them if you believe Technorati - are rediscovering the joy of words in a way that is unparalleled in recent times. We may not all be Mozarts of the blog, but it's better than silence.

11 March 2006

Open University Meets Open Courseware

Great news (via Open Access News and the Guardian): the Open University is turning a selection of its learning materials into open courseware. To appreciate the importance of this announcement, a little background may be in order.

As its fascinating history shows, the Open University was born out of Britain's optimistic "swinging London" culture of the late 1960s. The idea was to create a university open to all - one on a totally new scale of hundreds of thousands of students (currently there are 210,000 enrolled). It was evident quite early on that this meant using technology as much as possible (indeed, as the history explains, many of the ideas behind the Open University grew out of an earlier "University of the Air" idea, based around radio transmissions.)

One example of this is a close working relationship with the BBC, which broadcasts hundreds of Open University programmes each week. Naturally, these are open to all, and designed to be recorded for later use - an early kind of multimedia open access. The rise of the Web as a mass medium offered further opportunities to make materials available. By contrast, the holdings of the Open University Library require a username and password (although there are some useful resources available to all if you are prepared to dig around).

Against this background of a slight ambivalence to open access, the announcement that the Open University is embracing open content for at least some of its courseware is an extremely important move, especially in terms of setting a precedent within the UK.

In the US, there is already the trail-blazing MIT OpenCourseWare project. Currently, there are materials from around 1250 MIT courses, expected to rise to 1800 by 2007. Another well-known example of open courseware is the Connexions project, which has some 2900 modules. This was instituted by Rice University, but now seems to be spreading ever wider. In this it is helped by an extremely liberal Creative Commons licence, that allows anyone to use Connexions material to create new courseware. MIT uses a Creative Commons licence that is similar, except it forbids commercial use.

At the moment, there's not much to see at the Open University's Open Content Initiative site. There is an interesting link is to information from the project's main sponsor, the William and Flora Hewlett Foundation, about its pioneering support for open content. This has some useful links at the foot of the page to related projects and resources.

One thing the Open University announcement shows is that open courseware is starting to pick up steam - maybe a little behind the related area of open access, but coming through fast. As with all open endeavours, the more there are, the more evident the advantages of making materials freely available becomes, and the more others follow suit. This virtuous circle of openness begetting openness is perhaps one of the biggest advantages that it has over the closed, proprietary alternatives, which by their very nature take an adversarial rather than co-operative approach to those sharing their philosophy.

09 March 2006

RIAA Fights to the Death for DRM - Your Death

The ever-perceptive Ed Felten has an amazing story about the Record Industry Association of America (RIAA) and its friends-in-copyright fighting to keep DRM on people's systems in all circumstances - even those that might be life-threatening. From his post:

In order to protect their ability to deploy this dangerous DRM, they want the Copyright Office to withhold from users permission to uninstall DRM software that actually does threaten critical infrastructure and endanger lives.

In fact, it's enough to gaze (not too long, mind) at the RIAA's home page: it is a cacophony of "lawsuits", "penalties", "pirates", "theft" and "parental advisories" - a truly sorry example of narrow-minded negativity. Whatever happened to music as one of the loftiest expressions of the human spirit?

Savonarola, St. Francis - or St. IGNUcius?

There's a well-written commentary on C|Net that makes what looks like a neat historical parallel between Savonarola and Richard Stallman; in particular, it wants us to consider the GPL 3 as some modern-day equivalent of a Bonfire of the Vanities, in which precious objects were consigned to the flames at the behest of the dangerous and deranged Savonarola.

It's a clever comparison, but it suffers from a problem common to all clever comparisons: they are just metaphors, not cast-iron mathematical isomorphisms.

For example, I could just as easily set up a parallel between Stallman and St. Francis of Assisi: both renounced worldy goods, both devoted themselves to the poor, both clashed with the authorities on numerous occasions, and both produced several iterations of their basic tenets. And St. Francis never destroyed, as Savonarola did: rather, he is remembered for restoring ruined churches - just as Stallman has restored the ruined churches of software.

In fact, Stallman is neither Savonarola nor St. Francis, but his own, very special kind of holy man: St. IGNUcius of the Church of Emacs.

The Dream of Open Data

Today's Guardian has a fine piece by Charles Arthur and Michael Cross about making data paid for by the UK public freely accessible by them. But it goes beyond merely detailing the problem, and represents the launch of a campaign called "Free Our Data". It's particularly good news that the unnecessary hoarding of data is being addressed by a high-profile title like the Guardian, since a few people in the UK Government might actually read it.

It is rather ironic that at a time when nobody outside Redmond disputes the power of open source, and when open access is almost at the tipping point, open data remains something of a distant dream. Indeed, it is striking how advanced the genomics community is in this respect. As I discovered when I wrote Digital Code of Life, most scientists in this field have been routinely making their data freely available since 1996, when the Bermuda Principles were drawn up. The first of these stated:

It was agreed that all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society.

The same should really be true for all kinds of large-scale data that require governmental-scale gathering operations. Since they cannot be feasibly gathered by private companies, such data ends up as a government monopoly. But trying to exploit that monopoly by crudely over-charging for the data is counter-productive, as the Guardian article quantifies. Let's hope the campaign gathers some momentum - I'll certainly being doing my bit.

Update: There is now a Web site devoted to this campaign, including a blog.

Enter the Splogfighter

Talking of splogs, I came across (via SEO Data) the valiant Splogfighter's Blogger-based anti-splog blog. All power to whatever part of the virtual anatomy he/she/it uses in this laudable effort.

08 March 2006

Splog in a Box?

A long time ago, in a galaxy far away - well, in California, about 1994 - O'Reilly came out with something called "Internet in a Box". This wasn't quite the entire global interconnect of all networks in a handy cardboard container, but rather a kind of starter kit for Web newbies - and bear in mind that in those days, the only person who was not a newbie was Tim (not O'Reilly, the other one).

Two components of O'Reilly's Internet in a Box were particularly innovative. One was Spry Mosaic, a commercial version of the early graphical Web browser Mosaic that arguably began the process of turning the Web into a mass medium. Mosaic had two important offspring: Netscape Navigator, created by some of the original Mosaic team, and its nemesis, Internet Explorer. In fact, if you choose the "About Internet Explorer" option on the Help menu of any version of Microsoft's browser, you will see to this day the surprising words:

Based on NCSA Mosaic. NCSA Mosaic(TM); was developed at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.
Distributed under a licensing agreement with Spyglass, Inc.

So much for Bill Gates inventing the Internet....

The other novel component of "Internet in a Box" was the Global Network Navigator. This was practically the first commercial Web site, and certainly the first portal: it was actually launched before Mosaic 1.0, in August 1993. Unfortunately, this pioneering site was later sold to AOL, where it sank without trace (as most pioneers do when they are sold to AOL: anybody remember the amazing Internet search company WAIS? No, I thought not.)

Given this weight of history, it seems rather fitting that something called Boxxet should be announced at the O’Reilly Emerging Technology Conference, currently running in San Diego. New Scientist has the details:

A new tool offers to create websites on any subject, allowing web surfers to sit back, relax and watch a virtual space automatically fill up with relevant news stories, blog posts, maps and photos.

The website asks its users to come up with any subject they are interested in, such as a TV show, sports team or news topic, and to submit links to their five favourite news articles, blogs or photos on that subject. Working only from this data, the site then automatically creates a webpage on that topic, known as a Boxxet. The name derives from “box set”, which refers to a complete set CDs or DVDs from the same band or TV show.

As this indicates, Boxxet is a kind of instant blog - just add favourite links and water. It seems the perfect solution for a world where people are so crushed by ennui that most bloggers can't even be bothered posting for more than a few weeks. Luckily, that's what we have technology for: to spare us all those tiresome activities like posting to blogs, walking to the shops or changing television channels by getting up and doing it manually.

It's certainly a clever idea. But I just can't see myself going for this Blog in a Box approach. Perhaps I over-rate the specialness of my merely human blogging powers; perhaps I just need to wait until the Singularity arrives in a few years time, and computers are able to produce trans-humanly perfect blogs.

What I can see - alas - are several million spammers rubbing their hands with glee at the thought of a completely automatic way of generating spurious, self-updating blogs. Not so much Blog in a Box as Splog in a Box.

07 March 2006

The Other Grid God: Open Source

As I was browsing through Lxer.com, my eye caught this rather wonderful headline: "Grid god to head up Chicago computing institute". The story explains that Ian Foster, one of the pioneers in the area of grid computing (and the grid god in question), is moving to the Computation Institute (great name - horrible Web site).

Grid computing refers to the seamless linking together across the Internet of physically separate computers to form a huge, virtual computer. It's an idea that I've been following for some time, not least because it's yet another area where free software trounces proprietary solutions.

The most popular toolkit for building grids comes from the Globus Alliance, and this is by far the best place to turn to find out about the subject. For example, there's a particularly good introduction to grid computing's background and the latest developments.

The section dealing with grid architecture notes that there is currently a convergence between grid computing and the whole idea of Web services. This is only logical, since one of the benefits of having a grid is that you can access Web services across it in a completely transparent way to create powerful virtual applications running on massive virtual hardware.

The Globus Alliance site is packed with other resources, including a FAQ, a huge list of research papers on grids and related topics, information about the Globus Toolkit, which lets you create grids, and the software itself.

Open source's leading position in the grid computing world complements a similar success in the related field of supercomputing. As this chart shows, over 50% of the top 500 supercomputers in the world run GNU/Linux; significantly, Microsoft Windows does not even appear on the chart.

This total domination of top-end computing - be it grids or supercomputers - by open source is one of the facts that Microsoft somehow omits to tell us in its "Get The Facts" campaign.

06 March 2006

Blogging Newspapers

One of the interesting questions raised by the ascent of blogs is: What will the newspapers do? Even though traditional printed titles are unlikely to disappear, they are bound to change. This post, from the mysteriously-named "Blue Plate Special" blog (via C|Net's Esoteric blog) may not answer that question, but it does provide some nutritious food for thought.

It offers its views on which of the major US dailies blog best, quantified through a voting system. Although interesting - and rich fodder for those in need of a new displacement activity - the results probably aren't so important as the criteria used for obtaining them. They were as follows:

Ease-of-use and clear navigation
Currency
Quality of writing, thinking and linking
Voice
Comments and reader participation
Range and originality
Explain what blogging is on your blogs page
Show commitment

The blog posting gives more details on each, but what's worth noting is that most of these could be applied to any blog - not just those in newspapers. Having recently put together my own preliminary thoughts on the Art of the Blog, I find that these form a fascinating alternative view, and with several areas of commonality. I strongly recommend all bloggers to read the full article - whether or not you care about blogging newspapers.

05 March 2006

Google Googlied by Spaiku Adages

Today was a black day in the annals of my Gmail account: I received my first piece of spam. You might think I should be rejoicing that I've only ever received one piece of spam, but bear in mind that this is a relatively new account, and one that I've not used much. Moreover, Gmail comes with spam filtering as standard: you might hope that Google's vast computing engines would be able consistently to spot spam.

So far they have: the spam bucket of my account lists some 42 spam messages that Google caught. The question is: why did Google get googlied by this one? It's not particularly cunning: it has the usual obfuscated product names (it's one of those), with some random characters and the usual poetic signoff.

Actually, now that I come to check, this turns out to be slightly special:

Work first and then rest.
Actions speak louder than words.
Old head and young hand.

Maybe this is Gmail's Achille's Heel: it is defenceless in the face of spam haiku (spaiku?) adages.

04 March 2006

The European Digital Library: Dream, but Don't Touch

With all the brouhaha over the Google Book Search Library Project, it is easy to overlook other efforts directed along similar lines. I'm certainly guilty of this sin of omission when it comes to The European Library, about which I knew nothing until very recently.

The European Library is currently most useful for carrying out integrated searches across many European national libraries (I was disappointed to discover that neither Serbia nor Latvia has any of my books in their central libraries). Its holdings seem to be mainly bibliographic, rather than links to the actual text of books (though there are some exceptions).

However, a recent press release from the European Commission seems to indicate that The European Library could well be transmogrified into something altogether grander: The European Digital Library. According to the release:

At least six million books, documents and other cultural works will be made available to anyone with a Web connection through the European Digital Library over the next five years. In order to boost European digitisation efforts, the Commission will co-fund the creation of a Europe-wide network of digitisation centres.

Great, but it adds:

The Commission will also address, in a series of policy documents, the issue of the appropriate framework for intellectual property rights protection in the context of digital libraries.

Even more ominously, the press release concludes:

A High Level Group on the European Digital Library will meet for the first time on 27 March 2006 and will be chaired by Commissioner Reding. It will brings together major stakeholders from industry and cultural institutions. The group will address issues such as public-private collaboration for digitisation and copyrights.

"Stakeholders from industry and cultural institutions": but, as usual, nobody representing the poor mugs who (a) will actually use this stuff and (b) foot the bill. So will our great European Digital Library be open access? I don't think so.

The Amazing Amazon Mechanical Turk

OK, so I may be well behind the times, but I still found this rather amazing when I came across it. Not so much for what it is - a version of Google Answers - but for the fact that Amazon is doing it.

Google I can understand: its Answers service is reaching the parts its other searches cannot - a complement to the main engine (albeit a tacit admission of defeat on Google's part: resorting to wetware, whatever next?). But Amazon? What has a people-generated answer service got to do with selling things? Come on Jeff, focus.

Cool name, though.

Digg This, It's Groovy

Digg.com is a quintessentially Web 2.0 phenomenon: a by-the-people, for-the-people version of Slashdot (itself a keyWeb 1.0 site). So Digg's evolution is of some interest as an example of part of the Net's future inventing itself.

A case in point is the latest iteration, which adds a souped-up comment system (interestingly, this comes from the official Digg blog, which is on Blogger, rather than self-hosted). Effectively, this lets you digg the comments.

An example is this story: New Digg Comment System Released!, which is the posting by Kevin Rose (Digg's founder) about the new features. Appropriately enough, this has a massive set of comments (nearly 700 at the time of writing).

The new system's not perfect - for example, there doesn't seem to be any quick way to roll up comments which are initially hidden (because they have been moderated away), but that can easily be fixed. What's most interesting is perhaps the Digg sociology - watching which comments get stomped on vigorously, versus those that get the thumbs up.

Tying the Kangaroo Down

If any proof were needed that some people still don't really get the Internet, this article is surely it. Apparently Australia's copyright collection agency wants schools to pay a "browsing fee" every time a teacher tells students to browse a Web site.

Right.

So, don't tell me: the idea is to ensure that students don't use the Web, and that they grow up less skilled in the key enabling technology of the early twenty-first century, that they learn less, etc. etc. etc.?

Of course, the fact that more and more content is freely available under Creative Commons licences, or is simply in the public domain, doesn't enter into the so-called "minds" of those at the copyright collection office. Nor does the fact that by making this call they not only demonstrate their extraordinary obtuseness, but also handily underline why copyright collection agencies are actually rather irrelevant these days. And that rather than waste schools' time and money paying "browsing fees", Australia might perhaps do better to close down said irrelevant, clueless copyright office, and save some money instead?

03 March 2006

Beyond Parallel Universes

One of the themes of this blog is the commonality between the various opens. In a piece I wrote for the excellent online magazine LWN.net, I've tried to make some of the parallels between open source and open access explicit - to the point where I set up something of a mapping between key individuals and key moments (Peter Suber at Open Access News even drew a little diagram to make this clearer).

My article tries to look at the big picture, largely because I was trying to show those in the open source world why they should care about open access. At the end I talk a little about specific open source software that can be used for open access. Another piece on the Outgoing blog (subtitle: "Library metadata techniques and trends"), takes a closer look at a particular kind of such software, that for repositories (where you can stick your open access materials).

This called forth a typically spirited commentary from Stevan Harnad, which contains a link to yet more interesting words from Richard Poynder, a pioneering journalist in the open access field, with a blog - called "Open and Shut" (could there be a theme, here?) - that is always worth taking a look at. For example, he has a fascinating interview on the subject of the role of open access in the humanities.

Poynder rightly points out that there is something a contradiction in much journalistic writing about open access, in that it is often not accessible itself (even my LWN.net piece was subscribers-only for a week). And so he's bravely decided to conduct a little experiment by providing the first section of a long essay, and then asking anyone who reads it - it is freely accessible - and finds it useful to make a modest donation. I wish him well, though I fear it may not bring him quite the income he is hoping for.

01 March 2006

There's No INSTEDD without Open Access

An interesting story in eWeek.com. Larry Brilliant, newly-appointed head of the Google.org philanthropic foundation, wants to set up a dedicated search engine that will spot incipient disease outbreaks.

The planned name is INSTEDD: International Networked System for Total Early Disease Detection - a reference to the fact that it represents an alternative option to just waiting for cataclysmic infections - like pandemics - to happen. According to the article:

Brilliant wants to expand an existing web crawler run by the Canadian government. The Global Public Health Intelligence Network monitors about 20,000 Web sites in seven languages, searching for terms that could warn of an outbreak.

What's interesting about this - apart from the novel idea of spotting outbreaks around the physical world by scanning the information shadow they leave in the digital cyberworld - is that to work it depends critically on having free access to as much information and as many scientific and medical reports as possible.

Indeed, this seems a clear case where it could be claimed that not providing open access in relevant areas - and the range of subjects that are relevant is vast - is actually endangering the lives of millions of people. Something for publishers and their lawyers to think about, perhaps.

Higgins: Social Web, Social Commerce

Identity is a slippery thing at the best of times. On the Internet it's even worse (as the New Yorker cartoon famously encapsulated). But identity still matters, and sorting it out is going to be crucial if the Internet is to continue moving into the heart of our lives.

Of course, defining local solutions is easy: that's why you have to remember 33 different passwords for 33 different user accounts (you do change the password for each account, don't you?) at Amazon.com and the rest. The hard part is creating a unitary system.

The obvious way to do this is for somebody to step forward - hello Microsoft Passport - and to offer to handle everything. There are problems with this approach - including the tasty target that the central identity stores represent for ne'er-do-wells (one reason why the UK Government's proposed ID card scheme is utterly idiotic), and the concentration of power it creates (and Microsoft really needs more power, right?).

Ideally, then, you would want a completely modular, decentralised approach, based on open source software. Why open source? Well, if it's closed source, you never really know what it's doing with your identity - in the same way that you never really know what closed software in general is doing with your system (spyware, anyone?).

Enter Higgins, which not only meets those requirements, but is even an Eclipse project to boot. As the goals page explains:

The Higgins Trust Framework intends to address four challenges: the lack of common interfaces to identity/networking systems, the need for interoperability, the need to manage multiple contexts, and the need to respond to regulatory, public or customer pressure to implement solutions based on trusted infrastructure that offers security and privacy.

Perhaps the most interesting of these is the "multiple contexts" one:

The existence of common identity/networking framework also makes possible new kinds of applications. Applications that manage identities, relationships, reputation and trust across multiple contexts. Of particular interest are applications that work on behalf of a user to manage their own profiles, relationships, and reputation across their various personal and professional groups, teams, and other organizational affiliations while preserving their privacy. These applications could provide users with the ability to: discover new groups through shared affinities; find new team members based on reputation and background; sort, filter and visualize their social networks. Applications could be used by organizations to build and manage their networks of networks.

The idea here seems to be a kind of super-identity - a swirling bundle of different cuts of your identity that can operate according to the context. Although this might lead to fragmentation, it would also enable a richer kind of identity to emerge.

As well as cool ideas, Higgins also has going for it the backing of some big names: according to this press release, those involved include IBM, Novell, the startup Parity Communications (Dyson Alert: Esther's in on this one, too) and the Berkman Center for Internet & Society at Harvard Law School.

The latter is also involved in SocialPhysics.org, whose aim is

to help create a new commons, the "social web". The social web is a layer built on top of the Internet to provide a trusted way to link people, organizations, and concepts. It will provide people more control over their digital identities, the ability to more easily find other people and groups, and more control over how they are seen by others across diverse contexts.

There is also a blog, called Social Commerce, defined as "e-commerce + social networking + user-centric identity". There are lots of links here, as well as on the SocialPhysics site. Clearly there's much going on in this area, and I'm sure I'll be returning to it in the future.