Showing posts with label sui generis. Show all posts
Showing posts with label sui generis. Show all posts

06 October 2009

Postcodes: Royal Fail

Here's a perfect example of why intellectual commons should not be enclosed.

The UK Postcode data set is obviously crucial information for businesses and ordinary citizens - something that is clearly vital to the smooth running of everyday life. But more than that, it is geographic information that allows all kinds of innovative services to be provided by people with clever ideas and some skill.

That's exactly what happened when the Postcode database was leaked on to the Internet recently. People used that information to do all sorts of things that hadn't been done before, presumably because the company that claims to own this information, Royal Mail, was charging an exorbitant amount for access to it.

And then guess what happened? Yup, the nasties started arriving:

On Friday the 2nd October we received correspondence from the Royal Mail demanding that we close this site down (see below). One of the directors of Ernest Marples Postcodes Ltd has also been threatened personally.

We are not in a position to mount an effective legal challenge against the Royal Mail’s demands and therefore have closed the ErnestMarples.com API effective immediately.

We understand that this will cause harm and considerable inconvenience to the many people who are using or intend to use the API to power socially useful tools, such as HealthWhere, JobcentreProPlus.com and PlanningAlerts.com. For this, we apologise unreservedly.

Specifically, intellectual monopolies of a particularly stupid kind are involved:

Our client is the proprietor of extensive intellectual property rights in the Database, including copyright in both the Database and the software, and database rights.

Here's what Wikipedia has to say about these "database rights":

The Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases is a European Union directive in the field of copyright law, made under the internal market provisions of the Treaty of Rome. It harmonizes the treatment of databases under copyright law, and creates a new sui generis right for the creators of databases which do not qualify for copyright.

Before 1996, these sui generis "database rights" did not exist; they were created in the EU because lobbyists put forward the argument that they would offer an incentive to create more databases than the Americans, whose database publishers strangely didn't seem to need this new "right" to thrive, and so make the mighty EU even mightier - at least as far as those jolly exciting databases were concerned.

Rather wisely, afterwards the EU decided to do some research in this area, comparing their creation before and after the new sui generis right was brought in, to see just how great that incentive proved to be - a unique opportunity to test the theory that underpins intellectual monopolies. Here are the results of that research:

Introduced to stimulate the production of databases in Europe, the “sui generis”protection has had no proven impact on the production of databases.

According to the Gale Directory of Databases, the number of EU-based database “entries” was 3095 in 2004 as compared to 3092 in 1998 when the first Member States had implemented the “sui generis” protection into national laws.

It is noteworthy that the number of database “entries” dropped just as most of the EU-15 had implemented the Directive into national laws in 2001. In 2001, there were 4085 EU-based “entries” while in 2004 there were only 3095.

While the evidence taken from the GDD relies on the number of database “entries” and not on the overall turnover achieved or the information supplied by means of databases, they remain the only empirical data available.

So, the official EU study finds that the sui generis protection has had no proven impact on the production of databases; in fact, the number of databases went *down* after it was introduced.

Thus these "database rights" have been shown to stifle the production of databases - negating the whole claimed point of their introduction. Moreover, the Royal Mail's bullying of a couple of people who are trying to offer useful services that would not otherwise exist, shows the danger of entrusting such a critical data commons to commercial entities who then enclose it by claiming "database rights" in them: they will always be tempted to maximise their own profit, rather than the value to society as a whole.

Giving the Royal Mail a monopoly on this critical dataset - one that for all practical purposes can never be created again - is like giving a genetics company a monopoly on the human genome. That was attempted (hello, Celera) but, fortunately for us, thwarted, thanks largely to free software. Today, the human genome is an intellectual commons (well, most of it), and the Postcode data should be, too.

Follow me @glynmoody on Twitter or identi.ca.

18 March 2008

ODC Public Domain Dedication and Licence

One of the themes of this blog is how the ideas behind open source are seeping into many other domains. One of the latest is that of databases. The question of how you make a database open is prickly, not least because in Europe there is a stupid law that grants a “sui generis” database right, whatever that means. This was intended to stimulate investment in databases; but guess what? It has done precisely the opposite, and actually led to *less* investment relative to the US, where there is no such right. The withering power of intellectual monopolies strikes again.

Anyway, in order to deal with databases, a new kind of licence is required that takes into account these kind of problems, and the Open Data Commons has put one together that has just been released as version 1.0:

The Open Data Commons – Public Domain Dedication & Licence is a document intended to allow you to freely share, modify, and use this work for any purpose and without any restrictions. This licence is intended for use on databases or their contents (”data”), either together or individually.

Many databases are covered by copyright. Some jurisdictions, mainly in Europe, have specific special rights that cover databases called the “sui generis” database right. Both of these sets of rights, as well as other legal rights used to protect databases and data, can create uncertainty or practical difficulty for those wishing to share databases and their underlying data but retain a limited amount of rights under a “some rights reserved” approach to licensing as outlined in the Science Commons Protocol for Implementing Open Access Data. As a result, this waiver and licence tries to the fullest extent possible to eliminate or fully license any rights that cover this database and data. Any Community Norms or similar statements of use of the database or data do not form a part of this document, and do not act as a contract for access or other terms of use for the database or data.

Good stuff. (Via Andrew Katz.)

17 December 2007

Open Access Data - A Question of Protocol

Something calling itself a “Protocol for Implementing Open Access Data” sounds about as exciting as a list of ingredients for paint. But this memo from the Science Commons is one of the most important documents in this field to date. Its scope is explained in the opening paragraph:

This memo provides information for the Internet community interested in distributing data or databases under an “open access” structure. There are several definitions of “open” and “open access” on the Internet, including the Open Knowledge Definition and the Budapest Declaration on Open Access; the protocol laid out herein is intended to conform to the Open Knowledge Definition and extend the ideas of the Budapest Declaration to data and databases.

Again, that may not sound very exciting, but trying to come up with definitions of “open data” or “open access data” have proved extraordinarily hard, and in the course of the memo we learn why:
3. Principles of open access data
Legal tools for an open access data sharing protocol must be developed with three key principles in mind:
3.1 The protocol must promote legal predictability and certainty.
3.2 The protocol must be easy to use and understand.
3.3 The protocol must impose the lowest possible transaction costs on users.


These principles are motivated by Science Commons’ experience in distributing a database licensing Frequently Asked Questions (FAQ) file. Scientists are uncomfortable applying the FAQ because they find it hard to apply the distinction between what is copyrightable and what is not copyrightable, among other elements. A lack of simplicity restricts usage and as such restricts the open access flow of data. Thus any usage system must both be legally accurate while simultaneously very simple for scientists, reducing or eliminating the need to make the distinction between copyrightable and non-copyrightable elements.

The terms also need to satisfy the norms and expectations of the disciplines providing the database. This makes a single license approach difficult – archaeology data norms for citation will differ from those in physics, and yet again from those in biology, and yet again from those in the cultural or educational spaces. But those norms must be attached in a form that imposes the lowest possible costs on users (now and in the future).

The solution is at once obvious and radical:

4. Implementing the Science Commons Database Protocol for open access data
4.1 Converge on the public domain by waiving all rights based on intellectual property

The conflict between simplicity and legal certainty can be best resolved by a twofold measure: 1) a reconstruction of the public domain and 2) the use of scientific norms to express the wishes of the data provider.

Reconstructing the public domain can be achieved through the use of a legal tool (waiving the relevant rights on data and asserting that the provider makes no claims on the data).

Requesting behavior, such as citation, through norms and terms of use rather than as a legal requirement based on copyright or contracts, allows for different scientific disciplines to develop different norms for citation. This allows for legal certainty without constraining one community to the norms of another.

Thus, to facilitate data integration and open access data sharing, any implementation of this protocol MUST waive all rights necessary for data extraction and re-use (including copyright, sui generis database rights, claims of unfair competition, implied contracts, and other legal rights), and MUST NOT apply any obligations on the user of the data or database such as “copyleft” or “share alike”, or even the legal requirement to provide attribution. Any implementation SHOULD define a non-legally binding set of citation norms in clear, lay-readable language.

The solution is obvious because the public domain is the zero state of copyright (in fact, the new Creative Commons public domain licence is called simply CCZero.) It is radical because previous attempts have tried to build on the evident success of the GNU GPL by taking a kind of copyleft approach: using copyright to limit copyright. But the new protocol explicitly negates the use of both GPL's copyleft and the Creative Commons Sharealike licences because, minimal as they are, they are still too restrictive – even though they are both predicated on maximising sharing.

One knock-on consequence of this is that attribution requirements are out. This is not just a matter of belief or principle, but of practicality:

In a world of database integration and federation, attribution can easily cascade into a burden for scientists if a category error is made. Would a scientist need to attribute 40,000 data depositors in the event of a query across 40,000 data sets? How does this relate to the evolved norms of citation within a discipline, and does the attribution requirement indeed conflict with accepted norms in some disciplines? Indeed, failing to give attribution to all 40,000 sources could be the basis for a copyright infringement suit at worst, and at best, imposes a significant transaction cost on the scientist using the data.

It is this pragmatism, rooted in how science actually works, that makes the current protocol particularly important: it might actually be useful. It's also significant that it plugs in to previously existing work in related fields. For example, as the accompanying blog post explains:

We are also pleased to announce that the Open Knowledge Foundation has certified the Protocol as conforming to the Open Knowledge Definition. We think it’s important to avoid legal fragmentation at the early stages, and that one way to avoid that fragmentation is to work with the existing thought leaders like the OKF.

Moreover, the protocol has already been applied in drawing up another important text, the Open Data Commons Public Domain Dedication & Licence:

The Open Data Commons Public Domain Dedication & Licence is a document intended to allow you to freely share, modify, and use this work for any purpose and without any restrictions. This licence is intended for use on databases or their contents (”data”), either together or individually.

Many databases are covered by copyright. Some jurisdictions, mainly in Europe, have specific special rights that cover databases called the “sui generis” database right. Both of these sets of rights, as well as other legal rights used to protect databases and data, can create uncertainty or practical difficulty for those wishing to share databases and their underlying data but retain a limited amount of rights under a “some rights reserved” approach to licensing. As a result, this waiver and licence tries to the fullest extent possible to eliminate or fully license any rights that cover this database and data.

Again, however dry and legalistic this stuff may seem it's not: we're talking about the rigorous foundations of new kinds of sharing - and we all know how important and powerful that can be.

Update: John Wilbanks has pointed me to his post about the winnowing process that led to this protocol - fascinating stuff.