Sunday, 22 May 2011

unique ideas about unique ids

I recently came across, a madly ambitious project from the prolific Chris Taggart.

The website started in Dec 2010 as an open database of all companies in the UK - based on scraping the Companies House website. It has since gone global: by recruiting other scrapers, opencorporates now contains details of companies in over 20 jurisdictions.
It looks like it could soon become the world's central database of corporations.

The power of a database like this comes when it is connected to other databases. Like this list of all organisations funded by DfID for example. (Most will be registered as a corporation, even if they are also registered as a charity or NGO).

If the databases were connected then we could receive updates every time information about any of the organisations funded by DfID changed - for example every time a Director left, an annual report was submitted or the company changed registered address. In time, if other datasets were also connected, we could conceivably get updates every time any of the organisations was taken to court - or every time a press release was written about them.

This brave new world of linked data is coming fast. For the International Aid Transparency Initiative (IATI), it it is crucial that donors use unique identifiers (based on their registered company number) when referring to other organisations, so that this data can be connected to other datasets.

Although DfID should be applauded for being the first donor to publish data in the IATI format, not including unique identifiers for recipient organisations is a big omission.

If DfID know the company numbers for organisations they fund, then they should include them in their IATI data release; if they don't know them then they should start collecting them.