Thursday, June 12, 2014

Museums in the World of Open Data

Yesterday IMLS gave a webinar explaining the methodology that went into compiling the 34,144 records in their recently released Museum Universe Database File (MUDF). 

The big news isn't so much the count, though that is important, as the fact that for the first time there is a publicly available, open source data set of US museums. 

The MUDF data has already been imported into Github--a web-based hosting service for software development projects, and FactMiners.org launched an Open Source project to bring MUDF into the Neo4j graph database. This is the first step towards tapping the potential of this kind of public museum data. So take a minute to read (and view) a bit about graph databases and what they can do. 

A graph database contains nodes and relationships, as well as data about the properties of these elements. So it is about how data points are connected to each other in various kinds of ways--including geographical relationships and social relationships. Here is a brief video from the founder of Neo4j talking about how you can use graph databases to detect patterns, embed information in maps, and overlay social data. 



 Imported into such an environment, MUDF data becomes a step towards creating, for example, a "museum recommendation engine" that can answer queries like "what museums near me might I like to visit?"  (Ok, Emile uses restaurants as an example, but you can sub in "museum.") These queries can be pretty "intelligent"--going beyond simply finding what is nearby, and how highly they are ranked, to weighting the credibility of the people who gave that ranking. How are museums rated by your friends, or their friends? Is a given rating from someone who visits a lot of museums? Graph databases excel at supporting these kinds of sophisticated reasoning inquiries, based on knowing how things (and places and people) are related to each other--about connections.  

Of course, all of this presumes that the data in the graph database is actually about museums--a point which occupied a lot of the side bar commentary during the IMLS webinar  

Which brings us back to the fundamental question: what are we trying to count? Even if we all agree the database shouldn't include examples like an athletic hall of fame in a high school or a friends group supporting museums in Russia (examples Max van Balgooy points to in his analysis of the webinar), we are left struggling with whether to include support groups of US museums, or parent organizations, or associations devoted to historic preservation or folklore or other types of culture. As a field we have been notoriously unable to agree on a definition of "museum" that includes everything we believe ought to be counted and excludes what should not. (Since this is nominally Throwback Thursday here on the Blog, I will refer you to this post from 2009 on the museum identity crisis.)

One way to dodge the language tangle is to focus on utility--what do we want to use the database for? Or perhaps I should say, who do we expect to use the database, and what do they want to use it for? I presume that for its own purposes, IMLS wants to know the size, shape and impact of the field it supports. Here at the Alliance we want to set appropriate goals for advocacy and service. But I bet a lot of the folks accessing MUDF's data want to use it to create applications that serve people who want to actually visit a museum. Even after we all pitch in to refine the data by weeding out duplicates, museums that have closed, organizations listed more than once and things that clearly are not, in any way shape or form, museums, we also have to figure out what to add to MUDF (like "are there exhibits open to the public at this address?") in order to support things like a museum recommendation engine. 

But let's not lose sight of the big picture: now that the data is out there for anyone to play with, some pretty cool things are going to happen, with or without the direct involvement of the museum field. Hackers (in the constructive sense of the word) are going to start playing with data about us, and in the process they are going to amplify and add to the value of what we offer. 

No comments: