What is the Semantic Web?


The idea behind the semantic web is that although online data is available for searching, its meaning is not: computers are very good at returning keywords, but very bad at understanding the context in which keywords are used. A typical search on the term “turkey,” for instance, might return traditional recipes, information about the bird, and information about the country; the search engine can only pick out keywords, and cannot distinguish among different uses of the words. Similarly, although the information required to answer a question like “How many current world leaders are under the age of 60?” is readily available to a search engine, it is scattered among many different pages and sources. Semantic-aware applications infer the meaning, or semantics, of information on the Internet to make connections and provide answers that would otherwise entail a great deal of time and effort. New applications use the context of information as well as the content to make determinations about relationships between bits of data; examples like TripIt, SemaPlorer, and Xobni organize information about travel plans, places, or email contacts and display it in convenient formats based on semantic connections. Semantic searching is being applied for scientific inquiries, allowing researchers to find relevant information without having to deal with apparently similar, but irrelevant, information. For instance, Noesis, a new semantic web search engine developed at the University of Alabama in Huntsville, is designed to filter out search hits that are off-topic. The search engine uses a discipline-specific semantic ontology to match search terms with relevant results, ensuring that a search on "tropical cyclones" will not turn up information on sports teams or roller coasters.

INSTRUCTIONS: Enter your responses to the questions below. This is most easily done by moving your cursor to the end of the last item and pressing RETURN to create a new bullet point. Please include URLs whenever you can (full URLs will automatically be turned into hyperlinks; please type them out rather than using the linking tools in the toolbar).

Please "sign" your contributions by marking with the code of 4 tildes (~) in a row so that we can follow up with you if we need additional information or leads to examples- this produces a signature when the page is updated, like this: - alan alan Jan 27, 2010

(1) How might this technology be relevant to the museums you know best?

  • The Semantic Web (and the related concept of Linked Data) helps to break down the need for local storage of information about universal concepts. If, for instance, the entirety of the Art and Architecture Thesaurus was referenceable via PURLs instead of downloaded into local collections management systems, the AAT could become a query-able hub for all sorts of museum data (i.e, "show me all objects from all museums' collections that feature lead glazes"). - Koven Koven Apr 30, 2010
  • Europeana http://www.europeana.eu is an excellent case study for semantic web architectures for cultural institutions such as museums. This digital library not only brings together content from 27 countires it does so in 23 languages. This means that existing dictionaries, terminologies, thesauri, classifications, taxonomies etc. used by museums have to be integrated into a cross-domain platform. Achieving cross-domain interoperability was Europeana’s first challenge and was met with the adoption of a low-barrier data model. This is currently ESE, the Europeana Semantic Elements specifications - v3.2.1 (http://group.europeana.eu/c/document_library/get_file?uuid=c56f82a4-8191-42fa-9379-4d5ff8c4ff75&groupId=10602), a data model based on Dublin Core with additional Europeana qualifiers that enable portal-specific functionalities. Normalisation of the data is also a necessary step to achieve homogenous access to the content. However - this is not adequate for event-based who/when/what/why museum scenarios and new developents are currently under development.
  • - susan.hazan susan.hazan May 1, 2010
  • Susan's example is excellent and it also points to a second important dimension of SW technology: its suitability for solving internal aspects of a problem, not only for traversing the internet as a whole. So in a lot of hand-waving discussion of SW, the focus is on using SPARQL or similar to hook up data from two or more sources, but Europeana's use of SW tech is as much about how it organises, analyses, aligns, translates, enriches data that has already been ingested, and then both indexes it ready for conventional (solr-powered) search and offers triple-store graph-grappling semantic search too (in the lab). I think that the internal use of the tech is the key to how it's growing and will gain an ever stronger foothold, because it's easier to do with datasets you control (at least by virtue of ingesting them first). As more people do things like what Koven's lot have done and what Europeana is doing, understanding of the issues grows and the base of suitable data in the wild grows so that in due course it becomes rewarding to start the next step - queries across silos. I think we're getting closer to that critical mass, although I suspect that ingesting and indexing data using SW tech will remain a very important part of the equation.
    So I think there is an important distinction to make between use of the concepts and technology (the creation and spidering of a graph of linked concepts and entities), which can be an entirely self-contained thing using only data from within an organisation, and the alignment of those concepts with known vocabularies and ontologies ready for putting that dataset out into the world. The former is very useful for the organisation because it lets it do things that aren't possible with relational databases, and it's to be encourage. However it is but a step towards the latter which has a still greater payoff in terms of the data being positioned in a global context.
    I agree with Koven, though, that the most immediately useful and simple way of using SW for museums (and anyone, really) is as Linked Data. This doesn't require any fancy triple stores or whatever, it simply requires us to publish our metadata in a different way so that (a) it indicates "home URLs" for concepts we use to describe our stuff, and (b) gives our own stuff PURLs so it can be reliably referred to. There are other subtleties, but this stuff is the basics, has low cost and is immediately beneficial. One for the challenges page, perhaps, is getting some of those thesauri that are behind paywalls out into the world because they're no use behind there!- jeremy.ottevanger jeremy.ottevanger May 1, 2010

(2) What themes are missing from the above description that you think are important?

  • The most important theme missing here would be that of Linked Data. In the end, a linked data approach (in which all museum objects/concepts have permanent identifiers that don't change even when objects are deaccessioned/sold/etc.) will probably have the biggest impact on museum practice in the near term. - Koven Koven Apr 30, 2010
  • I am not sure if this is the correct place for such discussion but the standards and schema currently for Europeana being discussed are as follows:
• cdwalite
CDWA Lite is an XML schema to describe core records for works of art and material culture based on the Categories for the Description of Works of Art (CDWA) and Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) (http://www.getty.edu/research/conducting_research/standards/cdwa/cdwalite.html).
• museumdat
museumdat is a harvesting format optimized for retrieval and publication, meant to deliver automatically core data to museum portals. It builds largely upon the data format CDWA Lite developed in the US by the Getty, the Visual Resources Association and others, with a specific focus on arts. museumdat now applies for all kinds of object classes, e.g. cultural, technology or natural history, and is compatible with the reference model of the international documentation committee CIDOC-CRM (ISO 21127). museumdat is an outcome of the work of Fachgruppe Dokumentation des Deutschen Museumsbundes (DMB). (http://www.museumdat.org)
• spectrum
SPECTRUM (http://www.collectionstrust.org.uk/spectrum).
Within the ATHENA project and in close cooperation with a trans-Atlantic Work Group aiming at integrating cdwalite and museumdat into one standard and the 2009 new established CIDOC Work Group “Data exchange and data harvesting” with the UK SPECTRUM the new standard LIDO is being developed.
• LIDO (Lightweight information describing objects) is now in use by ATHENA - to be launched in the spring of 2011.
LIDO, Lightweight Information Describing Objects (http://museums.wikia.com/wiki/Berliner_Herbsttreffen_zur_Museumsdokumentation_2009)
LIDO is poised to become one of the central solutions recommended by the ATHENA Work Groups. Within the hundreds of dispersed museums across Europe, and different kinds of museum, each institution has their own rules on how and which data is recorded about their objects. In order to share data with other institutions, the one thing we agree on is that there is a need to agree on a common standard. In order to express, deliver, exchange, and harvest information in machine-readable form and to be able to upload data automatically to Europeana the decision was taken to use the LIDO-format. The LIDO standard is able to express a wide variety of information that identifies unique objects and understands and why a particular object is held in any specific museum. - susan.hazan susan.hazan May 1, 2010

(3) What do you see as the potential impact of this technology on education and interpretation in museums?

  • Linked Data and the SW both have huge implications for institutional data-sharing and end-user mashups.- Koven Koven Apr 30, 2010
  • I completely agree with my coleague here - susan.hazan susan.hazan May 1, 2010
  • Great summary Koven and Susan.- scott.sayre scott.sayre May 2, 2010

(4) Do you have or know of a project working in this area?

The Mellon-funded ResearchSpace and ConservationSpace projects are making tentative steps in this direction, but nothing concrete has emerged just yet. At the Met, we are using Semantic MediaWiki for the storage, querying, and presentation of complex free-text data. - Koven Koven Apr 30, 2010
Please share information about related projects in our Horizon Project sharing form.