|
|
Newsletter Issue 3 Main | Feature Articles | Cluster Reports | DLA | IAP | A/V-NTO | UIV | KESI | EVAL | Promotion | Workshop | Latest NewsKnowledge Extraction and Semantic InteroperabilityMartin Doerr provides us with an overview of a comprehensive report on Semantic Interoperability.
The DELOS WP5 cluster on Knowledge Extraction and Semantic Interoperability is finishing a comprehensive report on Semantic Interoperability in Digital Library Systems [1].
The Internet and more particularly the Web have been instrumental in making widely accessible a vast range of digital resources. However, the current state of affairs is such that the task of pulling together relevant information involves searching for individual bits and pieces of information gleaned from a range of sources and services and manually assembling them into a whole. This task becomes increasingly intractable with the rapid rate at which resources are becoming available online.
Interoperability is therefore a major issue that affects all types of digital information systems, but has gained prominence with the widespread adoption of the Web. As far as digital libraries are concerned, interoperability is becoming a paramount issue as the Internet unites digital library systems of differing types, run by separate organisations which are geographically distributed all over the world. Federated digital library systems, in the form of co-operating autonomous systems, are emerging in a bid to make distributed collections of heterogeneous resources appear as a single, virtually integrated collection.
The report defines interoperability very broadly as enabling any form of inter-system communication, or the ability of a system to make use of data from a previously unforeseen source. Interoperability in general is concerned with the capability of different information systems to communicate. This communication may take various forms such as the transfer, exchange, transformation, mediation, migration or integration of information.
Semantic interoperability ("SI") is characterised by the capability of different information systems to communicate information consistent with the intended meaning of the encoded information (as intended by the creators or maintainers of the information system). It involves:
The issue is addressed from the following perspectives:
SI issues are analyzed from a practical point of view for the following extended list of information life cycle elements that reveals the extraordinary relevance of SI in all aspects of Digital Libraries:
From a theoretical point of view, the report distinguishes SI at three levels of abstraction:
It shows in the sequence how these levels relate to different problems, methods and systems to achieve SI. Arguments are made that interoperability is always achieved by a reasonable combination of adhering to common standards and providing methods for dynamic interpretation of non-standardized contents. The above levels of abstraction greatly differ in the scale of concepts or data to be integrated. Consequently, standards are more easily promoted for the upper level, whereas the lower levels have to be addressed more by automated, dynamic methods of integration. The report also tries to bridge some gaps between the emerging different terminology of the libraries and the computer science communities for the same concepts.
The analysis of enabling factors and technologies to enhance SI begins with the role of foundational and core ontologies. They are not only perceived as a means to improve contents and consistency of terminological systems, but particularly as a means to assist mediation between different data structures and the transition zone of data structures and upper-level terminology.
A central role is played by Knowledge Organisation Systems (KOS) and their use in networks (NKOS), which deserve a particular classification due to the large variety in size and sophistication of intellectual analysis. KOS represent the shared agreement on concepts (categorical data) and important factual data, such as place names, very important people etc. Particular methods to enhance semantic interoperability are KOS transformation, correlation, mapping and others, but also questions of availability and rights are addressed.
An analysis of the role of architecture and infrastructures connects to how communication protocols and central services can support the global communication on standards and shared concepts, starting with metadata registries at level one to gazetteer services at level three.
In particular when discussing implementation strategies of integrated services, standardization, mediation and data warehousing are frequently controversially discussed, each one as the best solution. The report sees these techniques as alternatives with different application characteristics. Therefore a particular chapter is devoted to the pros and cons of these approaches so that designers may have better decision criteria at hand for their specific application.
Finally, some implications for a research agenda are discussed. At least some important areas for further R&D are identified:
Methodologies and tools for schema matching, mapping, and semantic data transformation, including graphical visualization methods to assist domain experts to formulate equivalences following their conceptualization as well as automated tools proposing schema matching to the expert.
Future issues for Thesaurus and KOS protocols include possible provision of more complex services, such as semantic expansion (beyond basic broader and narrower expansion), more advanced natural language functionality for identifying controlled terminology in free text (documents or query), cross-mapping provision (important for semantic interoperability) and possible data-dependent filters such as the number of postings associated with a concept.
The vision of employing imprecise semantic equivalences between multiple KOS (as "switching languages" etc.) requires a revision of query languages and engines in order to control dynamically the respective information loss.
Overall, methods and services are sought that lead to a convergence of global resources to higher states of semantic consistency, against the diverging forces of information isolation, update and local innovation.
References
1. Available draft: Patel M., Koch T., Doerr M., Tsinaraki C., DELOS Deliverable D5.3.1: Semantic Interoperability in Digital Library Systems, February 2005
Author Details
Martin Doerr
Publication date: June 2005 File last modified: Monday, 22-May-2006 The Delos Newsletter is published by the Delos Network of Excellence and is edited by Richard Waller of UKOLN, University of Bath, UK.
|