DL news
2007-12-03: DELOS Association established
The DELOS Association for Digital Libraries has been established in order to keep the "DELOS spirit" alive by promoting research activities in the field of digital libraries.
More info...
  
2007-06-08: Second Workshop on Foundations of Digital Libraries

The 2nd International Workshop on Foundations of Digital Libraries will be held in Budapest (Hungary) on 20 Septemeber 2007, in conjunction with the 11th European Conference on Research and Advanced Technologies for Digital Libraries (ECDL 2007).
Event website
  

DL Events
January 24-25, 2008 - Padova, Italy

4th Italian Research Conference on Digital Library Systems
Event website
 

December 5-7, 2007 - Pisa, Italy

Second DELOS Conference on Digital Libraries
Event website
   

Delos News as an
RSS-feed
Home arrow Newsletter Issue 3 - KESI
PDF Print E-mail

Newsletter Issue 3

Main | Feature Articles | Cluster Reports | DLA | IAP | A/V-NTO | UIV | KESI | EVAL | Promotion | Workshop | Latest News

Knowledge Extraction and Semantic Interoperability

Martin Doerr provides us with an overview of a comprehensive report on Semantic Interoperability.

  

The DELOS WP5 cluster on Knowledge Extraction and Semantic Interoperability is finishing a comprehensive report on Semantic Interoperability in Digital Library Systems [1].

  

The Internet and more particularly the Web have been instrumental in making widely accessible a vast range of digital resources. However, the current state of affairs is such that the task of pulling together relevant information involves searching for individual bits and pieces of information gleaned from a range of sources and services and manually assembling them into a whole. This task becomes increasingly intractable with the rapid rate at which resources are becoming available online.

  

Interoperability is therefore a major issue that affects all types of digital information systems, but has gained prominence with the widespread adoption of the Web. As far as digital libraries are concerned, interoperability is becoming a paramount issue as the Internet unites digital library systems of differing types, run by separate organisations which are geographically distributed all over the world. Federated digital library systems, in the form of co-operating autonomous systems, are emerging in a bid to make distributed collections of heterogeneous resources appear as a single, virtually integrated collection.

  

The report defines interoperability very broadly as enabling any form of inter-system communication, or the ability of a system to make use of data from a previously unforeseen source. Interoperability in general is concerned with the capability of different information systems to communicate. This communication may take various forms such as the transfer, exchange, transformation, mediation, migration or integration of information.

  

Semantic interoperability ("SI") is characterised by the capability of different information systems to communicate information consistent with the intended meaning of the encoded information (as intended by the creators or maintainers of the information system). It involves:

  • the processing of the shared information so that it is consistent with the intended meaning
  • the encoding of queries and presentation of information so that it conforms with the intended meaning regardless of the source of information

The issue is addressed from the following perspectives:

  • Definition and theoretical aspects of semantic interoperability in the light of the information life-cycle in digital libraries
  • Enabling factors and technologies to enhance semantic interoperability, as well as relevant actual methods and processes in use or under development
  • Implications for a research agenda

SI issues are analyzed from a practical point of view for the following extended list of information life cycle elements that reveals the extraordinary relevance of SI in all aspects of Digital Libraries:

  1. Creation, modification
  2. Publication
  3. Acquisition, selection, storage, system and collection building
  4. Cataloguing (metadata, identification/naming, registration), indexing, knowledge organisation, knowledge representation, modelling
  5. Integration, brokering, linking, syntactic and semantic interoperability engineering
  6. Mediation (user interfaces, personalisation, reference, recommendation, transfer etc.)
  7. Access, search and discovery
  8. Use, shared application/collaboration, scholarly communication, annotation, evaluation, reuse, work environments
  9. Maintenance
  10. Archiving and preservation

From a theoretical point of view, the report distinguishes SI at three levels of abstraction:

  1. Data structures, be it for metadata, content data, collection management data, service description data.
  2. Categorical data, i.e. data that refer to universals, such as classification, typologies and general subjects.
  3. Factual data, i.e. data that refer to particulars, such as people, items, places.

It shows in the sequence how these levels relate to different problems, methods and systems to achieve SI. Arguments are made that interoperability is always achieved by a reasonable combination of adhering to common standards and providing methods for dynamic interpretation of non-standardized contents. The above levels of abstraction greatly differ in the scale of concepts or data to be integrated. Consequently, standards are more easily promoted for the upper level, whereas the lower levels have to be addressed more by automated, dynamic methods of integration. The report also tries to bridge some gaps between the emerging different terminology of the libraries and the computer science communities for the same concepts.

   

The analysis of enabling factors and technologies to enhance SI begins with the role of foundational and core ontologies. They are not only perceived as a means to improve contents and consistency of terminological systems, but particularly as a means to assist mediation between different data structures and the transition zone of data structures and upper-level terminology.

  

A central role is played by Knowledge Organisation Systems (KOS) and their use in networks (NKOS), which deserve a particular classification due to the large variety in size and sophistication of intellectual analysis. KOS represent the shared agreement on concepts (categorical data) and important factual data, such as place names, very important people etc. Particular methods to enhance semantic interoperability are KOS transformation, correlation, mapping and others, but also questions of availability and rights are addressed.

  

An analysis of the role of architecture and infrastructures connects to how communication protocols and central services can support the global communication on standards and shared concepts, starting with metadata registries at level one to gazetteer services at level three.

  

In particular when discussing implementation strategies of integrated services, standardization, mediation and data warehousing are frequently controversially discussed, each one as the best solution. The report sees these techniques as alternatives with different application characteristics. Therefore a particular chapter is devoted to the pros and cons of these approaches so that designers may have better decision criteria at hand for their specific application.

  

Finally, some implications for a research agenda are discussed. At least some important areas for further R&D are identified:

  

Methodologies and tools for schema matching, mapping, and semantic data transformation, including graphical visualization methods to assist domain experts to formulate equivalences following their conceptualization as well as automated tools proposing schema matching to the expert.

  

Future issues for Thesaurus and KOS protocols include possible provision of more complex services, such as semantic expansion (beyond basic broader and narrower expansion), more advanced natural language functionality for identifying controlled terminology in free text (documents or query), cross-mapping provision (important for semantic interoperability) and possible data-dependent filters such as the number of postings associated with a concept.

  

The vision of employing imprecise semantic equivalences between multiple KOS (as "switching languages" etc.) requires a revision of query languages and engines in order to control dynamically the respective information loss.

  

Overall, methods and services are sought that lead to a convergence of global resources to higher states of semantic consistency, against the diverging forces of information isolation, update and local innovation.

  

References

 

1. Available draft: Patel M., Koch T., Doerr M., Tsinaraki C., DELOS Deliverable D5.3.1: Semantic Interoperability in Digital Library Systems, February 2005

    

Author Details

  

Martin Doerr
Principal Researcher
Institute of Computer Science
The Foundation for Research and Technology - Hellas (FORTH)
Vassilika Vouton
P.O.Box 1385
GR 711 10 Heraklion, Crete
Greece
email:
martinATics.forth.gr
Tel: +30 2810 391625
Fax: +30 2810 391638

  


Publication date: June 2005
File last modified: Monday, 22-May-2006

The Delos Newsletter is published by the Delos Network of Excellence
and is edited by Richard Waller of UKOLN, University of Bath, UK.

   

PDF version of the whole issue

DELOS Community
Username

Password

Remember me
Forgot your password?
Create new user
DELOS search
 DELOS site
 DELOS D-Lib
 DELOS sites