DL news

2007-12-03: DELOS Association established

The DELOS Association for Digital Libraries has been established in order to keep the "DELOS spirit" alive by promoting research activities in the field of digital libraries.
More info...

2007-06-08: Second Workshop on Foundations of Digital Libraries

The 2nd International Workshop on Foundations of Digital Libraries will be held in Budapest (Hungary) on 20 Septemeber 2007, in conjunction with the 11th European Conference on Research and Advanced Technologies for Digital Libraries (ECDL 2007).
Event website

DL Events

January 24-25, 2008 - Padova, Italy

4th Italian Research Conference on Digital Library Systems
Event website

December 5-7, 2007 - Pisa, Italy

Second DELOS Conference on Digital Libraries
Event website

Delos News as an
RSS-feed

Home

Newsletter Issue 3 - Feature Article (G. Ioannidis)

Newsletter Issue 3

Main | Feature Articles | Cluster Reports | DLA | IAP | A/V-NTO | UIV | KESI | EVAL | Promotion | Workshop | Latest News

Heterogeneity in Digital Libraries: Two Sides of the Same Coin

Georgia Koutrika provides us with an overview of the challenges facing digital library design caused by the diversity both of data resources and users and describes how IAP user surveys are addressing this difficulty.

Introduction

Heterogeneity may be regarded as a benefit to digital libraries, but the truth is, it also represents something of a problem to developers as well. We come across it in a number of forms. Even at a very high level abstraction, we should consider data sources and users as two of the most fundamental constituents of a digital library. Accordingly, two basic types of heterogeneity are evident: data source heterogeneity and user heterogeneity. One cannot design and implement a digital library without considering these issues extremely carefully. As a consequence, their importance to digital library design has instigated the production of three separate surveys within the DELOS context.

Data Source Heterogeneity

A digital library can be a vast collection of objects stored and maintained by multiple information sources, including databases, image banks, file systems, e-mail systems, the World Wide Web, and others. Therefore, assembling information of relevance on a specific topic involves searching for correct information items emanating from a wide variety of sources.

The issue of data source heterogeneity can represent significant problems when accessing multiple data sources. In effect it is the degree of dissimilarity between the component data sources that determines the amount of difficulty involved in implementing a data integration system. Data sources may differ in many ways. At a lower level, heterogeneity arises out of differing hardware platforms, operating systems, networking protocols and access interfaces. At the higher level, heterogeneity arises out of differences among different programming and data models as well as different perceptions and modelling of the same real world. Moreover, the fact remains that sources are evolutionary, i.e. where at one point they may be included on a system, there also comes a time when they are removed.

Four types of data source heterogeneity have been identified:

System heterogeneity: arising from different hardware platforms and operating systems
Syntactic heterogeneity: caused by discrepancies across the different protocols, encodings and languages used by the information sources (e.g. query languages, browsing interfaces, data formats, communication protocols and so forth)
Structural heterogeneity: encountered among sources using different data models, data structures and schemas
Semantic heterogeneity: produced by semantic conflicts arising from the fact that the meaning of the data can be expressed in different ways, as every metadata scheme defines its own set of data elements or categories for data

Consequently, there is a need to provide users with the capacity to access digital library objects both seamlessly and transparently despite the heterogeneity and dynamism across the various information sources involved. Interoperable information sources and services allow users to focus on information use instead of their being obliged to acquire and combine the required content manually from the different sources.

Syntactic and structural interoperability supports the handling, exchange and combining of data properly, having proper regard to formats, encodings, properties, values, data types and so forth. A data integration system is one that provides users with transparent access to a collection of related data sources as if these sources, as a whole, constitute a single data source. The main objective of a data integration system is to facilitate users' attempts to focus on specifying what data they want, rather than on describing how to obtain it. To achieve this, the system provides an integrated view of the data stored in the underlying data sources. In a data integration system, users are interested mainly in querying the integrated data rather than updating the data through the integrated view. It is therefore something of an understatement to suggest that heterogeneous data sources invariably present designers of data integration systems with a raft of challenging difficulties.

The Data Integration Services Survey

Given such challenges, the aim of the Data Integration Services Survey is thoroughly to describe and compare the different approaches, schemes, frameworks and systems mentioned in the current literature on supporting information integration from structurally heterogeneous sources. This is a survey on the following data source description approaches:

GAV (Global As View)
LAV (Local As View)
GLAV (Combining LAV & GAV)
as well as related query writing algorithms and systems

Semantic Interoperability

Semantic interoperability, on the other hand, allows users to negotiate and understand the meaning of the metadata items both in the same application domain and between application domains. Semantic interoperability refers to the extent to which different metadata schemes express the same semantics in their categorization. Successful interoperation requires clarity on how the categories of metadata relate to each other across different schemes. To this end, several questions must be answered:

When do elements have the same meaning?
When elements are derivatives, subsets, or variations of each other?
When elements are completely unrelated?

Furthermore, different application domains have established different metadata standards, making the interoperation of applications from different domains a tricky task. The problem becomes even more complicated when a vast body of standards already exists for the same application domain.

Semantic Interoperability Survey

The state of the art survey on Semantic Interoperability in Digital Libraries focuses on semantic interoperability issues, and in particular on:

domain-specific metadata models
models used for audiovisual content description
models for the description of items in the cultural heritage domain (including the holdings of archives, libraries and museums)
metadata schema interoperability resolution of semantics using taxonomies, thesauri and ontologies

User Heterogeneity

On the other hand, Internet access has resulted in digital libraries being increasingly used by diverse communities for a variety of purposes; among these sharing and collaboration have become important social elements. In addition, a user's information-seeking activities are no longer bound, neither geographically nor temporally. Information access can be achieved through a variety of devices from users' offices, homes, hotel rooms or even on the move, at any time of the day or night, seven days a week. As a result, information systems are seeing far greater use. More importantly still, the kind of people doing so now range well beyond librarians or scientists, as was once the case.

User Heterogeneity is, hence, a significant problem for digital libraries. Users have ever more complex needs and different users have differing requirements. At the same time, users want to achieve their goals with a minimum of cognitive load and as much enjoyment as possible. Furthermore, we must factor in the matter of information overload which fuels the need for more sophisticated and user-centered services which can provide access to the content of digital libraries. Individuals as much as groups of users have to be better supported if they are to capture, structure and share knowledge successfully. Furthermore, in the same context, both formal and informal learning activity requires similar support.

Personalization

An integral step towards these ends lies in building effective profiles of their users. A user profile is an appropriate description of the user, created manually by either the user, or automatically by the system. It is used by the system during its interaction with digital library users in order to anticipate their needs and satisfy them in the best possible way. This is achieved by adapting presentation, content, and services based on a person's task, background, history, device, information needs, location, and so forth, as dictated by the user profile. Digital libraries which fail to meet the personalization requirements posed by their users will ultimately find it difficult to retain their user base or indeed attract new users.

Therefore, this has led to the development of personalization systems which adapt their behaviour to the goals, interests, and other characteristics of their users, either as individuals or as members of particular groups.

Central to all personalization systems is the issue of user profile representation. This provides the means to record the user's preferences and status and so filter the content retrieved, personalize the services offered as well as track user access behaviour and needs. However the construction of user profiles can represent considerable effort which remains largely invisible to the layman.

The aim of the User Modeling for Personalization in Digital Libraries Survey is to study user profiling in Information Retrieval and Information Filtering. It describes different user profile representations, such as history-based, vector space model, weighted n-grams, and classifier-based profiles, explicit and implicit methods for user profile acquisition, user context, existing standards and models, and user profile management in major commercial systems and research projects.

User profiles can be used in a variety of ways to individualize user experience which means of course that approaches to personalization also differ. However it has been commonly observed that the largest proportion of research derives from the Information Retrieval community, with that of the Database community next most in evidence, in many cases inspired by Institutional Repositories (IR).

The Profile Usage for Personalization in Digital Libraries Survey covers personalization methods proposed in the IR and Database communities. It describes information filtering, continuous queries, recommender systems and personalized search engines.

Other Vital Work

Heterogeneity is by no means the only issue to consider in digital library design. During the first year of work, the IAP cluster has been drafting a set of comprehensive surveys and reports on other key relevant areas of interest to provide broad overviews of existing models and approaches as well as identify problems. These surveys formed the basis for establishing common approaches on information access, information integration and personalization; they were also instrumental in initiating joint research in a number of the aforementioned areas.

Apart from the surveys mentioned above, other surveys already in draft relate to the following topics:

Information Access Models and Modes
Metadata in the Context of DL
Peer-to-Peer Data Management Systems
Data Annotation and Provenance in Large Scale InformationIntegration Systems

Work carried out on the formulation of these surveys has served to identify major themes in research on both information access and personalization, as follows:

Information access: data indexing for complex similarity measures
Information integration: query processing and routing in P2P architectures
Personalization: modelling of user preferences and more general contexts

The surveys are available from the Information Access and Personalization cluster website:
http://delos.di.uoa.gr/transactions.php?type=Reports

Author Details

Georgia Koutrika
University of Athens
Email:
Telephone: +30 210 727 5242
Fax: +30 210 727 5214

Publication date: June 2005
File last modified: Monday, 22-May-2006

The Delos Newsletter is published by the Delos Network of Excellence
and is edited by Richard Waller of UKOLN, University of Bath, UK.

PDF version of the whole issue

Digital Library

DELOS Community

DELOS search

Contact DELOS