DL news
2007-12-03: DELOS Association established
The DELOS Association for Digital Libraries has been established in order to keep the "DELOS spirit" alive by promoting research activities in the field of digital libraries.
More info...
  
2007-06-08: Second Workshop on Foundations of Digital Libraries

The 2nd International Workshop on Foundations of Digital Libraries will be held in Budapest (Hungary) on 20 Septemeber 2007, in conjunction with the 11th European Conference on Research and Advanced Technologies for Digital Libraries (ECDL 2007).
Event website
  

DL Events
January 24-25, 2008 - Padova, Italy

4th Italian Research Conference on Digital Library Systems
Event website
 

December 5-7, 2007 - Pisa, Italy

Second DELOS Conference on Digital Libraries
Event website
   

Delos News as an
RSS-feed
Home arrow Newsletter Issue 1 - KESI
PDF Print E-mail

Newsletter Issue 1

Main | Feature Article | Cluster Reports | DLA | IAP | A/V-NTO | UIV | KESI | PRESERV | EVAL |
UIV - ongoing activities | Latest News

Knowledge Extraction and Semantic Interoperability

Liz Lyon describes the range of research activities being planned by the partners working in this challenging area.

Introduction

The thematic area of Semantic Interoperability is growing in importance in digital library (DL) research (taking the interpretation of "digital library" at its broadest). It applies to the application of different vocabularies and terminology used in descriptions of digital objects for both learning and research, collections of those objects, collections of datasets and resources used in the wider cultural heritage sector and in e-research. Indeed, cross-sectoral and cross-domain shared understanding of semantic descriptions is one of the goals of the Semantic Web as envisaged by Tim Berners-Lee in his "roadmap" published in 1998, (for further details, see http://www.w3.org/DesignIssues/Semantic.html). This vision has more recently (2001) been applied to "Grid computing" and e-science / e-research initiatives in the Semantic Grid approach, (see http://www.semanticgrid.org/).

In addition, the application of algorithms for the mining and analysis of digital resources (text, data, complex objects), offers exciting opportunities for the extraction of new knowledge and the re-use of data and information in new ways.

Today, we are beginning to address some of the issues and challenges in this complex area and the Delos Network of Excellence has the opportunity to carry out some important research to move the Semantic Web/Grid vision forwards towards implementation.

Aims and objectives of the cluster

The Knowledge Extraction & Semantic Interoperability research cluster has two key strategic goals:

  • To co-ordinate a programme of activities which brings together research excellence from a range of inter-related knowledge engineering and information management areas, and which facilitates the sharing of experience and expertise amongst practitioners from both DL and Grid/computing science backgrounds.
  • To explore the potential of new models, algorithms, methodologies and processes in a variety of technical applications, institutional frameworks and cross-sectoral environments, which will lead to the creation of guidelines and recommendations of best practice for dissemination to the widest possible community of interest.

We can examine some of the themes underlying this area in more detail.

Open Access to Digital Repositories of Data and Information

The development of digital repositories for the support of research and learning is at a critical stage. There has been a concerted effort to promote open access to the research literature with the success of the Open Archives Initiative, the development of the ePrints software from the University of Southampton, UK, the establishment of the European-focused Open Archives Forum and national initiatives such as DARE (Digital Academic Repositories). There has also been a drive to promote institutional repositories as the location for e-print deposit e.g. the DSpace project at the University of Cambridge, UK. These developments have all been made possible through the implementation of the OAI-Protocol for Metadata Harvesting within the information architectures. Digital resources published in this way may also include primary research data, experimental data generated by Grid-enabled applications, gene and protein structure data, statistical data, satellite data, census data and environmental modelling data. The current increase in Grid -enabled applications is resulting in large volumes of data being collected in data libraries and this trend is likely to continue in the future. These large datasets need to be managed, curated and made accessible to the research community.

In parallel to the development of repositories of research data and derived information, many institutions are creating learning objects for manipulation and inclusion in learning programmes and curriculum-based activities. Learning Management Systems are being deployed as vehicles for the development and distribution of online courses as part of e-learning initiatives. Repositories of learning objects are being developed, both at national and institutional level, to enable the access to and deposit of discrete learning objects for wider use by the community.

Provenance

The integrity, authenticity and value of the mass of information and knowledge derived from original data are actually dependent on a number of critical factors. For example in science, the provenance or origin of a particular set of data is essential to determining the likely accuracy, currency and validity of derived information and any assumptions, hypotheses or further work based on that information. Significant research has been carried out on describing the provenance of scientific data in molecular genetics databases and the topic has been explored in the Global Grid Forum (GGF6) in relation to Grid data. The Open Archives Initiative has carried out work to describe the provenance of harvested metadata records and the concept is included as an element in the administrative metadata which is part of the METS metadata standard. The critical factors include the definition and acceptance of appropriate frameworks for metadata description, a shared understanding of the concept of provenance, the widespread use of unique identifiers, appropriate linking technology and the application of common ontologies for discrete domains. These concepts are relatively new but have the potential for significant impact on the way in which research and learning is conducted in the future and on the ability to integrate and re-use digital resources in a variety of ways.

Semantic Web, Ontologies and Metadata Schema Registries

In order to achieve semantic interoperability between descriptions of services, collections and items, there needs to be a shared understanding of the meanings of subject terms and descriptors. Frequently, discrete subject domains have their own shared vocabulary, however specific terms may have different meanings within another subject domain. Additionally, one particular domain may have multiple vocabularies which are used by the different communities of interest. The myriad of existing vocabularies both at domain and high level is a major challenge to implementers and users of digital libraries who are trying to locate resources and services.

There is now an increasing number of developments in the broad area of Semantic Web/Grid technologies, ranging from the development of Semantic Web-enabled Web Services to the scoping of terminology servers to provide services to distributed digital libraries. There is also a growing body of work on registries and their use in the publication and validation of metadata schemas.

Knowledge Extraction

Finally, the increasing richness of both data and the descriptive metadata contained in digital libraries offers great potential for the application of a variety of tools to extract additional information to contribute to knowledge. The research community has a growing requirement for data manipulation tools to facilitate spatial change (federation, aggregation, dis-aggregation, replication, manipulation, linking, annotation, editing/versioning, transformation) and for knowledge extraction which can include analysis (textual, musical, statistical, mathematical, visual, chemical, gene), mining (text, data, structures), and modelling (economic, mathematical, biological). Taking an example, text mining techniques have been applied to resources in various domains and in particular to biomedical materials. Similarly, data mining techniques have been applied to domain datasets such as biomedical and physical data and this form of analysis is becoming increasingly important in the understanding of outputs from Grid-enabled projects and associated data repositories.

Together these themes form a rich contextual background to the research programme of this cluster.

Cluster Partners

A number of organisations and institutions are currently involved in this Work Package:

  • Department of Electronics & Computer Science, University of Southampton, UK
  • ETH, Swiss Federal Institute of Technology, Zurich, Switzerland
  • FORTH, Crete, Greece
  • Netlab Knowledge Technologies Group, Lund University, Sweden
  • School of Informatics, University of Edinburgh, UK
  • Technical University of Crete, Greece
  • UKOLN, University of Bath, UK
  • UNIMI, University of Milan, Italy
  • University for Health Informatics & Technologies, Tyrol, Austria

Start-up activities

We have identified a number of activities to initiate a programme of work which is currently being explored in terms of the definitions and scope of the various themes.

A Forum is being created to provide a physical and virtual arena for the exchange of experience and research in all the areas/themes of this cluster. The first meeting of the Forum is planned to coincide with the European Conference on Digital Libraries (ECDL) to be held in September in Bath, UK. It will provide an opportunity to integrate systematically other relevant groups into the cluster and will take the format of a one-day state-of-the-art workshop. This development is being supported by a moderated virtual forum or discussion list for the expansion of discussion on selected topics. It is also intended to maximise opportunities to harmonise with other relevant initiatives such as CIDOC and FRBR. The activity will culminate with an evaluative report and a second Forum workshop to disseminate the findings of the Report.

In the area of Knowledge Extraction, initially a study will be produced to determine the requirements for and usage of extracted knowledge for biblio-metrics, domain analysis, issue tracking and community modelling.

Semantic Interoperability is being addressed initially by scoping the area with the aim of producing a state-of-the-art overview of DL semantic issues including the application of standards, thesauri, ontologies, Knowledge Organisation Systems and the implementation of metadata schema registries.

It is intended that the discussions and various reports produced will inform the future research programme for the cluster.

Further information will shortly be available on the cluster Web site.

Author Details

Dr Liz Lyon
Knowledge Extraction & Semantic Interoperability Cluster leader
Director UKOLN
University of Bath
UK
E-mail:   
Tel: +44 (0) 1225 386580
Fax: +44 (0) 1225 386838



Publication date: April 2004
File last modified: Monday, 22-May-2006

The Delos Newsletter is published by the Delos Network of Excellence
and is edited by Richard Waller of UKOLN, University of Bath, UK.

PDF version of the whole issue

DELOS Community
Username

Password

Remember me
Forgot your password?
Create new user
DELOS search
 DELOS site
 DELOS D-Lib
 DELOS sites