(DRAFT)

CIS Working Group 3 - CIS Thesaurus and Glossary

Introduction

The first regional meeting of CIS National and Collaborating Centres was held at CIS Headquarters in Geneva, Switzerland, on 17-18 May 2005.

The meeting was held at the request of a number of representatives of CIS Centres in the region, in part because this year's regular CIS Centres meeting is scheduled to be held in Orlando, Florida, and it will be inconvenient for some European Centres representatives to attend it.

There were a number of the outcomes from the meeting see descriptions in both CIS Newsletters June 2005 and July 2005 www.sheilapantry.com/cis

The Future and Priorities

The discussions that followed gave a real opportunity to start and determine what is needed as priorities to create the new CIS Network.

Four Working Groups (WG) were formed as follows with CIS members who volunteered to start the work immediately:

WG 1 on Publicity, Promotion and Communications
Members: Sheila Pantry (UK), Irja Laamanen (Finland), Roman Litvyakov (Russian Federation) and Annick Virot (CIS HQ).
The Draft paper has already being discussed within the WG and CIS members worldwide and is currently being finalised for presentation at the CIS Annual Meeting in September in Orlando.
See Draft report on www.sheilapantry.com/cis/other/wp01.html

WG 2 on the development of the CIS Portal and the use of the CIS Logo on Centres web pages
Members: Katalin Balogh (Hungary), Roman Litvyakov (Russian Federation), Andras Szucs and Begonia Casaneuva (both CIS HQ).
The Draft paper is expected to be discussed with CIS members present at the Geneva meeting them finalised for the CIS Annual Meeting in September 2005 in Orlando.
First draft has been sent for comments to the initial Working Group.

WG 3 on CIS Thesaurus and Glossary
Members: Katalin Balogh (Hungary), Andras Szucs and Gabor Sandi (both CIS HQ). Sheila Pantry (UK)
The Draft paper is herewith and because of timescales is now sent to the WG and with CIS members present at the Geneva meeting. It will then be finalised for presentation at the CIS Annual Meeting in September 2005 in Orlando.

WG 4 on Training and E-Learning
Members: Sheila Pantry (UK), Irja Laamanen (Finland), Roman Litvyakov (Russian Federation) Catherine Blotiere (France), Maria Castriotta (Italy) Boryana Barbukova (Bulgaria) and Annick Virot (CIS HQ).
The Draft paper has already being discussed within the WG and CIS members worldwide and is currently being finalised for presentation at the CIS Annual Meeting in September in Orlando.
See Draft report on www.sheilapantry.com/cis/other/wp04.html


Reason for the Working Group being set up

At the Geneva meeting it was suggested that the CIS thesaurus be updated and its scope enlarged, taking into account similar thesauri published by other sources (e.g. the European Union's European Agency for Safety and Health at Work, the National Occupational Health and Safety Commission (NOHSC) in Australia and any others).

Which thesauri are available and current state of each one

The following is a brief description of current state of known Occupational Health and Safety (OSH) thesauri. If others are known to CIS members please will they send details to Gabor Sandi and also Sheila Pantry see below for contact details.


CIS thesaurus

The CIS Thesaurus is the trilingual (English, French, Spanish) source of terms used to index the CIS bulletin and database. It can also supply "meta" tags for indexing Web pages, and its system of facet codes has been used to organize occupational safety and health libraries and information centres.

The terms are organized hierarchically according to their specificity. Each term is associated with a facet code, which is a combination of letters or numbers that indicates the place of the term in the hierarchy.

The CIS Thesaurus represents a slight variation from the "classic" thesaurus paradigm: because terms and their facets are not generally added to the thesaurus unless they have been used to index a document, it may happen that there is no generic term covering one or more terms at a particular level in the thesaurus, although there may be an even broader term further up the hierarchy.

The Thesaurus is maintained as a database at CIS.

At intervals, the data are published in two formats: on paper and on CD-ROM. The printed version (1999) presents the terms in three orders, thematically by facet code, alphabetically and, in the case of chemical substances, numerically by Chemical Abstracts Service Registry Number. The CD-ROM edition (2001) provides easy navigation within the hierarchical structure and is searchable in English, French or Spanish.

For a better idea on the format you can take a look at a few typical sample pages and screen-shots of the printed and computer-based versions respectively.

The on-line version of the Thesaurus is also available for browsing/searching.

Selected terms can be used for example to search the CISDOC bibliographic database or to add DESCRIPTOR metatags to your HTML files.


Australian National Occupational Health & Safety Commission
Australian Occupational Health and Safety Thesaurus (AOHST) Guidelines

Hierarchical Structure of Preferred Terms
Alphabetical List of Preferred
Terms and Non-preferred Terms
3rd Edition, March 2003
National Occupational Health & Safety Commission, Canberra
Commonwealth of Australia 2003
(First edition, 1999.)
Available on NOHSC web see: www.safeworkaustralia.gov.au/AboutSafeWorkAustralia/WhatWeDo/Publications/Documents/11/AustralianOHSThesaurus_2003_PDF.pdf

Introduction

The Australian Occupational Health and Safety Thesaurus (AOHST) is a list of subject or descriptor terms covering concepts relevant to occupational health and safety (OHS) in Australia. It was developed by the National Occupational Health & Safety Commission (NOHSC) for use by any Australian organisation or individual with an interest in OHS.

The AOHST is of particular importance in the online environment in providing subject access to documents, databases and web sites. The development of this thesaurus was an outcome of the decision to apply metadata standards to NOHSC information available on the Internet. The use of thesaurus terms, and the application of metadata to information in the online environment, will provide added value to searching capability and ensure that searchers are able to locate all relevant information.

Purpose and Scope

The purpose of the thesaurus is to provide a standard list of terms to be used in describing what a document or other type of information source is about. These selected terms are called preferred terms.

AOHST also includes synonyms that were not selected as preferred terms, but have been included as references to make it easier to find the preferred terms. These references are called non-preferred terms.

Example

Overuse injuries has been selected as the preferred term to describe injuries caused by repetition and overuse. AOHST also includes the non-preferred terms, Occupational overuse syndrome, Repetitive strain injuries, OOSand RSI.

These non-preferred terms refer the searcher to the preferred term Overuse injuries.

By using (preferring) one term instead of five to represent this concept, it ensures that all relevant information is retrieved using the search term: Overuse injuries.

The thesaurus also includes a number of notes (Scope Notes) which help explain the meaning and usage of a particular term.

Example

Textile industry
Use for manufacturing of textiles and textile products
The scope of AOHST is to cover concepts relevant to OHS in Australia at national, State and local level. Some concepts are international, but the emphasis is given to the Australian context wherever necessary.

The aim is to limit the size of the thesaurus to no more than 3,000 terms. This is to ensure that it is relatively easy to review and keep up-to-date.

The terminology currently included in AOHST is OHS specific. This means that certain areas that are peripheral to the NOHSC's priorities, such as compensation and insurance, are less developed at this stage. However, the thesaurus is structured in such a way as to allow the inclusion or exclusion of other categories, as required. The following areas of OHS in Australia are included:

Where possible, the aim is towards single concept terms in the thesaurus.

For example:

Australian OHS Thesaurus 3

The purpose of this is to keep the AOHST as simple and as short as possible, as well as to make indexing and searching more flexible. However, where appropriate, multiple concept terms are included. When selecting a subject term for a particular situation, it may be necessary to select a group of thesaurus terms to cover all aspects of the subject concept.

For example:

Training material on safety in the construction industry may have the following thesaurus terms selected:

This will ensure that searchers can find the material either by keying in any of the individual terms, or by combining some or all of the individual terms.

The following criteria were taken into account when choosing particular terms in preference to other similar terms (synonyms): Australian spelling and terminology are preferred.

NOHSC Metadata Guide

Also of interest will be The NOHSC Metadata guide that is a set of metadata guidelines specific to the needs of Occupational Health and Safety organisations and related web development work.


Australian Emergency Management Institute Terms Thesaurus

http://library.ema.gov.au/emathesaurus

Australian Emergency Management Terms Thesaurus provides a list of terms commonly used across the emergency management sector. The thesaurus includes terms likely to be used by the sector, but not those relating to specific areas of particular emergency services. The thesaurus includes preferred terms, non-preferred terms, related terms and scope notes where available. The Thesaurus is maintained by EMA Library, in consultation with subject matter experts.


The European Agency for Safety and Health at Work Thesaurus

The Thesaurus has been developed by the Agency to help it meet the needs of the Member States - now 25 in number, to have access to the Agency's web site by the use of consistent OSH terminology.

It has drawn heavily on the Australian Thesaurus as a main starting point and has also drawn on the CIS Thesaurus.

The methodology used in achieving the current situation is that the compiler brought together the terms in English, altering any that were specifically Australian concepts, altering any Australian spelling, adding known European terms.

The results were then sent to the Agency's Focal Points in the Member States for them to translate into their own languages, verifying the results through consultation with subject specialists.

The European Thesaurus is now in 22 languages.

Contact point for the Agency is: Finn Sheye, European Agency for Safety and Health at Work, Gran Via 33, E-48009 Bilbao, Spain | Tel: + 34 944-794-360 | Fax: + 34 944-794-383 | Email: sheye@osha.eu.int | Web: http://osha.europa.eu


USA

During the research for this paper the following reference was also found as being publish in the USA

Environment, Safety and Health Thesaurus/Dictionary, edited by D.C. Clayton
Published as Technical Report
OSTI ID: 5471108; DE91013428
Report Number DOE/EH-0186 1 July 1991 by the US DOE Office of Scientific and Technical Information, Oak Ridge, TN (United States). 510 pages

The Environment, Safety and Health Thesaurus/Dictionary, was developed for the Office of Safety and Quality Assurance (EH-30) by the Department of Energy's (DOE) Office of Scientific and Technical Information (OSTI).

This thesaurus/dictionary is to provide a single departmental reference for:

  1. definitions or semantic structure of environment, safety, and health terms that will help assure consistent DOE-wide understanding of these terms, and
  2. synonyms and related terms that will improve the logic of a user's analytical strategy for word searches in computerized environment, safety, and health information systems.

In addition to special data fields found within the individual word blocks, the most noteworthy features of the document are the three appendices following the main body of the thesaurus.

These appendices include:

  1. a listing of all thesaurus acronyms and their reciprocal phrases;
  2. a listing of all thesaurus terms under broader subject categories; and
  3. a separate mini-thesaurus for the DOE FRASE (Factor Relationship and Sequence of Events) vocabulary used on the Safety Performance Measurement System (SPMS).

It also states that eventually, an electronic version of the thesaurus/dictionary will be available on the Safety Performance Measurement System (SPMS) to improve users` search and analytical capabilities.

Whether it is still available will need to be confirmed.
Perhaps USA CIS Member NIOSH via Dr Vern Anderson will be able to help here.


European Union

Kevin P. Gardiner, Principal Administrator Unit A5 - Information and Communication European Commission - Directorate General Justice, Freedom and Security recently contacted Sheila Pantry to discuss a new project:

TRANS-JAI - 1st Workshop on Thesauri and 2nd Workshop on other areas including OSH.

The Commission is to hold a preliminary draft workshop on Thesauri related issues in November 2005, but will be having a first Workshop preparatory meeting in September 2005.

Kevin Gardiner states that in order to ensure that the TRANS-JAI project is able to be "transposed" into other areas of the EU's subject areas, he has decided to hold a 2nd TRANS-JAI Workshop on this in January / February 2006.

There two areas that Kevin intends to look at namely: Health and Safety and Work and Environment.

Contact: Kevin Gardiner, Principal Administrator, TRANS-JAI Project Manager, Unit A5 - Information and Communication, European Commission - Directorate General Justice, Freedom and Security | Tel: +32 2 29 57 219 | Fax: +32 2 29 98 054 | Email: kevin.gardiner@cec.eu.int | Internet: http://ec.europa.eu/justice/index_en.htm


The British Standards Institution

Is currently revising its Standard for thesauri including the latest thinking, such as adapting thesaural practices for taxonomies.

Contact: Sophie Phipps, Marketing Services Manager, BSI Business Information | Tel: +44 (0)20 8996 7940 | Fax: +44 (0)20 8996 7553 | Email: sophie.phipps@bsi-global.com


Controlled Vocabulary, and how is it useful?

A controlled vocabulary takes the guess work out of searching and makes a database easier to search. Since people have many different ways of describing concepts, drawing all of these terms together under a single word or phrase in a database makes searching the database more efficient as it eliminates guess work. However, arriving at this efficiency requires consistency on the part of the individual indexing the database and the use of pre-determined terms.

It is likely that searchers are already familiar with the concept of controlled vocabulary. Yellow Page listings are arranged using controlled vocabulary. For example, a search for "Car Dealers" leads you to a note to "see Automobile Dealers." At a basic level, this is how a controlled vocabulary system works.

Conducting a search in a database that uses controlled vocabulary or indexing terms is efficient and precise. The biggest advantage to controlled vocabulary is that once the correct term is found and used, most of the information needed is grouped together in one place, saving time of having to search under all of the other synonyms for that term.

It is arguable whether controlled vocabulary or natural language systems give the best retrieval performance. Free Text or Natural Language systems often provide more results in a shorter time span because the searcher is searching all the fields of a given database (the Google search engine is a form of free text search).

Such searches work well for very specific searches, however, when a topic is older or broader in scope, it is likely that the searcher will retrieve irrelevant hits. It is also possible to miss some records relevant to the search because the searcher did not use the proper search term.

As with a web search, searching a database requires striking a balance between preciseness and generating enough hits to make the search successful.

Stop Words

In many online databases searchers must keep in mind that there are certain words that are ignored. These are called "Stop Words." Common stop words are words such as 'the', 'a', 'an', 'this', and 'that'.

Many computerized databases have an index/inventory and it is possible to group the words indexed and search on these.

Given the vast amount of OSH information indexed into databases the user will need to choose what is needed, how up-to-date, whether it is coming from validated and authoritative sources, and which country. For example in the aggregation of ten databases OSH UPDATE in which the CISDOC database is part there are over 532,000 records. A controlled thesaurus is not used across these databases.

Conclusions and options

So it would seem that there are number of current interests in OSH thesauri.

CIS Members need to consider a number of options to decide on the way forward. The following are offered but Members are invited to comment and offer other solutions:

Option 1. European Agency

  1. Whether the European Agency's thesaurus is appropriate for wider use in all CIS countries - about 137 countries. This option may help when searching across a number of major OSH websites that are not EU Member States own web site but are partners of the European Agency.
  2. If so, negotiate with the Agency and perhaps offer to contribute the thesaurus in other languages not covered by the European Agency.
  3. If not, then move onto option 2

Option 2. CIS Thesaurus

Decide whether CIS thesaurus is a good basis to start to enlarge it. It will be necessary to:

Option 3 Why use a thesaurus?

  1. Consider alternative approaches to information retrieval given the quick development of the technologies to index and retrieve detailed information, e.g. Google, AltaVista and other search engines.
  2. Consider Controlled Vocabularies for indexing and searching

Please send your comments back to both Sheila Pantry sp@sheilapantry.com and Gabor Sandi sandi@ilo.org as soon as possible, by 20 August 2005 at the latest, so that this Working Group paper can be presented at the Orlando CIS Meeting on 18 September 2005.

Many thanks for your help.