Comparison of CEN, FGDC and ISO standards for metadata

Ing. Jan Ruzicka
Institute of Economics and Control Systems
VŠB – Technical university Ostrava
17. listopadu 15, 708 33 Ostrava – Poruba
E – mail: jan.ruzicka@vsb.cz

Abstract

This paper describes comparison of three main standards that are used for describing metadata of geodata. There are compared basics elements of metadata and author's points of view also. Paper discussed problems and deficiency of these standards.

Introduction

Firstly I want to apologise, because the paper had to included comparison of two standards only, but there are compared three standards now.

There are several standards for metadata of geodata nowadays. Standard CEN (prEN 12657 Geographic Information – Metadata) is mostly used in the Europe. Another standard comes from the USA organisation FGDC (Federal Geographic Data Committee). Third standard comes from ISO (International Organisation for Standardisation).

All of those standards describe data set. Data set is logical part of information system or database. All of those standards would like to use following metadata elements to describe data set. Data set identification (title, version, etc.), overview (abstract, spatial schema, language, purpose of production, etc.), quality elements (lineage, positional accuracy, etc.), related documents, related data sets, spatial reference system, extent (spatial, temporal), content (data structure, classification, etc.), administrative metadata (point of contact, distribution), metadata of metadata (author, dates).

The paper has compared pre-standard "ENV 12657:1998 Geographic information – Data description – Metadata", which was created by European normalisation institute (CEN). In following text we will call that standard simply ”CEN”. Second compared standard (only working draft) was ISO/CD 19115, from the end of year 1999. The standard was made by ISO/TC 211. In following text we will call that standard simply ”ISO”. The last compared standard comes from Federal Geographic Data Committee (FGDC), USA and is called Standard for Digital Geospatial Metadata. The compared version is from the end of year 1998. In following text we will call that standard simply ”FGDC”.

Structure and expressive expedients of standards

ISO standard uses three basic expressive expedients. Textual description organised in paragraphs (chapters), UML (Unified Modelling Language) diagrams and textual description organised in tables. UML diagrams give well-arranged description of metadata elements. UML diagrams can be used in object-oriented development environments also. UML diagrams did not give full description of metadata elements. It is necessary to use tables for getting full descriptions of elements, but organisation in tables is not well arranged. Good parts of tables are short names of metadata elements. Short names can be used in XML (eXtensible Markup Language) or in another mark-up language (such as SGML).

CEN standard uses four expedients. Textual description organised in paragraphs (chapters), EXPRESS schemas, EXPRESS-G schemas (diagrams) and textual description organised in tables. EXPRESS-G schemas give similar well-arranged overview as UML diagrams, and did not give full description of metadata elements also. For that purpose there are EXPRESS schemas. EXPRESS schemas are very well arranged and give full description in textual form. Description organised in tables is very well arranged also.

In spite of not using diagrams and tables, FGDC description of metadata elements is well arranged and gives full description in one note. FGDC standard uses textual description, which is organised in simple structures. Those structures are easily comprehensible. An advantage of that description is, that reader does not need to know UML or EXPRESS.

In the next part of the paper are described metadata elements and how they are presented in compared standards.

Data set identification (name, version, etc.)

CEN has got a mandatory item data set name and several optional items (such as alternate name, version). FGDC uses class document. The class document provides description of document (data set) using name, version, document type (for example data set, video, audio). ISO uses similar way to FGDC.

The advantage of FGDC and ISO are possibilities of the class document usage to identify different types of documents. The document class can be used for identification of digital geo-data set, map, printed document, video, audio, etc.

Data set overview (abstract, language, purpose of production, spatial schema, etc.)

All of those standards give possibilities to specify description (abstract) of data set, language, character set and purpose of production. Purpose of production is optional in ISO and CEN. There are big differences in spatial schema definition.

CEN uses eight predefined basics spatial schemas (referenced from ENV 12160, for example ”Faces completely covering the plane without any gaps or overlap”, ”The planar graph linear network - there are Edges not intersecting, except in its beginning and ending, with Terminating Nodes associated in both cases. Isolated Nodes are allowed but Intermediate Nodes not”, ”A raster image is composed by a collection of Raster bands without any topology associated”) or allows to specify user-defined spatial schema (built from basic elements in conformity to ENV 12160, for example point, pixel, band, frame). Spatial schema is mandatory element in CEN metadata. In a case of predefined spatial schema raster image standard allows to specify raster type and raster description (using free text).

ISO gives more than 10 classes to specify spatial schema, inside metadata, without using external standard. But spatial schema is optional element of the standard. User can specify spatial schema type (raster, vector, TIN, image, matrix, etc.) There are also elements for raster, vector and image description. For example in a case of vector user can specify count of elements, geometric types, and topology types. In a case of raster standards allows to specify color depth, size, resolution, etc. In a case of image user can define number of bands, sensor type, etc.

FGDC gives similar possibilities to ISO in a case of vector. In a case of raster and image allows only parameters similar to ISO raster parameters.

Very important metadata element is data set sample. All of those standards give elements to define it. CEN gives only themes how to describe sample. FGDC and ISO give precise description how to specify sample. For example they allows specifying name, type, URL, description of the sample.

Quality elements (lineage, positional accuracy, etc.)

CEN uses external standard ENV 12656 to describe quality of the data set. There have to be specified lineage or another quality element. User can define quality parameters such as completeness, homogeneity, consistence, positional accuracy and metaquality elements also. ISO has got quality element optional and uses external standard ISO 19113. Content is similar to CEN. FGDC is similar to CEN, but elements are optional.

Related documents

CEN allows describing documentation related to data set, but CEN gives only free text to do it. That’s why there can be problem with identification of documents. ISO and FGDC use class document (described in chapter data set identification). FGDC is similar to ISO.

Related data sets

CEN uses identical data set name to specify related data set. ISO and FGDC allow to use class document and user can choose form relation types (aggregation, association and composition).

Spatial reference system

CEN, FGDC and ISO give similar possibilities to describe it. ISO uses external ISO standard. FGDC allows choosing from lists of Map Projections, Reference ellipsoids, etc. (usually for U.S.A. area) and gives possibilities to specify user-defined reference system. CEN allows specifying reference system parameters (map projection and it parameters, etc.), without choosing from lists.

Extent (spatial, temporal)

CEN, FGDC and ISO allow specifying it similarly. Planar extent is defined by rectangle or polygon or geographic areal. User can use gazetteer (thesaurus) to specify geographic areal(s) in ISO and FGDC. Choosing areal(s) from gazetteer minimise mistakes in areal(s) identification, but there should be gazetteers.

ISO and FGDC require geographical co-ordinates. CEN allows specifying co-ordinate systems in witch co-ordinates are given. CEN authors provide wide space to specify extent in variety co-ordinates systems, but it is quite big problem for managing metainformation systems. Consequently metainformation systems creators need transformation equations for variety co-ordinate systems and it is not possible.

All of those standards define elements for temporal and vertical extent description. ISO gives additional elements for spatial-temporal extent description.

Content (data definition (description))

CEN requires to describe data content using free text, but has got additional optional elements to define data classes (entities, object types), their attributes and relations (superclass, subclass, association). Each data structure element can be classified by thesaurus element. ISO has got data definition optional. ISO allows specifying pointer to external feature catalogue. FGDC required entering overview of the data definition or detailed data definition description. Detailed data definition description is similar to CEN, but without connection to thesaurus.

CEN gives good possibilities to describe data content and against FGDC allows data elements classification. Data element classification may be useful in metainformation systems for searching.

Classification

CEN provides optional classification using thesaurus and thesaurus elements. User can specify thesaurus name, version, date of birth, administrator and external documentation to the thesaurus. Thesaurus elements are specified by element name and there are possibilities to define element’s relationship (related, synonym, etc.).

ISO has got a two parts of classification. Firstly user have to classify data set by thematic category. Thematic categories are given by predefined list. Secondly user can use thesaurus, thesaurus elements and elements types (thematic, geographical, temporal, etc.). FGDC gives similar equipment to ISO. FGDC has got thesaurus types (thematic, geographical, etc.). User has to specify data set using at least one element from thematic thesaurus.

CEN against ISO and FGDC allows defining thesaurus version and administrator and it is quite good aspect. But CEN does not require data set classification and it is quite big deficiency of that standard. From practical experiences flows that searching by classification is most common query in metainformation systems.

Administrative metadata (organizations, persons, distribution)

CEN allows specify point of contact (organisation(s), person(s)) in relation to the data set and type of relation (free text). ISO and FGDC give more precise elements to describe point of contact and element point of contact is required.

Data set constraint elements are more concrete in ISO and FGDC then in CEN. CEN gives free text for all elements. ISO gives class for legal constraints and class for data security. A class legal constraint allows specifying access constraints and use constraints. A class security defines security classification system and classification itself. FGDC gives similar possibilities to ISO, but in legal constraint uses only free text.

Information about support services (mainly update of data set) is in CEN represented by free text. ISO allows defying what (for example whole data set, objects, attributes and geometry) and how frequently (using class for temporal parameters) is updated. But there are not possibilities to specify another aspect of support services (for example update price). FGDC gives list of possible update frequency, but does not allow specifying what is updated.

On-line access to data set is optional free text in CEN. FGDC allows to specify IP address (computer name) and in a case of modem connection some connection parameters such as parity and speed. There can be specified connection instruction. ISO has go a similar elements, but without technical parameters of modem connection.

Specification of data transfer formats is optional free text in CEN. ISO uses class, which allows specifying name, version, description and compression technique of format. At least one format has to be specified. FGDC is similar to ISO, but gives predefined list of formats.

Data media description is optional free text in CEN. ISO allows defying media name, compatibility and storage type (tar, ISO 9660, etc.). FGDC is similar to ISO, but the element is mandatory.

Element Price is mandatory in CEN and FGDC, but not in ISO. It is quite good to have got an element price optional. Metadata creators usually do not like to specify price of data. They prefer to give contact to price manager (distributor).

Element units of distribution is optional in all of those standards. ISO allows specifying size of data also.

Metadata about metadata (author, date of birth)

CEN allows specifying three type of dates (birth, update, verification) and they are conditional- mandatory. Another mandatory element is metadata language.

ISO required metadata language and character set. ISO required date of birth (or update date – but do not do differences between them). Mandatory elements are standard name and version also.

FGDC do not do differences between date of birth and update date, but allows specifying date of metadata verification (revision). Mandatory element is metadata author. Mandatory elements are standard name, version also. FGDC required specifying of metadata access, usage and security also.

FGDC is most strictly in element metadata about metadata, but all of required elements are very useful for metainformation systems. Mostly are useful metadata author, metadata access and metadata security elements. But very useful is making a difference between date of birth and update date also and FGDC do not do it.

Extended metadata elements

ISO and FGDC give possibilities to extend metadata elements by specifying new elements that are necessary for data description and which general standard can not define. CEN does not give that possibility. In extremely case that extension can produce new metadata description compatible with standard only in mandatory elements. But usually there is not that danger.

Mandatory metadata elements

  CEN ISO FGDC
metadata language + + -
metadata character set - + -
standard name - + +
standard version - + +
data set name + + +
abstract + + +
data set language + + +
data set character set + + +
spatial schema + - -
date of metadata born + + +
date of metadata update + - -
date of metadata revision + - -
spatial extent + - +
temporal extent + - +
quality elements + - -
organisation + + +
point of contact - + +
category - + +
purpose of production - - +
frequency of updates - - +
restriction of metadata access and usage - - +

Conclusion

We can say that there can be deficiency in the comparison, because there were compared working draft (ISO), pre-standard (CEN) and only one full standard (FGDC). In spite of that comparison can be useful in getting global overview about compared standards. The reader can get basic information about differences between standard’s elements.

FGDC and ISO standards are more concrete in metadata description. CEN standard gives only themes how to describe metadata elements, in a lot of cases, but FGDC and ISO standards give precise description how to describe metadata elements. At the other side ISO standard has got a lot of useful metadata elements optional. But in many cases useful optional elements are difficulty to get it. That fact can improve usage of that standard for getting metadata from various sources. Because of those facts, ISO may become more usable standard than CEN in Europe. Another fact is that development of CEN is finished. But ISO standard will be more used than CEN and FGDC standard, because of another main fact. Main fact is that ISO standard comes from ISO. ISO standards are generally used in many countries and they are international.

Metainformation systems based on CEN standard should start thinking about migration to ISO very quickly. First step can be creating possibility for exporting metadata in conformity to ISO standard. In the first step should be three main targets – mandatory classification, mandatory point of contact and implementation of useful lists given by ISO standard. Metainformation systems based on FGDC standard should start thinking about creating possibility for exporting metadata in conformity to ISO only, because FGDC gives similar possibilities to ISO and gives something more.

References

  1. CEN /TC 287: ENV 12657:1998 Geographic information – Data description – Metadata, 1998
  2. FGDC: Standard for Digital Geospatial Metadata, 1998
  3. Gouveia, C., Henriques, P., Nicolau, R., Rocha, J., Santos, M.: Moving from CEN TC 257 to ISO/TC 211 - The approach of the Portuguese Natonal Geographic Information Infrastructure, In. Proceedings from 4th AGILE Conference on Geographic Information Science, Brno, Czech Republic, 2001
  4. ISO/TC 211: ISO/CD 19115, 1999
  5. Ruzicka J.: Metainformation system of CAGI, In. proceedings from 6th EC-GIS Workshop, The Spatial Information Society - Shaping the Future, Lyon 2000, Lyon 2000
  6. Ruzicka J.: XML and metainformation systems, In. proceedings from GIS Ostrava 2001, Ostrava 2001, ISSN 1213-239X