|
International
Standard
ISO 24613-1
Second edition
Language resource management —
2024-01
Lexical markup framework (LMF) —
Part 1:
Core model
Gestion des ressources linguistiques — Cadre de balisage lexical
(LMF) —
Partie 1: Modèle de base
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: [email protected]
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Key standards used by LMF . 3
4.1 Unicode .3
4.2 Language coding .3
4.3 Script coding .3
4.4 Unified modelling language .3
5 The LMF model . 3
5.1 General .3
5.2 Class inheritance and data category selection procedures .4
5.2.1 Class inheritance .4
5.2.2 LMF attributes .4
5.2.3 Data category selection (DCS) .4
5.2.4 User-defined data categories . .4
5.3 LMF core package .4
5.3.1 General .4
5.3.2 LexicalResource class .5
5.3.3 GlobalInformation class .5
5.3.4 Lexicon class .6
5.3.5 LexiconInformation class .6
5.3.6 LexicalEntry class .6
5.3.7 Form class .6
5.3.8 OrthographicRepresentation class .6
5.3.9 GrammaticalInformation class .6
5.3.10 Sense class .6
5.3.11 Definition class .7
5.4 Cross reference (CrossREF) model .7
5.4.1 General .7
5.4.2 CrossREF class .7
5.4.3 CrossREFConstraint class .7
5.5 Methods for data category selection and subclass creation .7
5.5.1 General .7
5.5.2 Generalization.7
5.5.3 Object instantiation .8
5.5.4 Design choices.8
5.5.5 Data categories for orthographic representation .8
5.5.6 Principles for model simplification .9
5.6 LMF extension use .9
5.6.1 General .9
5.6.2 Lexicon comparison.10
Annex A (informative) Data category examples .11
Bibliography . 14
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
This second edition cancels and replaces the first edition (ISO 24613-1:2019), which has been technically
revised.
The main changes are as follows:
— several changes have been made to Figure 1 “LMF core package”, as follows:
— the OrthographicRepresentation class associations with the Form and Definition classes previously
had a cardinality of 1 to 1, which did not correctly represent the intent of the UML model; the revision
of the cardinality to 1 to 0.* in each case now provides a correct model;
— the type: intern/extern attribute-value pair is no longer included in the CrossREF class since it
described linking processes relevant for implementations, not associations relevant for a metamodel;
— the full names relationship values in the CrossREF class, “synonym/composition” replace the
abbreviations, “syn/compo”;
— the class names in Figure 1 are now harmonized with the LMF style;
— relevant information has been moved from the tables in ISO 24613-2:2020 to Table A.1, meaning that the
latter now contains more complete examples of values and attributes allocated to classes first introduced
in this document.
A list of all parts in the ISO 24613 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
Optimizing the production, maintenance and extension of electronic lexical resources is one of the crucial
aspects impacting human language technologies (HLTs) in general and natural language processing (NLP) in
particular, as well as human-oriented translation technologies. A second crucial aspect involves optimizing
the process leading to their integration in applications. Lexical markup framework (LMF) is an abstract
metamodel that provides a common, standardized framework for the construction of computational
lexicons. LMF ensures the encoding of linguistic information in a way that enables reusability in different
applications and for different tasks. LMF provides a common, shared representation of lexical instances,
including morphological, syntactic and semantic aspects.
The goals of LMF are:
— to provide
...