error_reporting(0); ?>
$issues_list = "issues-list-2015-05-27.html";
include('current.php');
include('issues-2015-05-27.php');
function disn($id) {
global $issuenames;
global $issues_list;
if ($issuenames[$id]) {
echo(''.$issuenames[$id].' ('.$id."
)");
} else {
echo('FIX ME!!');
}
}
?>
This document is for public comment, with comments due June 30, 2015 at 23:99 CET. Comments will be used to create future versions. Comments can be made via the MQM definition’s Github repository at https://github.com/multidimensionalquality/mqm-def/issues.
This version: | 0.9.3 (2015-06-16) (http://www.qt21.eu/mqm-definition/definition-2015-06-16.html) |
Latest version: | http://www.qt21.eu/mqm-definition/ |
Previous Version: | 0.9.2 (2015-06-12) (http://www.qt21.eu/mqm-definition/definition-2015-06-12.html) |
Diff from last major version (0.3.0): | http://www.qt21.eu/mqm-definition/diffs/mqm-0_3-0_9_3.html |
Copyright ©2014, Deutsches Forschungszentrum für Künstliche Intelligenz GmbH / German Research Center for Artificial Intelligence (DFKI)
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
This document is a draft of the MQM specification. It is subject to frequent and substantial revision and should not be relied upon for implementation.
Feedback on this document should be submitted to info@qt21.eu.
This document defines the Multidimensional Quality Metrics (MQM) framework. It contains a description of the issue types, scoring mechanism, and markup, as well as informative mappings to various quality systems. MQM provides a flexible framework for defining custom metrics for the assessment of translation quality. These metrics may be considered to be within the same “family” as they draw on a common inventory of values for data categories and a common structure. MQM supports multiple levels of granularity and provides a way to describe translation-oriented quality assessment systems, exchange information between them, and embed that information in XML or HTML5 documents.
1. Introduction (non-normative)
1.2. Quality assessment, quality assurance, and quality control
2. Terms and definitions (normative)
5.4. Integration with other metrics
7. Relationship to ITS 2.0 (normative)
8.1. Default severity levels for error-count metrics
8.3. Default severity multipliers from versions earlier than 0.3.0 (deprecated)
9. Creating MQM metrics (non-normative)
9.1. Example of defining a metric
9.2. Definition of MQM parameters
10. TAUS DQF subset (non-normative)
11. Mappings of existing metrics to MQM (non-normative)
13. Previous versions (non-normative)
Multidimensional Quality Metrics (MQM) provides a framework for describing and defining quality metrics used to assess the quality of translated texts and to identify specific issues in those texts. It provides a systematic framework to describe quality metrics based on the identification of textual features. This framework consists of the following items:
mqm:
namespace that can be used with XML or HTML5 (with appropriate adjustments to the HTML5 format) to embed MQM data in these file formats. These attributes are designed to work with Internationalization Tag Set 2.0 (ITS 2.0) localization quality metadata (7.2. MQM inline attributes)mqm:
namespace that can be used to insert MQM data into XML files when existing elements do not meet requirements. (7.3. MQM inline elements)MQM does not define a single metric intended for use with all translations. Instead it adopts the “functionalist” approach that quality can be defined by how well a text meets its communicative purpose. In practical terms, this statement means that MQM is a framework for defining a family of related metrics.
MQM is intended to provide a set of criteria which can be used to assess the quality of translations. While these criteria are intended to promote objectivity in assessment, a certain degree of subjectivity is inherent in assessing translation quality and MQM may not be able to distinguish between high-end translations that meet all specifications (other than to assure that they do, in fact, meet those specifications).
This document applies primarily to quality assessment of translated content (and thus to the output of translation systems). It does not apply to assessment of translation processes or projects. Here “translated content” is to be understood broadly to include text, graphics, and any other content which may be translated or adapted for multiple locales (i.e., a combination of language and geographical region). MQM applies to the translation industry, interpreted broadly to include localization (of software and other technical content) and “transcreation” (creative adaptation of content for target audiences and purposes, including but not limited to, adaptation of marketing materials and multi-media content), as well to various types of purely textual translation.
MQM is useful for assessing verifiable qualities of translations. It is not intended to address purely subjective criteria (such as “artistry” or “elegance”) that may be of key importance in some circumstances. Rather, it provides a functional approach to quality that seeks to see whether a translation meets specifications and to identify aspects that may fall short of expectations.
MQM is designed to apply to (monolingual) source texts as well as translated target texts. MQM’s disn('design'); ?>, disn('fluency'); ?>, disn('style'); ?>, disn('terminology'); ?>, and disn('verity'); ?> branches apply equally to source texts and target texts (although some specific issues within them might apply more to one or the other). The disn('accuracy'); ?> dimension is specific to translated texts (or, more properly, to the relationship between source and target text). While disn('locale-convention'); ?> is more likely to apply to target texts, it can apply to source texts. And, finally, the disn('internationalization'); ?> dimension applies solely to source content (many of its issues correspond to specific faults in the target text that can be identified under disn('locale-convention'); ?>).
Within the translation industry, three terms are used somewhat interchangeably to refer to quality activities: quality assessment, quality assurance, and quality control. However, within broader literature on quality these terms have distinct meanings and should be distinguished:
The focus of MQM is on quality assessment, which is essential to quality assurance and quality control. This document does not, however, specify or recommend particular quality assurance or quality control processes. (Note that within the translation industry there is widespread confusion between “quality assessment” and “quality assurance” within the localization industry, partially due to the adoption of the LISA Quality Assurance Model, which actually provided a model for quality assessment.)
The following terms and definitions apply in this document.
issues
in most contexts.)A quality translation demonstrates required accuracy and fluency for the audience and purpose and complies with all other negotiated specifications, taking into account end-user needs.For monolingual source texts, the formulation may be modified as follows:
A quality text demonstrates required fluency for the audience and purpose and complies with all other negotiated specifications, taking into account end-user needs.
As noted above, MQM applies to both source and target texts (with different dimensions applying to them). The default MQM scoring method accordingly allows for users to assess source texts to obtain a quality score for source texts and, if both source and target are assessed, issues found in the source may be counted against penalties for issues in the target text, resulting in higher scores. While not all implementations or usages scenarios will examine the source or count problems in the source in favor of translators, this principle is intended to help ensure that translators are recognized and credited when they have to translate inferior source texts rather than being blamed for all problems, even those beyond their control.
There are a number of ways to assess the quality of translations. Two primary methods are used in industry and academia:
MQM is ideally suited for implementation as an analytic metric. It is also easily adapted to serve as the basis for holistic assessments.
Rather than proposing a single metric for assessing all translations, MQM provides a flexible method for defining and declaring metrics that can be adapted to specific requirements. These requirements are generally stated in terms of a set of 12 “parameters” (see Section 9.2. MQM parameters), a subset of the translation parameters described in ASTM F2575:2014 that focuses primarily on aspects of the translation product (rather than the project or process). Using these parameters to define requirements and expectations before translation allows users to create appropriate metrics before translation begins and provides translators with a clear view of the criteria for assessing their work.
In addition, metrics must support both simple and sophisticated requirements. Rather than proposing yet another metric with more detail, MQM provides a flexible catalog of defined issue types that can support any level of sophistication, from a simple metric with two categories to a complex one with thirty or forty. It also supports both holistic assessment (for quick acceptance testing) and error markup/counts for cases where detailed analysis is required.
Conformance of a translation quality assessment metric with MQM is determined by the following criteria:
displayName
to assign a non-default name to an issue type. (E.g, a metric could declare “Not translated” to be the display name for disn('untranslated'); ?>.) However, at least for English-language implementations, the use of the default MQM names is encouraged. In any cases where the different display name is uses, the MQM ID MUST be used to prevent confusion.Note that the only required aspect is use of the MQM vocabulary, which MUST NOT be contradicted or overridden.
The full list of MQM issues is maintained in a separate document at = $issues_list; ?>.
At the top level, MQM is defined into major dimensions:
More information on the dimensions and their content can be found in the full list of MQM issues at = $issues_list; ?>
In order to simplify the application of MQM, MQM defines a smaller “Core” consisting of 20 issue types that represent the most common issues arising in quality assessment of translated texts. The Core represents a relatively high level of granularity suitable for many tasks. Where possible, users of MQM are encouraged to use issues from the Core to promote greater interoperability between systems.
The MQM Core can be graphically represented as follows (branches in gray italics represent major branches not included in the MQM core) (available here in SVG format):
The 19 issues are defined in the MQM core as follows:
Definitions for these issues can be found in the list of MQM issue types.
Even the 20 issues of the Core represent more issues than are likely to be checked in any given quality application and users may define subsets of the core for their needs. It is recommended for translation quality assessment tasks that issues contain at least the issue types disn('accuracy'); ?> and disn('fluency'); ?> if no other more granular types are included.
While users are strongly encouraged to limit issue types to pre-defined MQM issues, they may add additional issue types to MQM to meet additional requirements. User-defined issue types MUST include the following information:
x-
to indicate that the issue is a user-defined issue. E.g., x-respeaking-error
would be a valid ID for a user-defined “respeaking error” but respeaking-error
would be invalid.User extensions do not provide interoperability between systems and impede the exchange of data. Nevertheless they may be needed to support requirements not anticipated in MQM. Users should tie extensions into the predefined hierarchy using the parent
value as much as possible since doing so provides consumers of MQM data with the best guidance in interpreting unknown categories and mapping them to other systems. As with other aspects of MQM, users should limit granularity to the least granular level that meets requirements.
Users who encounter frequent need for custom extensions are encouraged to communicate their requirements to the MQM project for possible inclusion of these types in future versions of MQM.
In addition to MQM, it may be desirable to use other metrics that cannot be converted to a native MQM representation for various purposes. The key principle in integrating metrics is that they must be scoped to indicate to what MQM content they apply. For example, if a metric assesses only readability, it would be scoped to provide a score for MQM disn('fluency'); ?>, while a metric that provides a score for “Adequacy” would provide a score for MQM disn('accuracy'); ?>. A metric that provides an undifferentiated “quality” score would take all of an MQM metric as its scope and thus provide an overall score.
Non-MQM scores may be indicated in an MQM report by using the nodeScore
and scoreType
attributes, which may be appended to any node in the score report.
As the interpretation of any particular metric’s result/score is likely to depend on the specifics of the assessment, MQM can provide no guidance on how to utilize the result/score of non-MQM metrics. Results may be appended to MQM reports at the appropriate nodes in the MQM hierarchy and users may wish to combine these results with the results of MQM-based evaluation, (e.g., through averaging MQM and non-MQM scores normalized on a 1-100 scale). Such combinations are outside the scope of MQM.
As an example, the BLEU metric, an automatic metric for assessing machine translation (MT) quality with respect to human reference translation(s), is widely used in MT research. In the case of BLEU, the scope is global because BLEU provides a single, undifferentiated quality score. A BLEU score would thus be provided as parallel to the overall MQM score (see Section 8. Scoring for a recommended method for generating an MQM score). An implementer could utilize the BLEU score in various ways in conjunction with MQM: e.g., only assessing those translations that obtain a BLEU score over a specific threshold, averaging the BLEU and MQM scores, or using both scores for thresholds.
While the specific use of other scores cannot be mandated, their usage should not conflict with the MQM principles. For example, a metric’s results should not be stated to apply to the disn('fluency'); ?> branch of MQM if they include the results of an evaluation of whether or not terms have been translated correctly.
This section describes the MQM declarative markup. Use of the metrics declaration markup is mandatory for declaring an interoperable MQM metric. When used with XML or HTML, it is strongly recommended that the ITS 2.0 Localization Quality Issue data category be used to declare MQM issues in conjunction with the locQualityProfileRef
pointing to a valid MQM definition. Note that when implemented with ITS 2.0 quality markup that the requirements for implementing are also mandatory.
MQM provides an XML mechanism for exchanging descriptions of MQM-compliant metrics. MQM metrics description files use the .mqm
file name extension. An .mqm
file contains a hierarchical list of MQM issue types. This listing MUST conform to the hierarchy of issue types.
The following is an example of a small metric description file with issue names in both English and German. It includes a user-defined extension (x-respeaking
) used to identify errors caused when a vocal text being respoken without background noise based on a live audio feed is incorrectly repeated by the person doing the respeaking, leading to a mistranscription.
<?xml version="1.0" encoding="UTF-8"?> <mqm version="0.9"> <head> <name>Small metric</name> <descrip>A small metric intended for human consumption</descrip> <version>1.5</version> <src>http://www.example.com/example.mqm</src> </head> <issues> <issue type="accuracy" display="no"> </issue> <issue type="omission" weight="0.7"/> <issue type="addition"/> </issue> <issue type="terminology" weight="1.5"/> <issue type="style" weight="0.5"/> <issue type="fluency" display="no"> <issue type="spelling"/> <issue type="grammar"/> <issue type="unintelligible" weight="1.5"/> </issue> <issue type="x-respeaking" weight="1.5"/> </issues> <displayNames> <displaNameSet lang="en"> <displayName typeRef="accuracy">Adequacy</displayName> <displayName typeRef="terminology">Terminology</displayName> <displayName typeRef="omission">Omission</displayName> <displayName typeRef="addition">Addition</displayName> <displayName typeRef="fluency">Fluency</displayName> <displayName typeRef="style">Style</displayName> <displayName typeRef="spelling">Spelling</displayName> <displayName typeRef="grammar">Grammar</displayName> <displayName typeRef="unintelligible">Unintelligible</displayName> <displayName typeRef="x-respeaking">Respeaking</displayName> </displaNameSet> <displayNameSet lang="de"> <displayName typeRef="accuracy">Genauigkeit</displayName> <displayName typeRef="terminology">Terminologie</displayName> <displayName typeRef="omission">Auslassung</displayName> <displayName typeRef="addition">Ergänzung</displayName> <displayName typeRef="fluency">Sprachkompetenz</displayName> <displayName typeRef="style">Stil</displayName> <displayName typeRef="spelling">Rechtschreibung</displayName> <displayName typeRef="grammar">Grammatik</displayName> <displayName typeRef="unintelligible">Unverständlich</displayName> <displayName typeRef="x-respeaking">Sprecherfehler</displayName> </displayNameSet> </displayNames> <severities> <severity id="minor" multiplier="1"/> <severity id="major" multiplier="10"/> <severity id="critical" multiplier="100"/> </severities> </mqm>
MQM implements the following attributes in the mqm
namespace:
issueType
. Contains the MQM issue type, listed by ID. Note: MQM implementations MUST use the ID and MUST NOT use a localized name.issueSeverity
. Contains the MQM issue severity using the name defined in the metric. Note that the default severity level names are minor
, major
, and critical
. While other values MAY be used, if they are used they MUST be defined in the metric definition for proper interpretation.
MQM is designed to be used in conjunction with the following ITS 2.0 attributes from the localization quality issue data category:
locQualityIssueType
: Contains the issue type as defined by ITS 2.0. Mapping the native MQM value to the appropriate ITS issue type helps ensure compatibility with ITS 2.0-aware implementations, even if they do not implement MQM.locQualityIssueComment
: Contains a human-readable comment about the issue.locQualityIssueSeverity
: Contains a rating of severity from 0 to 100. Mapping from the name contained in the MQM issueSeverity
attribute to this attribute enables ITS 2.0-aware tools to interpret the severity of the issue.To ensure compatibility with ITS 2.0 markup, implementers SHOULD use ITS 2.0 markup where possible. All of the ITS 2.0 localization quality annotation may be used. MQM markup adds capability to the ITS 2.0 quality markup.
<?xml version="1.0"?> <doc xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0"> <doc xmlns:mqm="[XXXXXXXXXXX]" mqm:version="1.0"> <para><span mqm:issueType="spelling" mqm:issueSeverity="major" its:locQualityIssueType="misspelling" its:locQualityIssueComment="Should be Roquefort" its:locQualityIssueSeverity="50">Roqfort</span> is an cheese</para> </doc>
To create this markup the following process is followed:
its:locQualityIssueType
.mqm:
namespaceits:locQualityIssueSeverity
attribute. In this case the multiplier
value was 5 (out of 10), so it is represented as 50 in ITS markup.its:locQualityIssueComment
attribute.its:locQualityProfile
attribute. In general, MQM XML implementations should use existing span-level elements in the native XML format that MQM is being added to where possible. This use can be done using any of the ITS 2.0 methods with the addition of the MQM-specific attributes. However, such elements may not be available. In such cases, MQM defines two elements that can be used to add inline markup:
<mqm:startIssue />
. This element defines the starting position of an MQM span.<mqm:endIssue />
. This element defines the end position of an MQM span.Two empty elements are used so as to prevent any interference between MQM tags and existing XML structure, such as those that could be caused by improperly nested elements. To pair these tags the id
attribute is used. ID values MUST be unique within the document to prevent confusion.
An example of an MQM annotation is seen in the following XML snippet:
<para>“Instead of strengthening <mqm:startIssue type="function-words" id="1f59a2" severity="minor" agent="f-deluz" comment="article unneeded here" active="yes"/> the<mqm:endIssue idref="1f59a2"/> civil society, the president cancels <mqm:startIssue type="agreement" severity="major" comment="should be “it”" agent="f-deluz" id="3c469d" active="yes"/>them<mqm:endIssue idref="3c469d"/> de facto”, deplores Saeda. </para>
The mqm:startIssue
element MUST take the following mandatory attributes:
id
. Used to match the corresponding mqm:startIssue
and mqm:endIssue
tags within texttype
. Provides the MQM issue type.The mqm:startIssue
element CAN take the following optional attributes:
severity
. Permissible values defined by the MQM metric in use. Provides the severity of the issue. Default value is undefined.agent
. Text string identifying the agent that supplied the annotation. Default value is undefined.comment
. Text string containing a human-readable comment attached to an issue. Default value is undefinedactive
. One of yes
or no
. Indicates whether the issue is considered active (yes
) or inactive (no
). Default value is yes
. If an issue is marked as inactive, this means that it has either been resolved or determined not to be an actual error.In addition, ITS 2.0 attributes MAY be added to these elements to promote greater interoperability.
The mqm:endIssue
element MUST take the following mandatory attribute:
idref
. A value corresponding to the id
of the mqm:startIssue
tag that begins the identified span.Use of these inline elements also requires that the mqm namespace be declared in the document. The method for declaring this namespace needs to be determined.
The Internationalization Tag Set (ITS) 2.0 specification holds a privileged position with respect to MQM due to its use as a standard format for interchanging localization quality information through its localization quality issue
data category.
This section describes the mapping process from MQM to ITS 2.0 and from ITS 2.0 to MQM. As MQM allows the declaration of arbitrary translation quality assessment metrics, it serves a different purpose from ITS, which provides high-level interoperability between different metrics. While ITS is much less granular than the full MQM hierarchy, individual MQM metrics may be either more or less granular than the set of ITS 2.0 localization quality issue types (or may be more granular in some areas and less in other). As a result it is likely that conversion between MQM-based metrics and ITS will be “lossy” to some extent. In general the mapping process from MQM to ITS 2.0 is straight-forward since ITS 2.0 does not allow subsetting of the possible values for localization quality issue type, but the conversion from ITS 2.0 to MQM may be more challenging since an arbitrary MQM metric may or may not contain the default target mappings provided below and mappings may account for the MQM hierarchy.
MQM metrics that map to ITS MUST use the mappings described in this section, subject to the limitations described below.
MQM issue types are mapped to ITS issue types according to the following table. Note that this mapping is unambiguous and MUST be followed to ensure consistency between applications.
MQM issue type | ITS 2.0 issue type |
---|---|
disn('accuracy'); ?> | mistranslation |
disn('addition'); ?> | addition |
disn('improper-exact-tm-match'); ?> | mistranslation |
disn('mistranslation'); ?> | mistranslation |
disn('date-time'); ?> | |
disn('entity'); ?> | inconsistent-entities |
disn('false-friend'); ?> | mistranslation |
disn('no-translate'); ?> | |
disn('number'); ?> | numbers |
disn('overly-literal'); ?> | mistranslation |
disn('unit-conversion'); ?> | |
disn('omission'); ?> | omission |
disn('omitted-variable'); ?> | |
disn('over-translation'); ?> | mistranslation |
disn('under-translation'); ?> | |
disn('untranslated'); ?> | untranslated |
disn('untranslated-graphic'); ?> | |
disn('compatibility'); ?> | other (for all children) |
disn('design'); ?> | formatting |
disn('graphics-tables'); ?> | |
disn('call-outs-captions'); ?> | |
disn('graphics-tables-missing'); ?> | |
disn('graphics-tables-position'); ?> | |
disn('hyphenation'); ?> | |
disn('length'); ?> | length |
disn('local-formatting'); ?> | formatting |
disn('font'); ?> | |
disn('bold-italic'); ?> | |
disn('single-double-width'); ?> | |
disn('wrong-font-size'); ?> | |
disn('kerning'); ?> | |
disn('leading'); ?> | |
disn('paragraph-indentation'); ?> | |
disn('text-alignment'); ?> | |
disn('markup'); ?> | markup |
disn('added-markup'); ?> | |
disn('inconsistent-markup'); ?> | |
disn('misplaced-markup'); ?> | |
disn('missing-markup'); ?> | |
disn('questionable-markup'); ?> | |
disn('truncation-text-expansion'); ?> | length |
disn('overall-design'); ?> | formatting |
disn('color'); ?> | |
disn('footnote-format'); ?> | |
disn('global-font-choice'); ?> | |
disn('headers-footers'); ?> | |
disn('margins'); ?> | |
disn('page-breaks'); ?> | |
disn('widows-orphans'); ?> | |
disn('fluency'); ?> | other |
disn('ambiguity'); ?> | other |
disn('character-encoding'); ?> | characters |
disn('coherence'); ?> | other |
disn('cohesion'); ?> | other |
disn('corpus-conformance'); ?> | non-conformance |
disn('duplication'); ?> | duplication |
disn('grammar'); ?> | grammar |
disn('function-words'); ?> | |
disn('word-form'); ?> | |
disn('agreement'); ?> | |
disn('part-of-speech'); ?> | |
disn('tense-mood-aspect'); ?> | |
disn('word-order'); ?> | |
disn('grammatical-register'); ?> | register (ITS register covers both disn('grammatical-register'); ?> and disn('register'); ?>) |
disn('inconsistency'); ?> | inconsistency |
disn('inconsistent-abbreviations'); ?> | |
disn('images-vs-text'); ?> | |
disn('inconsistent-link'); ?> | |
disn('external-inconsistency'); ?> | |
disn('index-toc'); ?> | other |
disn('index-toc-format'); ?> | |
disn('missing-incorrect-toc-item'); ?> | |
disn('page-references'); ?> | |
disn('broken-link'); ?> | other |
disn('document-external-link'); ?> | |
disn('document-internal-link'); ?> | |
disn('nonallowed-characters'); ?> | characters |
disn('offensive'); ?> | other |
disn('pattern-problem'); ?> | pattern-problem |
disn('sorting'); ?> | other |
disn('spelling'); ?> | misspelling |
disn('capitalization'); ?> | |
disn('diacritics'); ?> | |
disn('typography'); ?> | typographical |
disn('punctuation'); ?> | |
disn('unpaired-marks'); ?> | |
disn('whitespace'); ?> | |
disn('unintelligible'); ?> | uncategorized |
disn('internationalization'); ?> | internationalization (for all subtypes) |
disn('locale-convention'); ?> | locale-violation (for all subtypes) |
disn('style'); ?> | style |
disn('awkward'); ?> | |
disn('company-style'); ?> | |
disn('inconsistent-style'); ?> | |
disn('register'); ?> | register (ITS register covers both disn('grammatical-register'); ?> and disn('register'); ?>) |
disn('variants-slang'); ?> | |
disn('third-party-style'); ?> | style |
disn('unidiomatic'); ?> | |
disn('terminology'); ?> | terminology (for all subtypes) |
disn('verity'); ?> | other |
disn('completeness'); ?> | |
disn('incomplete-list'); ?> | |
disn('incomplete-procedure'); ?> | |
disn('end-user-suitability'); ?> | |
disn('legal-requirements'); ?> | legal |
disn('locale-specific-content'); ?> | locale-specific-content |
Note that the entire Internationalization branch of MQM maps to the ITS internationalization
type. It is anticipated that this mapping will apply to all children of the MQM Internationalization
issue type that may be added in the future.
Mapping from ITS to MQM is less likely to be used and presents particular problems since MQM metrics typically contain only a small subset of the full MQM issue set. As a result MQM issues to which ITS localization quality issue type values are mapped may not exist in a particular MQM metric. In such cases processes MUST map the ITS value to the closest higher-level issue type in MQM if one exists in the target MQM metric. If no higher-level issue type exists in the target MQM metric, the process MUST skip the ITS 2.0 issue type (but MAY preserve the ITS 2.0 markup).
For example, if a process encounters the ITS 2.0 disn('omission'); ?> type and the target MQM metric does not contain disn('omission'); ?> but does contain disn('accuracy'); ?>, the ITS omission
value would be mapped to MQM disn('accuracy'); ?>. However, if the MQM metric does not contain disn('accuracy'); ?>, the higher node in the MQM hierarchy, the ITS omission
issue type would be ignored/omitted by the conversion process.
Note that the above requirements mean that in some cases there may be a many-to-one mapping from ITS to MQM. For example, if a document contains ITS annotations for omission
, untranslated
, and addition
, but the target MQM metric contains disn('accuracy'); ?> and no daughter categories, all of these categories would be mapped to MQM disn('accuracy'); ?>. In other words, there is no universal mapping from ITS to all MQM metrics since MQM metrics do not all contain the same issues.
Processes encountering issues such as those described in the previous paragraphs SHOULD alert the user about the information loss or remapping if user interaction is expected by the process.
In most cases the table shows that the ITS issue types map to MQM issue types with identical (except for casing) or similar names, highlighting the evolutionary relationship between ITS and MQM. Those items where names are different in a non-trivial manner are marked with an asterisk (*) to help draw attention to the fact that the names do not match.
ITS 2.0 Localization Quality Issue type | MQM issue type | Notes |
---|---|---|
terminology | disn('terminology'); ?> | |
mistranslation | disn('mistranslation'); ?> | |
omission | disn('omission'); ?> | |
untranslated | disn('untranslated'); ?> | |
addition | disn('addition'); ?> | |
duplication | disn('duplication'); ?> | |
inconsistency | disn('inconsistency'); ?> | |
grammar | disn('grammar'); ?> | |
legal | disn('legal-requirements'); ?>* | |
register | disn('grammatical-register'); ?>* | Register in ITS can also describe disn('register'); ?> (under disn('style'); ?>). If a mapping process is sophisticated enough to distinguish the two meanings, it may map to the appropriate issue. Otherwise, use disn('grammatical-register'); ?> as it is the more common issue |
locale-specific-content | disn('locale-specific-content'); ?> | |
locale-violation | disn('locale-convention'); ?>* | |
style | disn('style'); ?> | |
characters | disn('character-encoding'); ?>* | |
misspelling | disn('spelling'); ?>* | |
typographical | disn('typography'); ?>* | |
formatting | disn('local-formatting'); ?>* | |
inconsistent-entities | disn('entity'); ?>* | |
numbers | disn('number'); ?>* | |
markup | disn('markup'); ?> | |
pattern-problem | disn('pattern-problem'); ?> | |
whitespace | disn('whitespace'); ?> | |
internationalization | disn('internationalization'); ?> | |
length | disn('length'); ?> | |
non-conformance | disn('corpus-conformance'); ?>* | |
uncategorized | disn('other'); ?>* | |
other | disn('other'); ?> |
Note that the ITS uncategorized
category maps to MQM disn('other'); ?> even though MQM disn('other'); ?> maps to ITS uncategorized
. In other words, the mapping is asymmetric because the semantics of uncategorized
are broader than disn('unintelligible'); ?>.
The MQM scoring model applies only to error-count implementations of MQM. At present this specification does not define a default scoring model for holistic systems, which are less detailed in nature than error-count metrics. Future versions, however, MAY define a default model to holistic systems.
Note that MQM-conformant tools are NOT required to implement any scoring module at all. For example, an automatic tool that identifies possible issues but which does not determine their severity might not provide a score.
This scoring model provides one method to calculate a single quality score as a percentage value. Such scores are frequently used for acceptance testing in translation quality assurance processes. In addition, it generates sub-scores for various aspects of the both the target and, optionally, the source text. Additional scoring methods may apply to specific circumstances. It is RECOMMENDED, but not required, that implementers of MQM provide scores the conform to this section in addition to any other scores they may provide.
Version 0.3.0 made major changes with respect to severity multipliers. These changes render the default scoring for versions 0.3.0 and later incompatible with earlier versions. Version 0.9.1 introduced a new severity level, none, that always has a penalty of 0, i.e., it does not count against the transaltion, it is used to mark items that should be changed, but which are not considered errors for scoring purposes (see below).
For the purposes of calculating quality scores, the following default values apply:
MQM can generate target document quality scores according to the following formula:
where:
All penalties are relative to the sample size (in words) and are calculated as follows (assuming default weights and severity levels):
where:
= Number of issues with a “minor” severity | |
= Number of issues with a “major” severity | |
= Number of issues with a “critical” severity |
A score can thus be generated through the following (pseudo-code) algorithm:
foreach targetIssue { targetIssueTotal = targetIssueTotal + (targetIssue * weight[sourceIssueType] * severityMultiplier); } foreach sourceIssue { sourceIssueTotal = sourceIssueTotal + (sourceIssue * weight[sourceIssueType] * severityMultiplier); } // Generate overall score translationQualityScore = 100 - (targetIssueTotal / wordcount) + (sourceIssueTotal / wordcount);
In this algorithm, each issue type has a weight assigned by the metric that is retrieved and used to determine the individual penalties. Penalties are cumulative. Note that if the source is examined, penalties against the source are effectively added to the overall score for the translation, reflecting the fact that they indicate problems in the source the translator had to deal with. If the source is not assessed, the source penalties are by definition 0 and do not count for or against the translation’s quality score.
(Scores can be generated for any dimension or branch in the MQM hierarchy by counting only those issues in that selection. Note that counting source issues is optional and that if a score for a source document is desired then the formula should ignore target issues and instead subtract the total of source issues divided by the wordcount from 100 to arrive at a source content score.)
This algorithm can serve as a model for other systems, such as metrics with two severity levels or those with four. However, using other models will impede comparability of scores generated by various metrics.
The following severity multipliers were recommended as default multipliers prior to version 0.3.0. The former default severity weights were taken from the LISA QA Model and represent common industry practice. Discussion with experts in psychometrics, however, revealed that the range of values was too close to provide sufficient discrimination between relatively insignificant errors and those considered serious enough to reject a project. As these values were implemented in a number of tools they are documented here:
Scores using these multipliers can be easily updated to reflect the new values simply by changing the multipliers in the formula. Similarly, new scores can be compared with old scores by using this values in place of the new ones. However, as the old multipliers are deprecated, they SHOULD NOT be used as the default model for any new implementations.
This section describes the process for creating an MQM metric in cases where a suitable predefined metric is not available. The process may be graphically represented as shown below:
In this view, implementers first determine what sort of metric they wish to use (analytic, holistic, task-based testing, functional testing, etc.) based on the following criteria:
Based on the answers to the questions given above, users may select a method (the “how”) for assessing the translation. Some of the possible options include the following:
In addition to selecting an assessment method based on the answers to the questions on the left of the diagram, users also need to define the specifications (i.e., the values of the parameters) for the translation(s) to be assessed. (The MQM parameters are defined in section 8.2. Definition of MQM parameters below.) Based on the specifications, users decide which dimensions of the text will be assessed. Dimensions defined in MQM are the following:
Note that the dimensions correspond to top-level branches in the MQM hierarchy.
Depending upon which dimensions are selected and the degree of granularity required for the assessment task, MQM issues are then selected to ensure that the required dimensions are adequately assessed. In the case of disn('internationalization'); ?> it is likely that different assessment methods will be needed since internationalization cannot generally be assessed from examining texts (versus doing a code audit).
The following example will help clarify how the process works. The example is for a case in which a company that makes network diagnostic gear wishes to evaluate whether automatic (machine) translations into Japanese of user-generated forum content written in English is helping their Japanese users solve technical problems with their equipment.
Although simple, this example, shows how it is possible to build customized metrics to meet specific requirements using MQM.
MQM makes use of a selection of 11 of the 21 parameters defined in ASTM F2575, with the addition of one additional parameter, Output modality, which is subsumed under Text type in ASTM F2575 but which is broken out in MQM because of its special impact on some translationed. The parameters are defined as follows:
Parameter | Description |
---|---|
1. Language/locale |
|
2. Subject field/domain |
|
3. Teminology |
|
4. Text type |
|
5. Audience |
|
6. Purpose |
|
7. Register |
|
8. Style |
|
9. Content correspondence |
|
10. Output modality |
|
11. File format |
|
12. Production technology |
|
After the values for these parameters are fully specified, MQM implementers should verify that the selection of issue types will ensure that the requirements defined by the parameters are met. Note that parameters may override each other. For example, under Content correspondence the parameters might specify that a “gist” translation is acceptable, in which case disn('omission'); ?> would not normally be assessed; however if Audience specifies that the target audience consists of young readers with low literacy, disn('style'); ?> might be assessed to assure that the “simple” style needed for the target audience is achieved.
At this stage in MQM development, there are no normative guidelines for selecting issues. Instead implementers are encouraged to go through each parameter to identify project-relevant issues that will enable them to verify whether the translation meets the requirements set out in those parameters. Future versions of MQM may provide a more formal approach to issue selection.
Analytic metrics are created by making a selection of relevant issues from the listing of MQM issue types. The following procedure may be used to create a metric:
When considering which issues to check, creators of metrics should consider the following practical guidelines:
Holistic assessment methods are more flexible in some respects than error-count metrics. They are designed to provide an assessment of the translated text as a whole rather than a detailed accounting of all errors. As analytic assessment can be time consuming and is not needed in all cases (e.g., when the question is whether a text should be accepted or not), holistic methods may be more appropriate in some cases. Most of the MQM issue types can be easily used as either analytic types or holistic types that apply to the text as a whole. For example, the MQM disn('punctuation'); ?> issue type can be used by asking assessors using a holistic tool whether the text is punctuated correctly. In this context some issues will be more useful than others. For example, the disn('pattern-problem'); ?> issue type is unlikely to be useful in most holistic assessments since it generally makes sense only with regard to very specific sections of a text. By contrast, categories like disn('grammar'); ?> can more readily be applied to entire texts.
Note that there is no single method for building holistic scores. In a holistic approach specific issues are addressed through qualitative questions that may be assessed via ranking or on a binary- or scalar-value system. For example, a holistic assessment might address the disn('spelling'); ?> issue via questions like the following:
Because the scoring for holistic systems is highly dependent on the type of assessment scale used, no specific scoring system is provided here. Users of MQM who wish to implement it in a holistic environment should tie holistic questions to specific MQM issue types and develop appropriate scoring systems. This version of MQM does not define a system for describing holistic scoring systems, although future versions may do so. However, by using the MQM issue types and associating specific holistic questions with them, implementers can make their metrics more transparent and tie them to project parameters in the same way that can be done with error-count metrics.
The following guidelines may assist in designing appropriate holistic assessments and selecting issue types:
The TAUS DQF Error Typology is a recognized subset of MQM, developed and maintained by the Translation Automation User Society (TAUS) based on input from its members.
Previous versions of TAUS DQF and MQM were not compatible. As of revision 0.9, compatibility between the two has been achieved. The harmonization process required substantial modification to both MQM and DQF, but now DQF, with the exception of the “kudos” feature noted below, is a fully conformant subset of MQM.
The DQF tools check six issue types. If only these issue types are used, they correspond directly to MQM dimensions, as follows:
MQM also supports additional levels of issues, as shown in the following graphic:
The DQF “Additional Features” require special attention. Three of the issues can be marked as issues with the severity level “none” and the specific type noted in an MQM-compliant tool or markup:
Using these allows issues to be marked without counting negatively in the score of the translation.
One DQF feature is not currently implemented in MQM and can be conceived of as an additional implementation-specific feature:
Kudos currently need to be noted outside of MQM mechanisms. Whether and how they impact an MQM score is currently undefined and represents a point of ongoing discussion as of June 2015.
The full DQF subset of MQM is as follows:
This section contains informative mappings from existing metrics to MQM. Note that existing metrics are subject to update without notice. These mappings are provided as a courtesy and no guarantee is made of accuracy and completeness. Any implementations based on these mappings should carefully consider the metric to verify the accuracy of mappings.
The mapping from SAE J2450 is somewhat complex in that the distinction between severity levels is, in part, based on the whether the issue changes the meaning between target and source, meaning that—at least in principle—a minor error in J2450 would correspond to the Fluency branch in MQM and a major error would correspond to the Accuracy branch. Nevertheless, for most purposes, the following mapping should suffice.
SAE J2450 issue type | MQM issue type | Note(s) |
---|---|---|
Wrong term | disn('terminology'); ?> | |
Omission | disn('omission'); ?> | |
Misspelling | disn('spelling'); ?> | |
Punctuation error | disn('typography'); ?> | |
Syntactic error | disn('grammar'); ?> | |
Word structure or agreement error | disn('word-form'); ?> | |
Miscellaneous error | disn('other'); ?> |
Portions of this document were developed as part of the Coordination and Support Action “Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad)”, funded by the 7th Framework Programme of the European Commission through the contract 296347. Addition work was supported by the Coordination and Support Action “Quality Translation 21 (QT21)”, funded by the EU’s Horizon 2020 research and innovation programme under grant no. 645452.
design
to the core for compatibility with DQF.Style
to Stylistics
and split Style-guide
out to resolve problem that Style
included both mechanical and content-related issues.locale-violation
to Locale-convention
locale-applicability
to Locale-specific-content