Levels & Labels

Background: The candidate ISO 11669 FDIS does not use the term “grades”; and the recently published 2023 edition of ASTM F2575 does not use the expression “service levels”.  However, a comparative study has revealed that the concepts behind grades and service levels can be found in both. Compatibility between 11669 and F2575, the only two modern, international standards that apply to a wide variety of translation use cases, is important to all stakeholders. An essential step in demonstrating this compatibility is to describe an approach to service levels that works for both standards.

In addition, the concepts discussed herein can be adapted to interpreting. Anything enclosed in square brackets in this document is specific to interpreting. We need to study ASTM F2089, which focus on spoken languages only.

Some Definitions

service level
performance targets for a service
(SOURCE: ISO/IEC 17826:2022, 3.49)

service level agreement (SLA)
documented agreement between the client and provider that identifies services and service targets, including prerequisites for service levels and measures for performance
(SOURCE: ISO 37500, 3.20)

Note 1: The term “service level” is not defined in ISO 37500, perhaps because it is considered to be part of the general language. A “service level” might or might not be the same as a “level of service”.

Note 2: Quality is not mentioned in either definition. What is the connection between service levels and quality? That important discussion will mostly need to wait until another day, after we have developed detailed examples of service levels and service level agreements for at least a few use cases in the area of multilingual language services. Service level is not the same as quality level. Two international standards that deal directly with evaluating translation quality are in the final stages of development, one from ISO and the other from ASTM. Please stay tuned.

Note 3: Service level and quality level differ in that service level may not directly impact the final product, whereas  quality level does. Service level may determine that all translations are to be delivered on letterhead, and quality level may determine that only translators with subject matter specialization will work on projects.

Sections A and B below are an attempt to make the above definitions of service level and service level agreement more tangible through the lens of  ISO (11669) and ASTM (F2575), followed by a description of consumer labels that is compatible with both standards. Section C introduces a term found in WK46396 (analytic TQE), and Section D discusses consumer labels.

A service level must be agreed on before production begins. This is called pre-production and includes project initiation in 11669. The following tasks are found at least implicitly in both 11669 and F2575.

[pre-encounter, pre-session, or pre-event, instead of pre-production]

Describe the Use Case (in F2575, section 5 “Needs Analysis”, a use case consists of five things: subject field, type of text, topic, audience, and purpose. Implicitly, a use case also includes source language and target language(s)). There is a use case appendix at the end of this document, including hospital instructions to a patient and pre-trial triage.

Conduct a Risk Assessment (which consists of risk identification, analysis, and evaluation according to ISO 31000) (What risks are associated with the use case? E.g., consider operating instructions for a crane that is one hundred meters tall. An incorrect translation could result in severe damage to a building.)

Decide on Risk Tolerance Level for Errors in Correspondence and Fluency (which can result in risk mitigation activities to be conducted during production or associated with production, such as a bilingual comparison of source and target content)

Select a Method of Production (from fully automatic to fully human) compatible with the Risk Tolerance Level [method of delivery, from fully automatic to fully human and from remote to in person]

Agree on all other relevant service requirements for the use case (that is, requirements beyond risk tolerance level, risk mitigation activities, and method of production). Additional service requirements are linked to parameters such as compensation and delivery deadline. See the “translation parameters” section of 11669 or F2575 for the full list of parameters, which are basically the same in the two standards. For an abbreviated list of those parameters, see the Translation Parameters appendix at the end of this document. Parameters, in F2575, are standard questions; specifications are the answers in a particular use case.

Note 1: An SLA as conceived in these translation standards includes preliminaries that might be assumed rather than explicit in a typical SLA. Assumptions can lead to potential misunderstanding and conflict between stakeholders, whereas clear specifications can reduce both misunderstanding and conflict.

Note 2: The order of these five components of the pre-production phase is not random. Each builds logically on the previous one in a completed description of a multilingual language service, even though real-life conversations between a requester and a provider or advisor might start with any of them.

Once the requester and provider have covered the five aspects of a project [encounter rather than project] in part A (pre-production), they have defined a service level. A distinguishing feature between service levels in the world of translation is risk tolerance level for errors, but each service level includes much more than risk tolerance level.

The service level that has been agreed on should be documented whether we are dealing with translation or interpreting. One example of documentation is a service level agreement. Service level agreements can be broad or they can be specific to a particular client. 

[settings and encounters]

Here is an example of an agreement for a particular service level that applies to many but not all use cases, that we found on the STAR-TS website ( https://www.star-ts.com/about/terms-and-conditions/service-level-agreement/ ).

Along the path from Use Case to Service Level, the requester will often select an LSP to clarify the five components of the pre-production phase (see Section A). A novice requester should at least bring in an advisor to minimize risks.

A Language Services Advisor (LSA) is a new role that needs discussion, including the required skill set and how language professionals could become LSAs. Serge Gladkoff has proposed a skill set for an LSA focused on translation services. This skill set needs to be detailed but will certainly include an understanding of current AI technology, risk management, and quality management.

Note: A translation of a contract that must stand up in court needs the involvement of a professional human translator. However, there are use cases for which the risk level for raw machine translation is tolerable. We need to make such a list. Clearly, some situations, such as a 911 call that involves a human in distress, or a suicide hotline, will always require a professional human interpreter.  When is the risk level for fully automatic interpreting acceptable? (See news about the Interpreting SAFE AI task force.)

[Some duties of an LSA are currently being performed in healthcare interpreting by a Section 1557 officer, a compliance officer or a risk management professional. More generally, an interpreter coordinator in a court, medical, or educational system could be trained to be an LSA.]

Note: A simple set of names for categories of service levels, based on risk tolerance level, is needed, for example, in the context of procurement. Service levels are primarily a consideration for providers, rather than requesters, but they are based on the requirements of the requester.

This section suggests that an important aspect of the definition of a translation service level might be to indicate whether validation of the translation output has been conducted by a professional human translator.

Validation is a term in WK46396 (the draft standard on the approach to analytic translation quality evaluation often referred to as “MQM”).  There it is applied to translation quality metrics. A valid metric measures what you are trying to measure. A metric can be part of a reliable system yet not be valid. For example, suppose a system intended to measure the weight of a person actually comes up with supposed weight based solely on height. The system can be reliable in that it comes up with the same “weight” for a person each time a given person is tested, but typically the result does not match reality. The result can be consistently wrong.

Application to Translation Service Levels

In Section A, it was proposed that a service level should be selected with the use case in mind. Otherwise, the result will probably not meet the requirements of the requester. How could we apply the technical term “validation” to service levels? One way would be to ask whether it is important to meet all the requester’s requirements. That leads to another question. Which are the most important requirements? In describing a use case, two essential components are purpose and intended audience. Why are we even translating the source text and who is the intended audience? Does it matter whether a translation meets the needs of the intended reader? If it does matter, then validation is needed, validation that the translation meets requirements, such as meeting the needs of the intended user. Validation goes beyond text.

With recent advances in Artificial Intelligence, some requesters of translation services are wondering whether all human translators will soon be replaced by machines. Others find it obvious that for at least some use cases, professional human translators will still be needed in the long term. Regardless of your position on this fundamental question, the service component “validation by a professional translator”, short for “validation of translation output by a professional human translator”, is a convenient tool to indicate whether a professional translator is checking, using human intelligence, not artificial intelligence, that all the agreed-on requirements, expressed as translation specifications, based on standardized translation parameters, have been met.

It boils down to “whom do you trust?” when the stakes are high, that is, when the consequences of a translation not meeting agreed-on specifications can be serious.

Looking at the ongoing debate about the strengths and limitations of AI, including LLMs (large language models) and systems based on technology behind LLMs (both generative AI and “neural” machine translation), it is clear that AI does not yet truly understand what it is doing. It only manipulates vast quantities of text. For now, validation is needed for some use cases.

Note that the production phase is greatly facilitated by having agreed in advance on a set of structured specifications, but the consumer is not aware of the production phase.

Consumer labels in general are a form of consumer protection or at least warning and, when applied to translation services, come into play in the postproduction phase.  They inform end-users (who “consume” text) about potential risks associated with the product in their hands. If they do not read the source language or do not have access to the source text, they cannot always detect even serious issues in the translation.

[The consumer needs to be informed of the “label” before service delivery.]

For translation, according to ASTM F2575, the output should be labeled as either UMT (Unedited Machine Translation) or BRT (Bilingually Reviewed/Revised Translation).

The first label (UMT) is used for either raw machine translation or machine translation output that has only been checked monolingually by a human. In UMT, there has been no bilingual editing by a human professional to correct Correspondence errors. Thus, UMT is a warning flag for the consumer so that they are aware there could be some “poison biscuits” somewhere in fluent text. These poison biscuits are Correspondence errors unpredictably produced by AI-based NLP (natural language processing) that would be detected by a language professional with relevant subject matter expertise.

The second label (BRT) contrasts with UMT.  It covers the spectrum from post-edited and revised machine translation to traditional human translation with no use of machine translation, so long as the entire translation has been compared bilingually by a human professional with the source, and Correspondence errors have been corrected in conformance with specifications. Of course, there could be Correspondence errors in a translation labeled BRT, but the end user is assured that a language professional has been involved in order to reduce the risk of such errors.

Thus, every translation can be labeled as either UMT or BRT. As explained above the distinction is not simply human vs machine. The UMT label is a “red flag”; the BRT label is a covenant. Ideally, the BRT label will link to the translation service provider that takes responsibility for the translation.

The concept of validation, as explained previously in this document, applies to labels. A translation labeled BRT has been validated by a competent bilingual human. A translation labeled UMT has not been validated.

[Labels for interpreting: Any interpreting spoken output should be labeled as MDI (machine delivered interpreting) or HDI (human delivered interpreting). MDI is a red flag because fluent speech could contain Correspondence errors made unpredictably by a machine, typically without any accountability.]

Example of an SLA with a legal services provider. A basic service level for legal representation might be available only between 12:00 noon and 3:00 PM, Monday, Wednesday, and Friday and would not include meeting the client outside the office of the attorney. So, clients who are hauled off to jail, are on their own, unless they want to pay a substantial jail-visit fee. A high-end service level would include a 24-hour hotline and an agreement that a lawyer would meet the client anywhere in the city within 30 minutes of receiving a phone call. Regardless of the availability, only a lawyer can offer legal advice.

A fully-automated legal service (attention: this is a Hoosier joke) could offer something called “legal speculation” at a very low cost, where the client interacts with some kind of AI-based system.

Obviously, few would use a legal speculation service as opposed to professional legal advice. Then, why are so many using raw machine translation, which is analogous to legal speculation? Probably because the risks associated with errors in Correspondence are viewed as tolerable. Or because the requester is not even aware of the risks.

Another example: Going to a medical clinic vs using WedMD exclusively.

Point of discussion regarding ethics: Should a TSP for a particular use case where the risks are tolerable be allowed to provide raw MT (even when Google Translate does not work and a proprietary MT engine, perhaps trained on custom data, is needed)  or should they be required to refer the customer to some organization other than a TSP?

Appendix on Use Cases

Note: The following descriptions of use cases need to be examined in detail, and the transcript of the 2021 AsLing panel discussion needs to be mined for use cases.

Use case: In Wisconsin, a hospital decides to provide translations of its post-discharge instructions for patients with limited English proficiency.

Requirements: The material needs to be translated into the third most-commonly spoken language in Wisconsin, Hmong, and made available soon to meet demand.

Risk: Patients may be harmed if post-discharge instructions are not accurate.

Service Level: Because of the risk posed to the patient by an inaccurate or hard-to-understand translation, delivery time, correspondence and fluency requirements are demanding.

Use case:  The Directorate General for Translation (DGT) of the European Commission must provide a version of each piece of legislation in all of the official languages.

Requirements: There are 24 official languages in the EU and all members must have access to the same information as close to the same time as possible.

Risk: The documents can be used in court proceedings, so they must be clear and accurate.

Service Level: Maximum Correspondence and Fluency are required.

Use case: End users, professionals and amateurs alike, around the world use these articles published by a big technology company to solve their own software issues.

Requirements: New support articles need to be placed on the database as soon as possible, in all available languages.

Risk: The support article does not work (either does not allow the end-user to solve the problem at hand or, even worse, results in equipment damage or loss of data.

Service Level: A less-than-optimal level of correspondence and fluency might be an acceptable starting point, so long as end-user options include speaking with a tech-support person. End-user usability complaints are sometimes due to a lack of Correspondence. They often result in the substitution of the raw machine translation with one at a different service level.

Use case: An English-speaking attorney needs some documents in Spanish translated but does not know which.

Requirements: Legal team needs to identify which documents to translate through keyword search.

Risk: A set of “quick and dirty” translations will allow English-speaking members of the legal team to use their human intelligence and legal experience to sift through the documents, that is, conduct a triage, and reliably determine which, if any, are relevant to the case.

Service Level: Minimal levels of Correspondence and Fluency are allowed because this is for triage to assist the legal team to determine which documents require a translation with maximal correspondence and fluency. However, it still requires delivery within the deadline agreed to, especially since a second phase of translations is imminent.

Use case: English-speaking staff members in Canada are tasked with the maintenance of an expensive piece of equipment designed and built in Japan but cannot read the original service documentation.

Requirements: The translation does not need to be polished or highly fluent but must be understandable and contain no substantial correspondence errors. Delivery will be rolled out by language.

Risk: In this use case, a substantial Correspondence error is one that could cause injury to the maintenance person or damage to the machinery.

Service Level: A maximum level of Correspondence is required, but not the same level of Fluency, because the user know-how will compensate for any lack of Fluency. Delivery will be staggered, so deadlines will vary by language group.

Appendix on Translation Parameters

See https://www.tranquality.info/translation-parameters-guide/ ; We need such a table for parameters that apply to interpreting services.

Appendix on the Technical Term "grade" in ISO Standards

The term “grade” is found in many ISO standards. It is applied to everything from materials to management. For example, in ISO 1213, it is applied to categories of coal.

3.1.15
low-grade coal
combustible material that has only limited uses owing to undesirable characteristics (e.g. ash percentage or size)

Note: Potential errors in Correspondence and Fluency are undesirable characteristics of translation output. There is no implication that translation output is bad for the environment.

In ISO 26927, “grade” is applied to service levels in telecommunications.

3.11
enterprise-grade service
performance level for security, availability and service perception that is comparable to PBX-based service

Note: The previous two definitions of grade are very different. One (coal) is about product and the other (telecommunications) is about service.

In ISO 22886, “grade” is applied to healthcare management.

3.4.5
grade
category or rank given to different requirements (3.6.1)

3.6.1
Requirement
need or expectation that is stated, generally implied or obligatory

The above ISO 22886 definition of “grade” can be adapted to translation services as follows:

A low-grade translation service involves a high “Risk Tolerance Level for Errors in Correspondence and Fluency”. No one asks a TSP (translation service provider) for a translation that includes errors in Correspondence or Fluency, but they might agree to tolerate the risk of such errors in a particular use case, if that is the only way to obtain a language service quickly enough or at a sufficiently low cost. One option is to use a free MT service such as Google Translate and bypass the TSP entirely. But what if the material is confidential? (We need to find multiple use cases where low-grade translation is appropriate but the free platforms cannot be used.)

Leave a comment

Your email address will not be published. Required fields are marked *