This wiki has been archived. The articles are no longer editable.

Terms and concepts widely used in the translation and localization industry

From ProZ.com Wiki

Jump to: navigation, search



Note: This article is a joint project of ProZ.com members and guests. All translators are invited to add to this article. (Click "Edit" above; you must be logged in.)
If you don't know how wiki formatting works, see: http://en.wikipedia.org/wiki/Wikipedia:Cheatsheet


Contents

A

Accent mark. Small symbol placed above or near a letter, usually to distinguish its pronunciation from a similar word.

Accreditation. Formal process by which the knowledge, skills, and/or abilities of an individual – such as a translator or interpreter – or an organization – such as an LSP – are evaluated.

Active language. Language into which an interpreter renders interpretation.

Ad hoc interpreter. Person who provides interpretation services on an irregular basis, usually without the benefit of any formal training or professional preparation.

Adaptation. Process of converting information into an appropriate format for the target language and culture.

Advocacy. Practice by which an interpreter acts on behalf of either of the parties for which he or she renders interpretation. This practice is encouraged by some and discouraged by others.

Agglutination. In linguistics, combining short words or word elements into a single word order to express compound ideas.

Agglutinating language. Language in which, through agglutination, a single word can constitute a complete sentence, sometimes resulting in units of measure other than price per word (such as price per line, per character or per page) for calculating translation costs.

Agile. A group of software development methodologies based on iterative incremental development, where requirements and solutions evolve through collaboration between self-organizing, cross-functional teams.

"A" language. Native language or a foreign language spoken with native proficiency from which an interpreter can render all modes of interpretation.

Alignment. Process of matching segments of text with their translated renditions, creating equivalents between a source text and target text.

Alignment tool. Application that automatically pairs versions of same text in the source and target languages in a table. Also called bi-text tool.

Ambiguity. Situation in which the intended meaning of a phrase is unclear and must be verified – usually with the source text author – in order for translation to proceed.

American Sign Language (ASL). The dominant sign language of the deaf community in the United States, in the English-speaking parts of Canada and in parts of Mexico. Although the United Kingdom and the United States share English as a spoken and written language, British Sign Language is quite different from ASL and not mutually intelligible.

Anglophone. Someone who speaks the English language natively or by adoption. The term specifically refers to people whose cultural background is primarily associated with the English language, regardless of ethnic and geographical differences.

Apache. Open source web server supported by the Apache Software Foundation.

API. Acronym for application programming interface.

Apostille. Official attachment or stamp sometimes applied to translations of public and private documents as proof of authenticity for countries that have signed the Hague Convention on Documents.

Applet. A tiny program that is embedded in a webpage built in HTML (hyper-text markup language), and which launches when the webpage is loaded. Applets are written in Java and are frequently used in playing videos, animated images, audio, and other features that enhance a person's experience in the page. Unfortunately, applets are often written in ways that require individuals to download programs such as the latest version of Java language that may not be allowed on their computer. Thus, web browsing experiences that rely on applets can be frustrating for individuals who may need to use computers that have firewalls and security.

Application programming interface. Set of specified procedures or functions provided by a service or operating system to computer programs’ requests for support. Commonly abbreviated as API.

AQ. Abbreviation for availability quotient.

Arabic Eastern numerals. See Eastern Arabic numerals.

Arabic numerals. Set of ten numerals (0,1,2,3,4,5,6,7,8,9) that comprise the most commonly used symbolic representation of numbers throughout the world.

Artificial intelligence. Branch of computer science devoted to creating intelligent machines that produced the first efforts toward machine translation.

Artificial language. Language used by machines.

Authoring. Process of producing textual content.


Automated Machine Translation (AMT). AMT and Caterpillar Technical English are development project collaborations between Caterpillar Inc., and Carnegie Mellon University to further improve the creation and translation of technical documentation into three core languages: Spanish, French and German.

Automated publishing. Computerized production of content and presentation.


Automatic Content Enrichment (ACE). A bridge between single language websites and localization, ACE technology associates English words and phrases on web pages with pop-ups containing information in a user´s native language.

Automatic translation. Synonym for machine translation.

Automatic recognition. Method that automates the translation of terms through the use of an electronic dictionary and computer-assisted translation software, proposing target language equivalents and facilitating consistency of terminology and style.

Availability quotient. Metric that objectively ranks what percentage of the total online population can access each level of experience on any given site. Commonly abbreviated AQ.

B

Back translation. The process of translating a document that has already been translated into another language back to the original language - preferably by an independent translator.

Bidirectional (writing system). A writing system in which text is generally flush right, and most characters are written from right to left, but some text is written left to right as well. Arabic and Hebrew are the only bidirectional systems in current use.

Bidirectional text (bidi). A mixture of characters within a text where some are read from left to right and others from right to left. Bidirectional or bidi refers to an application that allows for this variance.

Big5. The name of the Chinese character set and encoding used extensively in Taiwan. Big5 is not a national standard, but is equivalent to the first two planes of CNS 11643-1992.

Bilingual Evaluation Understudy (BLEU). An algorithm for evaluating the quality of text that has been machine-translated from one natural language to another. Quality is considered to be a the correspondence between a machine’s output and that of a human. The closer that a machine translation is to a human translation, the better it is. BLEU was one of the first metrics to achieve a high correlation with human judgments of quality and remains one of the most popular. Scores are calculated for individual translated segments – generally sentences – by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Intelligibility or grammatical correctness is not taken into account.

Bitext. A merged document comprised of both source language and target language versions of a given text. Bitexts are generated by a piece of software called an alignment tool, which automatically aligns the original and translated versions of the same text.


"B" language. Foreign language from which an interpreter can render interpretation.

Blog. Short for "web log." A blog is an updatable website that is chronologically arranged, and updated at the user's discretion. What makes a blog different than a regular website is the fact that it can be syndicated so that others can subscribe and have the content delivered to a certain place automatically. Weblogs started out as journals and chronologically arranged websites. However, it is common now for blogs to include audio, video, graphics, and text. It is common for blogs to be available as RSS or Atom feeds.

Bodyshopping. The practice of using offshore resources and personnel to do small disaggregated tasks within a business environment without any broader intention to offshore an entire business function.

C

Chuchotage. Also called whispering interpreting, the interpreter sits or stands next to the intended audience and interprets simultaneously in a whisper. This mode does not require any equipment. Whispered interpretation is often used in situations when the majority of a group speaks one language, and a limited number of people do not speak the source language.

CMMI. Capability Maturity Model Integration.

Computer Aided Translation. Computer technology applications that assist in the act of translating text from one language to another.

Content Management System (CMS). A system used to store and subsequently find and retrieve large amounts of data. CMSs were not originally designed to synchronize translation and localization of content, so most of them have been partnered with globalization management systems. Controlled authoring. Writing for reuse and translation. Controlled authoring is a process that integrates writing with localization so that the text can be written for reuse and at the same time written for efficient translation.

Cookies. These are not programs, even though many people think that they have viruses or spyware in them. Instead, cookies consist of information that is sent by the browser to a web-server and back. They are very useful because they store information about the website one has visited and make it easier and faster to load the website the next time one visits. Some learning management systems require cookies in order for the user to log in or have access to certain sites. Other applications, such as shopping carts used in e-commerce also use cookies. Cookies are used to track web-browsing patterns and behaviors. They are also used to monitor a person's activities. For that reason, cookies have been held out as examples of how one's privacy can be violated in the Internet.

Crowdsourcing. The act of taking a task traditionally performed by an employee or contractor and outsourcing it to an undefined, generally large group of people, in the form of an open call. For example, the public may be invited to develop a new technology, carry out a design task, refine an algorithm, or help capture, systematize or analyze large amounts of data.

CSA. Common Sense Advisory.

E

8-bit Unicode transformation format. Variable-length character encoding form that can represent any character from almost all the languages in the world. Commonly abbreviated UTF-8.

ELIA. European Language Industry Association.

Extended characters. Characters that exceed the ASCII character range of seven bits, such as characters with diacritical marks or non-Roman characters.

Extensible Markup Language (XML). A programming language/specification pared down from SGML, an international standard for the publication and delivery of electronic information, designed especially for web documents.

Extreme globalization. Globalization projects that involve immense scalability, huge complexity, the use of XML-based and web services, extraordinary organizational efforts, and the need for massive amounts of translation and localization leverage.

F

FAQT. Abbreviation for fully automated quality translation.

Face cognates. Words that are thought to share a common origin but actually do not. For example, the words “embarrassed” and “embarazada” (Spanish for “pregnant”).

F2F. Abbreviation for face-to-face interpretation.

Face-to-face interpretation. Spoken language conversion by a human interpreter in the same location as the two parties who wish to communicate. Commonly abbreviated F2F.

FIGS. An acronym for the languages French, Italian, German and Spanish.

Fine-grained TM. Translation memory based on word- or syntagm-level segments.

First person interpretation. Practice by which the speaker's utterances are rendered into the target language directly, without changing pronouns or prefacing renditions with the words, "he said," or "she said."  A first person interpretation of the words

Fixed booth. Area built into a meeting room from which interpreters can perform simultaneous interpretation.

Foreign language. Language that is not one’s native language.

Freelancer. Individual who is self-employed and works for multiple companies on a per-project or contractual basis.

Freelance translator. Also known as "freelancer", an independent translator who sells his or her services to a client on a job-to-job basis without a long-term commitment to any one employer.

Free text. Data that is entered into a field without any formal or pre-defined the normal use of grammar and punctuation.

Full match. See exact match.

Fully automated quality translation. Machine translation output with no human post-processing or editing, but suitable as a replacement for human translation. This type of MT is the ultimate goal of technologists. Commonly abbreviated FAQT.

Fully automated useful translation. Machine translation output with no human intervention, but typically with a lower bar for quality than fully automated quality translation. Commonly abbreviated FAUT.

Functional quality assurance. See functional testing.

Functional testing. Reviewing software applications and programs to ensure that the localization process does not change the software or impair its functions or on-screen content display.

Fuzzy logic. Process that creates near matches in text to translation memory terms when exact matches cannot be found.

Fuzzy match. Indication that words or sentences are partially – but not exactly – matched to previous translations.

G

Gist translation. A less-than perfect translation performed by machine or automatic translation.

Globalization (g11n). In this context, the term refers to the process that addresses business issues associated with launching a product globally, such as integrating localization throughout a company after proper internationalization and product design.

Glocalization. A blending of the words globalization and localization, the term refers to the individual, group, division, unit, organization or community that is willing and able to think globally and act locally. Glocalization emphasizes that the globalization of a product is more likely to succeed when the product or service is adapted specifically to each locality or culture in which it is marketed. Internationalization (i18n). Especially in a computing context, the process of generalizing a product so that it can handle multiple languages and cultural conventions (currency, number separators, dates) without the need for redesign.

H

Hangul. Invented in the fifteenth century, the native alphabet of the Korean language, as opposed to the non-alphabetic hanja system borrowed from China. Each hangul syllabic block consists of the 24 letters (jamo) - 14 consonants and 10 vowels.

Hashtags. A community-driven convention for adding additional context and metadata to tweets. Hashtags have the hash or pound symbol (#) preceding the tag, for example, #translation, #xl8, #Localization, #L10N. Hashtags can occur anywhere in a tweet.

Homograph. One of two or more words that have the same spelling but differ in origin, meaning and sometimes pronunciation. An example is wind (weather) and wind (activity).

Homophone. A word that has the same pronunciation as another but different meaning, derivation or spelling. Examples are there and their, foe and faux, and time and thyme.

I

Ideographic language. A written language in which each character represents an idea, concept or other component of meaning, rather than pronunciation alone. Japanese kanji, Chinese hanzi and Korean hanja are examples of ideographic writing systems.

Information retrieval. The science of searching for information in documents, searching for documents or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the internet or intranets, for text, sound, images or data.

Instant Messenger. Also shortened to "IM." Software that lists a user's buddy list (who may consist of friends, family, co-workers, classmates, etc.) who are also online and enables users to exchange text-based messages. Some instant messenger programs also include voice chat, file transfer, and other applications. Popular instant messaging programs are available for free by ICQ, AOL, Yahoo!, and MSN. IM may be used in distance learning to facilitate communication between two students or between a learner and his or her instructor.

Internationalization (i18n). Especially in a computing context, the process of generalizing a product so that it can handle multiple languages and cultural conventions (currency, number separators, dates) without the need for redesign.

Internet Service Provider. Also shortened to "ISP." A company that provides Internet access to consumers and businesses, usually for a monthly fee. Services include e-mail, the World Wide Web, FTP, newsgroups, etc.

Interoperability. It is the ability of two separate and distinct systems to interact and operate across physical or logical boundaries. Any time there is an exchange across a boundary, interoperability comes into play. For the sake of clarity, the concept of interoperability should be further categorized as syntactic interoperability and semantic interoperability.

J

Java. A programming language originally developed by Sun Microsystems and released in 1995 as a core component of Sun's Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities. Java applications are typically compiled to byte code that can run on any Java virtual machine regardless of computer architecture.

Java Computer-Assisted Translation (JCAT). A Java-based translation tool that takes advantage of XML features. JCAT primarily benefits linguists.

JavaScript. An open-source scripting language for design of interactive websites. JavaScript can interact with HTML source code, enabling web developers to use dynamic content. For example, JavaScript makes it easy to respond to user-initiated events (such as form input) without having to use common gateway interface.

Java Server Pages (JSP). JSP have dynamic scripting capability that works in tandem with HTML code, separating the page logic from the static elements - the actual design and display of the page - to help make the HTML more functional.

JIS. The acronym for the Japanese Industrial Standard, which is the Japanese equivalent of ANSI.

K

Keyword.Any word on a web page. Keyword searching is the most common form of text search on the web. Most search engines do their text query and retrieval using keywords.

L

L10N. Localization

Localization (l10n). In this context, the process of adapting a product or software to a specific international language or culture so that it seems natural to that particular region. True localization considers language, culture, customs and the characteristics of the target locale. It frequently involves changes to the software's writing system and may change keyboard use and fonts as well as date, time and monetary formats.

LSP. Language Service Provider

M

Machine translation (MT). A technology that translates text from one human language to another, using terminology glossaries and advanced grammatical, syntactic and semantic analysis techniques.

O

Offshore outsourcing (offshoring). The practice of engaging a third-party provider in another country - often on another continent or "shore" - to perform tasks or services often performed in-house.

Open-source software. Any computer software distributed under a license that allows users to change and/or share the software freely. End users have the right to modify and redistribute the software, as well as the right to package and sell the software.

Open18N certification. A certification program that uses an independent authority to verify whether a Linux distribution is adhering to the industry-developed internationalization standard.

OpenType fonts. OpenType fonts are cross-platform, self-contained files and contain advanced typographic features such as glyph substitution and metrics overrides.

P

Parser. A computer program that takes a set of sentences as input and identifies the structure of the sentences according to a given grammar. The term parser is sometimes used generically in cases where the sentences are made up of information units of any kind.

Phonology. The part of linguistics that deals with systems of sounds especially in a particular language.

Q

Quality assurance (QA). The activity of providing evidence needed to establish confidence among all concerned that quality-related activities are being performed effectively. All those planned or systematic actions necessary to provide adequate confidence that a product or service will satisfy given requirements for quality. QA covers all activities from design, development, production and installation servicing and documentation.

Quality Manual. Quality assurance has nothing to do with quality. The correct name should be “consistency assurance”. A quality manual contains procedures that must be followed. The company is responsible for creating its own procedures. Deviating from the published procedure will result in the loss of QA accreditation.

QuarkXPress. A page layout program similar to Adobe PageMaker and Adobe InDesign.

Quotation marks. Punctuation used to enclose speech, a quotation or a particular phrase. The correct form of the quotation mark varies widely between languages.

R

Rule-based machine translation (RBMT). The application of sets of linguistic rules that are defined as correspondences between the structure of the source language and that of the target language. The first stage involves analyzing the input text for morphology and syntax - and sometimes semantics - to create an internal representation. The translation is then generated from this representation using extensive lexicons with morphological, syntactic and semantic information, and large sets of rules.

S

Search engine optimization (SEO). A set of methods aimed at improving the ranking of a website in search engine listings. SEO is primarily concerned with advancing the goals of a website by improving the number and position of its organic search results for a wide variety of relevant keywords.

Semantic interoperability. Semantic interoperability goes beyond syntactic interoperability with the additional condition that the two systems interpret the data unambiguously—a common meaning. Value X in System A means the same thing as Value X in System B. Semantic interoperability is only possible if there is a common reference used by both systems to disambiguate data values, thus the need for standards.

Simship. A term used to refer to the simultaneous shipment of products, usually software, in different languages or with other distinguishing differences in design.

16-bit Unicode transformation format. Variable-length character encoding form that can represent Unicode or ISO characters in a 16-bit series suitable for storage or transmission in data networks. Commonly abbreviated UTF-16.

Source language (SL). A language that is to be translated into another language.

Statistical machine translation (SMT). A machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. SMT is the translation of text from one human language to another by a computer that learned how to translate from vast amounts of translated text.

Syntactic interoperability. Syntactic interoperability means two systems can exchange data because they use a common file format or communication protocol—a common syntax.

T

TAA. Abbreviation for total available audience.

TAUS. Translation Automation User Society.

Tagging. Marking content in a document with information about its content.

T9N. Abbreviation for translation, in which the number 9 represents the number of letters between the first and last letters.

Target language (TL). The language that a source text is being translated into.

Target text. Text that has been translated.

TBX. Abbreviation for term base eXchange.

Technical interpretation. Interpretation for technical settings, such as meetings and conferences in the fields of engineering, telecommunications, and technology.

Technical translation. Translation of technical texts, such as user or maintenance manuals, catalogues and data sheets.

Telephone interpretation. Spoken language conversion that is provided by a remote human interpreter via telephone, be that through a traditional phone or via VoIP. Commonly abbreviated TI.

TEP. Abbreviation for translate-edit-proof.

Term. Word, phrase, symbol or formula that describes or designates a particular concept.

Term base. See terminology database.

Term base eXchange XML standard for exchanging terminological data. Commonly abbreviated TBX.

Term extraction. Selecting terms in a text and placing them in a terminology database for analysis at a later time.

Term harvesting. See term extraction.

Term list. 1. List of terms, usually in more than one language. 2. Input/output text files.

Term mining. See term extraction.

Terminology. Collection of terms.

Terminology analysis. Process carried out prior to translation in order to analyze the vocabulary within a text and its meaning within the given context, often for the purpose of creating specialized dictionaries within specific fields.

Terminology database. Electronic repository of terms and associated data.

Terminology list. See term list.

Terminology management. Use of computer software to manage translation resources, create terminology databases for translation projects, and improve productivity and consistency.

Terminology management tool. Computer application that facilitates terminology management.

Terminology manager. Software application that facilitates the process of translation by interacting with a terminology database.

Terminology software. Data processing tool that allows one to create, edit and consult text or electronic dictionaries.

Text expansion. Process that often occurs during translation in which the total number of characters in the target text exceeds that of the source text.

Text extraction. Process in which the text from a source file is placed into a word processing file for use by a linguist.

Text style. Characteristics of terminology, style and sentence formation within a given text.

Text type. See text style.

Text volume. Amount of text based on character count or standard lines, often used to price translation projects.

Third person interpretation. Practice by which the speaker's utterances are rendered into the target language indirectly, changing pronouns and prefacing renditions with words such as, "he said," or "she said.";

32-bit Unicode transformation format. Fixed-length, four-byte method for encoding Unicode characters. Commonly abbreviated UTF-32.

TI. Abbreviation for telephone interpretation.

TLM. Translation and Localization Management.

TMs. Translation Memories.

TMS. Abbreviation for translation management system.

TMS scorecard. Assessment of service offerings and features of a given translation management system.

TMX. Abbreviation for translation memory eXchange.

Traditional Chinese. Original Chinese ideographic character set used in Taiwan, Hong Kong, Macau and some Chinese communities who have not adopted the simplified characters used in the People's Republic of China.

Transcoding. Process by which character data is converted between different character sets.

Transcreation. Process by which new content is developed or adapted for a given target audience instead of merely translating existing material. It may include copywriting, image selection, font changes, and other transformations that tailor the message to the recipient.

Transcription. Process of converting oral utterances into written form.

Translatability. Degree to which a text can be rendered into another language.

Translate-edit-proof. Most common set of steps used for linguistic quality assurance in translation production processes. Commonly abbreviated TEP.

Translation. The process of converting all of the text or words from a source language to a target language. An understanding of the context or meaning of the source language must be established in order to convey the same message in the target language.

Translation agency. Organization that provides translation services.

Translation company. See translation agency.

Translation capacity. Average number of characters, words, lines, or pages that a professional translator can translate within a given time frame, such as a day, week, or month.

Translation kit. A set of files and instructions given to an LSP by a client. The purpose of a translation kit is to provide LSPs with expectations: the subject matter and target audience, files and format to be translated, delivery expectations, etc.

Translation management system. Program that manages translation and localization cycles, coordinates projects with source content management, and centralizes translation databases, glossaries, and additional information relevant to the translation process. Commonly abbreviated TMS.

Translation manager. Person in charge of managing one or more translation project.

Translation memory (TM). A special database that stores previously translated sentences which can then be reused on a sentence-by-sentence basis. The database matches source and target language pairs.

Translation memory eXchange. Standard for converting translation memories from one format to another. Commonly abbreviated TMX.

Translation memory plus machine translation. A workflow and technology process in which terms not found in translation memory are automatically sent to the machine translation software for translation, with the results fed back into the translation memory. Commonly abbreviated TMT.

Translation memory system. Computer-aided translation tool that offers translation suggestions from translation memory.

Translation portal. Web-based service that enables translation agencies, freelance translators and customers to contact one another and exchange services.

Translation unit (TU). A segment of text that the translator treats as a single cognitive unit for the purposes of establishing an equivalence. The translation unit may be a single word, a phrase, one or more sentences, or even a larger unit.

Token. Identifier stored as part of a user's security profile.

Total available audience. Metric that represents web users in all supported countries, speaking all supported languages, for a specific thought. Commonly abbreviated TAA.

Translation rate. Price of translation, usually provided on a per-word, per-line, or per-page basis.

Translation verification test. Test used to verify that all content from the source text has been translated accurately and completely.

Translator. Language service provider who performs translation.

Transliteration. Process of converting words from a source text or audio file into a written text that facilitates pronunciation of the words.

U

Uncial writing. A majuscule script commonly used from the third to the eighth centuries CE by Latin and Greek scribes.

Unicode. The Unicode Worldwide Character Standard (Unicode) is a character encoding standard used to represent text for computer processing. Originally designed to support 65,000, it now has encoding forms to support more than one million characters.

Unicode transfer format (UTF-8). An encoding form of Unicode that supports ASCII for backward compatibility and covers the characters for most languages in the world.

UX. User Experience.

V

Video Conferencing. Real-time visual and audio communication using a computer, video camera or web camera, and a network, such as the Internet. Examples of video conferencing include an instructor delivering a live lecture from one central point to many different students, all geographically separated, or a meeting between two students collaborating on a group project.

Virtual. Simulated or conceptual, not physical in nature. In distance learning, the term "virtual classroom" refers to the online environment in which students and instructors interact.

Voice-over. Refers to the production technique where a disembodied voice is broadcast live or pre-recorded in radio, television, film, theater and/or presentation. The voice-over may be spoken by someone who also appears on-screen in other segments or it may be performed by a specialist voice actor.

VoiceXML. The Voice Extensible Markup Language standard enables voice input and audio output for voice response and multimodal applications

W

Whispering interpreting. Also called chuchotage, the interpreter sits or stands next to the intended audience and interprets simultaneously in a whisper. This mode does not require any equipment. Whispered interpretation is often used in situations when the majority of a group speaks one language, and a limited number of people do not speak the source language.

X

XML Localization Interchange File Format (XLIFF). An XML-based format for exchanging localization data. Standardized by OASIS in April 2002 and aimed at the localization industry, XLIFF specifies elements and attributes to aid in localization. XLIFF could be used to exchange data between companies, such as a software publisher and a localization vendor, or between localization tools, such as translation memory systems and machine translation systems.

Y

Z

Sources

Common Sense Advisory's Glossary

eLearners Glossary

Originally published by Multilingual, industry magazine for website globalization, translation, international software development and language technology. January/February 2011

Multilingual, 2011 Resource Directory & Index 2010, pp. 57- 65.

zsoltsesztak