Transliteration is the process of converting text from one script into another. It works by preserving the sound of the original word, not its meaning.
Put simply: the letters change, but the pronunciation stays the same. The language itself doesn't change.
"The act of writing words or letters in the characters of another alphabet."
— Cambridge Dictionary[1]
"The act or process of writing words using the alphabet of a different language."
— Oxford Learner's Dictionaries[2]
Table of Contents:
These two terms are often confused, but they describe fundamentally different operations:
| Operation | What changes? | Example |
|---|---|---|
| Transliteration | The script (writing system) | namaste → नमस्ते |
| Translation | The language (meaning) | Hello → नमस्ते |
When you transliterate "namaste" into Devanagari as "नमस्ते", the meaning is not involved at all. Only the spelling is mapped to the nearest phonetic equivalent in the target script.
Translation works differently. It converts the meaning of a word, regardless of how it sounds.
Transliteration isn't limited to English or Hindi. It works between any two writing systems. Here are a few examples across different language pairs:
| Original Language | Script | Word | Transliteration | Target Script |
|---|---|---|---|---|
| Hindi | Devanagari | नमस्ते | Namaste | Latin |
| Arabic | Arabic | محمد | Muhammad | Latin |
| Japanese | Hiragana | とうきょう | Tōkyō | Latin |
| Chinese | Hanzi | 北京 | Běijīng | Latin |
| Russian | Cyrillic | Москва | Moskva | Latin |
| Greek | Greek | Αθήνα | Athína | Latin |
| Arabic | Arabic | محمد | Мухаммад | Cyrillic (Russian) |
| Arabic | Arabic | القاهرة | Аль-Кахира | Cyrillic (Russian) |
When the target script is the Latin (Roman) alphabet, the process has its own name: romanization. So the Arabic name "محمد," when romanized, becomes "Muhammad."
Romanization is the most widely studied form of transliteration. That's largely because the Latin script is used internationally for science, travel documents, and the internet.
Several formal standards govern romanization for different source scripts:
Hepburn romanization was created in 1867 by American missionary James Curtis Hepburn. He designed it with English phonology in mind, so that English speakers could naturally pronounce Japanese words without special training.
It is the most widely used Japanese romanization system in the world. You'll find it on road signs, train timetables, and passports, and it's taught to most learners of Japanese as a foreign language.
Pinyin (literally "spelled sounds") was developed in the 1950s by the People's Republic of China. It uses the Latin alphabet plus tone marks over vowels to represent both pronunciation and tone in Standard Mandarin.
Today it's the official romanization standard in mainland China and Singapore, recognised by the United Nations, and the main method for typing Chinese characters on digital devices.
ISO 9:1995 is published by the International Organization for Standardization. It defines a single, universal table for converting Cyrillic characters into Latin characters, covering 118 characters across all Cyrillic-script languages.
These include Slavic languages such as Russian, Ukrainian, and Bulgarian, as well as non-Slavic languages of the former Soviet Union. Its defining property is full reversibility: the original Cyrillic text can be reconstructed unambiguously from the ISO 9 transliteration.
Romanization moves text into the Latin script, but that's only one direction. Transliteration can work between any two scripts, and two other important directions have their own names:
Cyrillization is the direct inverse of romanization: it takes a word from a non-Cyrillic script and renders it in the Cyrillic alphabet.
It's used for writing foreign names and words in Russian, Ukrainian, Serbian, Macedonian, Bulgarian, and other Cyrillic-script languages. There are two approaches:
Take "Shakespeare" as an example. In Russian it becomes Шекспир. That's not a letter-for-letter map; it's a phonetic approximation based on how English speakers pronounce the name.
In Chinese, this process is called 音译 (yīnyì): phonetic transcription into Chinese characters (Hanzi).
Because Chinese characters are largely monosyllabic logograms, foreign words must be broken into syllables. Each syllable is matched to a Chinese character with a similar sound. Where possible, characters are chosen that also carry a neutral or favourable meaning:
Official transcriptions in China are standardised by the Xinhua News Agency's Names of the World's Peoples dictionary.
Transliteration sounds straightforward, but it comes with some real-world complications worth knowing about:
Whether you're working on a research project, a database, or a travel guide, a few simple principles will save you a lot of headaches:
Software can perform transliteration automatically. In a landmark study, Oh, Choi, and Isahara (2006) compared four distinct machine transliteration models within the same experimental framework[3]:
| Model | How it works | Best suited for |
|---|---|---|
| Grapheme-based | Maps source characters (graphemes) directly to target characters using learned spelling patterns. No phonetic information used. | Languages with consistent spelling-to-sound correspondence. |
| Phoneme-based | Converts source graphemes into phonemes first, then maps those phonemes to target characters. Pronunciation is the intermediary. | Languages with irregular spelling but consistent pronunciation. |
| Hybrid | Combines grapheme-based and phoneme-based probabilities using linear interpolation. Leverages the strengths of both. | General-purpose; consistently outperforms either alone. |
| Correspondence-based | Establishes explicit alignments between graphemes and phonemes, treating them as jointly paired units rather than independent sequences. | Complex scripts where grapheme and phoneme information must be modelled together. |
The study found that the hybrid and correspondence-based models performed best. Combining all four in an ensemble improved accuracy even further.
Google's transliteration, used in Google Input Tools, doesn't fit neatly into any single category from the 2006 JAIR taxonomy. Its approach has evolved through three distinct phases:
Google's transliteration started as a phoneme-based model, then became a statistical hybrid, and is now best described as a neural sequence model. That last category didn't exist when the 2006 JAIR taxonomy was written, but it builds directly on the hybrid foundation the study identified as most effective.
Last Updated On: