Greek Unicode Compliance Checker
Paste (or type) polytonic Greek text below to check its compliance with the four Unicode modes supported by GreekTranscoder 2.
Note: for accurate results, paste directly from Microsoft Word or another application which, like Word, preserves text formatting (Safari normalizes plain text to NFC, which may alter the original character encoding).
Reference
Composed Scholarly
- Oxia vowels: ά έ ή ί ό ύ ώ (U+1F71, 1F73, 1F75, 1F77, 1F79, 1F7B, 1F7D…)
- Ano teleia: · (U+0387)
- Greek question mark: ; (U+037E)
- Numeral sign: ʹ (U+0374)
Composing Scholarly
- Greek combining: U+0340 (grave), U+0341 (acute), U+0342 (perispomeni), U+0343 (psili), U+0314 (dasia)
- Greek punctuation (same as Composed Scholarly)
Composed NFC
- Tonos vowels: ά έ ή ί ό ύ ώ (U+03AC, 03AD, 03AE, 03AF, 03CC, 03CD, 03CE…)
- Middle dot: · (U+00B7)
- Semicolon: ; (U+003B)
- Modifier prime: ʹ (U+02B9)
Composing TLG
- Generic combining: U+0300 (grave), U+0301 (acute), U+0303 (tilde), U+0313 (psili), U+0314 (dasia)
- NFC punctuation (same as Composed NFC)
Composing Diacritics: A Technical Note
- The Problem with Keyboard Input
- Automatic Canonical Ordering
- GreekTranscoder 2: An Elegant Solution
- Summary
The Problem with Keyboard Input
Modern operating systems, particularly macOS, implement automatic Unicode normalization in their text input systems. When you type Greek text using any keyboard layout (including those that nominally produce combining diacritics), the operating system silently converts the input into composed characters (NFC) before it reaches the application.
For example, when typing the sequence:
- α (U+03B1) + combining rough breathing (U+0314) + combining iota subscript (U+0345)
macOS automatically substitutes the composed equivalent:
- ᾁ (U+1F81: Greek small letter alpha with dasia and ypogegrammeni)
This normalization occurs transparently, regardless of which keyboard layout you use or how you configure your input method.
Automatic Canonical Ordering
Even more remarkably, macOS automatically rectifies incorrectly ordered combining marks. According to the Unicode standard, combining diacritics must appear in a specific canonical order based on their Combining Class values. If a user types diacritics in the wrong sequence (for instance, by entering the iota subscript before the breathing mark), macOS silently reorders them and produces the correct composed character anyway.
This behavior, helpful though it is, means that standard macOS applications cannot receive true combining diacritics through keyboard input. The operating system’s text services layer intercepts and normalizes all input before it reaches Microsoft Word or any other application that uses standard Cocoa text handling.
A few applications with custom text engines—such as BBEdit, Sublime Text, or terminal emulators—bypass this layer and can receive raw combining characters. However, text created in these applications must then be transferred into Word, where the normalization problem resurfaces.
GreekTranscoder 2: An Elegant Solution
GreekTranscoder 2 bypasses the operating system’s input normalization by working directly on existing document content. When you select the “Use Composing Characters” option, GreekTranscoder 2:
- Reads the composed Greek characters in your document.
- Decomposes them into base letters plus combining diacritical marks.
- Writes the decomposed sequences back to the document.
Because GreekTranscoder 2 operates at the document level rather than through the keyboard input pathway, it can produce true combining diacritics that that would otherwise require external tools or convoluted copy-paste workflows.
Summary
| Input Method | Result | True Combining Diacritics? |
|---|---|---|
| macOS Greek Polytonic keyboard | Composed (NFC) | No |
| Any third-party keyboard layout | Composed (NFC) | No |
| Typing combining marks directly | Composed (NFC) | No |
| Typing in wrong canonical order | Composed (NFC) | No |
| Apps with custom text engines (BBEdit, etc.) | Decomposed | Yes, but outside Word |
| GreekTranscoder 2 with “Use Composing Characters” | Decomposed (scholarly or TLG) | Yes |
Note on NFD canonical ordering: true NFD ordering would arrange combining marks by ascending code point values (letter, then accent U+0301, then breathing U+0313, then iota subscript U+0345), placing the accent before the breathing. However, this contradicts the scholarly convention used in Greek studies (breathing before accent), which is reflected in Beta Code, the TLG corpus, and even in the official Unicode character names themselves (e.g., “ALPHA WITH PSILI AND OXIA”), making true NFD ordering technically correct but practically useless (normalizing decomposed text to NFC will produce identical composed characters regardless of the original diacritic ordering).
Support GreekTranscoder 2
If you find this tool useful, please consider making a donation. Developing and maintaining free software is an expensive hobby, and your support is welcome.
