English

Greek Unicode Compliance Checker

Paste (or type) polytonic Greek text below to check its compliance with the four Unicode modes supported by GreekTranscoder 2.

Note: for accurate results, paste directly from Microsoft Word or another application which, like Word, preserves text formatting (Safari normalizes plain text to NFC, which may alter the original character encoding).

Composed Scholarly
Enter text to check
Composed NFC
Enter text to check
Composing Scholarly
Enter text to check
Composing TLG
Enter text to check

Reference

Composed Scholarly

  • Oxia vowels: ά έ ή ί ό ύ ώ (U+1F71, 1F73, 1F75, 1F77, 1F79, 1F7B, 1F7D…)
  • Ano teleia: · (U+0387)
  • Greek question mark: ; (U+037E)
  • Numeral sign: ʹ (U+0374)

Composing Scholarly

  • Greek combining: U+0340 (grave), U+0341 (acute), U+0342 (perispomeni), U+0343 (psili), U+0314 (dasia)
  • Greek punctuation (same as Composed Scholarly)

Composed NFC

  • Tonos vowels: ά έ ή ί ό ύ ώ (U+03AC, 03AD, 03AE, 03AF, 03CC, 03CD, 03CE…)
  • Middle dot: · (U+00B7)
  • Semicolon: ; (U+003B)
  • Modifier prime: ʹ (U+02B9)

Composing TLG

  • Generic combining: U+0300 (grave), U+0301 (acute), U+0303 (tilde), U+0313 (psili), U+0314 (dasia)
  • NFC punctuation (same as Composed NFC)

Composing Diacritics: A Technical Note

The Problem with Keyboard Input

Modern operating systems, particularly macOS, implement automatic Unicode normalization in their text input systems. When you type Greek text using any keyboard layout (including those that nominally produce combining diacritics), the operating system silently converts the input into composed characters (NFC) before it reaches the application.

For example, when typing the sequence:

macOS automatically substitutes the composed equivalent:

This normalization occurs transparently, regardless of which keyboard layout you use or how you configure your input method.

Automatic Canonical Ordering

Even more remarkably, macOS automatically rectifies incorrectly ordered combining marks. According to the Unicode standard, combining diacritics must appear in a specific canonical order based on their Combining Class values. If a user types diacritics in the wrong sequence (for instance, by entering the iota subscript before the breathing mark), macOS silently reorders them and produces the correct composed character anyway.

This behavior, helpful though it is, means that standard macOS applications cannot receive true combining diacritics through keyboard input. The operating system’s text services layer intercepts and normalizes all input before it reaches Microsoft Word or any other application that uses standard Cocoa text handling.

A few applications with custom text engines—such as BBEdit, Sublime Text, or terminal emulators—bypass this layer and can receive raw combining characters. However, text created in these applications must then be transferred into Word, where the normalization problem resurfaces.

GreekTranscoder 2: An Elegant Solution

GreekTranscoder 2 bypasses the operating system’s input normalization by working directly on existing document content. When you select the “Use Composing Characters” option, GreekTranscoder 2:

  1. Reads the composed Greek characters in your document.
  2. Decomposes them into base letters plus combining diacritical marks.
  3. Writes the decomposed sequences back to the document.

Because GreekTranscoder 2 operates at the document level rather than through the keyboard input pathway, it can produce true combining diacritics that that would otherwise require external tools or convoluted copy-paste workflows.

Summary

Input Method Result True Combining Diacritics?
macOS Greek Polytonic keyboard Composed (NFC) No
Any third-party keyboard layout Composed (NFC) No
Typing combining marks directly Composed (NFC) No
Typing in wrong canonical order Composed (NFC) No
Apps with custom text engines (BBEdit, etc.) Decomposed Yes, but outside Word
GreekTranscoder 2 with “Use Composing Characters” Decomposed (scholarly or TLG) Yes

Note on NFD canonical ordering: true NFD ordering would arrange combining marks by ascending code point values (letter, then accent U+0301, then breathing U+0313, then iota subscript U+0345), placing the accent before the breathing. However, this contradicts the scholarly convention used in Greek studies (breathing before accent), which is reflected in Beta Code, the TLG corpus, and even in the official Unicode character names themselves (e.g., “ALPHA WITH PSILI AND OXIA”), making true NFD ordering technically correct but practically useless (normalizing decomposed text to NFC will produce identical composed characters regardless of the original diacritic ordering).

Donate

If you find this tool useful, please consider making a donation. Developing and maintaining free software is an expensive hobby, and your support is welcome.