Q&A: How are line breaks handled in bidirectional messages containing both English and Hebrew?

This is a very good question, as it relates to one of several unresolved problems in using traditional (‘Square’) Hebrew text in computer environments:

Mixing RTL (right-to-left) text such as Hebrew or Arabic with LTR text such as English usually wreaks havoc on the display order of the text. One reason is the conflict between two competing standards of encoding in Hebrew—Logical, and Visual:

The bi-directional way (logical method) and the visual method. In the logical method characters are stored in the electronic document in the order that a normal person would type, and in the visual method the characters are ordered assuming that the display device will order them left-to-right. In HTML, only the logical method is a real standard.[1]

Read More…

Q: Why is it that Arabic can be written seamlessly with left-to-right text (e.g. Latin) in between words but Hebrew is a catastrophic mess?

Great question. Here’s an excerpt from an explanation that I gave in a white paper in 2002:

The problems are profound and impact at the most fundamental level. Even something as basic as saving Square Hebrew text in a computer file is subject to two (theoretically three) competing and mutually exclusive methods— Visual and Logical (or Implicit), each with its own advantages and drawbacks. Neither can serve as the single agreed standard for all contexts, because while one (Visual) was originally specified by The Standards Institution of Israel (SII) as the preferred method for Hebrew email and websites [Footnote 1], Logical is the method opted for by Microsoft in its Windows operating system and applications, following an in-depth analysis of the “BiDi” (bi-directionality) problem in Middle-Eastern scripts in general [Footnote 2 ]. (The third scheme–Explicit –is technically an extension of Logical.)

Although the SII has since revoked its original recommendation and now recommends the Logical method [Footnote 3], the Visual has become the de facto standard in online Hebrew, the standoff is unlikely to be resolved any time soon, and the Hebrew user has no choice but to live with the consequences. If using computers in English is akin to having an all-terrain vehicle, computing in Hebrew is like being a tourist in a foreign country with nothing but a restricted bus pass.

These include:

  • The order of characters is reversed whenever Hebrew text is cut from a Visual application (e.g., most Hebrew websites) and pasted into a Logical one (e.g., Hebrew Microsoft Word):

Correcting this can be done only manually, or through the use of special third-party utilities (such as Flipper, or Alon’s HTML Utility), whose existence most users are either unaware of, or prefer not to purchase.

  • The unpredictable placement of parentheses, hyphens, colons and other punctuation in Hebrew text as you type, which often brings about an unintended jumbling of the intended order of words:

Punctuation marks, such as parentheses and hyphens, regularly cause havoc in Square Hebrew documents. Want to have hours of fun? Try typing the official International Standards Organisation’s standard name for Hebrew in a Hebrew document—ISO-8859-8—it simply can’t be done.

  • Word-wrapping isn’t automatic in Visual-mode documents (e.g., Web pages): line breaks must be inserted manually, or some other technique used to ensure text appears as intended and justified to the right

The apparent order of words in multi-term names or phrases of any Roman (LTR) script in Hebrew text (Visual or Logical) changes unpredictably when the paragraph layout is changed, often rendering them meaningless or misleading:

Fun times.

If in Arabic these problems don’t exist, presumably it’s because only one standard was applied from the get-go.

References:

[1] Application of Hebrew in mail messages transfer in TCP/IP networks (Dig.Classif. 62.39:68.3) August 1995, The Standards Institution of Israel http://www.itpolicy.gov.il/vadat… [lapsed link—JOS]

[2] Middle Eastern Language Issues, Global Software Development page, Microsoft. http://www.microsoft.com/globald… [lapsed link]

[3] Matitiahu Allouche, BIDI Architect, IBM Israel, Globalization Center Of Competency—Bidirectional Scripts, and member of the SII technical committee #1109 on Hebrew in Computerized Systems and the Internet.

*

When writing in Hebrew in a word processor, how do I format it so that Latin text can be written after the Hebrew text?

(My answer to this question at Quora.com)

As Amir Aharoni points out, wordprocessing in Hebrew—especially when it involves mixing it with Roman text—is still a largely unresolved minefield, with certain problems “baked in” to the standards for Hebrew display in computerised contexts. Disruption of order—of words, of letters, of Hebrew and Roman text—is one of the most frequent problems. Even if you manage to get it to display properly in one file, once you do anything to it—convert it to another format, cut-and-paste it to another file, or even just insert a punctuation mark, such as a comma—it will likely be disrupted.

Continue reading