Comparing Strings and the 4 Turkish I’s

Turkish I’s come in 4 forms: ı, I, i, İ

The Turkish I problem comes into play when comparing “standardised” strings, ones that have been converted to lower-case or upper-case.  This is commonly done for the purpose of comparing with a string literal, for example:

// PHP
$status = 'published';
if (strtoupper($status) == "PUBLISHED") {
    showPost();
} else {
    // ...

Using a Turkish locale or ISO-8859-9 charset, the above would fail. The reason behind this is that a Latin dotted “i” becomes a unique dotted capital “İ” in the Turkish alphabet. Likewise, the unique undotted “ı” maps to the Latin undotted capital “I”.

There are also considerations for displaying localised information which are out of the scope of this article.

Why should I care?

If what you have coded is going to be distributed, or your snippets published on the web, you should care. Your application, website, or simple snippet may be written in English, not translated and may not even be intended to be used for users who cannot read English.

This does not take into account English speaking users from or who happen to be in Turkey using systems and computers with a different locale or compilers with different cultures. There is a need to recognise that there may be users using different locales, cultures or character sets, and your application may be used on their systems

Solution

Internationalisation is demanding work. To handle this perfectly, a system would have to determine which content in an interface is in which language. For a long time on Android phones, Turkish users would see yet to be translated phrases come up as “EDITOR’S PICKS” for example. Even after handling these cases, one has to come up with a solution for names. Here is how Google Play handles capitalisation for LinkedIn, note the dotted capital “İ”:

How the Android App, Google Play, handles Turkish I's in names

How the Android App, Google Play, handles Turkish I’s in names

There is no easy one line of code solution. How you handle the problem depends on each case. For basic string comparison, it may simply be a matter of checking or changing the character sets beforehand. When handling localised content, it may require changing how data is input and capturing and storing locale information with input.

Sources:

Share and Enjoy

  • Facebook
  • Twitter
  • Google Plus
  • LinkedIn
  • Reddit
  • Pinterest

Leave a Reply

Your email address will not be published. Required fields are marked *