Given the constant competitive pressure
on executives to expedite product time-to-market, many developers are
given tight deadlines to deliver functional software. This software is
often geared for localization once the source language version is ready
for release.
Keeping these pressures in mind,
developers can strive to ensure that basic principles are maintained
while developing software to facilitate localization efforts – and meet
time-to-market requirements for all the required languages, not just the
source.
Here are 12 do’s and don'ts that all
developers should read and apply in their work.
1. Do externalize messages in
Message Catalogs, resource files, and configuration files. Messages
are textual objects that are translatable components. These catalogs or
files, such as Java resource bundle message files or Microsoft resource
files, are installed in a locale-specific location or named with a
locale-specific suffix.
This practice will facilitate the
localization process, since localizers can work on these resource
bundles without the need to modify source code. It will also permit the
use of a single source code for all languages, where only the resource
bundles will have different language flavors.
2. Don't internationalize fixed
textual objects. These are objects that should not be translated,
such as comments, commands, and configuration settings. Only externalize
the strings needing translation. If these objects appear in resource or
configuration files, they should be marked "NOT_FOR_TRANSLATION.”
Here are some examples of fixed textual
objects:
-
User names, group names, and
passwords
-
System or host names
-
Names of terminals (/dev/tty*),
printers, and special devices
-
Shell variables and environment
variable names
-
Message queues, semaphores, and
shared memory labels
-
UNIX commands and command line
options (e.g., ls -l is still ls -l in all locales)
-
Commands such as /usr/bin/dos2unix
and /usr/ccs/bin/gprof
-
Commands that are XPG4-compliant (in
/usr/xpg4/bin/vi) and have equivalent non-XPG4 commands; non-XPG4
commands that are not fully internationalized. For example, /usr/bin/vi
does not process non-EUC codesets, but /usr/xpg4/bin/vi is fully
internationalized and can process characters in any locale.
-
Some GUI textual components, such as
keyboard mnemonics and keyboard accelerators
3. Do allow for
text expansion in messages (especially for GUI items).
Here are some Microsoft translations
into German:
-
bullet –> Aufzählungszeichen
-
bundle –> Einzelvorgangsbündel
-
Link –> Verknüpfung
-
Login –> Anmeldung
-
Update –> Aktualisierung
-
Undo –> Rückgängig (machen)
-
Geschäftsaktivitätsüberwachung
replaces the acronym BAM (Business Activity Monitoring)!
Apply the following expansion rules
when possible. When the source text is:
-
0 – 10 characters: The expansion
required is from 101 – 200%.
-
11 – 20 characters: 81 – 100%
-
21 – 30 characters: 61 – 80%
-
31 – 50 characters: 41 – 60%
-
50 – 70 characters: 31 – 40%
-
Over 70 characters: 30%
But keep the string length well below
your limit (usually 254 characters) to account for the extra characters
needed.
Try to place the labels above the
controls, not beside them. The expansion of a label can increase the
width of the form more than the expected resolution, which will force
horizontal scroll bars or cause truncation. This also simplifies
localizing applications required into bidirectional languages (languages
that are read from different directions [RTL or LTR], such as Arabic and
Hebrew).
4.
Don't use variables when you can avoid
them. Variables create
questions in the translator's mind as to the gender of the term to
substitute, making it difficult to correctly translate the sentences
that incorporate it. If variables are to be used, offer a list of
replacements. Also allow for gender and plurals variations in the
translation of the sentences that incorporate the variable.
For example:
<%if err = 400
errtext = "server"
else
errtext = "connection"
end if
<P> The <%=errtext%> is currently
unavailable </P>
While this displays grammatically
correct sentences in English, the translation in French will be
problematic. In French, the word "server" is masculine, while the word
"connection" is feminine. The translator cannot use the correct
translation for the article "the" based on the translation of the
differing genders of server and connection.
The code should be instead:
<%if err = 400
<P> The server is currently
unavailable </P>
else
<P> The connection is currently
unavailable </P>
end if
At the same time and for similar
reasons, don't use composite strings. A composite string is an error
message or other text that is dynamically generated from partial
sentence segments and presented to the user in full sentence form. Use
complete sentences instead, even at the expense of repeating segments.
This will ensure the accuracy of the translation, regardless of gender,
plurality, conjugation, or sentence structure.
Also, avoid using the same placeholders
when using multiple variables in the same string, since the sentence
structure does change in different languages. For example, <Total %s, %s
of %s> (as in Total 5, 1 of 5) might read "5 of 1, Total 5" in the
translated text. Instead, use numbered placeholders (e.g., "Total %1, %2
of %3").
5. Do perform pseudo-translation.
Pseudo-translation is the process of replacing or adding characters to
your software strings to detect character encoding issues and hard-coded
text remaining in the source files.
Here's an example of a few strings from
a C resource file, with their respective pseudo-translations in
Japanese:
IDS_TITLE_OPEN_SKIN "Select Device"
IDS_TITLE_OPEN_SKIN "日本Sイlイct
Dイvウcイ本日"
IDS_MY_OPEN
"&Open"
IDS_MY_OPEN "日本&Opイn日"
In these strings, Japanese characters
replace the vowels in all English words. After compilation, testers can
easily detect corrupt characters (junk characters replacing the Japanese
characters) or strings that remain fully in English (source strings
still embedded in the code).
6. Don’t use IF Conditions or rely
on a sort order in your code to evaluate a string value. For
example, avoid (IF Gender = “Male” THEN). Always depend on enumeration
or unique IDs.
7. Do use Unicode functions and
methods to support all scripts. Applications that store and retrieve
text data need to accept and display the characters from any given
language. Using Unicode encoding solves the problem of unsupported
character sets and the display of junk characters.
8. Don't insert hard carriage
returns in the middle of sentences. Translation memory tools key off
hard returns and assume that the sentence has ended. Inserting them in
the middle of a sentence leads to incomplete sentences in the
translation database
and corrupts the sentence structure in the target language files.
Instead, replace hard returns with soft returns (or better yet, use a
break tag of some sort, such as <BR>).
Also be aware that sentence structures
change in different languages, as well as the length of sentence parts.
So, additional breaks may be needed in target languages.
9. Do choose your third-party
software provider carefully. Insist they support Unicode and comply
with the above practices. Often problems are encountered with
third-party software, and the fact that you don't have control over
their code to fix the problems makes the localization tasks particularly
difficult.
10. Don't use text in icons and
bitmaps.
The translated text may be too long to fit. Also, avoid using symbols
with cultural connotations and locale-specific idioms.
11. Do use long dates or month
abbreviations instead of numbers when identifying dates. Month vs.
day orders in different parts of the world vary (e.g., mm/dd/yy in the
US; dd/mm/yy in Europe).
12. Don't alphabetically sort
strings in string tables and resource bundles. Try to offer as much
context as you can with the externalized strings. This will help the
translator better adapt the translation to that context. If context is
non-existent,
run-time QA will take much longer to correct the translations.
For example: "Update" could be the
action (to update) or the software itself. "Check" in a financial
software could be the action (noun or verb), or the monetary equivalent.
"Email" could be a verb or a noun.
Following these simple principles will
expedite localization and reduce testing, rework, and quality assurance
costs – ultimately allowing you to meet the strict time-to-market
requirements expected from companies selling products worldwide.
To get proactive assistance in
addressing the above software issues during localization as well as
others, do not hesitate to
contact our localization experts. You may add a
link on your corporate Web site to any InfoMail.