Most content
publishers, software or other, perform updates on their product,
manuals, website, and marketing literature yearly or on a more regular
basis. They update their source language files and engage their
localization vendor or staff to update all the supported target
languages.
Changes to the
source files take the shape of additions (new text), removal of obsolete
text (deletions), or edits (modifications to existing text).
The new text
requires new translations for each target language. The deleted text is
disregarded. The edited or modified text will require updating in all
target languages.
When a top-down
localization process is applied and a translation database (translation
memory or TM) is in use, the search engine looks for segment changes
(complete phrases or complete segments) to the source. The following is
the result:
-
No match or new text:
Typically generates little match in the database and requires full
translation.
-
Repeat or unchanged text:
Generates a 100% match from the database, not requiring any
changes.
-
Edited or modified text:
Results in a "fuzzy" match. This is a match in the database that
can be anywhere from 50-99% of the original. Anything under 50% is
considered no match.
-
Deleted text:
Produces no impact on the translation update effort, since the
text no longer exists.
|
|
 |
|
Fig1: Example of fuzzy matching analysis
|
Translation databases store language
pair segments or sentences. A search engine is run on the newly released
source text that analyzes the text one segment at a time, comparing it
against what is already in the database.
·
If a 100% match is found, then it is considered an exact
match.
·
If the search engine finds a similar but not an exact match, it
allocates a fuzzy match percentage to it, anywhere from 50% to 99%.
For instance, a
sentence with ten words having just one word difference from a sentence
stored in the database will result in a 90% fuzzy match. If it has only
five words in common with another sentence, then the fuzzy match is 50%.
By calculating the
fuzzy match of each sentence, one can approximate the effort of
translation needed to perform the full update in any target language.
At GlobalVision, we
apply weights to strings to calculate the "equivalent" new word count to
translate. For instance, the sentence with ten words having just one
word changed since the last release is calculated as two new words to
translate (20%). A sentence of ten words with four or more words changed
is calculated as ten new words (100%). Other percentages are applied in
between.
Internal changes to
the sentence tags (bold, italic, links, internal font or color change,
etc.) will also force a fuzzy match. A weight is applied to these
changes as well, as they also require translator intervention.
The analysis and
calculations are done by the translation database/search engine
software. These are based on algorithms built in the software
that objectively approximate the new number of words to translate. The
results are not 100% exact, but during the past ten years in using these
algorithms, we have satisfied all our clients.
Applying an
appropriate weight to each fuzzy match is a process that we use to
estimate not only the cost, but also the staffing and scheduling data.
This is why we can accomplish 98% of our projects on schedule and on
budget for our clients!
You may add a link on your
corporate Web site to any of our articles