Free Webinar

Free Newsletter

Free White Paper

Free Quote

Blog

  News >> Newsletter

InfoMail, December-2007
GlobalVision International, Inc.


Translation database fuzzy matches and word count demystified

Most content publishers, software or other, perform updates on their product, manuals, website, and marketing literature yearly or on a more regular basis. They update their source language files and engage their localization vendor or staff to update all the supported target languages.

Changes to the source files take the shape of additions (new text), removal of obsolete text (deletions), or edits (modifications to existing text).

The new text requires new translations for each target language. The deleted text is disregarded. The edited or modified text will require updating in all target languages.

When a top-down localization process is applied and a translation database (translation memory or TM) is in use, the search engine looks for segment changes (complete phrases or complete segments) to the source. The following is the result:

  1. No match or new text: Typically generates little match in the database and requires full translation.
  2. Repeat or unchanged text: Generates a 100% match from the database, not requiring any changes.
  3. Edited or modified text: Results in a "fuzzy" match. This is a match in the database that can be anywhere from 50-99% of the original. Anything under 50% is considered no match.
  4. Deleted text: Produces no impact on the translation update effort, since the text no longer exists.
 

Fig1: Example of fuzzy matching analysis
 

Translation databases store language pair segments or sentences. A search engine is run on the newly released source text that analyzes the text one segment at a time, comparing it against what is already in the database.

·   If a 100% match is found, then it is considered an exact match.
·  
If the search engine finds a similar but not an exact match, it allocates a fuzzy match percentage to it, anywhere from 50% to 99%.

For instance, a sentence with ten words having just one word difference from a sentence stored in the database will result in a 90% fuzzy match. If it has only five words in common with another sentence, then the fuzzy match is 50%.

By calculating the fuzzy match of each sentence, one can approximate the effort of translation needed to perform the full update in any target language.

At GlobalVision, we apply weights to strings to calculate the "equivalent" new word count to translate. For instance, the sentence with ten words having just one word changed since the last release is calculated as two new words to translate (20%). A sentence of ten words with four or more words changed is calculated as ten new words (100%). Other percentages are applied in between.

Internal changes to the sentence tags (bold, italic, links, internal font or color change, etc.) will also force a fuzzy match. A weight is applied to these changes as well, as they also require translator intervention.

The analysis and calculations are done by the translation database/search engine software. These are based on algorithms built in the software that objectively approximate the new number of words to translate. The results are not 100% exact, but during the past ten years in using these algorithms, we have satisfied all our clients.

Applying an appropriate weight to each fuzzy match is a process that we use to estimate not only the cost, but also the staffing and scheduling data. This is why we can accomplish 98% of our projects on schedule and on budget for our clients!

You may add a link on your corporate Web site to any of our articles
  Articles:


International Expansion
• Going global on a shoestring?
• Are you considering localization?
• Visible and hidden benefits
• To localize or not to localize
• Top 5 localization myths

Planning Localization
• Single Sourcing is In
• Search Engine Geo-Optimization
• Product localization strategies
• Product localization processes
• Do's & don'ts in software development
• Pseudo-translate before you translate
• Best localization practices
• Selecting a localization team
• Selecting in-country reviewers
• Authoring for localization
• Single-sourcing for localization
• Word count demystified
• 10 Localization resolutions

Challenges in Localization
• 5 Reasons why localization fails
• Reduce localization costs
• Min. localization update costs
• Do's and don'ts in localizing art
• Last-minute source changes
• Localizing your website?
• Localization QA
• Localizing into Chinese?

Subscribe to: Free Articles
_______________________________


Localization Blog:
• Single-Sourcing Is In
• Localization and a Pixar Movie
• Benefits of Translation Management
  Systems (TMS)
• Geo-optimize your website to
   globalize your business
• Web 2.0 and Localization
• How to justify localization?
• No Global ECM without EMCM
• The case for FIGS
• How much should you trust?
• Tribute to the freelance translator
• 16 - 0
• When localizing your website,
 keep its DNA intact!
• Divide, Prioritize and Conquer
• Much talk about Machine Translation
• Crowdsourcing localization?
• The deceit of the translation sample
• Don't be stingy with your glossaries
• Thou shalt have transparent & free
  Translation Management
  Systems (TMS)
• Are you spending too much on
  AdWords?
• Give your dough to the baker
• Main Street or Wall Street when
  choosing a localization vendor
• ISO standards and localization
• Lowest localization rates

Subscribe to: Blog Posts (Atom)

_______________________________

NEW      NEW     NEW
www.gvAccess.com
Translation Management System (TMS)
Request a free webinar

 
Related Links Privacy Policy Terms of Use Site Map

Copyright © 2008 GlobalVision International, Inc. - All Rights Reserved