Fuzzy Match or Fuzzy Math?

Recently, a blog was published about Translation Memory Matching. In it, the authors explained what Translation Memory (TM) is and how fuzzy matching is derived. But a couple of things in their blog did not add up prompting me to write about them.

The following is the example that they gave:

Assuming a source segment:
The lazy brown fox jumped over the quick brown dog.

And a previously translated segment in the TM:
The lazy brown dog jumped over the quick brown fox.

Comparing the two, there are 10 words in the new source segment of which 2 have effectively deviated from the one in the TM. The word dog became fox and the word fox became dog.

While explaining how Translation Memory fuzzy match engines work, the authors indicated that the source segment contains 50 characters and that the fuzzy match is calculated at 92%.

From the given information, it seems that the fuzzy match engine that they apply uses the Levenshtein distance method to calculate the penalty. There are 4 character substitutions out of 50 total (f, x, d and g in fox and dog), or 46 / 50 characters remaining the same, equaling 92% match. Sounds good?

But wait a second, since when do translators translate characters, one at a time? Sounds like they are comparing apples to oranges, no?

Although the Levenshtein distance is a good method to identify possible fuzzy matches from a large pool of already translated segments in the TM, it is by no means an accurate method to calculate the actual fuzzy match %. This fuzzy match % is a measure that identifies the work ahead for the translator to change the fuzzy match segment to accurately represent the meaning of the new source segment. Two changing words out of 10 sounds a lot more like an 80% fuzzy match to me. But then again, translators do not translate a word at a time. Still, 80% is a lot more accurate than 92%.

Why is that important you may ask? Well, for many reasons, including proper planning, scheduling and budgeting (read word count demystified). So, if you are a translator working for a vendor, make sure you question their fuzzy math calculations when they issue you the translation purchase order.

Another important point was the explanation given for why vendors charge their clients to accept 100% matches. The explanation makes sense. But since TM tools enable the acceptance of 100% matches with “minimal efforts”, if you are the client, what do you think is a reasonable overhead cost for you to pay in this case? Look up what you are being billed on 100% matches and if you think it is more than a “minimal cost”, ask them why that is so.

Chances are you won’t like their numbers or their answers, so don’t hesitate to join our webinar “Translation Localization process”.


You might also like

One Comment

  1. Website translator
    Posted December 16, 2011 at 2:52 pm | Permalink

    Interesting post. But I would like to highlight that fuzzy matches may not even correspond to the same words, if the characters are the same or similar. I have found cases where I had an 80% match and none of the words matched!

    Worst of all, even 100% matches are misleading, because they match the original text in the source language, but this may have different translations in the target language. A typical example is gender – a sentence like “Place it on the table” will have a different translation in Spanish depending on whether “it” has a male or female gender in Spanish.

Post a Comment

  • Our Clients

    • Alcohol Countermeasure Systems logo
    • Active Endpoints logo
    • AirVersent logo
    • Biomerica logo
    • Canspan Communications logo
    • Constant Contact logo
    • Zeiss logo
    • Daktronics logo
    • DigiLabs logo
    • Diversified logo
    • DYMO logo
    • Ecovation logo
    • GibbsCAM
    • Intuitive Surgical logo
    • Jarden Consumer Solutions logo
    • Northwest Aluminum Specialties logo
    • NWL logo
    • Questia logo
    • Shore View logo
    • SolidWorks logo
    • Spark Creative Services logo
    • Spatial logo
    • Star Trac logo
    • The Cavanaugh logo
    • UW  Center for AIDS and STD : CFAR logo
    • The Mathworks logo
    • Telephonics logo
    • Adecco Group logo
    • Ciena logo
    • Coeur logo
    • iCAD logo
    • Kaz logo
    • nVision Global logo
    • IMSI Design logo
    • Siemens logo
    • cfDesign logo
  • Request Information



     General Information Attend Webinars Read White Papers Test Your Skills

    Requirements

  • Subscribe to Blog

    Enter your email address: