I sometime wonder whether SDL Trados programmers even understand the concept of fuzzy matching, or, if they do, whether they care or have pride in their job - only incompetent programmers would create or use a fuzzy matching algorithm that leads to ludicrous results such as these:
That's right: according to Trados, the segment "Ownership of the Services and Marks." is a 65% match for "Description of the Service and Definitions."
After all, "of the" and "and" are exactly the same in both sentences.
Mmm... Interesting.
ReplyDeleteIt's really fascinating to have an insight on how this fuzzy algorithm works. Thanks!
I wonder how statisitically significant this is.
I mean, how many hundreds (thousands) of valid fuzzy matches you get before you come across a funny one like this one?
Daniel
I never use a matching rate that is lower than 75%. It is useless and results are as shocking and disappointing as the ones you show here.
ReplyDeleteHi
ReplyDeleteYou're right, I've noticed the same thing before and reported it to SDL Trados on their ideas page. This issue gets even worse in languages like German and to a lesser degree Dutch that have compound words like Arbeitsmaßnahmengesetz. In more complicated long documents you'll end up typing those long words again and again, because Trados is unable to 'weight' word value, so to say.
I would like to know if other CAT programs work differently though. that would be a compelling reason to switch to another brand.
regards, Marinus
Thi is a 57% fuzzy match that appeared once in my Trados 6.5:
ReplyDeleteIt is a continuous process.
Es un desastre completo.
Cheers!
The match score 65% is obvious: 3 of 6 words are same and 1 word is very similar, so 4:6=0,66 or 66%, and -1% due to an additional character in "Services".
ReplyDeleteTrados programmers really don't care about their fuzzy matching algorithms (anymore) because SDL does not support the research on this feature. Neither Trados nor its competitors care about a linguistically reasonable fuzzy matching because such a matching cannot be language-independent, but Trados and others must be language-independent for commercial reasons.
Trados even performs fuzzy matching inside of words. So you can discover more amazing fuzzy matches in Trados if you create a segment of say 5 words and try to match it with another segment of also 5 words, while 4 words are identical and the 5th word has only one different letter. The match score will be the higher the more identical letters the different word contains.
I guess you guys don't rely on fuzzy match. I wonder how it works in Asian languages...
ReplyDeleteShouldn't "marchi" be "brands" up there, bro?
DeleteNot necessarily, in fact, depending on the context, probably not (e.g., if "marks" meant "trademarks").
DeleteYou can see this is an old thread by the fact that Trados (now Studio) has implemented a way around this problem with the option AutoSuggest years ago.
ReplyDelete"Fuzzy Matches" still suck - in fact, they've become worse. Any "fuzzy match" below, say, 75% is now entirely useless. But at least it has become a lot easier to avoid retyping long words now.
The reply by SDL and other CAT builders to the suggestion of giving weight to word length was that not every language has long words (e.g. Chinese), but it seems BS. Someone probably owns the patent.
Well, you can also see that it is an old thread by the fact that the original post is dated August 2008. And, yes, fuzzy matches suck more than ever in the (in other respects, much improved) SDL Trados Studio.
Delete