As with software, if the data fed into the computer program (read source text) is bad, the program's results (read translation) will also be bad. I thought of this when reading ultan's recent post "Information Quality, MT and UX" on Multilingual Computing's Blogos blog. ultan notes that quality information not only makes machine translation easier, but simply is better information that is more easily understood by both humans and machines.
So what is quality information? I think quality information consistent and concise, but well-written text with an audience-appropriate level of technical terminology. In this context, well-written refers to grammatically correct, clear structures free of spelling and punctuation errors. Clearly the amount and complexity of subject-specific terminology used depends on the text's end users. Installation instructions for consumers will need to be practically jargon-free (and contain explanations of any unavoidable terms), while specifications for computer programmers can contain quite a few acronyms and still be easily understood.
While this last statement is generally true, I have had to deal with source text that was replete with abbreviations specific to a particular company, without having access to an internal list of these acronyms (if such a list even existed). Since the assignment was the usual rush job via a translation agency in another time zone, there was no way to ask for and receive such a list in a timely manner. I did my best guessing the meaning of many of the abbreviations from context and annotated the rest with translator's notes.
I was initially surprised at how frequently source text -- even fairly lengthy whitepapers and similar types of text -- appears not to have been proofread, let alone copy-edited. After reading a couple of books on technical and business matters recently, I am no longer surprised. Even books being printed and sold in bookstores don't seem to undergo much of a quality-assurance process any more. A case in point is Tamar Weinberg's "The New Community Rules: Marketing on the Social Web", which I am in the process of reviewing for an upcoming issue of the Society for Technical Communication's magazine Intercom, which contains quite a few instances where sentences seem to have been hurriedly revised and fragments of the sentence's previous incarnation left behind or too much taken out. So if books aren't proofread any more, what can we expect from internal industry papers or instructions?
However, such poorly written source text not only hampers the flow of reading, it often also adds ambiguity to the text. After all, if there are two conjunctions when only one should be present, which of the two did the author intend to use? And if I pick the wrong one, the translation could be completely misleading. But never having seen the machine for which I am translating the instructions, how would I know whether the correct conjunction here was "and" or "or"?
Yes, we do need quality assurance for translations. But we also need quality assurance for the source text -- not only for the translator's sake, but also for the reader's sake. As programmers are fond of saying: Garbage In, Garbage Out -- GIGO.