Home Page

 

  

Dissertation Contents.

 

1. Why the Assistant Programme project was conceived.

 

1.2 Personal interest in this area of study.

 

2. Description of the Assistant programmes.

2.1 Programming languages and external software used.

 

2.2 The Assistant Programme layout.

 

2.3 Selecting phrases for the email.

 

2.4 Tackling questions of style in textual expression.

 

2.5 Search for phrases not found in the General Email Subjects section.

 

2.6 Dialogue emails.

 

2.7 Assistant Trainer: for more advanced students.

 

2.8 Building the Corpora for the Assistant Programmes.

 

3. State of the Art

3.1 Machine Translation systems and how they work.

 

3.2 Practical applications of the MT processes.

 

3.3 Description of a translation software package.

 

4. Assistant Programme effectiveness

4.1 Translation effectiveness.

 

4.2 Teaching potential.

 

4.3 The methodological approach of the Assistant Programmes as a self-study tool.

 

5. Demand and practical implementation.

5.1 Comments on the ELAN report: “Effects on the European Economy of Shortages of Foreign Language Skills in Enterprise (2006)”.

 

Bibliography

 

***********

 

 

 

 

The philosophy behind the creation and development of the Assistant Programme project.

Dissertation.

 

3. State of the Art

 

3.1 Machine Translation systems and how they work.

 

Before embarking on a hands-on testing and comparison approach between the Assistant Programmes and machine translators, it seems appropriate that having explained the workings of the Assistant Programmes I should discuss the nature of present-day machine translation.

 

We can identify two very different procedures in machine translation (MT). The first is rule-based machine translation (RBMT) which, as the name implies, provides a text translation based on the grammatical, morphological and syntactical rules of the languages in question. This system was in vogue during the 1980's but evolved towards a more holistic approach to MT in the 1990's which was data-driven or corpus-based. Two divisions in this latter focus in MT have been termed statistical machine translation (SMT) and example-based machine translation (EBMT). The engines behind these two systems rely on large amounts of corpora of bilingual texts and their translations previously carried out by human translators.

 

SMT arrived on the MT scene in 1988 developed by the IBM group after a realisation of the needs of translation systems which involved references to real texts rather than just the rules of language, which in all living languages are plagued with exceptions and nuances. SMT employs a process which aligns sentences, phrases or “clumps” and individual words between the source language (SL) and target language (TL). From this process, a frequency of words sequences is derived: SLTL. The resulting translation is extracted from evaluating the most likely TL word or words from the SL. Inevitably, the larger the corpus of bilingual text the more efficient the resulting translation. The differences beween SMT and EBMT are not wholly clear. According to Hutchins EBMT: “uses segments (word sequences (strings) and not individual words) of source language (SL) texts extracted from a text corpus (its example database) to build texts in a target language (TL) with the same meaning.” But Hutchins later adds: “EBMT model is less clearly defined than the SMT model [...] there is a multiplicity of techniques, many derived from other approaches, including methods used in RBMT systems, methods.” (Hutchins, Machine Translation - Two Main Types, 2007).

 

On this point, we can bring in an article published in the New Scientist. The article, called “Software learns to translate by reading up” discusses the SMT process and how this approach not only allows a machine to translate more accurately but also learns more about language structure by its ability to compare existing human translations. Interestingly, the article also points out the need to rely on rules (RBMT) meaning that SMT borrows from RBMT requirements but the difference being that it is the machine itself which creates these rules from the texts it analyses.

 

“The key to their [Kevin Knight and Daniel Marcu's] `statistical machine translation software' are the translation dictionaries, patterns and rules - translation parameters - that the program develops. It does this by creating many different possible translation parameters based on previously translated documents and then ranking them probabilistically.” (Knight, 2005).

 

MT systems which veer away from the definitions described above have not yet appeared. According to Hutchins “the dust has settled” on machine translation techniques. This does not mean, however, that there exists no room for improvement in MT output. The limitations on better translating power are directly proportional to the size of the corpora under analysis but with the Internet amassing huge amounts of translated texts, vast corpora are already more readily available. The main obstacle then is the power of the machine running the translating process. As Kevin Knight puts it: "The secret to machine translation is computer power, [...] It takes really big and fast computers."

Top

 

blue_stem.gif

 

3.2 Practical applications of the MT processes.

 

In this section I shall discuss the practical applications of machine translation according to the MT developer and analyst John Hutchins and then my own assessment of that appraisal bringing in lay users's opinions of MT obtained from Internet forums. Hutchins has defined three basic types of MT application: assimilation, interchange and dissemination. (Hutchins, Commercial Systems, 2005).

 

Assimilation:

Probably, the vast majority of non-professional MT use lies in this area. Hutchins suggests there is a great need for the translation of text especially received or viewed electronically into the user's language just for comprehension purposes. This, Hutchins terms “gisting”. The fact that the end result is not accurate does not suppose a disadvantage says Hutchins. The user just requires an overall message from an email, report or website description and if necessary takes action based on that message. Large search engines such as Google now offer translation from an ample selection of languages so that users can understand the gist of websites written in foreign languages. Online sources (often free) offer text boxes where whole paragraphs or even pages of foreign text can be pasted and translated into the required language at the click of a button. Such companies are Babelfish from Alta Vista. The author of the Spanish.about.com website lists and describes the few others available free on the Internet:

 

Intertran

Translation to and from about two dozen langauges. Distinguishes between U.S.and British English, and between Spanish and Latin American Spanish.

 

Dictionary.com

Translate text or Web pages to or from English and other major European languages.

 

PROMT Online Translator

Online translation of Web sites, e-mail, mobile phone content and straight text.

 

PROMT - Reverso Online

Paste in the text and get an instant translation. Languages include English, Spanish, German, Russian.

 

Interchange:

The next application category of MT described by Hutchins is the exchange of information between online users. Hutchins believes there is an increasing demand for this aspect of translation: “The need is for immediate translation in order to convey

the basic content of messages, however poor the input. MT systems are finding a natural role here, since they can operate virtually or in fact in real-time and on-line and there is little objection to the inevitable poor quality.” (Hutchins, 2005). He goes on to say that there is no place here for the services of the professional translator as time is the predominant factor. I understand Hutchins is envisaging the following situation: an English user in business with no or little knowledge of Spanish receives an email in Spanish from a prospective customer, he or she submits the text to a machine translator and understands the gist of the message. The same user then writes a reply in English and submits that text to the MT tool. The resulting Spanish text is then emailed back to the sender.

 

Dissemination:

This, according to Hutchins, is where texts are translated by MT but as they can never offer an accurate rendering of the language where precision is required they need revision by human translators. Over the last few years many translator's tools have been developed to work with MT called Translator Memories (TMs). An MT translated text can then be adapted and rectified so a correct version can be produced. Nowadays, large-scale programmes called Translator's Workstations can combine TMs with other helpful tools such as word processors, optical character recognition (OCR) scan features and terminology management software to be used in conjunction (or as a stand-alone tool) with TM. Hutchins describes this application and its importance in large companies thus: “A pre-requisite for successful MT installation in large companies is that the user expects a large volume of translation within a definable domain (subjects, products, etc.), and that the user has available (or has the resources required to acquire or to create) a terminological database for the particular application.” (Hutchins, 2005).

 Top

 

 blue_stem.gif

 

3.3 Description of a translation software package.

 

The following is a review from a comercial site (Top Ten Reviews), ie. it sells translation software receiving a commission per sale. However, the review is still interesting as it has graded 10 software packages and gave “Power Translator Pro” from the company LEC the highest mark on translation performance of the 10 it sells. The review is presumably promoting the virtues of the package yet I believe that, in fact, it achieves the opposite.

 

The most interesting section of the review is the report on translation efectiveness. Reading the comment leaves one thinking there is some contradiction in terms at scoring a perfect 10 for software packages which “are never perfect”:


”Effectiveness:

Power Translator Professional received a perfect score in translation effectiveness. Though machine translations are never perfect, LEC's translation engine is the best we've seen. We tested the software in multiple ways using all of the modules. We used the LEC File Trans tool to get the following results...

 

The flaws in this software (“the best we've seen”) are highlighted by the test translation offered as evidence. I could not find the Spanish equivalent this was apparently translated from, notwithstanding this, the resulting text leaves any native English speaker perplexed at how this software could score a perfect 10!

 

“Translation Test Results:

“You think that you can go some whole vacations without listening the made famous question in the entire world for the children: Is it “us there still?” It is possible with the modern cruises and itineraries boy - friendly. These centers vacacionales roving make traveling to the destination half the amusement.

 

Disney threw away the first cruise - of family friendly in the half-filled ones of - 1990s; this effort motivated a maremoto of the cruises that the whole family could enjoy. Today, family - the friendly cruises are experiencing the biggest growth in the cruise industry. Not there is a cruise that is developed that it lacks lodgings for children of all ages. So find their marine foot, jump I approach and take you their family for a life in some vacations that you/they will remember.”

 

Admittedly, the gist is recognizable in most of the text. However, there are obvious problems with syntax eg: “Not there is a cruise...”; lexical/register choice eg: “lodgings for children...; ...roving... ”; idiomatic expression translation: “find their marine foot” (sea legs?) and “half the amusement” (half the fun?); mis-translation: “Disney threw away the first cruise”; possessive adjective recognition: “their marine foot” and “their family” is presumably “your marine foot” and “your family” because of the use of the leading imperative “find”. Interestingly, at the end of the sentence, the programme recognizes the possible ambiguity and offers both pronouns: “you/they” for the reader to choose from.

 

The reviewers' conclusions are optimistic to say the least:

“If you are a professional, business owner or webmaster and frequently need to translate various files, Power Translator 11 Professional is the best solution.”

Nevertheless, the programme does serve a use for understanding texts even though even this might prove difficult at times. Obviously, the software could not carry out complex translation tasks requiring accuracy, that is, compiling precision emails for practical business tasks between companies.

 

 (This article is copyright Michael Bilbrough 2008. All rights reserved.)

 

4. Assistant Programme effectiveness

4.1 Translation effectiveness.

Top

 

blue_stem.gif