Translation Technologies
Written by MICHAEL BURNETT
TALKING THE TALK TO FIGHT THE FIGHT.
Special forces have difficult and dangerous enough jobs as it is, but add language barriers in Iraq and Afghanistan as an obstacle to the mission and even innocent misunderstandings could become deadly. And so USSOCOM has been working with VoxTec International, Annapolis, Md., to refine a one-way translation device that enables special operators to communicate in six languages of the region— modern standard Arabic, Iraqi, Pashto, Urdu, Dari and Kurmanj. VoxTec deployed the third generation of this device, its Phraselator P2, to warfighters in 2004. Software upgrades last year gave users of P2 the ability to create favorites by dragging and dropping phrases from different preset categories into custom categories. They also could add new translations directly onto the device through a trusted translator, and the upgrade provided better verbal searches of the phrases as well.
But now the company has a new version that has incorporated feedback from the field. John Hall, a former SEAL and president of VoxTec, has been listening to SOF and has worked with Ace Sarich, also a former SEAL and inventor of the Phraselator, to make desired improvements a reality.
“That feedback was to have a smaller version than the P2, to have the voice recognition and the usability but to have it in a hands-free, eyes-free mode, and to make it a wearable device,” Hall told Special Operations Technology.
VoxTec recently demonstrated the new product, known as the Squad Integrated Device or SQUID, to USSOCOM in January. It’s a small device, measuring 6 inches by 3 inches by 1.5 inches. It weighs only 13 ounces and it’s completely covered in rubber for durability. It fits on a sheath on a standardissue vest and comes with a headset for handsfree operation.
The SQUID device sleeps until needed. A warfighter calls out the word “translator,” the SQUID beeps to acknowledge that it has woken up and then it begins to recognize and translate phrases from any speech that is spoken next.
Warfighters can use the Phraselator devices in three different modes. A soldier could speak into it and the device recognizes the speech. Or they could use it like a PDA with a stylus and scroll through the phrases. Finally, they could scroll through predefined lists or use a toggle button with one hand.
“From a functional standpoint, it has been successful because it is one of those new technologies that a lot of people use differently,” Hall commented. “Some people like the voice recognition; some people like using it as a PDA; and some people like the headset. So they have all of those options right now.”
The P2 carries up to 1,000 phrases in each module, which is generally fitted to a mission set. One device can hold dozens of modules.
“We are not trying to replace translators or trained linguists,” Hall emphasized. “We are trying to be a supplement to the linguist out in the field. We are trying, for MARSOC [Marine Special Operations Command] and others, to work with their linguists to customize their Phraselators and SQUIDs to how their translators want them to be used. It becomes a force multiplier.”
GALE YEAR TWO
USSOCOM is not alone in investing in translation technologies. Indeed, the Defense Advanced Research Projects Agency (DARPA) is redefining what is possible in automated translation technology with several programs in its portfolio right now. Three companies—BBN Technologies, IBM and SRI International—have been working on DARPA’s Global Autonomous Language Exploitation (GALE) program.
Last September, the three contractors met their first year requirements and proceeded into the second year of GALE. When they completed their first year objectives, the companies found that their second year objectives had become more challenging than originally anticipated, Prem Natarajan, head of the speech solutions group at BBN Technologies, told SOTECH.
“The targets for year two have been elevated beyond what was originally conceived, so year two is an even bigger challenge than we thought it would be when the program started,” Natarajan explained.
The contractors received a new objective called distillation, which is the ability to sort through foreign language content and identify information predefined as valuable or of interest to the user. GALE software picks up and analyzes speech and text in multiple languages. It then provides a summary or highlights of that information to its users automatically.
“You have tons of information and you are not interested in a simple translation,” Natarajan noted. “You want something to find things that are of importance, things that are relevant and things that have value. You want to sort through mounds of data and find things that have some intrinsic value or some information content in them.”
The first year alone for GALE presented some challenges to the contracting teams because of the very nature of the work. They were confronted by the question of how do they measure success, Natarajan revealed.
“Measuring machine translation performance has itself been an area of research,” he elaborated. “How do you actually say this machine translation system is X and put a number to it and say the next one is Y and put a number to it? In the case of speech recognition, that took many years of research. But you can see how many words were substituted for other words, how many words were deleted, and how many words that weren’t spoken but inserted? The sum of these three [is] called the word error rate.”
But machine translation is not quite as easy to measure when going from one language to another, he added. Objective listeners could hear information in an original language and then hear a translation. They could then agree or disagree on whether the translation is accurate or whether some paraphrasing of the original language was acceptable.
Being able to determine that a software algorithm resulted in an acceptable translation becomes critically important before you move onto a new cycle of experimentation in machine translation, Natarajan declared.
“So it’s very important to define a metric,” he said. “The metric for machine translation, the program manager for GALE at DARPA came up with the notion of a translation error rate. BBN worked with him to define and implement that metric. It was a new metric, so there was some getting used to it.”
Determining a translation error rate through an automatic measure was considered, but then the DARPA contractors adopted a human translation error rate, which involves human beings rating the translation rather than computers. This has resulted in a more accurate reliable performance measure for the GALE software, Natarajan concluded.
GALE MILESTONES
IBM has developed an in-house project called the TIDES Automatic Language Exploitation System (TALES) that it develops under GALE funding. IBM is attempting to do more with its participation in DARPA-funded speech research initiatives than meet the needs of the U.S. military, David Nahamoo, chief technology officer of speech technology for IBM Research, told SOTECH.
“We also have a business-oriented objective, which is to take these technologies and deploy them not only for military applications but also for broader uses by working with device manufacturers and partners interested in the creation of, for example, a tourism travel offering,” Nahamoo explained.
GALE focuses on broad domain translation from both text and speech content from broadcast or conversational news into useful information, Nahamoo said. As such, GALE is a bit passive, as it monitors foreign-language content being broadcast over television and provides an English-language translation. DARPA has set a new series of evaluations in July, where the agency hopes to achieve two general improvements.
“One, improve the quality of the speech translation by a much larger amount than we had at the end of the first evaluation,” Nahamoo reported. “Two, finish the distillation part, which is the ability to ask a question about the content and extract that information. We must also improve that one by a good percentage. Both of them have pretty aggressive goals that have been set for the community to accomplish.”
GALE has proven to be quite challenging because the language coming through the GALE software could cover any conceivable topic being discussed on news programs, Nahamoo observed.
“The challenge in the real-time translation is the condition of the use and the urgency of dealing with the situation that could affect the quality of the speech and the type of the speech,” he said.
DISTILLATION IN GALE
SRI International was the last of the three GALE contractors to enter into the program. However, the company made up for lost time and covered a lot of ground very quickly, producing GALE software that is statistically indistinguishable from its competitors, Jordan Cohen, senior scientist at the SRI Speech Technology and Research Lab, told SOTECH.
“We started behind the curve. The other two contractors had operating systems when they started out, and we did not,” he revealed. “We spent the first year getting everything to work. So we were pretty happy with our results. Now people are working very hard to make it better for the second year.”
SRI has seen impressive results with its efforts at distillation, one of the major objectives in the second year of GALE. The company’s solution is unique and effective, Cohen asserted.
“That involves questions and answers from an annotated database about what’s going on. It turns out that when you have that annotated database, you have a lot of information that is cross-modal. It is from different genres or different languages,” he said.
“But since this is news and talk shows, it is often information that appears in some form or another in other languages, as things tend to be current. You can use that information to go back and fix up the earlier processes. If anything about the SRI system is going to show huge advantages in the future, that’s going to be the thing,” he continued.
The SRI distillation solution specializes in annotation, he explained. The software examines the semantics and intent of each string of words, expending a great deal more effort on that than the other GALE companies, Cohen surmised.
“The input for the systems is in both speech and text,” he noted. “But when things are put in the database to be queried, both speech and text things are mingled. But they have very different characteristics. For instance, in speech, you are liable to get words wrong. But you know about semantics and syntax because of the characteristics of the speech.
“We look at the speech very hard to get any information we can about it other than the words,” he said. “Then we add that information in an annotation form. We spend some time making the information from a speech story equivalent to information from a text story, although they have different forms. There is a transformation involved in that. It’s an interesting process, and we are having quite a bit of success doing it.”
TRANSTAC
Both IBM and SRI International also are contractors under another DARPA project, a speech-to-speech translation effort named TransTac, formerly known as Babylon.
IBM began its efforts in 2001 as a research program called the Multilingual Automatic Speech-to-Speech Translator (MASTOR). Its goal was to provide bi-directional English and Mandarin Chinese translation from speech input and output. The TransTac program has focused on domain-specific applications such as medical and checkpoint operations, IBM’s Nahamoo explained.
“When you are domain-specific, you are task-oriented, therefore you can go based on translation for meaning rather than lower level types of translation that focuses on the word or phrase level,” he said. “We learned right away that you capture the meaning. Now that you know the meaning, it doesn’t change from the origin language to the target language. You can reconstruct the sentence in the target language and form it as speech.”
The Department of Defense has deployed about 35 ruggedized MASTOR systems for evaluation in the United States and for action in Iraq. U.S. Joint Forces Command has been using the systems in reallife environments in order to evaluate their effectiveness and to provide feedback on the next steps required in the program. The deployed generation of the technology is basically three years old, Nahamoo noted, but a new generation under development at IBM shows a great deal of promise in being useful to soldiers.
“We are in the next phase of TransPac, which started in the July timeframe. Our project moved from focusing on Chinese to English or bi-directional translation to modern standard Arabic translation and finally to translation focused on colloquial Iraqi, Arabic and English,” Nahamoo described. “We are currently bringing the technology to small devices, which is an important element in the usage of the technology. You cannot necessarily carry a laptop with you when you are on the move.
“The technology today relies on a lot of visual feedback,” he added. “When you talk, the sentences show up and the communication takes place by showing the sentences. You can have other alternative choices in front of the recipient. We are now focused on making the system fully speech to speech when no visual modalities are available.” SRI’s Cohen estimated that DoD has deployed about 50 systems from his company under TransTac. The SRI solution, called IraqCom, also focuses on force protection and medical operations.
“It works under the hypotheses that you can define the situation being discussed. It translates between free spoken English and Iraqi Arabic in either direction,” he said.
The SRI solution is very similar to the IBM solution in functionality, Cohen said. He believes the free flow speech recognition and language translation functions of IraqCom work pretty well for the current stage of technology development.
But like his counterpart at IBM, Cohen agreed that feedback on the systems has been slow to reach his company.
“The people using them are in theater,” he stated. “Not all of the information on those systems is sharable with us. Some of that information tends to be critical in military ways.”
SRI will focus on developing a hands-fee, eyes-free interface for its TransTac solution this year, Cohen said. However, neither DARPA nor SRI has envisioned a final end state for the technology.
“I don’t know of a stake in the ground that says this has to be finished. No one actually understands what that means,” Cohen said. “There are all kinds of ways in which these systems can be better, broader or different. Currently, at DARPA, this is a technology evolution project, trying to figure out what this space looks like and where we have capabilities. That’s something DARPA does very well.” ♦






