Index:
[thread]
[date]
[subject]
[author]
From: Asma Chandani <asc2106@columbia.edu>
To : <CPC@emoglen.law.columbia.edu>
Date: Sun, 3 Apr 2005 16:52:58 -0400
Robot Translators Decipher Mountains of Messages
This is a multi-part message in MIME format.
------=_NextPart_000_0001_01C5386D.96FE8320
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
Does the US govt currently use MT to intercept all electronic messages
traveling between the US and countries designated by the State
Department according to political currents? Will it? Are US citizens'
rights disparaged in allowing the government to do so? ......
Assume a private company did solely this task, and the US issued a
warrant to obtain pertinent information gleaned in the process. Does it
amount to state action? Are we content to accept Tribe's argument (and
the Supreme Court's, I believe?) that users have no expectation of
privacy in electronic mediums traveling miles beyond their locale?
~Asma
See link for article (http://www.technewsworld.com/story/41513.html) or
read text below:
Robot Translators Decipher Mountains of Messages
Knight Ridder/Tribune
04/02/05 5:00 AM PT
"Expectations for speed and accuracy are not always met," admitted
William McClellan of Booz-Allen Hamilton, a technology consulting firm
in McLean, Va. "But it's a way to find the needle in the haystack
without translating every straw."
Somewhere in a vast jumble of documents in a Baghdad warehouse or in the
constant buzz of electronic signals in the sky, a few ominous words or
phrases may be hidden: "Explosives." "Nerve gas." "Convoy." "Airport
arrival." "The president."
The words, however, are in Arabic, Farsi, Pashto or some other language
that few Americans understand. The messages urgently need to be
translated, but there aren't enough expert linguists to handle the
flood.
The time for robot translators has arrived, according to a panel of
language specialists at a meeting of the American Association for the
Advancement of Science in Washington last month.
Not Enough Humans
"The Defense Department doesn't have enough human translators," said
Melissa Holland, an expert at the Army Research Laboratory in Arlington,
Va.
"The backlog of untranslated documents is a hindrance to the war on
international terrorism," said Mohammad Shihadah, the founder of
Applications Technology, a small firm in suburban McLean, Va., that
sells Arabic-to-English translation software to the government.
Since Sept. 11, 2001, the Defense Department, the CIA and other
intelligence agencies have been pouring money and effort into what's
known as "machine translation," or MT for short.
MT uses computers to translate messages from one language to another --
such as turning "Good Morning" into "Buenos Dias" or "Auf Wiedersehen"
into "Au Revoir" with little or no human intervention.
Computer scientists have labored to perfect machine translation since
the 1950s with only modest success. But the terrorist attacks and the
wars in Afghanistan and Iraq have given the technology a boost.
Today's robot-linguists are far from perfect, but they can give soldiers
in the field the gist of a document, a poster or a possible threat
scrawled on a wall.
"Soldiers can get a sense of what a document is about -- not a perfect
translation," Holland said. Accuracy is still less than 50 percent,
Clare Voss, another Army researcher, acknowledged.
Translation Triage
Equipped with a handheld PDA , a digital camera and a laptop computer
in the back of a Hummer, a GI can quickly decide if a message needs
human attention.
"Expectations for speed and accuracy are not always met -- it's not the
Queen's English," admitted William McClellan, a machine translation
systems manager at Booz-Allen Hamilton, a technology consulting firm in
McLean. "But it's a way to find the needle in the haystack without
translating every straw."
The elimination process is called "triage."
"Knowing what to translate first out of thousands of documents is a
problem faced daily by our military and intelligence officers,"
McClellan said. "Thousands of documents can be automatically screened,
and those meeting certain criteria can be ... automatically routed to
linguists and domain specialists."
The volumes of material to be translated are "enormous," said Mark
Turner, an MT expert at CACI, an information technology organization in
Lanham, Md.
In Baghdad, "we found warehouses with billions of documents in bags,
boxes, binder and books," he said. "There are tons of paper and
terabytes [trillions of bytes or letters] of electronic media."
People who use machine translation often find it frustrating, quirky and
unreliable. "MT is a useful tool for triage, but it doesn't replace
human linguists," Turner said.
Long Effort
For decades, machine translation systems labored to make computers
understand traditional rules of grammar -- subjects, verbs, objects and
so on. Progress was slow, thanks to the tremendous ambiguity and
complexity of human language.
The word "get," for example, has 24 possible meanings listed in
Webster's New College dictionary. One of them is "kill" -- as in "I'll
get you for this."
In the 1990s, however, a new technique came along, applying statistical
analysis to huge databases of previously translated texts. By comparing
a new, unknown message to millions of stored sentences, phrases and
words, researchers could quickly find the most likely translation.
This method, also known as "data-driven machine translation," works like
this: The computer scans a sentence, lists each possible meaning of each
word and arranges them in every possible order, most of them
nonsensical, until it finds one that most nearly matches a good
translation.
For example: "bites man dog," "dog man bites," "man bites dog," and,
finally, "dog bites man." A long sentence can produce millions of
variations.
Statistical machine translation "was a huge leap in the state of the art
-- very high accuracy, very fast," said Daniel Marcu, a co-founder of
Language Weaver Inc., a commercial MT company in Marina del Rey, Calif.
Marcu claimed that his company's system can translate 5,000 words per
minute, 24 hours a day, seven days a week. Five years ago, he said, the
best that could be done was one 1,000-word document a day.
According to Marcu, the system can record a broadcast from al Jazeera,
the Arabic-language network that carries Osama bin Laden's taped
messages, and translate it automatically.
"With a one-minute delay, you can see what al Jazeera reported," he
said.
Commercial Applications
Machine translation is also gaining ground in international commerce,
according to Stephen Richardson, a former IBM (NYSE: IBM) researcher
who now heads the Machine Translation Project at Microsoft (Nasdaq:
MSFT) in Redmond, Wash.
"Companies are facing increasingly difficult and costly challenges of
localizing their products and services in the global marketplace,"
Richardson said.
Human translation is very expensive -- 20 to 50 cents per word, he said.
Older, rule-based machine translation systems cost as much as a million
dollars to create and maintain.
Microsoft has used the new, data-driven method to translate its customer
support database into four foreign languages "at a substantial cost
savings," Richardson said. The machine translations still need a final
polishing by a human editor, but the total cost is 35 percent less than
it used to be.
Nevertheless, the prime mover for machine translation is the war on
terror, and the urgent need to understand what potential enemies are
saying.
"You can't expect the president to speak Pashto," an Afghan language,
said Benson Margulies, the chief technical officer at Basic Technologies
Inc., a language processing provider in Cambridge, Mass.
C 2005 Knight Ridder/Tribune News Service. All rights reserved.
C 2005 ECT News Network. All rights reserved.
------=_NextPart_000_0001_01C5386D.96FE8320
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<meta name=3DProgId content=3DWord.Document>
<meta name=3DGenerator content=3D"Microsoft Word 10">
<meta name=3DOriginator content=3D"Microsoft Word 10">
<link rel=3DFile-List href=3D"cid:filelist.xml@01C5386D.96B481D0">
<!--[if gte mso 9]><xml>
<o:OfficeDocumentSettings>
<o:DoNotRelyOnCSS/>
</o:OfficeDocumentSettings>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:SpellingState>Clean</w:SpellingState>
<w:GrammarState>Clean</w:GrammarState>
<w:DocumentKind>DocumentEmail</w:DocumentKind>
<w:EnvelopeVis/>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
</w:WordDocument>
</xml><![endif]-->
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;
text-underline:single;}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline;
text-underline:single;}
span.EmailStyle17
{mso-style-type:personal-compose;
mso-style-noshow:yes;
mso-ansi-font-size:10.0pt;
mso-bidi-font-size:10.0pt;
font-family:Arial;
mso-ascii-font-family:Arial;
mso-hansi-font-family:Arial;
mso-bidi-font-family:Arial;
color:windowtext;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 10]>
<style>
/* Style Definitions */=20
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1028" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3DEN-US link=3Dblue vlink=3Dpurple =
style=3D'tab-interval:.5in'>
<div class=3DSection1>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'>Does the US govt currently use =
MT to
intercept all electronic messages traveling between the US and countries
designated by the State Department according to political currents? Will =
it? Are
US citizens’ rights disparaged in allowing the government to do =
so? ......<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'>Assume a private company did =
solely
this task, and the US issued a warrant to obtain pertinent information =
gleaned
in the process. Does it amount to state action? Are we content to accept =
Tribe’s
argument (and the Supreme Court’s, I believe?) that users have no
expectation of privacy in electronic mediums traveling miles beyond =
their
locale? <o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'>~Asma<o:p></o:p></span></font></p=
>
<div style=3D'border:none;border-bottom:solid windowtext =
1.5pt;padding:0in 0in 1.0pt 0in'>
<p class=3DMsoNormal style=3D'border:none;mso-border-bottom-alt:solid =
windowtext 1.5pt;
padding:0in;mso-padding-alt:0in 0in 1.0pt 0in'><font size=3D2 =
face=3DArial><span
lang=3DEN =
style=3D'font-size:10.0pt;font-family:Arial;mso-ansi-language:EN'><o:p>&n=
bsp;</o:p></span></font></p>
</div>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'>See link for article (<a
href=3D"http://www.technewsworld.com/story/41513.html">http://www.technew=
sworld.com/story/41513.html</a>)
or read text below:<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;text-transform:uppercase;mso-ansi-language:EN'>Robot
Translators Decipher Mountains of Messages<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'>Knight Ridder/Tribune =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'>04/02/05 5:00 AM PT =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'>"Expectations for speed and
accuracy are not always met," admitted William McClellan of =
Booz-Allen
Hamilton, a technology consulting firm in McLean, Va. "But it's a =
way to
find the needle in the haystack without translating every =
straw."<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Somewhere in a vast =
jumble of
documents in a Baghdad warehouse or in the constant buzz of electronic =
signals
in the sky, a few ominous words or phrases may be hidden: =
"Explosives."
"Nerve gas." "Convoy." "Airport arrival."
"The president."<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>The words, however, are =
in
Arabic, Farsi, Pashto or some other language that few Americans =
understand. The
messages urgently need to be translated, but there aren't enough expert
linguists to handle the flood.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>The time for robot =
translators
has arrived, according to a panel of language specialists at a meeting =
of the
American Association for the Advancement of Science in Washington last =
month. <o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Not Enough Humans =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"The Defense =
Department
doesn't have enough human translators," said Melissa Holland, an =
expert at
the Army Research Laboratory in Arlington, =
Va.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"The backlog of =
untranslated
documents is a hindrance to the war on international terrorism," =
said
Mohammad Shihadah, the founder of Applications Technology, a small firm =
in
suburban McLean, Va., that sells Arabic-to-English translation software =
to the
government.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Since Sept. 11, 2001, =
the Defense
Department, the CIA and other intelligence agencies have been pouring =
money and
effort into what's known as "machine translation," or MT for =
short.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>MT uses computers to =
translate
messages from one language to another -- such as turning "Good
Morning" into "Buenos Dias" or "Auf =
Wiedersehen" into
"Au Revoir" with little or no human =
intervention.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Computer scientists have =
labored
to perfect machine translation since the 1950s with only modest success. =
But
the terrorist attacks and the wars in Afghanistan and Iraq have given =
the
technology a boost.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Today's robot-linguists =
are far
from perfect, but they can give soldiers in the field the gist of a =
document, a
poster or a possible threat scrawled on a =
wall.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"Soldiers can get a =
sense of
what a document is about -- not a perfect translation," Holland =
said.
Accuracy is still less than 50 percent, Clare Voss, another Army =
researcher,
acknowledged. <o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Translation Triage =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Equipped with a handheld =
PDA , a
digital camera and a laptop<span style=3D'mso-spacerun:yes'>
</span>computer in the back of a Hummer, a GI can quickly decide if a =
message
needs human attention.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"Expectations for =
speed and
accuracy are not always met -- it's not the Queen's English," =
admitted
William McClellan, a machine translation systems manager at Booz-Allen
Hamilton, a technology consulting firm in McLean. "But it's a way =
to find
the needle in the haystack without translating every =
straw."<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>The elimination process =
is called
"triage."<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"Knowing what to =
translate
first out of thousands of documents is a problem faced daily by our =
military
and intelligence officers," McClellan said. "Thousands of =
documents
can be automatically screened, and those meeting certain criteria can be =
...
automatically routed to linguists and domain =
specialists."<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>The volumes of material =
to be
translated are "enormous," said Mark Turner, an MT expert at =
CACI, an
information technology organization in Lanham, =
Md.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>In Baghdad, "we =
found
warehouses with billions of documents in bags, boxes, binder and =
books,"
he said. "There are tons of paper and terabytes [trillions of bytes =
or
letters] of electronic media."<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>People who use machine
translation often find it frustrating, quirky and unreliable. "MT =
is a
useful tool for triage, but it doesn't replace human linguists," =
Turner
said. <o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Long Effort =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>For decades, machine =
translation
systems labored to make computers understand traditional rules of =
grammar --
subjects, verbs, objects and so on. Progress was slow, thanks to the =
tremendous
ambiguity and complexity of human language.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>The word =
"get," for
example, has 24 possible meanings listed in Webster's New College =
dictionary.
One of them is "kill" -- as in "I'll get you for =
this."<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>In the 1990s, however, a =
new
technique came along, applying statistical analysis to huge databases of
previously translated texts. By comparing a new, unknown message to =
millions of
stored sentences, phrases and words, researchers could quickly find the =
most
likely translation.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>This method, also known =
as
"data-driven machine translation," works like this: The =
computer
scans a sentence, lists each possible meaning of each word and arranges =
them in
every possible order, most of them nonsensical, until it finds one that =
most
nearly matches a good translation.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>For example: "bites =
man
dog," "dog man bites," "man bites dog," and, =
finally,
"dog bites man." A long sentence can produce millions of =
variations.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Statistical machine =
translation
"was a huge leap in the state of the art -- very high accuracy, =
very
fast," said Daniel Marcu, a co-founder of Language Weaver Inc., a
commercial MT company in Marina del Rey, =
Calif.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Marcu claimed that his =
company's
system can translate 5,000 words per minute, 24 hours a day, seven days =
a week.
Five years ago, he said, the best that could be done was one 1,000-word
document a day.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>According to Marcu, the =
system
can record a broadcast from al Jazeera, the Arabic-language network that
carries Osama bin Laden's taped messages, and translate it =
automatically.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"With a one-minute =
delay,
you can see what al Jazeera reported," he said. =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Commercial Applications =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Machine translation is =
also
gaining ground in international commerce, according to Stephen =
Richardson, a
former IBM (NYSE: IBM)<span style=3D'mso-spacerun:yes'>
</span>researcher who now heads the Machine Translation Project at =
Microsoft
(Nasdaq: MSFT)<span style=3D'mso-spacerun:yes'> </span>in Redmond, =
Wash.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"Companies are =
facing
increasingly difficult and costly challenges of localizing their =
products and
services in the global marketplace," Richardson =
said.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Human translation is =
very
expensive -- 20 to 50 cents per word, he said. Older, rule-based machine
translation systems cost as much as a million dollars to create and =
maintain.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Microsoft has used the =
new,
data-driven method to translate its customer support database into four =
foreign
languages "at a substantial cost savings," Richardson said. =
The
machine translations still need a final polishing by a human editor, but =
the
total cost is 35 percent less than it used to =
be.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>Nevertheless, the prime =
mover for
machine translation is the war on terror, and the urgent need to =
understand
what potential enemies are saying.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>"You can't expect =
the
president to speak Pashto," an Afghan language, said Benson =
Margulies, the
chief technical officer at Basic Technologies Inc., a language =
processing
provider in Cambridge, Mass.<span style=3D'mso-spacerun:yes'> =
</span><o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>© 2005 Knight =
Ridder/Tribune
News Service. All rights reserved.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'>© 2005 ECT News =
Network. All
rights reserved.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
<p class=3DMsoNormal><font size=3D3 face=3D"Times New Roman"><span =
lang=3DEN
style=3D'font-size:12.0pt;mso-ansi-language:EN'><o:p> </o:p></span><=
/font></p>
<!--/copyright-->
<p class=3DMsoNormal><font size=3D2 face=3DArial><span lang=3DEN =
style=3D'font-size:10.0pt;
font-family:Arial;mso-ansi-language:EN'><o:p> </o:p></span></font></=
p>
</div>
</body>
</html>
------=_NextPart_000_0001_01C5386D.96FE8320--
-----------------------------------------------------------------
Computers, Privacy, and the Constitution mailing list
Index:
[thread]
[date]
[subject]
[author]