Index: [thread] [date] [subject] [author]
  From: <egk2101@columbia.edu>
  To  : <cpc@emoglen.law.columbia.edu>
  Date: Fri, 14 Apr 2006 10:52:02 -0400

Paper 1: Game theory and data-mining

Game theory and data-mining: a flaw in the system?
By: Angelo Kramvis

It’s called “Matching Pennies,” a conceptual game from mathematical
game theory. The rules of the “game” are simple. You and I each get
a penny and we each decide to put them down as heads or tails. If
they match, you keep both, if they don't match, I keep both. Then
the question is: what is the best strategy to win the most pennies?

It turns out that the best strategy is purely random. Why? As long
as there is a detectable pattern, it can be exploited. According to
Dr. Philip Stark, a Professor of Statistics at UC Berkeley, this is
a well-established concept in the mathematical world. Stark’s
comments were made in review of a recent MIT paper that concluded
that profiling in airport security was actually less effective than
random searches in the long run.

While the debate on racial profiling rages on, all major U.S.
airports are currently using a profiling system called CAPPS
(Computer Assisted Passenger Pre-Screening System) to catch
terrorists and other potentially risky passengers before they
board. Anyone with CAPPS score over some threshold is pulled aside
for extra security searches and questioning. While the parameters
used to compute CAPPS scores are classified, and the threshold
number is unknown, it’s obvious when the system has flagged you.
The paper’s authors, Chakrabarti and Strauss, show that terrorists
can beat such a system through simple trial and error. Send a large
enough sample of terrorist agents through security, and find the
ones that consistently pass through unchecked. Those with the
“winning” profile are the ideal candidates for terrorist missions.
Possibly, with enough of those profiles, terrorists may even begin
to understand the algorithm being used against them. The authors go
on to show how random searches, like the random strategy in pennies,
would be superior in finding weapons and potential terrorists. There
is no way to practice against randomness.

Although I agree with the general conclusions of the paper, I note
that the Israeli airline, El Al, has a policy of singling out all
young Arabs for extensive search procedures, but in spite of
ongoing Arab hostility, it has not had a hijacking in over thirty
years. Thus, searching all Arabs extensively is a potential limit
to the randomness argument. However, I do not think that this limit
is reachable in the case of data-mining.

The ideas of the MIT paper have some relevance to the current debate
over data-mining and terrorism. Data-mining uses similar modeling
techniques to the CAPPS program to look for the terrorists in our
midst. Political correctness aside, a highly probable assumption
that we can make is that future terrorists are going to be Muslim
(or at the least very sympathetic to Islamic fundamentalism).
However, the class of Muslims residing within the U.S. is
undoubtedly huge. The El Al solution is too costly, both in
resources and civil liberties, to apply here. Thus, more factors
than simply ethnicity are needed to model what the profile of a
suspected terrorist looks like. By using predictive factors, a
“terrorist profile,” to determine who should be monitored more
closely, it becomes possible for terrorists to discover those
patterns and circumvent them. As with airport security, the Muslim
profiles identified as “safe” would then alter the behavior and
recruitment of the actual terrorists.

Would it be possible for terrorists to do this? It depends. First,
there needs to be something that alerts them to the fact that
they’ve been flagged by the system. Potential sources of this
information could include being detained, questioned, or subjected
to enhanced security measures when close to sensitive targets. As
Philip B. Heymann argues, three important ways to stop terrorists
from effectuating their plans are to (1) monitor and frustrate
those efforts, (2) deny to some “access to likely targets or to the
resources needed to attack those targets,” and (3) detain those
likely to engage in terrorism. Thus, there are several ways in
which terrorists can gather information about the suspect profile.

Although such information is not perfect, terrorists do not need to
figure out the model in detail. All they need is enough numbers to
determine who isn’t being tracked. Of course, the government could
monitor without ever letting its efforts become known. But what
would be the purpose of monitoring if there was never any action
taken as a result?

One possible solution to this problem could be to alter the model
over time to prevent terrorists from using its predictability to
their advantage. This option is appealing, but flawed. Information
that “defines” a terrorist profile is already hard to determine. As
Bruce Schneier states, data-mining works best when you’re searching
for a reasonably well-defined profile. Reasonableness here is meant
in terms of the number of false alarms – both positive and negative
– that your model creates. The costs of associated with the
inaccurate predictions of the model include wasted law enforcement
resources and infringements on civil liberties on the false
positive side, and potentially missing the chance to stop a terror
plot on the negative side. If your model creates more costs than
the benefits it produces, then perhaps the resources are better
spent elsewhere.

Thus, assuming that even if such information was available to make a
cost-effective profile of a terrorist, a possibility that Schneier
strongly doubts, such a model would have to well-defined. Requiring
a well-defined profile, the ability to alter the model without
slipping back into inefficiency is limited. If terrorists are
unable to “figure out” the current system, it may well be because
its not well-defined enough to catch them. This leads to an
interesting problem, that as the model profile of a terrorist
becomes more finely-tuned and stable, the risks of terrorists
detecting and circumventing that pattern become larger.

While the risk does not completely destroy the possibility of useful
data-mining in counterterrorism, it at least raises doubts as to its
effectiveness. Along with its other problems, the tool looks even
more unappealing.


Word Count:  990


--------------------------------------------------------------------

http://www.acfnewsource.org/science/random security.html
“Random Security”

http://www.9-11commission.gov/

25 Harv. J.L. & Pub. Pol'y 441, 442

http://www.scu.edu/ethics/publications/ethicalperspectives/profiling.html
“Racial Profiling in an Age of Terrorism”
Peter Siggins

http://www.wired.com/news/columns/0,70357-0.html
“Why Data Mining Won’t Stop Terror”
Bruce Schneier



-----------------------------------------------------------------
Computers, Privacy, and the Constitution mailing list



Index: [thread] [date] [subject] [author]