A Universal Tool to Rescue Old Files From Obsolescence
The New York TimesThe New York Times TechnologyAugust 29, 2002  

Home
Job Market
Real Estate
Automobiles
News
International
National
Politics
Business
Technology
- Circuits
- Columns
Science
Health
Sports
New York Region
Education
Weather
Obituaries
NYT Front Page
Corrections
Opinion
Editorials/Op-Ed
Readers' Opinions


Features
Arts
Books
Movies
Travel
Dining & Wine
Home & Garden
Fashion & Style
New York Today
Crossword/Games
Cartoons
Magazine
Week in Review
Multimedia/Photos
College
Learning Network
Services
Archive
Classifieds
Personals
Theater Tickets
Premium Products
NYT Store
NYT Mobile
E-Cards & More
About NYTDigital
Jobs at NYTDigital
Online Media Kit
Our Advertisers
Member_Center
Your Profile
E-Mail Preferences
News Tracker
Premium Account
Site Help
Privacy Policy
Newspaper
Home Delivery
Customer Service
Electronic Edition
Media Kit
Community Affairs
Text Version

Discover New Topics in Depth


$7 Internet Trades, No Inactivity Fees


Go to Advanced Search/Archive Go to Advanced Search/Archive Symbol Lookup
Search Optionsdivide
go to Member Center Log Out
  Welcome, cloud_reader

WHAT'S NEXT

A Universal Tool to Rescue Old Files From Obsolescence

By ANNE EISENBERG

DIGITAL documents can quickly become unreadable, as anyone who has tried to open an old WordStar file or postponed transferring data from a 5 1/4-inch floppy knows.

Disks decay, or the required software changes, or the necessary hardware and operating systems no longer exist.

During the last decade, a growing number of librarians, archivists and researchers have turned to the challenge of long-term preservation of digital documents, debating ways to conserve the information embedded in them so that it can be understood in the future just as it is understood today.

Advertisement


At present, the basic, imperfect approach is to update documents constantly, converting them from their original versions into newer ones while it is still possible to run the old software.

But this is a labor-intensive process and in many cases gradually leads to corrupted documents, because each time the files are updated they may lose some of the stored information.

The alternative, also widely practiced, is to keep old files and hope that some software in the future will be able to decipher them. Chances are, though, that in 2040 there won't be a way to understand, for example, those old PDF documents. Acrobat Reader, the present means of reading them, will probably no longer be in use, and even if you save a 2002 version, it will be unlikely to run on computers of 2040.

What is needed, some archivists argue, is a kind of computer Esperanto — a common preservation system that can read and present today's formats and the thousands that will follow in a simple, standard way that can be emulated or mimicked on whatever computers lie ahead.

Now, Dr. Raymond Lorie, a researcher at the I.B.M. Almaden Research Center in San Jose, Calif., has proposed a system that he hopes will become that lingua franca. He has developed a prototype for a "universal virtual computer" — a system with architecture and language designed to be so logical and accessible that computer developers of the future will be able to write instructions to emulate it on their machines.

Dr. Lorie defined and described his universal virtual computer in a series of technical papers in the last few years and demonstrated the system for the National Library of the Netherlands.

For the universal computer to work, it would first have to be adopted as a standard throughout the computer industry. Developers of new software with new file formats would need to write additional software that could read and display the files in the language of the universal computer. At the same time, descriptions of the universal virtual computer would need to be widely available for future computer developers.

Then, assuming that the universal computer is simple and logical enough, people 100 years from now using different computer architectures would face only one relatively basic task to read old formats on new machines — write a set of instructions so the universal virtual computer could be emulated on whatever machines exist then.

Emulation is a common computer technique in which one computer acts like another — for instance, code is written for a Mac that mimics in every detail the operations of a PC so that programs written for a PC will run on a Mac.

In his approach, Dr. Lorie said, a program written for the universal virtual computer extracts all the data stored in a file, for instance, the data in a PDF file. This program does not try to reproduce the full range of services offered by Acrobat Reader.

"I don't need to recreate Acrobat Reader with all its buttons and colors," he said. "That would be overkill." Users of the future, he said, will want to see the document and have access to the data. "They will take the data and store it, probably in a completely different way."

Dr. Lorie's program reads and displays the contents of the PDF file using tags, extra semantic information designed to reduce the confusion of people in 2040 who may at first be unsure of what they are viewing. These semantic tags might say, for instance, "There is text in this document and it is organized like this," he explained.

Dr. Lorie has successfully tested the key parts of his universal computer, proving that it will work in the future, said Dr. Robin Williams, associate director of research at Almaden. To do this, Dr. Lorie first wrote a program in the universal computer language that could read and display the contents of a PDF file. Then he wrote programs to show how his universal computer system could work on computers with different architectures, Dr. Williams said.

Johan Steenbakkers, director of information technology for the Dutch national library, which hired I.B.M. to investigate a way to preserve electronic publications, said Dr. Lorie's virtual computer had been successfully demonstrated there. "We have seen a proof of concept," he said. "If the universal virtual computer became a standard for digital archiving, it would be a major step forward," offering a controlled, one-time migration to a specific preservation format.

Meanwhile, Jeff Rothenberg, a senior computer scientist at the RAND Corporation in Santa Monica, Calif., who raised the problem of long-term preservation of digital documents in an influential Scientific American article in 1995, takes a different approach to preservation.

Mr. Rothenberg wants archivists to preserve the original software — meaning, for instance, all of the functions of Adobe Acrobat — rather than adopting the data extraction program that Dr. Lorie proposes.

"I would prefer to store documents in their original forms and formats — with all of the software that created them and is typically required to view them," he said. The original software would be run under emulation on future computers. "This is the only reliable way to recreate a digital document's original function, look and feel," he said.

Data extraction, in contrast, is too limited, he said. "It will give you the contents — or rather, what someone thought were the meaningful core contents — in some future form," he said. "But it won't preserve the original."




Doing research? Search the archive for more than 500,000 articles:




E-Mail This Article
Printer-Friendly Format
Most E-Mailed Articles
Reprints

Start the day informed with home delivery of The New York Times newspaper.
Click Here for 50% off.


Home | Back to Technology | Search | Corrections | Help | Back to Top


Copyright 2002 The New York Times Company | Permissions | Privacy Policy
E-Mail This Article
Printer-Friendly Format
Most E-Mailed Articles
Reprints


Mary Ann Smith


Subscribe to Circuits
Sign up to receive a free weekly Circuits newsletter by e-mail, with technology news and tips and exclusive commentary by David Pogue, the State of the Art columnist.



Topics

 Alerts
Computers and The Internet
Computer Software
Libraries and Librarians
Create Your Own | Manage Alerts
Take a Tour
Sign Up for Newsletters





U.S. v. Microsoft: The Inside Story of the Landmark Case

Price: $24.95 Learn more.







You can solve today's New York Times crossword puzzle online. Click here to learn more.







I am a Seeking a
Create a free photo profile
Contact others now!
(under $25/month)
Read dating success stories