Internet Archive expands OCA book digitizing effort

IDG News Service |  Development Add a new comment

The Internet Archive has received a grant from the Alfred P. Sloan Foundation to expand its book-digitizing efforts, which so far have resulted in the scanning of about 100,000 books now available on the group's Web site.

The grant will also benefit the Open Content Alliance, an initiative launched in October 2005 and backed by the Internet Archive, Yahoo Inc. and others to digitize books and multimedia material and make them available online, the Internet Archive announced Wednesday.

The scanned works hosted by the Internet Archive are also available for indexing by any search engine that adheres to the OCA's open-access terms for the content. These principles include providing "the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors," according to the OCA Web site.

The Sloan Foundation awarded the grant to support the digitization of historical collections from five major libraries by the Internet Archive, a nonprofit organization building an online library of texts, audio, video, software and Web pages.

The US$1 million grant will be used in part to scan the complete personal library of founding father and U.S. President John Adams, housed at the Boston Public Library. Meanwhile, the Getty Research Institute in Los Angeles is making available art, architecture and performing arts books.

The archive of publications issued by New York City's Metropolitan Museum of Art will also be digitized, as well as California Gold Rush primary texts from the University of California at Berkeley's Bancroft Library. Finally, the Internet Archive will also scan the James Birney Collection of Anti-Slavery materials from Johns Hopkins University libraries in Baltimore.

Scanning books to make them available online has become a controversial practice primarily due to Google Inc.'s approach. The search engine giant is digitizing library collections that include copyright books without always asking for permission from the copyright owners. It indexes the full text of these works and makes them searchable through its Book Search service.

Google faces lawsuits alleging that this is a violation of copyright law. Google claims it is protected by the fair use principle, because it only displays snippets of text from copyright works.

The Internet Archive has refrained from digitizing copyright books, although it is interested in seeing copyright issues worked out, because its ultimate goal is to provide access to as many works as possible for the benefit of people worldwide, said Brewster Kahle, Internet Archive founder.

For example, Kahle is interested in sorting out the issue of books whose copyright owners can't be found, often called "orphan works," as well as the issue of copyright works that are out of print. In these two cases, Kahle believes that libraries should take a leading role in finding "the right path through it." In the case of in-print copyright books, a collaboration between libraries and publishers could generate significant progress, he said.

While others are criticizing Google for its wholesale scanning of copyright works, Kahle finds fault with the agreements the company is hammering out with its partner libraries. In his opinion, the contracts put too many restrictions on how libraries and people may use and share digital copies of public-domain works. "Google has bound the libraries pretty tightly," he said. "Public domain works should stay in the public domain."

Google didn't immediately respond to a request for comment.

In addition to Yahoo and the aforementioned libraries, participants in the OCA include Microsoft Corp., Adobe Systems Inc., Columbia University, Hewlett-Packard Co., the University of Toronto, Xerox Corp. and the University of North Carolina at Chapel Hill.

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    DevelopmentWhite Papers & Webcasts

    White Paper

    HP NonStop SQL Fundamentals whitepaper

    This whitepaper offers a detailed look into the fundamentals of HP NonStop SQL solutions. See how this system delivers unprecedented levels of application availability with fail-safe data integrity and meets the needs of enterprises with large-scale business critical applications.

    White Paper

    Nebraska Medical Center case study

    See how the Nebraska Medical Center implemented a SQL solution to make information more readily available to streamline operations, improve patient care and facilitate medical research with an enterprise solution running on HP NonStop servers.

    White Paper

    Concepts of NonStop SQL/MX

    For DBAs and developers who are familiar with Oracle solutions and want to learn about NonStop SQL/MX, this whitepaper provides an overview of the similarities and differences between the two products-with a specific focus on implementation.

    White Paper

    6 Things Your CIO Needs to Know About Requirements

    If your organization is not predictably successful on technology projects, there is likely an issue in requirements. CIOs must take action and own requirements maturity improvement. There are 6 main things a CIO must know about requirements.

    Webcast On Demand

    User Experience Monitoring

    In this webinar, you will learn hints & tips for improving end-user response times from Forrester Research analyst, Jean-Pierre Garbani.

    Sponsor: Nimsoft

    See more White Papers | Webcasts

    Ask a question

    Ask a Question