Wednesday, November 14, 2007

Search Engine Placement - On-line Libraries

There are several major efforts underway to digitize the world’s books that are not currently under copyright. Google and Microsoft are leading the way to bring the world’s books to the online world. (They do charge organizations for this service). In addition, there are not-for-profit efforts by groups such as the Open Content Alliance, which can be found at The Internet Archive. You can visit their preliminary web site by clicking on the previous link.

Vast repositories of the entire text of books eventually will available to the major search engines. It is already possible to do a limited book search on Google. Books can be searched, just like many other categories on Google such as; Web, Images, Video, News, Maps, Gmail, etc.

This search can be done from the “more” tab at the top of Google’s homepage. Clicking on it will reveal numerous goodies, books included. Does this mean that Shakespeare will soon be appearing in Search Engine Placement results ? Not likely, but it does offer some exciting possibilities for research, and the potential for tremendous time savings by students and scholars whose material can be indexed on-line.

Why should this be of interest to Internet users, as well as Christian and principled writers ? Well, this will open up writings, initially Western literature, to several billion readers. There are only approximately one billion people in the world that have access to libraries in their native countries, mostly in the developed nations. Not only historical literature, but Old Testament, New Testament as well as Completed Testament versions of God’s Word will soon be accessible to a much larger world audience.

Major libraries such as: New York Public Library and libraries at the University of Michigan, Harvard, Stanford, and Oxford have signed on with Google. Others wanted more open, unrestricted access to their archives, free from any possibility of commercial interest, such as the Boston Library Consortium, (with 19 members), including the University of Massachusetts, the University of Connecticut. The Smithsonian and the University of California, are other institutions that decided to opt out of offers from Microsoft and Google.

In order to put the scope of this effort in perspective, there are an estimated 32 million books in the world that could be scanned for this effort. This number is somewhat in dispute, since many books are out of print, yet not technically out of copyright. Google is not making this distinction, and has decided to digitize anything that is either out of print or out of copyright. This has produced praise from some quarters and wrath from others, mainly publishers, and some writers. Google is in the midst of numerous lawsuits, but their efforts continue unabated, and it will probably take years for the courts to work out the details. Ironically, even a settlement, may work in Google's favor. (see below sources).

Technically, the numbers of books in the world is much smaller if you only count books out of copyright. (See the following table).

Book Edition Count
From OCLC
2000 B.C. - 1 B.C. 779
1 A.D. - 1449 2.291
1450 - 1499 11,234
1500 - 1599 100,731
1600 - 1699 240,171
1700 - 1799 537,139
1800 - 1899 2,573,101
1900 - 1919 1,651,313
1920 - 1960 5,335,059

At the University of Michagan, using special proprietary equipment and software, Google is able to digitize one million books per year. It will take Google about six years to copy the Library's entire collection of 7 million volumes.

The amount of storage space required to store a single digitized book is approximately 1 MB. Google currently has over 10,000 servers on-line to index the entire Web, which consists of well over 2 billion web pages. 32 million MB of data storage required to store the world's collection of books, is information of a much greater magnitude than the entire Web today.

It has been revealed that it costs $ 30.00 for the Open Content Alliance to digitize a single book, (although Google's costs may be significantly lower). It may take more than 10 years and between 500 million to 1 billion dollars to digitize the world's books. (This may be a gross overestimate since I don't have figures on cost from Google, and the most aggressive effort is being made by Google in this area).

With these kinds of costs involved, it is understandable why it is important that Google is a part of this effort. Without taking sides one way or the other on the legal ramifications of what they are doing, clearly the idea of digitizing books is an idea whose time has come. When Stanford University made only their card catalog available electronically, within a fairly short period of time, students were visiting the library twice as often and checking out twice as many books. From these kinds of observations, it is clear that the availability of on-line books will not soon obsolete the existence of libraries, but only offer a new way to access the information contained in books of every description.

There is a very interesting article in a recent issue of New Yorker Magazine that puts libraries and great collections of writings in an historical perspective.
Read this New Yorker article !

http://www.newyorker.com/reporting/2007/11/05/071105fa_fact_grafton

Additional information from recent sources below.
Open Source (not-for-profit) efforts to digitize books.
An open-source-rival-to-Googles-book-project/2100-1025_3-5915690.html

http://www.news.com/An-open-source-rival-to-Googles-book-project/
2100-1025_3-5915690.html

New York Review of Books article

at: http://www.nybooks.com/articles/19436

New Yorker article on book digitization and copyright issues

http://www.newyorker.com/reporting/2007/02/05/070205fa_fact_toobin

I have included the full URL for you to cut and paste into your browser address bar, even if the link does not work.

All for now. More to follow later.

John Lombaerde
Search Engine Placement by the CAD CAM Guy

No comments: