July 28, 2008
A Website Library Model: The Cancer Supportive Care Experience
Ernest H. Rosenbaum, MD and Alexandra Andrews
In SuSE, MySQL bring navigation revolution to cancer support site CancerSupportiveCare.com proposed a novel way of organizing websites assigning Library of Congress call numbers to pages to address navigation problems, creating, a website library.
This has been our experience in the past three and a half years. We have been asked why we did not use the Dewey Decimal Classification System. Dewey Decimal is copyrighted and charges money. The Library of Congress is in the public domain. We have found the Library of Congress Classification (LCC) can be easily used for cataloging website pages.
We are using SuSE Linux, Apache, MySql, PHP (LAMP) running on a FreeBSD server. First of all a functioning database with easy queries is imperative. When writing the PHP (or other language) code, your query must include truncation. For example, the word diet occurs in diet, diets, dietitian, dieting, dieter, dietetic, DIETS, Dieter, etc. The computer recognizes each of these variants of diet as separate and distinct words. If your code only looks for the word dieting - lower case letters, when a web user types in the search box another form of diet such as Dieters, Dieting, Dietitians, the appropriate pages will not found.
We made this mistake thinking we could depend on the keywords box in our Advanced Search web page. It was too cumbersome for our users. The PHP code was rewritten to reflect word truncation leading to user ease. Use a trim function to trim the string (word) to the smallest unit in this example diet, plus functions such as - toupper(), strupper(), strlower() and tolower(), to have the upper and lower case alphanumeric characters seem the same to the computer. We also added an entry search page with a link to the more elaborate advanced search page. Originally, the database keywords and description fields were set to varchar (255 characters). To stop truncation of words, we changed these fields to the text function.
Because CancerSupportiveCare.com had visitors in 2007 from 173 countries, we write plain HTML (hypertext markup language) code to allow for legacy browsers, elderly machines and dial up connections. We also use the lynx browser plus two code validators to check each page. In February 2008 our visitors used 22 browsers, 17 operating systems, 136 screen resolutions, 56 languages and 7 connection speeds. Remember! You have no control over what browser, operating system, screen colors, screen resolution, or machine is being used to view an external website.
When writing the front end PHP pages for the user to access the database, have the webpage call itself. For example, a webpage named mysearch.php will have this code embedded form action="mysearch.php" method="POST". This is a security precaution. If someone tries to save the mysearch.php page to read the PHP code in order to break into the database, all they will get is a HTML webpage - no PHP code. In these days of Internet spammers, hackers and outlaws precaution is necessary.
One way to debug PHP is to use HTML. To prevent your PHP code from being displayed on an HTML page, you need to either create an htaccess file, or add php to your Apache httpd configuration file.
Have the table containing the search fields center itself, for instance table border="1" width="90%" align="center". This keeps all of the rows aligned instead of different sections wandering off the page creating strange shapes. Or use CSS (cascading style sheet) centered layout, but remember this code may not work with older browsers. Have the answers to the search question dump to the top of the page. No one wants to scroll down, down, down to find the information. They want everything at eye level or what is sometimes called, Above the fold.
Your site map should contain links to what we call our card catalog pages. Our site map is a flat HTML page organized using Library of Congress Classification (LCC) with descriptions similar to the card catalog model. These catalog pages contain separate subjects. With these card catalog pages our users can find a particular topic, article, or subject choosing either the database search and/or the card catalogs. Originally we had few card catalog pages, but they became too large and unwieldy. Look at the size of your page. Think of your user. Once the page becomes larger than 40,000-50,000 bytes it is time to think of splitting the web page into two pages.
Organizing Web Pages
Some web designers depend on a directory (sometimes called channels or folders) structure. They use so called jump-points to move from section to section. This may be acceptable for a small site with no unifying database. Because of subject overlap this may not be the best way to organize larger websites. For instance if you have a channel for nutrition and a folder for exercise, where do you place an article discussing both diet and exercise? Using a database backed library schema solves this problem. As we said in Crisis on the World Wide Web: A Library Website Model, "The majority of our site visitors know how to use a library. By adhering to Library of Congress Classification standards our website is organized on a familiar system."
We are adding several books and pamphlets to CancerSupportiveCare.com. We have come to a consensus that these should be in unique directories. These directories include an index page, the entire booklet in PDF (portable document format) and then the booklet is broken up into HTML pages. By providing HTML and/or Text versions of material, we meet Web Accessibility Standards. Then to achieve a middle ground we list each page/section on the appropriate card catalog page using LCC.
Most of the time adding and classifying new web material is straightforward. Some pages fall into multiple categories, such as the article, Skin Cancers and Sun Exposure in the Life After - A Roadmap for Cancer Survivors Directory. Does this article belong in RC262.C29 - Cancer Survivorship, or does this article belong in RC280.S5 - Skin Cancer? This article includes information about preventing, diagnosing, and secondary skin cancers. The solution, we list it in the card catalog page containing RC280-282 - Cancers, Neoplasms, Tumors, Oncology, while the article, Skin Cancers and Sun Exposure is located in Life After - A Roadmap for Cancer Survivors Index. This example demonstrates the need for website organization using card catalog pages, an accessible search, with the whole site unified by a database.
Changing over to the library schema was a tedious, difficult task. It was daunting to get started and finish. But now as our website keeps expanding and we constantly add new web material, this library model has saved us time, money, and effort; and most rewarding, our users find the Cancer Supportive Care information they need.
Reprinted by permission CancerSupportiveCare.com