Web Databases
Part 1: Understanding the basics
6/22/98
by Edward Piou
In the beginning, there were static Web pages. Some were text-only, some had nice images, but each page generally stayed the same, changing neither content nor appearance in response to users' actions. It was new. It was good. It just wasn't great.
Then came the Common Gateway Interface (CGI). CGI programs allowed the display of dynamic pages -- pages with appearance and content that changed based on information that users input. The programs could do fancy things like send email messages or save information to a file on a server. CGIs were around near the Web's beginning, but it took some time for them to become common. Still, it was getting better.
And now there are databases. Databases have been around since before the Web was born, but they weren't used much on the Web in its early days. As more tools and scripting languages are being developed to help connect databases to web sites, things are becoming much more interesting.
Databases allow you to deal with large amounts of information efficiently and easily. They can do things that static pages can't (such as serving a different advertisement each time someone views a web page). And they simplify tasks that would be difficult with CGIs alone (such as searching for and displaying items in a large online catalog).
In this guide to web databases, I'll show you how a web-connected database can make managing large amounts of information easier, allowing you to improve on the services you provide. Later this summer in a followup article, I'll help you decide which database is best for you and what tools to use to connect it to your site.
But first, let's find out exactly what a database is.
What is a database, anyway?
According to my non-technical desk dictionary, a database is
"a collection of data organized especially for rapid search and retrieval (as by a
computer)." PC Webopaedia gives a similar definition, describing a database as a "collection of
information organized in such a way that a computer program can quickly select
desired pieces of data." In fact, what we informally call a database actually consists of three parts: a database (which is the data itself), a database server, and a Database Management System (DBMS).
The database itself is the information that you've stored, or the data. Take our Anchor Web Index as an example. The Index is an annotated collection of sites about web development topics. All the information that makes up the Index -- including the annotations, category associations, and URLs -- is stored in a database. By itself, this collection of data is pretty useless. We need a database server and a DBMS to make use of it.
A database server is software that controls access to the information in the database. When you "pass" a command to the database server, the server interprets the command. It selects or modifies information in the database, depending on what you asked it to do. The database server we use for Anchor is MySQL.
A Database Management System (DBMS) is the entire set of programs that let you deal with the data in your database. The DBMS includes the database server. It may also include other programs -- CGIs, server-side scripts (ASP, PHP, Javascript, and so on), or other ways in which you pass commands to the server.
For the Anchor Web Index, our DBMS consists of the following:
Visitor use a web browser to communicate with a web server (in our case, an Apache server) to get information from a database. The web server communicates with the database server (in our case, a MySQL server) and the database server manages information for one or more databases. Several web servers can connect to the same database server, and a single web server can connect to multiple database servers.
How is a database structured?
Most DBMSs in use today, both on the Internet and off, are relational
databases that use the SQL language for queries. Structured Query Language (SQL) is a relatively simple language
which you can use to access most relational databases. Various vendors have
implemented their own extensions to SQL, but most of the language is standard regardless of what
DBMS you use. IBM released the first commercial version of SQL
in the early 1980s, and it has been adopted by almost every relational
database vendor since then.
A relational database stores information in tables. Each table stores information about a set of similar things, such as the web sites in our Index. A table contains a set of fields with information about their attributes. For example, in our Index database there is a field for "URL." This field has an attribute describing what sort of characters should be entered (numbers, letters, and so on).
The tables in a relational database are generally related in some way (which probably has something to do with the name). Tables may be related conceptually; for example, you may put them in the same database
because they deal with the same general system. In one database on this site, we have a table called
website (for use in the Index) and another called message (for use in our discussion forums). They are in the same database so that, in future, we can tie these areas together, perhaps by including users' comments about sites we include.
Here's what the website table in our Index looks like:
|
FIELD
unique_id |
TYPE
int(11) |
In this table, unique_id, url, and the other items in the left column are field names. The right column contains the attributes for each field, or what should be entered (numbers, letters, and so on). For example, for the field unique_id, the attribute int(11) specifies that numbers (integers) should be entered. Each web site we add to our Index is given a unique identifying number.
The url and several of the other fields are text strings. MySQL has an attribute
called varchar which allows for up to 255 characters to be entered (which would be expressed as varchar(255). You can set the maximum number of characters allowed; for example, if you wanted 150 to be the maximum, you could express it as varchar(150). This way, the database acts as your watchdog, allowing only the number of characters you set.
So how does it work on the Web?
Let's look at what happens when a web user requests information from a database. Again, take the example of our Web Index. If someone wants to see what sites have been added recently, they can do that by clicking on "New Additions." The page they see is formatted just like a typical static HTML page. However, it's not a typical HTML page. It's the product of a web-connected database, some PHP programs, and an HTML shell.
So what happens behind the scenes when a visitor requests "New Additions?" Here's the rundown (don't worry too much if you don't understand the code listed here; just try to understand where the data is flowing):
1. The user sends "data" to the web server.
/index/newsites.html when this happens.
2. The web server converts the data into a set of SQL queries.
When a browser requests "/index/newsites.html", the web server runs the PHP code from that page:
$query1 = "SELECT title, url, description FROM website WHERE
TO_DAYS(NOW()) - TO_DAYS(website.create_date_stamp) <= 7 ORDER BY
sort_title";
This creates a query called $query1 that will be sent to the database server. It will find sites posted in the last seven days (WHERE
TO_DAYS(NOW()) - TO_DAYS(website.create_date_stamp) <= 7) and sort them by title (ORDER BY
sort_title).
3. The web server sends the queries to the database server.
$query1) to that server:
$linker = mysql_connect ("db_server", "username", "password");
mysql_select_db ("db_name");
$result1 = mysql_query ($query1);
The first line sets up a connection between the web server and the database
server. The web server connects to the machine db_server using the username
and password entered in this line (hey, we can't tell you everything).
The second line selects the database that the query will apply to.
The third line tells the database server to execute the query and return the
MySQL result identifier (the results) as $result1. The web server
later uses the result identifier to retrieve the relevant information from the
database. If there is an error, $result1 will be set to 0.
5. The database server returns the results of the query.
6. The web server converts the query results into HTML.
7. The HTML is returned to the user.
You can view a diagram showing what this looks like: http://www.ahref.com/images/daily/199806/databases.gif
When wouldn't you use a database?
Databases can make information management a lot easier, but just because you
can use a database to solve a problem doesn't mean you should.
Here's an example of an application which might seem worth using a database for, but really doesn't need one. Each time you load a web page on Anchor, you get a random message at the top of the page. We could put these messages into a database, and set up a system for administrators to add, delete, and modify the messages through a web interface. But the messages are few enough (12) and change infrequently enough (unchanged in the past month) that it is easier to just use a server-side script to assign them to an array, and randomly choose one element of the array to display each time someone loads one of our pages.
Another example where a completely dynamic, database-driven solution would be inappropriate would be on Yahoo's index of web sites. It makes sense for them to use a DBMS to manage their index of web sites, and to allow users to search the index. But rather than use fully dynamic, database-connected pages as a browsing interface for users, they export the information from their database to static pages (an approach that we'll use for our Index as it grows). If they performed queries to their site database each time users browsed their pages, the load on their machines would increase immensely. The resources required to make multiple database queries far exceed the resources used in serving a mostly static web page.
If you want to make and distribute a web application that might be used by people who don't or can't run DBMSs for some reason, storing your information in text files might be a good idea. NetForum (http://www.medsch.wisc.edu/netforum/) is a web-based discussion forum that stores messages and configuration information in plain old text files. Because of this, the people who use the program need not take the trouble of installing, running, and administering a DBMS. This makes NetForum easier to install and use, though it also makes it harder for someone not familiar with the nuts and bolts of the program to alter it.
Finally, the more complicated you make a software system, the greater the chance that it will fail at some point. Using a DBMS as part of a web application gives you one extra component to worry about, one more thing that just might go wrong. If you're dealing with large sets of information and need a standardized way of dealing with the data, a database can definitely help. But if a system is small enough and the information is manageable through some other means, you may want to explore other options.
Edward Piou is an Anchor producer and runs ep Productions (http://www.eppi.com), a development company based in the Washington, D.C. area.
© 1998-1999 Anchor Productions, Inc. All rights reserved.