Web Databases
Part 2: Setting Up Your System
8/24/98
by Edward Piou
Part 1 of our three-part series on web databases covered database basicswhy you would use one, how data flows from database server to web browser, etc. In this article, we'll cover getting your database set up, including selecting the right tools for your needs. And in case you're wondering what we use here at Anchor, there's also some information on what our system looks like.
Keep an eye out for the third installment in this series. In Part 3 (appearing in the early fall), we'll explore one of the database systems on Anchor in more detail and show you some of the actual code behind it.
But for now, let's get started with a raging debate: whether to use Unix or Windows NT to serve your site. This choice will affect most of the other choices you make about your database system.
Unix or Windows NT?
A lot has been written about the advantages of Unix over Windows NT as a web server platform, as well as the advantages of NT over Unix. There are numerous articles which analyze the differences between the two operating systems (see the Web Index). Most of them tend to agree on a few things:
Analysts generally predict that Windows NT will catch up with Unix in these three areas as the operating system matures (much as they once forecast that Windows would eventually approach the ease of use of the Macintosh OS). For today, though, the experts agree that most flavors of Unixincluding Solaris and Linuxlead Windows NT in these areas.
So why would anyone use Windows NT instead of Unix? Despite these serious drawbacks, Windows NT might be the better choice in some cases. Security, scalability, and stability are important, but there are several other factors that should affect your decision on whether to run your database on Unix or Windows NT:
Understanding What's Right for You
Ease of use, training of personnel, your existing setup, networking requirements, cost: all of these should be part of your decisionmaking process when choosing between Unix and NT. Combined with information about the three S's (security, scalability, stability), these factors should give you a clear picture of what you need.
Ease of use
In terms of ease of use, the main difference between Unix and Windows NT is in the interface. NT has a
fairly advanced and aesthetically pleasing graphical user interface (GUI), compared to the command-line interface that is the default for Unix. Both operating systems offer alternatives, making it possible for a user to change the interface. With Windows NT, you can open a DOS command shell and issue commands from the command line when you like. And there are GUIs available for Unix that can let you perform many of the functions which used to be relegated to the command line only (KDE, FVWM, and others). At its core, though, Windows NT remains a GUI-driven operating system. Unix remains a
command-line system.
The advantage of a GUI is that most of the commands that you'll need to maintain your machine are available through menus. By pointing and clicking, you can do most of the configuration of your machine. You won't have to memorize many commands. If you forget exactly how to accomplish a particular task, you can generally meander through menus until you get to the program or configuration file which you need. You don't have to know exactly where everything is, assuming you're willing to spend some time exploring. The problem with this approach, though, is that someone else has determined what menus you have to burrow through to get your job done. You're stuck with the logic of the people who designed the system.
The advantage of a command-line interface is flexibility and speed. Once you know where everything is on your Unix machine, you can generally access it without having to navigate through a complex series of menus. You can access any possible action from the command line, regardless of what directory you are in. You don't have to spend time looking for things in menus, and you don't have to follow the logic of the people who built your OS. You can impose your own logic on your machine.
One interesting development in the Unix world (specifically in the Linux world) is the development of several independent GUIs aimed at making the system more user-friendly. The leading such GUI, KDE, offers a less complicated, more easily configured interface than you'll usually find on Unix. KDE is less likely to scare away "point-and-click" consumers, who are more likely to feel at home using Macintosh or Windows. Ease of use is one area in which the Unix world is trying to catch up with the Windows world.
Personnel training
Another thing to keep in mind is the knowledge of the people currently maintaining the
machine, as well as those who will be maintaining it in the future. If your tech
support group consists of people with years of experience in Windows NT but there is not one
experienced Unix system administrator to be found, it may make sense to
use Windows NT. If Windows NT systems administrators in your area get paid twice as
much as Unix sysadmins or are harder to find, it may make sense to go with
Unix. Of course, it's generally a good idea to have both Unix and Windows NT
administrators in your organization if possible. Otherwise, you may find
yourself unable to take advantage of useful tools that are available on only one of these two platforms.
Keep in mind that Windows NT is not "just like Windows 95." In addition to being more powerful, Windows NT has a lot of administrative issues that a Windows 95 user never has to deal with. You shouldn't buy Windows NT just because your technical people already use another version of Windows. They are very different animals, despite the surface similarities.
Your existing setup
Obviously, if you've already got a number of machines of one flavor sitting
around, it can make sense to use that platform, because the software has
already been bought and/or installed.
If you have some older computers (2 or more years old) lying around that run painfully slowly under Windows
NT, you may want to try installing Linux on them. Because it doesn't have a
GUI (and because its developers have concentrated
on making its code lean and mean), Linux needs far less powerful hardware to
get the job done. Even a 75MHz PC with 32 megabytes of memory should be sufficient to run a fairly busy web site on Linux, while it would run noticeably slower on Windows NT.
Networking requirements
It is fairly easy to set up either a Windows NT machine or a Unix box to work
on a LAN, and to serve as a file or print server for other computers. But
administering a web server and database remotely works differently, depending
on the choice of platform you make. With Unix, you can telnet in to your
server from any machine with a telnet client (which is available, free, for
all computers) and perform all the same administrative functions that you
could from the desktop.
With Windows NT, you cannot perform all those same administrative functionsfor example, you can't run those command-line options which every NT administrator eventually discovers are easier to use, and sometimes necessary to do their job. Remote administration tends to be easier for Unix machines than for Windows NT.
Cost
Price is always worth considering when making hardware and software purchases.
For example, Linux, one version of Unix that is gaining in popularity, is free. With a free
operating system and the variety of free software tools available for Linux,
your cost of software ownership could be as low as, well, nothing. Zero dollars.
Windows NT, on the other hand, costs upwards of $500. When you add to that the price of the various software tools that tend to run well on NTMicrosoft SQL Server or Oracle, Visual Basic, etc.your costs will approach several thousand dollars. In addition, the hardware needed to run Windows NT well is more expensive than the software necessary to run Linux, since you'll need faster equipment and more memory.
Commercial versions of Unix (such as Solaris) tend to cost even more than Windows NT. In addition, the software that runs on Solaris and other commercial Unix versions is generally more expensive than that for Windows NT.
What about the Macintosh?
Of course, you can also run your database on the Macintosh OS. Their
operating system is certainly much more stable with the release of MacOS 8.0 (and higher versions)
than it has been so far.
A number of good web servers are available for the Macintosh: WebStar, WebTen (Tenon's port of Apache), and others. There is also a web server bundled with FileMaker Pro, a database available on the Macintosh and other platforms. Publishing a FileMaker Pro database on the Web is very easy to do.
The main reason to run your database or web server on a Macintosh is ease of use. A Macintosh is in general easier to use, easier to configure, and easier to administer than any of the alternatives. If you or your team members only have Macintosh experience, this could be the choice for you (assuming you recognize that your Macintosh server will most likely fail more often than one on Unix or Windows NT).
For the purposes of this article, we won't be looking in-depth at any Macintosh-based databases. Until Apple releases an even more stable operating system, like Rhapsody or perhaps Mac OS X, the ease of use of a Macintosh won't outweigh the stability of Unix.
What does Anchor use?
Anchor runs on two Linux machines, the faster of which has a 120 MHz
processor, which provides all the speed we need for running Linux. The web
site is on one machine. The database is on the other.
We run Unix for the basic reasons detailed above. It offers a lot more control to the administrator, runs faster than Windows NT, and is much easier to connect to remotely. In addition, as the systems administrator for Anchor, I'm comfortable with Unix. I've used it for over 7 years, and it's my server platform of choice.
Choosing a Database
After you have decided on a platform, you'll need to choose a database to run. Important factors to consider are database cost and availability of staff who are familiar with the database, as well as speed and features.
|
|
|||
| Database | Cost on Unix | Cost on Windows NT | Comments |
| MySQL | Free | $200 | Fast; stable; lacks some advanced features |
| miniSQL | $250 | $250 | Free for non-profits and educational institutions; easier to maintain, but less stable, than MySQL |
| SQL Server | Not available | $1200 | Suitable for most Windows NT needs |
| Oracle | $1200 | $1200 | Very scalable; suitable for extremely large applications |
Free/open-source databases
There are a number of free, open-source databases available. MySQL is a
popular free database built by T.c.X, a company based in Sweden that offers
paid support for the product. According to benchmark figures available at the
MySQL website, MySQL is the fastest freeware database available for Unix
machines. This speed, combined with its great stability, low cost (zero dollars), and
variety of compatible tools, have made it very popular
among web programmers. The fact that the combination of PHP and MySQL won the
1998 Web Tools Award in the
Databases category is sure to increase the number of developers who use and
can support MySQL.
MySQL does lack two important things you'd find in most commercial databases, however. One is stored procedures. The other is transaction rollback.
Stored procedures are small programs stored in the database. These programs are made up of a string of database commands that you expect you'll need to execute more than once. For example, in the case of an ad tracker program, this might be:
ad_views_count by one
ad_views_count exceeds ad_views_bought set ad_expired to 1
ad_url and
ad_expired.
Transaction rollback also allows you to group a series of commands. In this case, however, you group them together as a series of actions which all need to succeed (or fail) together. For example, suppose you are programming an online stock exchange simulation. If Trader A decides to buy 10,000 shares of WidgetCo stock at $15 per share from Trader B, you need to make several updates to your database. These updates include removing $150,000 from Trader A's account and removing 10,000 shares from Trader B's account. If you can't remove 10,000 shares from Trader B's account (perhaps another buyer bought all of B's stock a millisecond earlier), you don't want the $150,000 to disappear from Trader A's account. Rollbacks let you tell the database not to commit the transfer of money unless the transfer of stock worked, or vice-versa.
Another potentially free database is miniSQL. This is a SQL database that is free for schools and non-profits (it costs $250 per year for other institutions). miniSQL is slower and less stable than MySQL, but is easier to install and maintain. MySQL is actually based on miniSQL. The folks at T.c.X built MySQL when they realized miniSQL wasn't providing all the functionality they wanted.
PostgreSQL is another free, open-source database system. It contains some functionality not present in MySQLfor example, transaction rollbackbut is slower.
If you're using Windows NT and you're interested in going with MySQL or miniSQL for your database project, here's the good news: both of them have been ported to NT. Now here's the bad news: you'll need to pay if you want to use MySQL for NT. It costs $200, though miniSQL is still free to non-profits and educational organizations.
Commercial databases
A number of commercial databases exist for both Windows NT and Unix. The
leaders include Oracle (for both Windows NT and various flavors of Unix), and
Microsoft SQL Server (for Windows NT). Other robust commercial databases you might want to consider using are Sybase and Informix. There are also less robust commercial
databases availablefor example, Microsoft Accesswhich can be appropriate for
low-traffic internal sites. These won't perform very well when your hits start to ramp
up, though.
These commercial databases tend to have more features than the freeware databases listed above and are usually more stable. If you are dealing with database tables that will hold millions of records, you'll most likely want a high-end commercial solution such as Oracle. Oracle databases generally have a reputation for being the most robust and scalable. They're probably the way to go if both the dataset and your company's budget is large enough. Staffing may be an additional cost if you take this route, though. There are plenty of people who specialize in Oracle, but because their services usually go to huge institutions deploying huge systems, their services tend to be more expensive.
If you're running your database on Windows NT, Microsoft SQL Server will probably work well for you. It won't handle the millions of records which Oracle can deal with, but has much of the same functionality as Oracle does, as well as speed and stability. The number of developers familiar with it is high enough to guarantee you can find someone to maintain it.
What does Anchor use?
Anchor uses MySQL. We use it because it is free, fast, stable, and (like much
open-source software) has a community of users willing to provide technical
support for each other.
Connecting to the Database
The final thing to decide is what tools you will use to connect to your database. There are a number of applications which claim to let you connect your database to the Web without programming. However, they generally don't let you customize your applications to the degree other solutions door if they do, they make you jump through some serious hoops.
In most cases, you should use the same CGI programming language or server-side scripting (SSS) system to connect to your database as you do to provide other interactive services on your web site. (An SSS system lets you place programming commands inside your HTML files, while CGI programs generally require all information to reside in separate programs, rather than inside HTML pages.) Almost all of the popular web programming languages and SSS systems can communicate with the databases we've covered so far.
On Windows NT, Active Server Pages (ASP) and Cold Fusion (an advanced, proprietary scripting system) are two good server-side scripting solutions (see our guide to ASP Basics for more information). PHP (an open-source scripting system) and EmbPerl (a scripting system that lets you embed Perl commands and programs in HTML files) work on both Unix machines and Windows NT systems. You can use Perl, C, C++, and Java on either platform to write CGIs.
On Anchor, we use PHP because it runs quickly and is easy to implement. If I were redoing the technical aspects of Anchor now, I would consider using EmbPerl instead. Because it was designed to be easy to learn and implement, PHP does not have all the capabilities of a more mature, more powerful language like Perl. We don't need anything that PHP can't do yet, but when we do, we may have to turn to Perl scripts for added functionality.
What is an API? What is ODBC? DBI?
There are a lot of buzzwords and acronyms in the world of web databases. Many we've already covered earlier in this series. A few extras you might want to know about are API, ODBC, and DBI.
API stands for Application Programming Interface. Database creators (and web server creators) write APIs to provide standardized ways for other programmers to interface with the database systems. Essentially, the API defines what commands (with what input) a program can send to a server. It also defines what sort of output the program should expect to receive back.
ODBC stands for Open Database Connectivity. It is an API that was initially developed by Microsoft to standardize how programs talk to databases, and is mostly used on Windows NT-based databases. It's useful because it allows you to write programs that don't have to be database-specific. Rather than writing your program to interface with Microsoft SQL Server's API, then rewriting the program if you move the data to an Oracle database, you can just write the program to connect to an ODBC source. Then if you move the database from one system to another, you just need to change that ODBC source (a relatively simple task).
DBI is similar to ODBC but is specific to Perl. DBI modules are available for connecting to Oracle, MySQL, MiniSQL, Sybase, ODBC, and other database servers. (If you used DBI to connect to an ODBC source, you'd have two layers of APIs between your program and your database.) By writing a Perl program using the DBI module, you can increase the portability of your code and cut down on future work. Changing from one data source to another using DBI generally consists of changing one or two lines of code per program, rather than changing each call to the database.
Recap & Resources
Hopefully, this guide will help you make decisions about the type of software you'll need to set up a web-connected database. You have several decisions ahead of you: whether to go with Unix or Windows NT, which database to choose, and how to bridge the gap between database server and web server.
These can be tough decisions, and you may still have questions once you've finished reading this guide. Feel free to post questions in our discussion forums, or to post a comment about your experience in setting up a web database.
Edward Piou is an Anchor producer and runs ep Productions (http://www.eppi.com), a development company based in the Washington, D.C. area.
© 1998-1999 Anchor Productions, Inc. All rights reserved.