Apache Server 2.0: A Beginner's Guide

Apache Server 2.0: A Beginner's Guide

Apache Server 2.0: A Beginner's Guide

Apache Server 2.0: A Beginner's Guide

Paperback

$49.00 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

This work offers clear and straightforward instructions for tasks ranging from basic to technical, as well as the background information required for an administrator to understand the history of the server and how best to run it in today's Web environment. It contains information on running Apache with different platforms, including UNIX-based systems like Linux, FreeBSD, and Solaris; Windows products, including NT and Windows 2000; and Mac OS X. There are eight pages of blueprints which demonstrate the differences between Apache Server and IIS 5 and diagrams showing Apache running on different operating systems.

Product Details

ISBN-13: 9780072191837
Publisher: McGraw-Hill/Osborne Media
Publication date: 09/05/2001
Series: Network Professional's Library
Pages: 560
Product dimensions: 7.03(w) x 9.74(h) x 1.23(d)

About the Author

Kate Wrightson has been running Apache on a small network for several years, and has experience working with Linux and UNIX. She is also the co-author of WordPerfect Suite 2000 for Linux: The Official Guide and Corel Linux Starter Kit: The Official Guide.

Read an Excerpt

Excerpt from Chapter 1:

History and Background of Apache

Despite its dominance and importance today, the World Wide Web (WWW) is a relative newcomer to networked computing, having been developed only in the middle 1990s. Despite its late start, the Web has become the service synonymous with "Internet" to millions of users worldwide. Whether you've been around the Internet since the early days (and remember Gopher and other pre-Web services) or you arrived on the scene after the Web had become the most popular service for Internet users-running neck and neck with electronic mail-you know people want fast and reliable access to the millions of Web pages out there.

While you can't guarantee reliable service on the user's end, you can make sure your own pages are served rapidly and your Web presence is stable, whether you're running a small Web server out of your dining room or you're part of an administrative team operating a server that offers thousands of pages for millions of daily hits. The secret to a stable Web presence is choosing the right Web server for your site: the Apache Server. Over 60 percent of sites on the Web use Apache or one of its derivatives to power their pages. In this chapter, you learn why Web administrators choose Apache, as well as what makes it so powerful and unique.

WHAT IS APACHE?

At its most basic, the Apache Server is a standards-compliant Web server. This means the Apache Server supports the requirements of the HTTP 1.1 standard, a document that defines the method by which files encoded in Hypertext Markup Language (HTML) are moved across computer networks. TIP: HTTP is an acronym for Hyper Text Transfer Protocol.

The term server means Apache responds to requests from other programs, but doesn't provide documents of its own volition. That is, when you open a Web browser-such as Netscape-and type http://www.apache.org into the text box and then press ENTER, your browser contacts the server at apache.org and requests the default page for that site. The server responds to the request with the file you want to see, which the browser then formats and displays. Figure 2-1 shows the basic process.

NOTE: These standards are maintained by the World Wide Web Consortium (W3C), a nonprofit group that works to develop standards for both HTTP and HTML. In Chapter 13, "Serving Compliant HTML," you learn more about working with standards and why they're critical to administrators and their sites.

Apache is more than a simple Web server, though. The true power behind the Apache Server lies in its modularity. The core of the server is actually quite small, serving as the central component of the program, but not providing a lot of extra functions. Those functions are added as modules, individual pieces of code that permit the server to handle a particular type of request or file in the appropriate way. Chapter 5, "Apache Modules," covers the range of available modules, while Chapter 8, "Dealing with Innovation: mod_perl, A Case Study," explains one popular module in great detail. If you plan to run Apache in any serious way, you'll find its modularity means you only need to install the functions you plan to use-without wasting machine cycles on functions you don't need.

DEVELOPMENT AND HISTORY OF THE APACHE PROJECT

The Apache Server is the creation of a large group of programmers and developers who work together to build and strengthen Apache and its modules, as well as to incorporate new technologies into the server. The Apache project started in 1995 as an attempt to upgrade the original HTTP daemon (httpd) developed at the National Center for Supercomputing Applications by Rob McCool. Because McCool had taken a new job in 1994, nobody at NCSA had taken over the project, so httpd was languishing at a time when Web programming was starting to take off.

Web administrators were working on httpd on their own, and they began to share their patches and hacks with each other in an attempt to strengthen httpd without McCool's input. Soon, eight programmers announced the formation of the Apache Group, which would serve as a central node for httpd development. They took all the patches they could find and incorporated them into httpd code, releasing the first Apache server distribution in April 1995 as version 0.6.2. Testing and writing new code occupied the Apache group (including NCSA programmers) for the remainder of 1995, and after two more beta releases, Apache 1.0 was released in December 1995. Within a year, Apache was the most popular server being used on the Web. This popularity hasn't slowed, with Apache itself now serving 60 percent of Web sites and its derivatives adding another 3 or 4 percent to that total. Apache is currently in beta for version 2.0, with the most recent stable release being 1.3.

NOTE: This book is written using both Apache 2.0 and Apache 1.3. Since the 2.0 release is still under construction and is released only as beta software, those running Web sites that require reliability may need to stay with the current stable release (Apache 1.3) until the 2.0 version is released as stable. Significant differences from 1.3 are noted in this book, but some processes given here for 2.0 may not work on 1.3 installations.

At the end of 1999, the Apache developers took a somewhat unusual step. The server had become so popular, a more bureaucratic structure was needed to manage the project and its work. So, the Apache Software Foundation was established under United States law as a fully nonprofit organization. The foundation can receive donations, distribute funds to developers or other recipients, and manage the growth of Apache in an organized manner. Perhaps even more important, the foundation is considered a separate legal entity, apart from any people involved in the project. The foundation can enter into contracts, participate in legal action, and even sue or be sued, though one hopes that will never be necessary!

OPEN SOURCE SOFTWARE

Working with Apache without learning something about the Open Source or Free Software community is nearly impossible. Apache is often touted as one of the biggest successes to come out of this community, and the project has stayed faithful to its roots as the server has become more widely used. But what J's Software, and why is it important?

At their most general, the terms Free Softwak and Open Source refer to software developed by volunteers and distributed with a license that's simultaneously restrictive and open. Free Software licenses usually require the user to contribute any changes made to the program back to the development community. They also require the full code base be distributed openly, holding nothing back as a "trade secret." Many programs released under such licenses, like Apache, are also distributed free of charge.

NOTE: Free Software doesn't always mean "no cost" software. The "free" refers to the way in which the code base, and improvements to the code base, must circulate among users and developers. People in the community use the phrase "free speech, not free beer" to indicate a difference exists between sharing without restraint and sharing without payment.

The Free Software movement is the brainchild of Richard Stallman, an MIT computer scientist who spent much of the 1970s decrying the rise of commercial software that hid its code from users and administrators. Without access to the code, Stallman knew administrators would have to rely on the software companies to fix bugs and produce upgrades. These upgrades would be generic and not always useful for a particular administrator's needs. So, Stallman began working on projects that would be released freely to the computing community and has continued to do so for the last quarter-century. He also created a foundation, called the Free Software Foundation, which helps people write Free Software and get it distributed.

Many of Stallman's programs are now considered integral parts of a Unix system, which is ironic because his project name, GNU, stands for Gnu's Not Unix. Stallman wasn't the only person working on such programs, though. A robust international community of programmers, hackers, and students was building an amazing array of programs. The rise of the Internet and its growing availability to people outside the military and academic networks helped with this explosion of code. However, the catalyst for truly amazing growth came when a Finnish college student, Linus Torvalds, released the first version of a new operating system called Linux.

NOTE: You'll see Unix spelled both with the capital U and in all capital letters, as in UNIX. The latter is a registered trademark, while the former has become the general way to describe UNIX-based operating systems, which may or may not contain part of the code in the AT&T copyrighted UNIX. In this book, the Unix spelling is used.

Linux was a version of an older Unix-based operating system called Minix, but it was developed and released under a GNU-derived license. One major innovation was that Linux could run on a variety of hardware, a far cry from the days when individual computers arrived with their own unique operating systems. The wide distribution of Linux meant a large user base was available to work with new programs and to generate data that would work as independent of the hardware platform as possible. With a Free and flexible operating system now available, the community exploded . . . and business began to take note.

Unfortunately-or fortunately, depending on the side you take-Stallman's insistence on the term "Free Software" wasn't the best marking tool. Businesses weren't comfortable with the concept of "free," thinking free code might be worth exactly what was paid for it. The programs were good and competitive, but the perception was a problem. Enter Eric Raymond, a programmer active in the Free Software community who identified this problem. In his landmark essay "The Cathedral and the Bazaar," Raymond suggested the term "open source" as a replacement. Open Source would carry the same connotations of open development and the distribution of source code, but would remove any financial or moral implications from the software's description. What term you use is up to you, but you should be aware of the shadings behind each description.

NOTE: If you're interested in learning more about this community, you can find out a lot by searching the Web and by reading the writings of both Stallman and Raymond. Raymond's book, The Cathedral and the Bazaar (O'Reilly & Associates, 2000), is a collection of his most important essays, which are also available on his Web site: http://www.tuxedo.org/-esr/writings/. You can learn more about Stallman's views by reading through the GNU site at hftp://www.gnu.org....

Table of Contents

Acknowledgmentsxxiii
Introductionxxv
Part IInstalling Apache
1History and Background of Apache3
What Is Apache?4
Development and History of the Apache Project5
Open Source Software6
How Apache Works9
Features of Apache 2.010
Summary11
2Preparing for Apache13
Locating and Downloading Apache14
Preparing the Web Server Machine16
Identifying and Removing Prior Servers18
Using Apache with Unix20
Upgrading from Earlier Versions of Apache24
Identifying Previous Apache Installations24
Should You Upgrade?27
Summary28
3Installing Apache29
Installing Apache from Binaries30
Installing Apache from Source Code35
Summary44
4Running a Heterogeneous Network47
Samba for Windows Users48
netatalk for Macintosh Users51
When You Run Multiple Flavors of Unix57
Summary60
5Apache Modules61
How Apache Modules Work62
The Default Modules63
Locating Modules Not Included with Basic Packages86
Installing Modules87
Summary88
Part IIConfiguring and Running Apache
6Configuring and Testing Apache91
The Apache Configuration Files93
Configuring Apache for Unix93
Configuring Apache for Windows116
The apachect1 Utility118
Summary119
7Managing the Apache Server121
Controlling Apache with Direct Commands122
Using apachect1125
Starting Apache Automatically At System Boot127
Defining the File System132
Summary135
8Dealing with Innovation (mod_perl: A Case Study)137
When to Use a New Idea139
Finding New Modules and Shortcuts140
The mod_perl Module151
Security Versus Innovation154
Summary155
Part IIIApache Administration
9Logs159
Apache Logs160
Finding the Logs161
How to Read Logs162
Configuring Logs162
The mod_log_config Module167
Useful Log Tricks168
Summary172
10Disk Management173
File system Management174
Disk Partitions175
Moving Content176
Disk Quotas179
File and Directory Permissions180
Summary183
11Performance Tuning185
Why Tune?186
Streamlining Your Apache Installation188
Unnecessary Modules194
Load Balancing195
Tracking Site Use197
Summary199
12Dealing with Users201
The Human Side of Administration202
Setting Quotas203
Setting Policies204
Unix User Management206
Summary208
13Serving Compliant HTML209
What Is the World Wide Web Consortium?210
HTML Standards211
Setting Appropriate Server Policies225
Summary226
Part IVBeyond the Basics: Advanced Apache Topics
14MIME and Other Encoding229
What Is MIME?230
MIME Types and Apache Configuration237
Character Sets256
Summary259
15CGI: The Common Gateway Interface261
The Common Gateway Interface262
CGI and Apache263
Obtaining CGI Scripts268
Uses for CGI on Your Site270
CGI and Security276
Writing Your Own CGI Scripts278
Summary280
16Image Maps281
Web Navigation283
Constructing Image Maps284
Enabling Image Maps289
Serving Image Maps: mod_imap290
Maintaining Accessibility293
Summary294
17Using Apache to Save Time: SSI and CSS295
Server Side Includes296
Configuring SSI298
Working with SSI Variables302
SSI Commands303
Cascading Style Sheets306
Making Web Pages Accessible309
Summary310
18Virtual Domain Hosting311
Virtual Domains312
Should You Host Virtual Domains?313
Working with the Domain Name Server315
Configuring Virtual Domains317
Virtual Domain Services: E-Mail322
Summary323
19E-Commerce325
What Is E-Commerce, Anyway?327
Security and E-Commerce329
Adding E-Commerce Elements to Your Site332
Choosing an E-Commerce Provider336
Summary339
Part VSecurity and Apache
20Basic Security Concerns343
Security Self-Evaluation344
Access346
Availability347
Resources348
Software and Practices for Secure Operation350
Summary354
21What to Do If You Get Cracked355
Noticing the Crack356
Finding and Fixing Vulnerabilities358
Preventive Measures359
Security Breach Checklists360
Summary367
22SSL: The Secure Socket Layer369
What Is SSL?370
How SSL Works with Apache377
Using SSL as a Module379
Summary381
23Firewalls and Proxies383
What Is a Firewall?384
Choosing a Firewall387
Firewall Structures388
Administering a Firewall395
What Is a Proxy?395
Choosing and Compiling a Proxy Package396
Configuring a SOCKS Proxy397
The mod_proxy Module398
Summary399
Part VIAppendices
AInternet Resources403
Web Sites404
Newsgroups408
Mailing Lists410
Getting Involved with the Apache Community412
Related Resources412
BUsing a Unix Text Editor417
GNU Emacs424
pico429
Summary432
CGlossary433
A434
B434
C435
D438
E439
F439
G440
H440
I442
L443
M443
N445
O445
P446
Q448
R448
S449
T452
U453
V453
W454
X454
DCommon Unix Commands455
EApache Configuration Files479
httpd-std.conf481
httpd-win.conf500
highperformance-std.conf518
Index521

Introduction

Introduction

Everybody loves the Web. Many people think the Web is the Internet because it's the most widely advertised Internet service and the subject of much business experimentation over the past few years. Even though the Web is only one of several critical Internet services (along with e-mail, file transfer, and other useful technologies, it has certainly become a critical part of many people's daily lives and work. This is an amazing fact, but it's even more astonishing when you realize the Web is a new technology, developed and popularized within the last ten years!

While most people use the Web frequently and familiarly, far fewer are aware of the software that gets Web pages on to their monitors. Sure, everyone knows about Web browsers, but the servers that talk to the browsers and hand over the requested files are much more anonymous. However, without Web servers, no Web exists. A number of Web servers are available to the would-be Web administrator, from the complex and highly configured commercial servers sold as part of an e-commerce package to the most bare-bones and terse servers designed for test needs. Chapter 2, "Preparing for Apache," introduces some of these Web servers.

The two most popular Web servers, though, are Microsoft's Internet Information Server (IIS) and the Apache Web server. In fact, Apache is the most popular Web server in the world. It runs more than half the world's Web sites, and it performs well on rigorous benchmarking and performance tests. While IIS has the edge in some all-Microsoft networks, even the most hardcore Microsoft administrators often run Apache for their Web sites. To add to the popularity of the Apache server, you can download the software free. The source code is also openly available, meaning a constant and enthusiastic development community is building new features and functions for the server, and Apache is thoroughly tested in real-world situations and installations.

Obviously, this is a book about Apache, so you might expect me to be partial. Yes, I tend to think freely developed software has the edge on a lot of commercial software, but that's not the point with Apache. Apache is simply a better Web server than anything else out there. It's robust, streamlined, modular, responsive, and stable. That's the recipe for a darned good piece of software, which is precisely what Apache is. In this book, you find little preaching about Free Software or Open Source (though Chapter 1, "History and Background of Apache," contains an introduction to the topic, so you understand the community that created Apache).

Instead, we explore one of the two most popular and successful freely developed programs and how it can work for you. My hope is learning more about Apache will dispel some of the myths you might believe about noncommercial software, and that you'll consider other such software for your system as well.

TIP: The other freely developed success is the Linux operating system. Both Apache and Linux come from dedicated and committed communities, which work on the projects as hobby and passion.

No matter the reason why you've chosen the Apache Web server-or the reason you chose this book-you can find something in it to challenge your skills and meet your needs. Apache is a great piece of software and I hope you share my enthusiasm for it after you finish this book. Please be aware, there are worlds beyond what's covered here. In particular, this book doesn't cover dynamic content served from databases, and it gives little room to module programming and advanced scripting. Other valuable books cover such topics. This book is an introduction and a guide to basic Apache administration, and I hope you continue to explore other topics once the basics are under your belt.

WHAT'S IN THE BOOK

This book is divided into six parts. The first three sections deal with the basic tasks involved in running Apache, while the last three introduce more extended topics and provide helpful information you can use as a resource. If you're completely new to Apache, start at the beginning and read the first two parts before you install the server, using the remaining parts to bolster your knowledge as you gain experience. If you're more experienced with Unix servers in general, you may choose to skip the first two sections (or use them as a reference) and move to the fourth and fifth sections to expand your knowledge about Web-related topics. All readers can use the appendices in Part VI as support for the rest of the book.

TIP: Two Tables of Contents at the beginning of the book. One is a chapter listing, while the second is expanded and contains the various subheadings of each chapter. Skim the expanded Table of Contents to learn more about the topics covered in each chapter and each part of the book.

Part 1, Installing Apache, starts from the beginning, with an introduction to open software and to the Apache server itself. This part also includes practical information on preparing your machine for the Apache server, locating a recent copy of the software, and installing the server. Other chapters in this part introduce software that can help you run a network that includes more than one operating system, as well as introduce Apache's modular construction and the various modules that perform different functions for the server.

Part II, Configuring and Running Apache, is the next step after successfully installing the server. Part II contains extensive information about configuring the server to meet your particular needs, as well as help in testing your configuration and fixing any problems that might occur. Once the server is configured properly, you're ready to manage and operate the server to provide Web pages to your visitors. This part of the book concludes with an introduction to the ongoing world of Apache development, including numerous modules that provide extended features to the server.

Part lIl, Apache Administration, focuses on the basic tasks involved in being a Web administrator. Chapters on Apache logs, basic Unix disk management, and performance tuning can help you understand your server and site traffic, as well as keep your installation running as smoothly as possible. This part also contains a chapter on dealing with your user base and setting up appropriate user policies, plus a chapter on the HTML standard and why you should attempt to serve HTML code that's as standard-compliant as possible.

Part IV, Beyond the Basics: Advanced Apache Topics, moves to topics of interest to a Web administrator, but that aren't required to run the Apache server. This part begins with an explanation of the MIME standard and text types; including character sets, which you can serve on your site. In this part, you also find an introduction to CGI scripts, image maps, server-side includes, and cascading style sheets. These are all page design techniques, but those that require some attention from you as the site administrator. Here, you also learn how to host virtual domains from your regular site. This section of the book concludes with an introduction to e-commerce and its complications.

Part V, Security and Apache, concludes the main part of the book. Security is an integral consideration for any Web administrator. In this section, you learn about some basic security concerns and precautions, and what to do if your site is cracked. This section also contains an introduction to Secure Sockets Layer (SSL) technology, and explains how to set up firewalls and proxies to further secure your Web server.

The final part of this book, Part VI, contains five appendices for further information. Appendix A is a list of some helpful Internet resources for the Web administrator. Appendix B offers instruction for several popular Unix text editors, which you need when you configure Apache. Appendix C is a glossary, while Appendix D contains a list of commonly used Unix commands. Finally, Appendix E contains the text of Apache's configuration files.

WHO SHOULD READ THIS BOOK

No one "ideal reader" exists for this book. Yes, the material here is targeted at the beginning to the intermediate user of Apache, but enough information is contained in the book that almost anyone should be able to find it useful. The absolute beginner with both Unix and Apache can find help in working with a new operating system, as well as with the server software, while the more experienced administrator might find Part IV or V the most useful. Regardless of your level of experience, this book should be of help to you. That said, I did make some assumptions about you, the reader, as I wrote:

  • You have more than an academic interest in running a Web server, whether you want to serve a personal noncommercial site from a home network or you're involved in administering an extensive and high-profile Web site at your workplace;
  • You have access to an always-on, high-speed Internet connection and a Pentium-level computer with sufficient RAM (or the equivalent Macintosh set-up);
  • You have, or are willing to install, a Unix variant as your operating system;
  • You know, or are willing to learn, the rudiments of working on a Unix machine;
I also assume a majority of readers are working in a professional and technological job where Web administration is either part of your job description or something you might do in the near future. While there's certainly no reason why an individual cannot run a Web site-and, in fact, this is increasingly common-most sites on the Web are served by commercial entities and not by individuals.

NOTE: Although a vast number of personal pales and sites are on the Web, most of those pages aren't hosted by the individual who owns them. Instead, the pages are served by the individual's ISP, a third-party Web hosting service, or one of the free Web hosts, like Yahoo!Geocities or Angelfire. In that sense, the pages are hosted by a commercial entity and not by an individual.

Thus, I made an additional set of assumptions about readers who have a professional interest in running Apache:

  • You manage computer resources for a nontechnical user base.
  • You have access to multiple machines on an internal network.
  • You serve (or plan to serve) a Web site that's critical to your company's work.
  • You might not have the ultimate authority over the files served from the site.
  • You have root access to the machine that hosts the Web server.
Regardless of who you are or what you do, you can run a Web server. Special considerations will probably exist for your situation, no matter who you are, but the basic practices are the same in most scenarios. Every Web administrator should be concerned with security and with basic or advanced administrative topics. The difference between a personal site and a commercial site is simply a matter of degree and of ancillary software.

HOW TO USE THE BOOK

Books are easy to use. Just pick one up and start reading! In the case of technical books, though, some additional information can help before you get too deeply into the subject matter. As with most technical books, this one uses a certain set of conventions to indicate particular kinds of information:

A word in italics is a new or important word, which is usually defined within the next sentence. So, you might see a sentence like this: "Installing Apache on a Unix machine requires that you have root access. The root account is the administrative account under Unix, and has special privileges that normal user accounts don't have, such as installing and running server programs." Many of the italicized terms in this book are also found in Appendix C, the glossary.

URLs are shown in boldface. Be aware, though, many of the URLs in this book are fanciful and don't refer to real sites. They're used as examples. However, a number of URLs throughout the book point you to useful sites or extra information that can help you with running Apache.

Words or phrases in the Courier font are direct Unix commands, file or directory names, or full directory paths. Some Courier text is set off from the text paragraphs surrounding it, as in

Code is often shown in this format.

Lengthy text files or bits of code are usually set off like this.

Some text is printed inside a box with a shaded edge, hich is called a sidebar. These sidebars contain information that adds to the chapter, buthat didn't flow neatly with the main chapter text. A sidebar might explain a deeper technology topic or provide some background on a particular Apache function. You can also see special paragraphs in the text t labeled Tip, Note, or Caution:

    TIP: A Tip is an extra bit of information that might interest you. Tips might contain links to more specialized Web pages, a piece of Unix or Apache history, or some other item that's interesting, though not critical to running Apache.

    NOTE: A Note is something you should know before you begin working with the subject under discussion. Notes might be configuration details, additional commands, or other information to enhance your understanding of the topic.

    CAUTION: Relatively few Cautions are in this book, but pay attention to the ones you find. A caution is a warning, whether about Apache itself, the Web, or some function on your Unix machine. Read the cautions carefully, so you can avoid the pitfalls they describe.

From the B&N Reads Blog

Customer Reviews