Home|Cudeso|Linux|*NIX / BSD|Misc|
 

Webalizer

Contents

1. Introduction

2. Useful resources

3. Getting the package

4. Install

5. Configuration

6. DNS-lookup

7. Cron-job and logrotate

8. Squid

9. Misc





1. Introduction

This document is not intended as a "total" guide for administring the Webalizer-package. Neither is it intended to be without errors. These are just the experiences I had when setting up Webalizer.

First of all, what is Webalizer?

I quote Freshmeat.net
The Webalizer is a Web server log analysis program. It is designed to scan web server log files in various formats and produce usage statistics in HTML format for viewing through a browser. It supports wu-ftpd xferlog-formatted logs.
I was looking for a good tool that could analyze my Apache-files. I was using some 'basic' counters like NedStat or XtremeTracker but they were a bit unhandy because you always needed to browse to a web-page. Furthermore, I was also searching for a tool that could do some statistics on my Squid-logs. And there it was...nice and shiny...and free.

2. Useful resources

A lot of the this material is "collected" together from various other resources.
For a detailed explanation of how Webalizer works I would strongly suggest you visit these pages and read through them. Pick out the things you need and you'll see that putting it all together isn't as hard as it could seem at first sight.

Homepage of Webalizer - http://www.mrunix.net/webalizer/
Dandy's Personal Site - http://www.geocities.com/danilody/
CyberMall - http://cybrmall.net/Index.mv?parm_func=webalizer

3. Getting the package

Off course, before you could setup Webtools it could come in handy that you download the package. So, point your browser to http://www.mrunix.net/webalizer/download.html and get the latest release. One thing you should make sure before installing the package is that you have the GD Graphics Library (1.7.3 or later) installed. If not, you can download it from http://www.boutell.com/gd/. Installation of this package is straightforward. Untar the 'thing' and run the make-scripts
tar zxvf gd-package.tar.gz
make
make install

4. Install

First step is to untar the package.
tar zxvf webalizer-package.tar.gz
When no errors occured there should now be a directory with the source-files. Navigate to this directory
cd webalizer-package
I installed the webalizer-package with the DNS-lookup enabled. This feature gives you the chance to look-up from which country (or domain) the different visitors are coming. In order to use DNS-lookup you need to enable it when running the configure script.
./configure --enable-dns
When no major errors occur you can now start making the necessary files.
make
make install
After this, there is a webalizer in /usr/local/bin and a webazolver also in /usr/local/bin.

A note : if you're having troubles compiling the webalizer (if you're receiving the error-messages : dns_resolv.c:149: too few arguments to function , you can try with the option : --with-db=/usr/include/db1 )

5. Configuration

By default, webalizer stores a configuration file in /etc/webalizer.conf. I'm not very fond of it. I agree that all configuration files belong in the /etc dir but for the webalizer, things are running a bit odd. By default when you run webalizer without special settings, it will look for a config file in the current directory AND in the /etc directory. So this could mess things up a bit. I prefer to install the configuration files together with the configuration files of Apache (to my humble opinion, that's where they belong). So, I have moved the file to /etc/httpd/conf/ by running
mv /etc/webalizer.conf /etc/httpd/conf/
When you're running only one site things are easy. But when you've got several virtual domains on your server it would be best to create a config-file for each site. You could create it now but maybe it's best to wait with it a little while. When you adjust the global settings now in one file, you can easily copy this file later for the other virtual domains (and save some valuable time).

To get all details out of webalizer, you need to adjust some settings in the httpd.conf file (this is the configuration file for Apache). As a global setting or as a setting for each virtual domain you should add this line
CustomLog /var/log/httpd/mydomain_access_log combined
Virtual domains make sure you can host several sites on one server. This is handled by the VirtualHost settings. With this 'clause' you can specify specific settings for the different hosts. For example you can specify different documentroots for the seperate hosts. By using the above tag (CustomLog) you can define a seperate access-logfile for every host. For the error-logs I'm used to use only one log (this makes it easy to check for global errors, allthough I can come up with several reasons why you should use a seperate log file for these also).
For these effects to take place you should restart the http-daemon! Try accessing some of your sites and check wether the logs are filled.

Open the webalizer configuration file with your preferred editor. The first thing you need to change is the log-file webalizer should check. This can be done with the LogFile directive.
LogFile /var/log/httpd/mysite_access_log
Next there's the log format with LogType. I'll be covering this later on. For regular website-logs, you could use clf (which is the default).

Furtheron you should specify an output dir. When you want to consult the reports through the web, you could place them in a secured area. Make sure this directory exists (webalizer will not make any directories).
OutputDir /home/http/html/secured/webalizer/mysite
When you're hosting several domains it could be important to have the hostname (domain-name) printed on top of the report. This can be handled by
HostName mydomain.com
By default, webalizer gives a lot of output. This is useful when run from the command-line but could also cause a lot of bogus-mail when it's run from a cron-job. By using the directive
Quiet yes
there will be no output except when errors occur.

When you're really sure that everything works fine all the time (yeah...right), you could use
ReallyQuiet Yes
and have no output at all (even error-messages are supressed).

Most of the time the most visits and referrals will be from your site so it would be best to hide these. Hiding them doesn't meen they are not counted; they are merely not displayed in the top-rankings.
HideSite *mysite.com
HideReferrer mysite.com
You can have sites that have visited you grouped by using
GroupSite *.mygroup.com
One feature I like is to have grouped domains displayed in bold and shadowed in the top-rankings.
GroupDomains 1
GroupShading yes
GroupHighlight yes
This give you a nice visualisation of the top-sites that are visting your host.

Most of the other settings speak for theirselves. The comment that's already included in the configuration file should be enough to have you started.

6. DNS-lookup

Like I've mentioned before, it's possible to let webalizer do a reverse DNS-lookup (remeber you should have enabled DNS-lookup at compilation).

For this to work, webalizer needs to create a ns-database. This can be done when processing the log-files or as a stand-alone process. I've configured it in a way that it is run when webalizer is processing the logfiles.

When you want to run it as a seperate process, you should use webazolver. This in fact is a symlink to webalizer with some extra settings. When you run it as a seperate process, you need to provide a dns-database and a logfile to process.
webazolver -D dns_cache.db /var/log/httpd/access.log
The above command starts webazolver and lets it store all the dns-lookups from /var/log/httpd/access.log in the file dns_cache.db.

Another way to have ip's resolved is by having it run when processing a logfile (with webalizer). For this to function, you have to modify two settings in the configuration file. The first is the database file webalizer needs to use.
DNSCache /var/log/httpd/dns_cache.db
Secondly, you need to provide how many child-processes are allowed to do the lookup. This is handled by
DNSChildren 5
When you set DNSChildren to 0 no lookup will take place (even if you have provided a DNSCache-file). The number of children processes may vary between 1 and 100. But keep in mind that when setting it to 100 this could cause a huge overload on system-resources, so beware. The default setting is 5.

7. Cron-job and logrotate

I've set up webalizer so that it is run every hour as a cron-job. Because I'm managing several sites, I've created a small script where the different log-files are processed by webalizer.
example of cron.webalizer

echo `date +%d-%m-%Y' '%H:%M` ":Webalizer run" >> /var/log/httpd/webalizer.log
/usr/local/bin/webalizer -c /etc/httpd/conf/webalizer/site1.conf >> /var/log/httpd/webalizer.log
/usr/local/bin/webalizer -c /etc/httpd/conf/webalizer/site2.conf >> /var/log/httpd/webalizer.log
/usr/local/bin/webalizer -c /etc/httpd/conf/webalizer/site3.conf >> /var/log/httpd/webalizer.log
This script is called from /etc/crontab with this setting
01 * * * * web-user /etc/cron.custom/cron.webalizer
When you have created seperate log-files for your access-logs, it's important to get them rotated (just like the 'regular' access-file). The rotation of logfiles is handled by logrotate. Normally, in /etc/logrotate.d/ there should be a file apache with the settings for rotating the logs. Just copy the existing access-log setting and change it to meet your logfile-name-convention. Don't forget to add an entry for the logfile of webalizer also (/var/log/httpd/webalizer.log). To check wether you've used the right syntax you can fire up logrotate by entering
/usr/sbin/logrotate -f /etc/logrotate.conf
There's one other thing that's important when you're rotating the logs. By default Webalizer will only analyze the active log (specified in the config-file). When you're rotating your logs, this means that everytime the analisis is run, you will only have statistics for the latest log. That's not very nice is it? To fix this, you can have webalizer do an incremental processing. But beware, you need to set this option before running webalizer. Changing it for an existing report will result in data that's not fully reliable. There are two settings involved.
Incremental yes
IncrementalName /var/log/httpd/mydomain.current
The first one starts the incremental process. The second one gives the name where the temorary results will be stored.

8. Squid

You can have webalizer also analize your squid logs. This is great to have a global view of which sites you have visited most. The only thing that needs to be changed is the logfile type.
LogType squid
That's all there is!

9. Misc

I was experiencing some trouble starting the webazolver on a freshly installed Red Hat 7.2. After a while I found out that the problem was located in the db3 database-system provided by Berkeley (on which webalizer counts for storing the resolved names). When you're having trouble compiling webalizer with dns-support first open up the file /etc/ld.so.conf and add a line like /usr/include/db3 (or wherever the file db_185.h is located...but beware, normally when you have installed db1 and db2 a copy of this file could also exist in the db1 and db2 directory but it NEEDS to point to the db3 directory). Next run
ldconfig -v
Now start configuring webalizer with new directives (check to see if you have deleted the file config.cache).
./configure --enable-dns --with-db --with-dblib
make
make install
Copyleft 2002-2007 - cudeso.bewebmaster@cudeso.betop