Sitemapper Version 1.008
========================

Description
-----------

sitemapper.pl is a simple perl script which generated an HTML site map from a
given URL. It does this by traversing the site, getting the home page,
extracting links from it, getting all the pages linked, and so on. 

The default sitemap generated is an HTML bulleted list. The first level
indented list item is the home page; the next level are all the pages linked
from the home page. The next level are all the pages linked from each of these
pages, and so on. If a page is linked from more than one page, it is show in
the "highest" place in the tree it is linked from.

Alternative sitemap formats are:

    * a dynamic HTML version (see below) which generates a collapsable folding
      tree.

    * a text version, which generates a simple formated text file

    * an XML graph version, which prints out all the URLs and links in the site
      in an XML format

sitemapper.pl should correctly deal with framesets, client side image maps, and
<BASE> tags. It ignores all "off site" links - i.e. all absolute URLs that do
not start with the original "base" URL of the home page.

Modules
-------

sitemapper.pl includes two modules that it requires in its distribution:

WWW::Sitemap
LWP::AuthenAgent

WWW::Sitemap is the module that is used to generate the sitemap structure from
which the various output formats are generated. The interface provides access
to list of URLs for a site, and links from each of these URLs. It also supports
a traverse method, which allows the caller to specify a callback, so that other
formats of sitemap can be generated, or other sitemap related functionality
implemented. See the documentation of this module for more details.

LWP::AuthenAgent is a simple subclass of the LWP::UserAgent module, which
allows requests to be made for URLs that require autentication, by requiring
the user to type the username / password information for the relevant realm.
This information is stored in the LWP::AuthenAgent object, so that repeated
requests to the same realm can be made without re-typing the authenication
details (a bit like a web browser, in fact). tty echo is switched off for the
password.

Installation
------------

Just the basic Makefile.PL stuff; i.e.:

> perl Makefile.PL
> make
> make test
> make install

Usage
-----

To use sitemapper.pl, just type:

./sitemapper.pl -url http://www.mysite.com/

to get output to stdout, or

./sitemapper.pl -url http://www.mysite.com/ -output mysitemap.html

to output to a file. Type

./sitemapper.pl -help 

to get full usage instructions, or

.sitemapper.pl -doc

to output the pod documentation

Examples
--------

example.html contains an example of sitemapper.pl output, for the Canon Research
Europe Ltd Perl Pages (http://www.cre.canon.co.uk/perl/); i.e. by running:

./sitemapper.pl -o example.html -url http://www.cre.canon.co.uk/

example.js.html contains an example of a dynamic HMTL version of the site map
for the CRE site. This is generated using Jef Pearlman's (jef@mit.edu)
javascript Tree class.

http://developer.netscape.com/docs/examples/dynhtml/tree.html

Many thanks to Jef for allowing this to be distributed with sitemapper.pl! This
is generated by running:

./sitemapper.pl -o example.js.html -url http://www.cre.canon.co.uk/ -format js

exampl.xml contains the output from:

./sitemapper.pl -o example.xml -url http://www.cre.canon.co.uk/ -format xml

The XML format for this file is pretty ad hoc - probably not of interest to
anyone apart from me!

Finally, a plain text version can be generated using the -format text
option; for example:

./sitemapper.pl -o example.txt -url http://www.cre.canon.co.uk/ -format text

CPAN Modules
------------

sitemapper.pl uses the following CPAN modules, that need to be installed before
it will work:

WWW::Robot
HTML::Summary
Digest::MD5
Date::Format
Getopt::Long
HTML::Entities
IO::File
LWP::UserAgent
URI::URL
Term::ReadKey

See http://www.perl.com/CPAN/ for details of how to download / install these
modules.

Bugs
----

Please send any bugs / comments / suggestions to Ave.Wrigley@itn.co.uk