Offline Browsing - Browsing your hard disk

Making webpages available offline - How to save websites to your hard disk for later viewing

Why browse your hard disk

There is a lot of good stuff on the Internet. The trouble is I don’t want to spend all my online time browsing and reading. Have you ever found yourself asking “Why can’t I just download what I want and look at it in my own good time?” Well, you can. Here are some suggestions for offline browsing.

Use your browser

Modern browsers provide good support for offline browsing. Internet Explorer offers two options - webpage saving and synchronize.

To save a webpage select File-> Save As on the menu, and IE offers to save “Web page, complete” as the default file type. This will save the current webpage and all of it’s requisite parts to your hard disk.

Easy, but what if the current page is page 1 of a sequence? If it’s more than a few pages, it will get a bit tedious to save each page individually. This is where synchronize comes in.

The easiest way to use Synchronize is to tag existing Favorites for offline browsing. On the menu choose Favorites-> Organize Favorites. Click each favorite you want to view offline and click the “Make available offline” check box. Click the Properties button to set how much content to download. Now before you log-off you can click Tools-> Synchronize and the selected favorites will be updated for offline viewing.

Netscape also provides a facility called Netcaster that also provides offline browsing.

Use a batch download utility

The main problem with the offline browsing features of the browsers is that you have little control over where the downloaded material is stored, so it’s almost impossible to share your downloaded webpages with other computers. I want to be able to download pages for viewing and printing on my desktop PC and also sit in the beanbag and browse them on my notebook.

Enter the download utilities!

There are a stack of freeware and shareware batch downloaders available. These tools allow you to replicate a selected part of a website on your hard drive, which you can then browse offline.

I tried a few and found wget from GNU to be the most reliable. wget is a freely available from http://ftp.gnu.org/pub/gnu/wget/ or if Windows is your flavour try Heiko Herold’s windows wget spot. Note there is longer a Mac OS X version on the Apple download site, but I have version 1.8.1 (157k) here or try Fink for a more recent version.

Unzip this package and it’s ready to run.

wget is a command line utility with an enormous range of command line options. Make no mistake - wget is a poweruser’s tool. But that doesn’t mean ordinary plonkers like me can’t make good use of it!

Although it’s possible to setup a configuration file to save your commonly used options (being a plonker) I find it just as easy make a script file that calls wget with the options I want. I call mine webget, and it looks like this:

#!/bin/sh

wget -x -r -nc -k -np %1

(Windows users could just as easily skip the first line and call it webget.bat.)

What these options mean

-x reproduces the complete path (including the URL) of the file on your drive

-r “recursive download” - follows links on the current page (and subsequently downloaded pages) to a default depth of 5 links

-nc “no clobber” - don’t download a file if you already have it

-k convert hard links to relative links, making offline browsing reliable

-np “no parent” - do not download any links from above the current directory

With these settings, a webpage and other pages in the same directory are downloaded into a replica of the website on your computer.

An Example

Put the wget binary file and your webget script in your path, and you can retrieve a back issue of PC Update by typing the following in a DOS window:

webget http://www.melbpc.org.au/pcupdate/2112/index.htm

This will download index.htm and all of the other articles available in the 2112 directory to your computer, storing them in the directory www.melbpc.org.au/pcupdate/2112/.

But other files from www.melbpc.org.au/pcupdate/ or even www.melbpc.org.au will not be downloaded.

The good think about this about this approach is that I control where the webpages are downloaded to, so I can move them onto another machine if I want to. Also the URL and path of the original file are recorded in the file structure, so I can easily return to the live website for an update or to view other related material.

Look before you leap (to downloading)

Many friendly webmasters provide a zipped copy of their website to download for local viewing. This will always be a faster download than using an automated website download.

Cautions

Take care when using any of these automated download techniques, as they can quickly get out of hand - downloading megabytes of unwanted material to your computer.

If you use wget (which I hope you do) read the documentation before changing any of the options shown here.

Otherwise, if you batch download using a browser, check the download settings.

Always make your download settings conservative. You can always go back again later if you didn’t get everything you want.

First published: PC Update June 2002 (online version updated)