Audience Dialogue

Finding dead links

We try to check all the links on our pages every month, but web page addresses change at an alarming rate. Of the 400-odd external links on this site, about 50 change in the average month. If you clicked on one of the links above and got a "404" error message, saying that the page couldn't be found, there are three possible reasons:

Here's how to detect the problem, and (usually) fix it... The server for that page is temporarily down, or overloaded

Solution: try again later. If it's the middle of the night at the server location, the site may be down for maintenance. Try again a few hours later - then again - then again...

The page has disappeared permanently from the Web

That page - maybe even the whole site - has disappeared permanently from the Web. However, it's rare for a page to disappear forever, except when:
(1) a whole site disappears, and its pages usually disappear with it, or
(2) a page is overwritten (e.g. "this month's news from XYZ") - when you wanted last month's news. In this case, the link works, but the wording is totally different from what you wanted.
When a site or a page totally disappears, you can try viewing it in the Google cache, if it's been gone less than a month or so. If it's been longer, you'll need to visit the Wayback Machine from the Internet Archive.
Most of the time, though, sites stay put, and pages stay too, but change their address..

The URL of the page has changed

Maybe the URL (web address) of the page has changed. A well-behaved site would keep the old URL, and redirect you to the new one (as we try to do). But many sites are not so user-friendly (try to pat own back - strain a muscle).

The URL in the link was wrong

Web sites have a lot of errors, and it's surprisingly easy to copy a URL wrongly. Some software makes it very easy to omit the first or last letter when making a selection.

Solutions

These solutions are not necessarily in the order of the problems listed above...

  1. Check the URL closely
    It might seem at first that URLs are a jumble of letters and numbers, but look closely and you begin to see a structure. If the URL was wrongly given on the linking page, this might be because the http://was missing at the beginning or the URL, or was typed imperfectly - every character must be exactly like that. Also, a URL rarely contains spaces. A space character is shown in a URL as %20. If a URL doesn't work and you see a %20 in the middle of it, try removing that %20. Other common mistakes include putting / where a full stop should be (and vice versa), and omitting the last character or few (most HTML pages end .html or .htm, and don't vary within one website). Perhaps the commonest problem of all is spelling - web developers are not famous for their sepling abiliyt (atcually, we blame our typign, not our spellign) so if the URL contains a word that looks misspelt, try correcting the spelling. It often works.
  2. Check the site that the page belongs to
    Does the website itself still exist? To check that, try to visit its home page (the 404 message usually offers an option for this) and look for the missing page. If the error message doesn't show the URL of the home page, you can work it out easily enough by copying everything after the http:// to the first single slash into the address bar. So if the missing page is www.abc.com/de/fg/hij.html, go to www.abc.com and look for a sitemap or search page there.
    Another trick is to go up one directory level, because a common reason for pages moving is a reshuffle of directories. If, for example, the fg directory at abc.com got too big, the webmaster might create a subdirectory, and put hij.html into that. So the new URL wuld be something like www.abc.com/de/fg/xyz/hij.html. You probably won't be able to gess the new directory name, so see if you can go to www.abc.com/de/fg/ and with luck you'll see a sub-index. Without luck, you won't see anything - but it's worth a try. If that fails, go up another level and check out www.abc.com/de/ and you might see an index there. But the more levels you have to go up, the harder it gets to find the file.
    If all that fails, you could email the site, and ask where the page has gone (though we've found that many sites can't answer such requests - e.g. "only the webmaster would know, and he's gone mountain-climbing in Afghanistan").
  3. Use a search engine
    Or go to a large search engine. Google is currently the biggest for sites in English,and Alltheweb for other European languages. Type in what seem to be the rarest words in the link to at page (ignoring "the", "and", and so on) and you'll usually find the missing page, with a new URL, or moved to another directory. (If you don't know the rarity of words, check our word frequency list - anything not on it is either rare or new.)
    However, because the search engines only crawl the web about once a month, the new URL may not yet have registered with the search engine. That's where Google's cache is useful - for most pages you can click on the word "Cached" to see the version that Google had stored last time it checked the page. Otherwise, you'll have to wait a month or so - either for us to update the link, or the search engine to cache it.
  4. Ask an expert searcher
    If both the above options fail, there's still a lot that can be done, but it depends on the type of page you're looking for. Some large libraries now offer internet searching services, and there are some commercial searching services available. A highly skilled searcher can often find in five minutes what will take an average searcher (you?) an hour - if you find it at all. Some of these can be found in the Open Directory category of Computers > Internet > Consultants > Research
  5. Save the page
    If you sense that a web page is really important for you, and may vanish, why not save it? The problem is, if you save it with common browser software, you either get a proprietary file format for the whole page, or the text in one file and all the images in another. And then you don't know what the original URL was. My solution: save the web page as a PDF file. It looks the same as if it were printed, with the URL (if not too long) and the date visible in the header or footer. Saving it solves half the problem, but the other half is finding it again. I found it helps to save each page with the original page title - as it appears in the title bar at the top - for this page, it would be How to find dead links on the Web, which you can probably see right now at the top of this window. If you save all the web pages you want most in a separate folder, yo can use software such as Copernic or Sherlock (Mac - built in) to find the page, based on any text in it.
We looked for some detailed pages on how to find lost links. There were hundreds of pages aimed at web developers, but very few for ordinary users. The best we found was First Aid for Broken Links by J R Mooneyham, but it's not much more detailed than this.

If you're desperately trying to find a lost link on this site, contact us, and we'll try to repair it. Please let us know the the page title or URL that containts the dead link, and the text (underlined in blue) for the dead link.