Skip to main content

Web archives: Home

How to find webpages and whole websites that have disappeared, and how to save a copy of a webpage or whole website

Can't find what you're looking for?

Links break all the time.  People reorganize websites or simply remove something they've put online.

Below are some suggestions for how to get around this persistent problem.

Finding webpages and whole websites that have disappeared, or finding old versions of them

If you follow a link and don't find what you expected, the first thing to do is search on that particular website to see if content is now located somewhere else.  Many websites have a search function (look for a text box or magnifying glass icon) and try searching that way.  Alternatively, use the site: operator on Google or other search engines to restrict your search to a particular website. For example:

library hours site:unt.edu

will search all webpages whose URL ends in "unt.edu" that contain both the words "library" and "hours" (that is, searching for those words anywhere on the UNT website).

If that doesn't work, you might need to find a copy in a web archive. There are a number of organizations that crawl the web and save copies of websites. You can use Time Travel to search many of these web archives, including the Internet Archive's Wayback Machine (the most well known), all at once, looking for "mementos" (prior versions of webpages).

There are some additional web archives for particular types of websites:

  • If you are interested in content from the website of a US federal government agency (many of which have addresses ending in .gov or .mil), use the End of Term Web Archive, which contains copies of US government websites captured at the end of each recent presidential term, to search and browse just these websites.
  • To access archives created by Texas state agencies, and to search across them, use TRAIL.
  • If you are interested in old versions of UNT websites:

Note that web archiving works by following links that appear on pages; therefore, websites with search forms that you need to use in order to reach documents on the site are generally poorly captured.

Saving a copy of a webpage or whole website

  • WebCite is a membership-supported organization that allows an author or editor to take a snapshot of a Web resource cited within an article and cite that snapshot.
  • Perma.cc is similar but dedicated just to legal literature
  • Webrecorder captures your interactions with web pages and lets you save privately or publicly.
  • You can have the Internet Archive's Wayback Machine take a snapshot on demand (see instructions)

If you are not able to capture a webpage using one of these tools, or if you need to make a private, portable copy, consider using a tool such as FireShot to capture a webpage as a PDF or image file.

Archiving webpages or social media

The UNT Libraries can crawl the Web to collect news stories, social media posts, or other webpages related to certain topics, gathering the data for study by researchers. (For example, see a “Yes All Women” Twitter Dataset.) To request creation of a Web dataset, or if you have any questions about web archiving activities at UNT Libraries, contact Mark Phillips.

Perhaps, though, you don't need to build your own.  Browse or search openly shared web-archive datasets on the Archive-It website.

Alternatively, you can scrape the web on your own using a tool such as Web Scraper.

Analyzing your data

Sharing your data

While copyright, licensing restrictions, or IRB restrictions will likely prevent you from sharing the collection of documents that you study, you can still share your list of sources, codebook, scripts, and other data that would allow another researcher to replicate your findings. The UNT Libraries can help you: see our information on research data management.

Librarian

Kevin Hawkins's picture
Kevin Hawkins
Contact:
085 Willis Library
+1 940 565 2015
Website

Additional Links

top