Skip to Main Content

Web archives

How to find webpages and whole websites that have disappeared, and how to save a copy of a webpage or whole website

Can't find what you're looking for?

Links break all the time.  People reorganize websites or simply remove something they've put online.

Below are some suggestions for how to get around this persistent problem.

Finding webpages and whole websites that have disappeared, or finding old versions of them

If you follow a link and don't find what you expected, the first thing to do is search on that particular website to see if content is now located somewhere else.  Many websites have a search function (look for a text box or magnifying glass icon) and try searching that way.  Alternatively, use the site: operator on Google or other search engines to restrict your search to a particular website. For example:

library hours site:unt.edu

will search all webpages whose URL ends in "unt.edu" that contain both the words "library" and "hours" (that is, searching for those words anywhere on the UNT website).

If that doesn't work, you might need to find a copy in a web archive. There are a number of organizations that crawl the web and save copies of websites. You can use Time Travel to search many of these web archives, including the Internet Archive's Wayback Machine (the most well known), all at once, looking for "mementos" (prior versions of webpages).

There are some additional web archives for particular types of websites:

  • If you are interested in content from the website of a US federal government agency (many of which have addresses ending in .gov or .mil), use the End of Term Web Archive, which contains copies of US government websites captured at the end of each recent presidential term, to search and browse just these websites. But if you are interested in a federal agency or commission that reached the end of its charge or was significantly changed, or from before 2008, try the CyberCemetery.
  • To access archives created by Texas state agencies, and to search across them, use TRAIL.
  • If you are interested in old versions of UNT websites:

Note that web archiving works by following links that appear on pages; therefore, websites with search forms that you need to use in order to reach documents on the site are generally poorly captured.

Saving a copy of a webpage or whole website

  • WebCite is a membership-supported organization that allows an author or editor to take a snapshot of a Web resource cited within an article and cite that snapshot.
  • Perma.cc is similar but dedicated just to legal literature
  • Webrecorder captures your interactions with web pages and lets you save privately or publicly.
  • You can have the Internet Archive's Wayback Machine take a snapshot on demand (see instructions)

If you are not able to capture a webpage using one of these tools, or if you need to make a private, portable copy, consider using a tool such as FireShot to capture a webpage as a PDF or image file.

Building web or social-media archives (datasets of webpages or posts for study)

Twitter provides a special way to gain access to their archive of tweets for academic research.

YouTube has also begun offering API access through its YouTube Researcher Program

The UNT Libraries can crawl the Web to collect news stories, social media posts, or other webpages related to certain topics, gathering the data for study by researchers. (For example, see a “Yes All Women” Twitter Dataset.) To request creation of a Web dataset, or if you have any questions about web archiving activities at UNT Libraries, contact Mark Phillips.

Alternatively, you can scrape the web on your own using a tool such as Web Scraper.

Perhaps, though, you don't need to build your own.  Some have been created by others and made available:

Analyzing your data

Sharing your dataset

While copyright or licensing restrictions will likely prevent you from sharing the collection of documents that you study, you can still share your list of sources, codebook, scripts, and other data that would allow another researcher to replicate your findings. The UNT Libraries can help you: see our information on research data management.

Additional Links

top