Website Archiving - Digital Humanities - Introduction - LibGuides at University of Exeter

Why do websites need archiving?

Websites and online databases are outputs of many Digital Humanities projects, offering interactive engagement with data, as well as disseminating information about the project for users around the world. They are exciting and valuable resources for many users, but there are challenges to keeping them online long-term.

As web technologies are continually developing, websites require an increasing amount of time and resources to maintain them and keep them running securely as they get older. Old websites that are not maintained can become unusable, and even hazardous to users, editors or the host institution. Archiving older websites ensures that their contents and interfaces can be kept accessible in the long term, and their data is stored safely for future use.

The Digital Humanities team works hard to keep sites live and ensure they work as well as possible during the research project after the end of the project, to maximise the benefit for the project and users. It is a standard part of data management for projects funded by public grants that web resources will be kept online for a limited time frame beyond the end of the funded period (usually 5 years), followed by an appropriate archiving process.

Our website archiving processes follow guidance from the National Archives and UK Web Archive.

How can I view the archived version of a website?

Copies of our archived websites are available in one or more different versions. Links to all available copies will be shared on the website landing page.

Wayback Machine snapshot

At the point of archiving, the website is submitted to the Wayback Machine, which uses its own capture method to create a snapshot of the website. These snapshots are stored within the Wayback Machine, hosted by the Internet Archive, and can be viewed online using the built-in viewer. Wayback Machine snapshots are easy to view, but cannot be downloaded for offline viewing. As these snapshots are not created by the Digital Humanities Team, we cannot control the capture process, and it is possible that features or functionality may be missing from the snapshot.

Website snapshot (available from various repositories)

This is a captured copy of the website which has been created and checked by the Digital Humanities Team and uploaded to one or more data repositories for secure storage. These archived websites are usually stored as a single .warc or .wacz file, and are available for download from Open Research Exeter (the University of Exeter's institutional repository), Github and/or Zenodo. Information about the capture and completeness of the archived copy are included in the repository metadata.

To view these files, you will need to download them, then use a free viewer software such as Replayweb.page, which will allow you to browse the archived version in a similar way to the original website.

Is the archived copy an exact copy of the original website?

Archived copies contain as much of the content, structure and appearance of the original live website as possible. However, the processes and tools available for copying sites are not always able to capture the full content of each website. Search features and interactive elements such as embedded maps are particularly difficult to capture using current archiving tools and may not be functional in all archived copies.

Snapshots created using the Wayback Machine may contain slightly different content to those captured by the Digital Humanities team, due to the different archiving tools used.

Some archive viewer software such as ReplayWeb.page includes a basic search function, which may help with navigating archived web files where the original search is no longer working.

Where specific content or data could not be captured as part of the site, it is sometimes possible to archive it as a dataset alongside the website files. If these are available, they will be included in the ORE, Zenodo, and/or Git repositories alongside the web archive file.

How long will the archived versions be available?

We have deposited archived copies in our institutional and trusted third party repositories. Please see their individual retention policies for details:

Digital Humanities - Introduction: Website Archiving