Skip to Main Content

Digital Humanities - Introduction: Website Archiving

Introducing Digital Humanities methods, practices and support at Exeter

Why do websites need archiving?

As web technologies are continually developing, websites require an increasing amount of time and energy to maintain them and keep them running securely as they get older. Old websites that are not maintained can become unusable or hazardous to users, editors or the host institution.
Archiving older websites ensures that their contents and interfaces can be kept accessible in the long-term, and their data is stored safely for future use.
A limited time frame for keeping a site live (usually 5 years), followed by an appropriate archiving process, is a standard part of data management for projects funded by public grants. Our website archiving processes follow guidance from the UK Web Archive and National Archives.

How can I view the archived version of a website?

Our archived websites are available in two different versions:

Wayback Machine snapshot

At the point of archiving, the website is submitted to the Wayback Machine, which uses its own capture method to create a snapshot of the website. These snapshots are stored within the Wayback Machine, and can be viewed online using the built-in viewer. Wayback Machine snapshots are easy to view quickly, but cannot be downloaded for offline viewing. As these snapshots are not created by the Digital Humanities Team, we cannot control the capture process, and it is possible that elements may be missing from the snapshot.

Full website archive (available from various repositories)

This is a captured version of the website which has been created and checked by the Digital Humanities Team and uploaded to various repositories for secure storage. These archived websites are usually stored as a single .warc or .wacz file, and are available for download from Open Research Exeter (the University of Exeter's institutional repository), Github and Zenodo.
To view these files, you will need to use a free viewer software such as Replayweb.page, which will allow you to browse the archived version like the original website. 

Is the archived copy an exact copy of the original website?

The archived versions capture as much of the content, structure and appearance of the original live website as possible. However, the processes and tools available for copying sites are not always able to capture the full content of each website. Search features and interactive elements such as embedded maps are particularly difficult to capture and may not be available in archived versions.

The archived versions available through the Wayback Machine are captured using different tools to those created by the Digital Humanities team, which may result in slightly different versions.

Where specific content or data could not be captured as part of the site, it is sometimes possible to archive it as a dataset alongside the website files. If these are available, they will be included in the ORE, Zenodo, and/or Git repositories alongside the web archive file.

How long will the archived versions be available?

We have deposited archived copies in our institutional and trusted third party repositories. Please see their individual retention policies for details:

Contact us

For more help, feel free to contact the Digital Humanities team.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

Contact Us