Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Digital Humanities - Introduction: TEI Text Encoding

Introducing Digital Humanities methods, practices and support at Exeter

What are XML and TEI?

XML stands for eXtensible Markup Language. A markup language is a type of computer code in which tags are used to store information on other texts. It doesn’t actually do anything in itself; rather, it is descriptive. XML is designed to record information about what something means, rather than just how it appears (which is what HTML does).

 

TEI stands for the Text Encoding Initiative. The TEI website states that the project exists to create "a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics." (https://tei-c.org/)  It is a set of standards and guidelines that are applied to XML to make texts uniform in their structure, meaning they can be complied and used with other texts. The type of data it encodes describes the features of texts, such as the content and the physical properties objects they’re written on.

 

TEI and XML are, therefore, intrinsically related as TEI is a set of guidelines to write XML in. It is possibly best to think of TEI as a dialect of XML, where the language is given a number of specific rules and meanings as a result of being used in a specific context.

TEI is a standard form of encoding texts that is used across the study of Humanities online. You have probably come into contact with it without even knowing! As it is a standard, it is used by encoders across multiple fields allowing for the data given to texts to be read by all. As a result of this, TEI is used as the basis for many search functions within databases you use all the time such as JSTOR, EEBO and our own library databases. Through a knowledge of how TEI functions and gives texts meanings, you will be able to use enhanced search functionalities allowing you to search smart and more effectively. 

 

It is important to remember that TEI is also a form of editing. Choosing what information to encode and what to leave out will change the way that people can interact with texts. Sometimes students fall into the trap of just using the first online source that they find. Understanding TEI will give you the tools to evaluate sources and make sure that texts are used appropriately and critically. 

 

To see how TEI and XML are integrated into search functions are integrated into databases, click on these links to their websites:

 

JSTOR Logo                                EEBO Logo            

Projects that utilise TEI

Famine and Dearth Archive

Famine and Dearth in India and Britain was created at the University of Exeter and  uses TEI to study food security in early modern India and Britain through hundreds of source texts in several languages, and to explore one traveller’s vivid and moving descriptions of his experiences.

Cotton Famine Archive

Poetry of the Lancashire Cotton Famine is a project created here at the University of Exeter that uses TEI to create a searchable database of political poems from the Lancashire Cotton Famine, so that we can research the people, places and themes that were important to these poets.

Dawin Correspondance

The Darwin Correspondence Project is run by the University of Cambridge uses TEI to allow readers to study the letters that Charles Darwin wrote, to find out more about his work and interests and about the people he corresponded with.

undefined

 

Jane Austen’s Fiction Manuscripts project is a collaboration between many groups using TEI to extract details of the author’s complex and often very messy original manuscripts of her novels, allowing us to explore her writing process.

Where can I go to find out more?

This course only goes through the very basics of TEI. If you are looking to get more involved here are some resources and projects that you might like to consider:

 

A Gentle Introduction to TEI

undefined

If you are looking to explore more into TEI, this guide is a comprehensive explanation of every aspect of encoding in TEI, created by the TEI Consortium. It is a dense read, but necessary and interesting if you are looking to explore the range of applications for TEI in an academic context or if you are going to be encoding something for a project.

 

TEI By Example

TEI By Example Logo

TEI By Example is a great website, that allows you to build on the basic skills you have learnt here, and apply them to your field of interest. There are sections that focus on poetry, plays and prose, as well as explorations into the mechanics and process of editing, and the importance of the TEI header.

 

Transcribe Bentham:

undefined 

The Transcribe Bentham project seeks out volunteer coders (such as yourself) to transcribe and encode the works of the late philosopher. This is a great opportunity to use your newly gained skills in a research environment! We recommend spending some time familiarising yourself with the guidelines, and having them up next to you as you encode to make sure you don't miss anything. All of the work will be checked by a professional so it is a great environment to develop your skills and participate in active research.

 

GitHub TEI Guidelines

undefined

The GitHub TEI forums are a great place to keep up to date with current discussions surrounding TEI, and can be used as a resource for learning more about the community and process of creation of the guidelines. The TEI community is very friendly and approachable so spend time exploring threads and feel free to ask questions if you have questions that haven't been asked before. 

TEI's Basic Structure

To explain how to use TEI it is important to explain how it is created. A TEI document is comprised of a number of elements. These elements are containers for information, to break down a text into its constituant parts and label them. This will give new data to the text, allowing for computers to 'understand' things that may be obvious to human readers. 

Thse elements include, but are not limted to:

Book 

Poem

Page

Paragraph

Sentence

Location

Name

Punctuation

 

These elements can then be given attributes. Attributes give a specific meaning to an element, for example a specific page number, a chapter name, or the name of a person or location.

 

By giving computers the ability to understand these destinctions within a text, and give it specific knowledge it becomes possible to search a text for locations, people or page numbers, or for a specific location, person or page number. In doing this, however, it is vital to make sure that elements don't overlap with eachother, and are contained seperately within other elements to allow computers to read the texts properly.

 

There are several ways to describe this structure most refer to it as a 'nesting structure'. Perhaps the most helpful way is to think of the structure is to imagine elements creating a series of consecutive boxes or even Russian Dolls - one feature, must fit inside another, and so on until the entire text is contained within one (very large) box. The same is true with elements. You cannot have a box which starts in one box and ends in another, but you can have several boxes inside the same box (as long as none of them are tangled up). You can also have a specific box, with something written on the outside. This is effectively what an atribute does.

Does TEI XML have any rules?

Yes, just a few. Here they are:

  • All open (non-empty) elements must close within their parent element
  • A single root element must contain the whole of the rest of the document
  • No overlapping elements are allowed
  • Certain characters in the content must be ‘escaped’, as they form part of XML syntax. To use the characters < > and & you must write them as such:

 

< becomes &lt;

> becomes &gt; 

& becomes &am;

 

For example, if you were to have a text with an ampersand in it, you would need to escape it for it not to be misinterpreted by the computer:

Tate &amp; Lyle ✔

Tate & Lyle ❌

 

These rules will make more sense as you use them but they are vital for you to learn in order to be able to encode your texts. If you follow these rules you will be able to create a well-formed XML document.

 

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

Contact Us