Skip to Main Content

Digital Humanities - Introduction: Working with Data

Introducing Digital Humanities methods, practices and support at Exeter

Data visualisation

Working with data, whether networks of individual correspondents, tables of historical financial transactions, literary references, or results from archaeological surveys, can often require interpretation visually. There are a number of toolkits that can deal with specific or general types of data, and these can uncover patterns in your data that would be difficult to infer without digital tools.

D3: Data-Driven Documents
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.

Data cleaning

When working with data, it is almost always necessary to spend a good portion of time cleaning and standardizing that data for analysis or presentation. This is especially true when re-purposing data created by others (always ensure you have permission and give due credit). Useful resources for this work include:

  • OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
  • Authority Lists: wherever possible you should use existing standards and commonly used authority files in order to create interoperable data. For example, VIAF: The Virtual International Authority File and Geonames for people and place identifiers.

Crowdsourcing and Citizen Science/Humanities

There are still many tasks that cannot easily be performed by computers, and especially in the Humanities, the creation of usable research data often involves human intervention, even if on a basic level. Crowdsourcing can often create or enhance research data by asking participants to make judgements about the properties of artefacts, by deciphering handwriting or by interpreting basic information.

The most popular crowdsourcing platform is Zooniverse, which has a variety of projects that you can get directly involved and engaged with, from transcribing ships' logs to understand old weather, to deciphering ancient manuscripts and military field reports.

Data management

Data management is the development of processes and procedures to suit a project's, or an organisation’s data requirements; processes and procedures are supported by an infrastructure, to protect and organise information assets. The concept emerged in the 1980s following the move from sequential processing to random access processing. Data management encompasses the detailed consideration of the following areas: database systems, masterdata and metadata management, quality control, integration definition, warehousing, transformation, governance and architecture. A Data Management Framework (DMF) is a system of thinking which allows a user of the Framework to correctly view data related concepts, and such frameworks are often applied in data management.

 

Linked Data and Linked Open Data

Data in isolation can only give a very narrow view, whereas Linked Data, which is connected to other resources, contextual data and further authorities, can provide a much broader and richer understanding of a topic. The linking is usually made using permanent identifiers that are maintained globally by these authorities, which can also provide narrative and onward links to other resources that reference them. Examples of authorities include VIAF (Virtual International Authority File) which identifies name references, and GeoNames, which references place-based data.

Linked Open Data is linked data that is offered for use with an Open Access licence, allowing it to be used more widely and often without royalty or charge. Please read the licence carefully and respect the copyright holder's wishes.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

Contact Us