Further Information

DARIAH-DE has developed the following definition of the term “research data” in the context of its project work:

Within DARIAH-DE, digital humanities and cultural studies research data encompasses all sources/materials and results that are collected, generated, described and/or evaluated in the context of a humanities or cultural studies research question and that can be stored in machine-readable form for the purposes of archiving, citability, and further processing.

This definition is intended to account for the special characteristics of humanities research and the resulting heterogeneity of the underlying data. The individual work processes reflect, in a certain sense, the workflows within a Research Data Lifecycle as they should typically be carried out.

Minimum Requirements for Digital Research Data in the Context of DARIAH-DE

In order to make meaningful use of DARIAH-DE services for your own research data, your data should meet four basic requirements:

  • Validity
  • Reliability / documentation of the context of creation and collection
  • Machine readability (and thus processability)
  • Referenceability including information on authorship and legal information regarding further use (by third parties)

Where Can I Find More Information on Research Data?

DARIAH-DE continuously develops its information offering and communicates through multiple media formats.

On our YouTube channel you will find various clips on the topic of research data.

The DHd Blog provides information on current topics and developments.

In the Doing Digital Humanities bibliography you will find both introductory works and more advanced titles and subject areas. The open bibliography offers specialist literature on various areas of digital research, teaching and infrastructure in the humanities. The bibliography can be collaboratively expanded via the free, web-based reference management platform Zotero.

The following external pages contain comprehensive further information on the topic of research data:

Together with the definition of humanities research data, the Research Data Lifecycle forms the intellectual concept behind the central infrastructure for DARIAH-DE. The repository infrastructure is being expanded so that the entire Research Data Lifecycle for the digital humanities can be covered and, where necessary, extended.

Fundamental processes in the lifecycle of research data:

  • Planning and creation
  • Selection
  • Ingest / transfer
  • Storage / infrastructure
  • Preservation measures
  • Access / use

The DARIAH-DE Research Data Federation Architecture

The DARIAH-DE Research Data Federation Architecture (DFA) is the name for basic services provided by DARIAH-DE to cover the fundamental processes of the Research Data Lifecycle. The DFA currently includes the indexing and display of research data, the provision of description schemas for collection descriptions and their long-term storage, as well as comprehensive search functionality for heterogeneous structured data collections and archives. In addition, specific metadata standards are stored and crosswalks between metadata schemas are saved to provide assistance with mapping research data of different origins and compositions.

What is Metadata and What is it Used For?

Metadata is data or information about data: it describes the actual data (digital files or physical objects), containing information about the content, form, or authorship. To structure and process this metadata, various metadata formats exist.

Discipline-Specific Recommendations for Data and Metadata

Within the framework of DARIAH-DE, discipline-specific recommendations for data and metadata have been developed. This was initially done from the perspective of the humanities disciplines involved in DARIAH-DE. We would like to invite scholars from all humanities and cultural studies disciplines, as well as researchers from information science and computer science, to actively participate in expanding and developing these recommendations.

Recommendations for Research Data, Tools and Metadata in the DARIAH-DE Infrastructure

An overview of various metadata standards can be found in the Recommendations for Handling Data and Metadata.

DCDDM

The DCDDM (DARIAH Collection Description Data Model) is a data model for collection description. Institutions and researchers can use it to create descriptions of collections that are both human- and machine-readable. The aim of the DCDDM is to provide easy-to-implement guidelines to support the creation, publication, and management of collections. Collections can consist of both physical objects (books, images, coins) and digital objects (digitized texts, database entries).

The documentation can be found in the Wiki. The DCDDM XML schema and detailed documentation are available on GitHub.

The data model is used in the Collection Registry for the collection descriptions stored there.

Why Should I License My Data?

The German Research Foundation (DFG) advises in its Guidelines for Safeguarding Good Research Practice that primary data be stored on “durable and secure media” in the institution that created them for ten years. However, archiving data alone does not guarantee the reproducibility of scientific results. Making data available is an equally important and legitimate requirement. The broadest possible, interregional and long-term access to data raises a series of legal questions that can be regulated with the help of licenses. These answer, among other things, what researchers may (or may not) do with other researchers’ data.

forschungslizenzen.de

The portal forschungslizenzen.de, developed within the DARIAH-DE project, provides an overview of research licenses and presents them in a practical context using examples from the humanities. The aim is to provide an overview, to network contact persons, and to facilitate entry into the topic.

The portal responds to two needs that became apparent during work in DARIAH-DE: on the one hand the desire for knowledge exchange on the licensing of research data, and on the other hand the need for educational and advisory work regarding the corresponding decision-making processes.

The selection of examples presented focuses on the field of Digital Humanities. The texts are taken from current publications on the topic, and the content is supplemented as new developments emerge. In contributions on individual projects, contact information for persons at the participating institutions is presented. In this way, researchers are encouraged to exchange ideas with existing projects on licensing issues and to share experiences.

DARIAH-DE Publications on the Topic

A detailed discussion of copyright and recommendations for standard licenses for research data can be found in the DARIAH-DE Working Papers:

  • Nikolaos Beer, Kristin Herold, Maurice Heinrich, Wibke Kolbmann, Thomas Kollatz, Matteo Romanello, Sebastian Rose, Felix Falco Schäfer, Niels-Oliver Walkowski: Data Licenses for Humanities Research Data - Legal Conditions and Need for Action. DARIAH-DE Working Papers No. 6. Göttingen: DARIAH-DE, 2014. URN: urn:nbn:de:gbv:7-dariah-2014-4-8

  • Paul Klimpel, John H. Weitzmann: Researching in the Digital World. A Legal Guide for the Humanities. DARIAH-DE Working Papers No. 12. Göttingen: DARIAH-DE, 2015. URN: urn:nbn:de:gbv:7-dariah-2015-5-0

The DARIAH-DE “Research Data Federation Architecture” (DFA) is the name for services and tools with which research data and collection descriptions from different sources — such as cultural institutions, libraries, archives, research institutions, and data centers — can be found and used for analyses.

Search queries in a scientific context require a high degree of precision in determining the respective parameters. Ideally, it should be possible for researchers to restrict their academic search in a digital environment to specific sources. In this way, XML structures of datasets of different provenance can be queried, ensuring the interoperability of various data and metadata schemas, and correlating heterogeneous data and metadata sources through a common reference for places, names, dates, or other logical units.

<img src="/en/daten/DFA-federation-2018-06-13.png" class="img-fluid rounded" alt="Schematic structure of the DARIAH-DE Data Federation Architecture" />

The DARIAH-DE “Research Data Federation Architecture” visualized in the graphic above encompasses the indexing and display of research data, the provision of sustainable and persistent access for the use of technical tools to compare descriptions and contents of digital collections, and comprehensive search functionality for heterogeneously structured data collections and archives.

The DARIAH-DE Research Data Federation Architecture is modular in design, can be extended at any time with additional components, and currently includes the following tools and services:

  • The Collection Registry allows both the registration of information on research data collections in DARIAH-DE and the registration of new collection information.

  • The DARIAH-DE Repository allows research data to be stored, annotated with metadata, permanently and machine-readably referenced through the use of Persistent Identifiers, and discovered through the Generic Search. The repository also makes it possible to archive data collections sustainably and securely.

  • Using the DARIAH-DE Publikator, research data can be conveniently uploaded to the DARIAH-DE Repository via a graphical interface and annotated with metadata. These can then be registered as a collection in the Collection Registry and are then indexed in the Generic Search.

  • The Data Modeling Environment (DME) is the place where data can be modeled and mappings between data models stored, managed long-term, and combined as needed. It offers conceptual support for researchers in the arts, humanities, and social sciences to connect heterogeneous data and thereby establish interoperability.

  • Mappings enable automated translation of data from one model into another. For this reason, the DME forms the basis for searching various collections in the Generic Search, for example. The functionality of the DME with regard to mapping between data models is illustrated in the following screenshot of the user interface:

  • The Generic Search provides a front-end for the data stored in the Collection Registry and the DARIAH-DE Repository. Using the Generic Search, distributed datasets can be searched. Furthermore, it is possible to search the indexed metadata, save this search in a personalized manner, and adjust or refine it at a later point in time.

  • The EPIC PID Service, as a basic service, ensures permanent referenceability of research data via so-called “Persistent Identifiers.” The latter are services that guarantee a sustainable reference to data. This keeps references — for example in scientific publications — stable even when the storage location of the referenced data changes. DARIAH-DE uses PIDs from the European Persistent Identifier Consortium (EPIC).

This set of digital tools forms a modular software architecture, in which each service enables access to heterogeneous data sources of various provenance. New methods for analyzing distributed data collections are thus made possible.