Filenames, Formats & Metadata

Filenames

For file and directory naming use only alphanumeric characters without special characters such as quotes, punctuation marks, characters with diacritics, spaces, slashes and the like. Underscores (_) and hyphens (-) can be used. For further guidance have a look at the recommendations from IANUS.

Formats

You are strongly encouraged to provide the resources in standard formats acknowledged by the respective research communities. We will support you in converting the data if this is necessary and feasible.

Suitable formats should be widely in use and, if possible, be in compliance with open and non-proprietary standards. Files should not be password protected, encrypted or compressed in a lossy way. If files depend on references to other files, fonts or other external data, these objects should be deposited as well, or at least described in e.g. a plain text README file. Whenever a choice for encoding is possible choose UTF-8 without the byte order mark (BOM) (see FAQ).

If file conversions become necessary, potential loss of information should be minimised. If lossless conversion into an open or recommended format cannot be achieved the original files will be kept together with the converted versions.

The preferred format for annotated textual data in our repository is TEI/XML (Text Encoding Initiative) with metadata in teiHeaders. Additionally, all language resources have to be described in CMDI (Component Metadata Infrastructure), automatically generated based on the ARCHE metadata. We will gladly support you in creating this metadata. For an overview of recommended standard formats have a look at the CLARIN standards recommendations.

For other formats not covered in the CLARIN standards, for general text formats, and media formats refer to the table for preferred and accepted formats provided by us. The table is based on the formats listed at IANUS and at the Archaeology Data Service.

Preferred and accepted formats in ARCHE (08. 2017). Preferred formats are suitable for long-term preservation. Accepted formats require conversion.

Metadata

Metadata should answer basic questions regarding your data allowing others to understand, discover and share the data. Good metadata provides information about how data was produced, who was involved in the making and what the data is about. Using metadata is an essential part in complying to the FAIR Data Principles, to make data Findable, Accessible, Interoperable, and Reusable (see FAQ).

Metadata can cover different levels like collection-level, file-level and even data unit-level. Ideally metadata is implemented accurately and as completely as possible making use of a standard format. The Archaeology Data Service and IANUS provide format agnostic collection-level metadata which can be applied to all types of domains. Additionally in the respective sections in IANUS’ IT-Empfehlungen file-level metadata is presented, which in general is more technical and heavily depends on the data type and the methods used.

The metadata required when depositing in ARCHE is detailed in the table for metadata requirements. At ARCHE additionally project-level metadata is used alongside collection-level and file-level metadata. Mandatory fields required by ARCHE are marked as such, but using recommended fields is essential for increased findability, understandability and citability of data. The metadata schema of ARCHE is also available in OWL-format annotated and with extensive documentation, of which also a tabular representation exists.

Properties are listed for projects, collections and resources.
m = mandatory, r = recommended, o = optional, and * = property can be used multiple times.