IDEALS Digital Preservation Support Policy

Committed to building and maintaining collections for the use of students, faculty, scholars, and the public long into the future, the University of Illinois at Urbana-Champaign assumes an obligation to ensure long-term access to the materials deposited into IDEALS and their intellectual content, but also acknowledges the inherent challenges involved in preserving digital content.

To this end, the IDEALS Digital Preservation Support Policy defines the categories of preservation support available and provides specific information about where different file formats fit within these categories. This policy is subject to change as new and emerging technologies impact our ability to preserve deposited content.

Background

Our ability to preserve digital objects deposited in IDEALS is dependent, among other things, on whether the file format used:

  • Is openly documented (more preservable) or proprietary (less preservable);
  • Is supported by a range of software platforms (more preservable) or by only one (less preservable);
  • Is widely adopted (more preservable) or has low use (less preservable);
  • Is lossless data compression (more preservable) or lossy data compression(less preservable); and
  • Contains embedded files or embedded programs/scripts, like macros (less preservable).

All digital objects deposited to IDEALS will receive a basic level of preservation. Basic preservation means that IDEALS will preserve the viability of the original object through:

  • ensuring that the bitstream (the 1s and 0s that make up the digital file) remains exactly the same over time;
  • assigning a persistent, permanent identifier;
  • creating preservation metadata;
  • maintaining onsite and offsite backup copies;
  • performing regular virus and file corruption checks; and
  • performing periodic refreshments by copying files to new storage media.

Basic preservation does not ensure that a digital object may be opened by a computer program or is understandable by a human in the future. For example, in 2006 a faculty member deposits a conference presentation in the Microsoft PowerPoint format (ppt), a proprietary format. In 2030, a graduate student would like to view that conference presentation, but the software program - Microsoft PowerPoint - used to open and read ppt files has been discontinued since 2020. Old versions of the software program are difficult to find, and, because the ppt file format had never been publicly documented, there exist no other software programs to open the file. Even though the original digital object (the conference presentation in ppt) is still technically viable, it is no longer renderable (able to be opened by a computer program), and thus not understandable by the graduate student in 2030.

Therefore, for digital objects that meet certain criteria (see below), IDEALS will strive to preserve not only the viability of the object but also the renderability and the understandability of the content of the digital object, as well as the original file itself. In the case of some objects in proprietary formats, this will mean that in addition to the original digital object, IDEALS will also save a copy of the object transformed into a file format that is more preservable than the original. For example, the conference presentation in ppt might also be saved as a pdf/a object (an open, publicly documented standard). The pdf/a object is a more preservable format than the ppt format. What may be lost is the full functionality of the original digital object. For example, the graduate student in our example may not be able to view the conference presentation as a slide show as the Microsoft PowerPoint software program allows. However, the content of the conference presentation will be preserved.

IDEALS also recognizes that in some cases an access copy of a digital object is necessary due to the proprietary nature or cost of the software used to render it. For example, a Microsoft Word document is reliant on the Microsoft Word progam to render it; IDEALS will also provide a pdf version of the document because pdf readers are freely and readily available. In some cases, the access copy and the preservable copy may be the one and the same - a pdf/a version, for example.

Categories of Preservation Support

IDEALS categorizes digital objects into three categories of preservation support. These categories are defined below. Any format not yet reviewed and evaluated by IDEALS will receive Category 3 support on deposit. A different category may be assigned after format review takes place.

Category 1 - Highest Confidence - Full Support

Description:

  • Most confidence in ability to provide long term preservation to content and functionality
  • Highest level of preservation support in effort to maintain viability, renderability, and understandability as well as functionality of original digital object.

Criteria:

  • Is in a format this is publicly documented (example: xml);
  • Is in a format this is widely adopted (example: xhtml);
  • Is in a format that may be rendered by multiple software packages (example: txt);
  • Is in a format that has lossless data compression (example: uncompressed tiff files); and
  • Contains no embedded files or dynamic content (example: txt).

Actions:

  • Monitor file format for changes that might warrant transformation or reassessment;
  • Migration of document to successive format when necessary;
  • Basic preservation including:
    • bitstream maintenance;
    • persistent, permanent identifier;
    • preservation metadata;
    • onsite and offsite backup copies;
    • regular virus and file corruption checks;
    • periodic refreshments to new storage media.

Examples:

  • Plain text document in unicode
  • A tiff image

Category 2 - Moderate Confidence - Intermediate Support

Description:

  • Moderate confidence level in ability to provide long term preservation to content of file
  • Intermediate level of preservation support in effort to maintain maintain viability, renderability, and understandability (but not functionality) of original digital object.

Criteria:

  • Is in a format that is publicly documented;
  • AND is in a format that has lossy data compression (example: Ogg Vorbis);
  • OR is in a version of a format that has been deprecated in favor of a later version (example: HTML 3.0).

OR

  • Is in a proprietary format;
  • Is in a format that is widely adopted; and
  • Is in a format that is of enough public and/or commercial interest that tools are likely to be available to migrate them to successor formats.

NOTE: Files with embedded content (for example, a PowerPoint (ppt) with a AVI video file (avi) inserted into it) are more preservable if the the files are deposited as separate files within the same item in IDEALS need link here to help screen. If the content remains embedded, it will likely not remain intact when the file is transformed to a more preservable format.

NOTE: Files with dynamic content (for example, an Excel spreadsheet (xls) with dynamic functions - even simple ones!) are more preservable if the dynamic content is either documented (for example, a note in an Excel spreadsheet explaining the functions that are included) or the document is saved as a static document (for example, a cell in an Excel spreadsheet that is the sum of a column is saved as the sum, not the function of adding the multiple cells).

Actions:

  • Monitor file format for changes that might warrant transformation or reassessment;
  • When possible, transformation to a format that preserves the content and when possible the formatting and style of the original, but not necessarily the functionality.
  • Basic preservation of original object including:
    • bitstream maintenance;
    • persistent, permanent identifier;
    • preservation metadata;
    • onsite and offsite backup copies;
    • regular virus and file corruption checks;
    • periodic refreshments to new storage media.

Examples:

  • Microsoft Word document (proprietary format)
  • A compressed TIFF file.

Category 3 - Low Confidence - Basic Preservation Only

Description:

  • Low confidence level in ability to provide long term preservation to content of file
  • Basic level of preservation support in effort to maintain maintain viability of original digital object only.

Criteria:

  • Is in a proprietary format;
  • Is in a format about which little information is publicly available;
  • Is in a format that is not widely adopted;
  • Is in a format with lossy data compression;
  • Is supported by a single or very few software platforms; and/or
  • Is in a format that does not meet the criteria for any of Categories 1-2.

Actions:

  • Basic preservation of original object only including:
    • bitstream maintenance;
    • persistent, permanent identifier;
    • preservation metadata;
    • onsite and offsite backup copies;
    • regular virus and file corruption checks;
    • periodic refreshments to new storage media.

Examples:

  • Kodak Photo CD format (pcd)
  • Windows Media Video (wmv)

Table of Preservation Actions

Preservation Action Category 1 Category 2 Category 3
Provision of persistent identifier for object and/or its metadata X X X
Creation of preservation metadata X X X
Secure storage and backup X X X
Regular fixity checks X X X
Regular virus checks X X X
Periodic refreshment to new storage media X X X
Transformation to a more preservable format N/A X  
Storage of original digital object X X X
Strategic monitoring of format for changes X X  
Migration to successive format upon obsolescence X    
Topic revision: r17 - 07 Feb 2008 - 12:59:37 - kcostel4
 
Copyright 2014 by University of Illinois at Urbana-Champaign.
All material on this collaboration platform is the property of the University of Illinois at Urbana-Champaign.
Suggestions, requests, or problems finding IDEALS Resources? Send feedback
Powered by the TWiki collaboration platform