|
ARCHIVES |
An
archive is a collection of historical records, as well as the place they are located.
[1] Archives contain
primary source documents that have accumulated over the course of an individual or organization's lifetime.
In general, archives consist of records that have been selected for permanent or long-term preservation on grounds of their enduring cultural, historical, or evidentiary value. Archival records are normally unpublished and almost always unique, unlike
books or magazines for which many identical copies exist. This means that archives (the places) are quite distinct from
libraries with regard to their functions and organization, although archival collections can often be found within library buildings.
[2]
A person who works in archives is called an
archivist. The study and practice of organizing, preserving, and providing access to information and materials in archives is called
archival science.
When referring to historical records or the places they are kept, the plural form
archives is chiefly used.
[3] Archivists tend to prefer the term "archives" (with an S) as the correct terminology to serve as both the singular and plural, since "archive," as a noun or a verb, has acquired meanings related to computer science.
Etymology
First attested in English in early 17th century, the word
archive (pronounced
/ˈɑrkaɪv/) is derived from the
French archives (plural), in turn from
Latin archīum or
archīvum,
[4] which is the
romanized form of the
Greek ἀρχεῖον (
arkheion), "public records, town-hall, residence or office of chief magistrates",
[5] itself from
ἀρχή (
arkhē), amongst others "magistracy, office, government"
[6] (compare an-archy, mon-archy), which comes from the verb
ἄρχω (
arkhō), "to begin, rule, govern".
[7]
The word originally developed from the Greek
ἀρχεῖον (arkheion) which refers to the home or dwelling of the
Archon, in which important official state documents were filed and interpreted under the authority of the Archon. The adjective formed from
archive is
archival.
Users and institutions
Historians,
genealogists,
lawyers,
demographers,
filmmakers, and others conduct
research at archives.
[8] The research process at each archive is unique, and depends upon the institution in which the archive is housed. While there are many different kinds of archives, the most recent census of archivists in the United States identified five major types: academic, business (for profit), government, non-profit, and other.
[9] There are also four main areas of inquiry involved with archives: material technologies, organizing principles, geographic locations, and tangled embodiments of humans and non-humans. These areas help to further categorize what kind of archive is being created.
Web archiving
The process of collecting data from the World Wide Web and preserving it in an archive, such as an archive site, for the web user to see. See
Website Archiving.
Examples of web archives
- Side bars
- Blogs
- Calendar
- Tag cloud
- News websites
History
The word "archive" can refer to any organised body of records fixed on media. The management of archives is essential for effective day-to-day organisational decision making, and even for the survival of organisations.
[citation needed] Archives were well developed by the ancient Chinese, the ancient Greeks, and ancient Romans. Modern archival thinking has many roots in the French Revolution. The
French National Archives, who possess perhaps the largest archival collection in the world, with records going as far back as A.D. 625, were created in 1790 during the
French Revolution from various government, religious, and private archives seized by the revolutionaries
History of Archival Research
Many archives have been around for multiple hundreds of years. For instance
Vatican Secret Archives was started in the 17th century AD and contains state papers, papal account books, and papal correspondence dating back to the 8th century. Most archives that are still in existence do not claim collections that date back quite as far as the Vatican Archive.
However, many national archives were established over one hundred years ago and contain collections going back three or four hundred years ago. The United States
National Archives and Records Administration was established originally in 1934.
[1] The NARA contains records and collections dating back to the founding of the United States in the 18th century. Among the collections of the NARA are the Declaration of Independence, the Constitution of the United States, and an original copy of the Magna Carta. Similarly, the
Archives nationales in France was founded in 1790 during the French Revolution and has holdings that date back to AD 625.
Universities are another historic venue for archival holdings. Most universities have archival holdings that chronicle the business of the university. Some universities also have cultural archives that focus on one aspect or another of the culture of the state or country in which the university is located.
The University of North Carolina at Chapel Hill has archival collections on the subjects of Southern History and Southern Folklife.
[2] Boston University's Howard Gottlieb Archival Research Library has collections dedicated to chronicling advances and famous moments in American art, drama, and public/ political life.
[3]
The reason for highlighting the breadth and depth of historical archives is to give some idea of the difficulties facing archival researchers in the pre-digital age. Some of these archives were dauntingly vast in the amount of records they held. For example, The Vatican Secret Archive had upwards of 52 miles of archival shelving. In an age where you could not simply enter your query into a search bar complete with Boolean operators the task of finding material that pertained to your topic would have been difficult at the least. The
Finding aid made the work of sifting through these vast archives much more manageable.
[4] A finding aid is a document that is put together by an archivist or librarian that contains information about the individual documents in a specific collection in an archive. These documents can be used to determine if the collection is relevant to a designated topic. Finding aids made it so a researcher did not have to blindly search through collection after collection hoping to find pertinent information. However, in the pre-digital age a researcher still had to travel to the physical location of the archive and search through a card catalog of finding aids.
Pre-Internet Data Storage
Organizing, collecting, and archiving information using physical documents without the use of electronics is a daunting task.
Magnetic storage devices provided the first means of storing electronic data. As technology has progressed over the years, so too has the ability to archive data using electronics. Long before the internet, means of using technology to help archive information were in the works. The early forms of
magnetic storage devices that would later be used to archive information were invented as early as the late 19th century, but were not used for organizing information until 1951 with the invention of the
UNIVAC I.
UNIVAC I, which stands for Universal Automatic Computer 1, used
magnetic tape to store data and was also was the first commercial computer produced in the United States. Early computers such as UNIVAC I were enormous and sometimes took up entire rooms, rendering them completely obsolete in today's technological society. But the central idea of using magnetic tape to store information is a concept that is still in use today.
While most magnetic storage devices have been replaced by optical storage devices such as
CDs,
USB flash drives DVDs, some are still in use today.
[5] In fact, the floppy drive is one example of a magnetic storage device that became extremely popular in the 1970s through the 1990s. Older 5.25" floppy discs have not been used for quite some time but the smaller 3.5" floppy discs aren't obsolete yet. The 3.5" discs hold approximately 1.44 mgs of data and for years have been used by millions of people to back up the information on their hard drives.
Magnetic tape has proven to be a very effective means of archiving data as large amounts of data that don’t need to be quickly accessed can be found on magnetic tape. That is especially true of aging data that may not need to be accessed again at all, but for different reasons still needs to be stored “just in case”.
[5]
Internet Age Archiving
With the explosion of the internet over the past couple decades, archiving has begun to make its way online. The days of using electronic devices such as magnetic tape are coming to an end as people start to use the internet to archive their information.
Internet archiving has become extremely popular for several reasons. As mentioned earlier, the attempt to have as much information take up as little space a possible is very helpful for many archivers. Using the internet to archive allows for this to be possible as well many other benefits. There is no limit to how much information one can store online in their archive. Internet archiving can be used to store as little information needed for a single person, or for as much information needed for a major company. Internet archives can contain large-scale digitization as well as provide long term management and preservation of the digital resources similarly to the electronics used in the pre-internet data storage era. Along with the idea of storage benefits, archiving via Internet ensures that ones information is safe. There is risk of misplacing your information, or having it get destroyed by water or fire etc. Those are problems that may occur when archiving using floppy discs, hard drives, and computers Lastly, the ability to access the information from almost anywhere is one of the main attractions to online archiving. As long as one has access to the internet they can edit and retrieve the information they are looking for.
Most institutions with physical archives have begun to
digitize their holdings and make them available on the internet. Notably the National Archive and Records Administration in Washington, D.C. has a clearly defined initiative that was started in 1998 in an attempt to digitize many of their holdings and make them available on the internet.
Digital preservation is the active management of
digital information over time to ensure its accessibility. Preservation of digital information is widely considered to require more constant and ongoing attention than preservation of other media.
[1] This constant input of effort, time, and money to handle rapid technological and organizational advance is considered the main stumbling block for preserving digital information. Indeed, while we are still able to read our written heritage from several thousand years ago, the digital information created merely a decade ago is in serious danger of being lost, creating a
digital Dark Age.
[citation needed]
Digital preservation is the set of processes and activities that ensure continued access to information and all kinds of records, scientific and cultural heritage existing in digital formats. This includes the preservation of materials resulting from
digital reformatting, but particularly information that is
born-digital and has no analog counterpart. In the language of digital imaging and electronic resources, preservation is no longer just the product of a program but an ongoing process. In this regard the way digital information is stored is important in ensuring its longevity. The long-term storage of digital information is assisted by the inclusion of
preservation metadata.
|
DIGITAL PRESERVATION |
Digital preservation
Digital preservation is defined as: long-term, error-free storage of digital information, with means for retrieval and interpretation, for the entire time span the information is required for. Long-term is defined as "long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely".
[2] "Retrieval" means obtaining needed digital files from the long-term, error-free digital storage, without possibility of corrupting the continued error-free storage of the digital files. "Interpretation" means that the retrieved digital files, files that, for example, are of texts, charts, images or sounds, are decoded and transformed into usable representations. This is often interpreted as "rendering", i.e. making it available for a human to access. However, in many cases it will mean able to be processed by computational means.
Strategies
In 2006, the
Online Computer Library Center developed a four-point strategy for the long-term preservation of digital objects that consisted of:
- Assessing the risks for loss of content posed by technology variables such as commonly used proprietary file formats and software applications.
- Evaluating the digital content objects to determine what type and degree of format conversion or other preservation actions should be applied.
- Determining the appropriate metadata needed for each object type and how it is associated with the objects.
- Providing access to the content.[8]
There are several additional strategies that individuals and organizations may use to actively combat the loss of digital information.
Refreshing
Refreshing is the transfer of data between two types of the same storage medium so there are no
bitrate changes or alteration of data.
[9] For example, transferring
census data from an old preservation
CD to a new one. This strategy may need to be combined with migration when the
software or
hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing will likely always be necessary due to the deterioration of physical media.
Migration
Migration is the transferring of data to newer system environments (Garrett et al., 1996). This may include conversion of resources from one
file format to another (e.g., conversion of
Microsoft Word to
PDF or
OpenDocument), from one
operating system to another (e.g.,
Windows to
Linux) or from one
programming language to another (e.g.,
C to
Java) so the resource remains fully accessible and functional. Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats.
The
US National Archives Electronic Records Archives and
Lockheed Martin are jointly developing a migration system that will preserve any type of document, created on any application or platform, and delivered to the archives on any type of digital media.
[10] In the system, files are translated into flexible formats, such as XML; they will therefore be accessible by technologies in the future.
[10] Lockheed Martin argues that it would be impossible to develop an emulation system for the National Archives ERA because the volume of records and cost would be prohibitive.
[10]
Replication
Creating duplicate copies of data on one or more systems is called
replication. Data that exists as a single copy in only one location is highly vulnerable to software or hardware failure, intentional or accidental alteration, and environmental catastrophes like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Replicated data may introduce difficulties in refreshing, migration, versioning, and
access control since the data is located in multiple places.
Emulation
Emulation is the replicating of functionality of an obsolete system.
[11] Examples include emulating an
Atari 2600 on a
Windows system or emulating
WordPerfect 1.0 on a
Macintosh.
Emulators may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems, such as with the
MAME project. The feasibility of emulation as a catch-all solution has been debated in the academic community. (Granger, 2000)
Raymond A. Lorie has suggested a
Universal Virtual Computer (UVC) could be used to run any software in the future on a yet unknown platform.
[12] The UVC strategy uses a combination of emulation and migration. The UVC strategy has not yet been widely adopted by the digital preservation community.
Jeff Rothenberg, a major proponent of Emulation for digital preservation in libraries, working in partnership with Koninklijke Bibliotheek and National Archief of the Netherlands, has recently helped launch
Dioscuri, a modular emulator that succeeds in running MS-DOS, WordPerfect 5.1, DOS games, and more.
[13]
Metadata attachment
Metadata is data on a digital file that includes information on creation, access rights, restrictions, preservation history, and rights management.
[14] Metadata attached to digital files may be affected by file format obsolescence.
ASCII is considered to be the most durable format for metadata
[15] because it is widespread, backwards compatible when used with
Unicode, and utilizes human-readable characters, not numeric codes. It retains information, but not the structure information it is presented in. For higher functionality,
SGML or
XML should be used. Both markup languages are stored in ASCII format, but contain tags that denote structure and format.
Trustworthy digital objects
Digital objects that can speak to their own authenticity are called
trustworthy digital objects (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic.
[16] Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs.
Digital sustainability
Digital sustainability encompasses a range of issues and concerns that contribute to the longevity of digital information.
[17] Unlike traditional, temporary strategies and more permanent solutions, digital sustainability implies a more active and continuous process. Digital sustainability concentrates less on the solution and technology and more on building an infrastructure and approach that is flexible with an emphasis on interoperability, continued maintenance and continuous development.
[18] Digital sustainability incorporates activities in the present that will facilitate access and availability in the future.
Digital preservation standards
To standardize digital preservation practice and provide a set of recommendations for preservation program implementation, the Reference Model for an Open Archival Information System (
OAIS) was developed. The reference model (
ISO 14721:2003) includes the following responsibilities that an OAIS archive must abide by:
- Negotiate for and accept appropriate information from information Producers.
- Obtain sufficient control of the information provided to the level needed to ensure Long-Term Preservation.
- Determine, either by itself or in conjunction with other parties, which communities should become the Designated Community and, therefore, should be able to understand the information provided.
- Ensure that the information to be preserved is Independently Understandable to the Designated Community. In other words, the community should be able to understand the information without needing the assistance of the experts who produced the information.
- Follow documented policies and procedures which ensure that the information is preserved against all reasonable contingencies, and which enable the information to be disseminated as authenticated copies of the original, or as traceable to the original.
- Make the preserved information available to the Designated Community.[19]
OAIS is concerned with all technical aspects of a digital object’s life cycle: ingest into and storage in a preservation infrastructure, data management, accessibility, and distribution. The model also addresses metadata issues and recommends that five types of metadata be attached to a digital object: reference (identification) information, provenance (including preservation history), context, fixity (authenticity indicators), and representation (formatting, file structure, and what "imparts meaning to an object’s bitstream".
[20] Prior to Gladney's proposal of TDOs was the Research Library Group's (RLG) development of "attributes and responsibilities" that denote the practices of a "Trusted Digital Repository" (TDR) The seven attributes of a TDR are: "compliance with the Reference Model for an Open Archival Information System (OAIS), Administrative responsibility, Organizational viability, Financial sustainability, Technological and procedural suitability, System security, Procedural accountability." Among RLG’s attributes and responsibilities were recommendations calling for the collaborative development of digital repository certifications, models for cooperative networks, and sharing of research and information on digital preservation with regards to intellectual property rights.
[21]
Digital sound preservation standards
In January 2004, the Council on Library and Information Resources (CLIR) hosted a roundtable meeting of audio experts discussing best practices, which culminated in a report delivered March 2006. This report investigated procedures for reformatting sound from analog to digital, summarizing discussions and recommendations for best practices for digital preservation. Participants made a series of 15 recommendations for improving the practice of analog audio transfer for archiving:
- Develop core competencies in audio preservation engineering. Participants noted with concern that the number of experts qualified to transfer older recordings is shrinking and emphasized the need to find a way to ensure that the technical knowledge of these experts can be passed on.
- Develop arrangements among smaller institutions that allow for cooperative buying of esoteric materials and supplies.
- Pursue a research agenda for magnetic-tape problems that focuses on a less destructive solution for hydrolysis than baking, relubrication of acetate tapes, and curing of cupping.
- Develop guidelines for the use of automated transfer of analog audio to digital preservation copies.
- Develop a web-based clearinghouse for sharing information on how archives can develop digital preservation transfer programs.
- Carry out further research into nondestructive playback of broken audio discs.
- Develop a flowchart for identifying the composition of various types of audio discs and tapes.
- Develop a reference chart of problematic media issues.
- Collate relevant audio engineering standards from organizations.
- Research safe and effective methods for cleaning analog tapes and discs.
- Develop a list of music experts who could be consulted for advice on transfer of specific types of musical content (e.g., determining the proper key so that correct playback speed can be established).
- Research the life expectancy of various audio formats.
- Establish regional digital audio repositories.
- Cooperate to develop a common vocabulary within the field of audio preservation.
- Investigate the transfer of technology from such fields as chemistry and materials science to various problems in audio preservation.[22]
Updated technical guidelines on the creation and preservation of digital audio have been prepared by the
International Association of Sound and Audiovisual Archives (IASA).
Large-scale digital preservation initiatives (LSDIs)
Many research libraries and archives have begun or are about to begin Large-Scale digital preservation initiatives (LSDI’s). The main players in LSDIs are cultural institutions, commercial companies such as Google and Microsoft, and non-profit groups including the
Open Content Alliance (OCA), the
Million Book Project (MBP), and
HathiTrust. The primary motivation of these groups is to expand access to scholarly resources.
|
BIT ROT |
Bit rot
Bit rot, also known as
bit decay,
data rot, or
data decay, is a colloquial
computing term used to describe either a gradual
decay of
storage media or (facetiously) the spontaneous degradation of a
software program over time. The latter use of the term implies that software can literally wear out or
rust like a physical tool. More commonly, bit rot refers to the decay of physical storage media.
Database preservation usually involves converting the information stored in a database, without losing the characteristics (Context, Content, Structure, Appearance and Behaviour) of the data, to a format which can be used in the long term, even if the technology and daily life knowledge changes.
Database preservation projects
In the past different research groups have been contributing to the solutions of the problems of database preservation. Research projects carried out in the past in this regard include:
- Software independent archival of relational databases (SIARD)[1]
- Repository of Authentic Digital Objects (RODA)[2]
- Digital Preservation Testbed [3]
- Lots of Copies Keep Stuff Safe (LOCKSS)
|
SECTION 108 STUDY GROUP |
Section 108 Study Group
The
Section 108 Study Group is a select committee of
copyright experts, convened by the
Library of Congress, and charged with updating for the digital world the United States
Copyright Act's balance between the rights of creators and copyright owners and the needs of libraries and archives.
nformation Lifecycle Management refers to a wide-ranging set of strategies for administering
storage systems on
computing devices. Specifically, four categories of storage strategies may be considered under the auspices of
ILM.
Policy
ILM Policy consists of the overarching storage and information policies that drive management processes. Policies are dictated by business goals and drivers. Therefore, policies generally tie into a framework of overall
IT governance and
management;
change control processes; requirements for system availability and recovery times; and
service level agreements (SLAs).
Operational
Operational aspects of ILM include backup and data protection;
disaster recovery, restore, and restart; archiving and long-term retention; data replication; and day-to-day processes and procedures necessary to manage a storage architecture.
Infrastructure
Infrastructure facets of ILM include the
logical and
physical architectures; the
applications dependent upon the storage platforms;
security of storage; and
data center constraints. Within the application realm, the relationship between applications and the production,
test, and
development requirements are generally most relevant for ILM.
Definition
Information Lifecycle Management (sometimes abbreviated
ILM) is the practice of applying certain policies to the effective management of information throughout its useful life. This practice has been used by Records and Information Management (RIM) Professionals for over three decades and had its basis in the management of information in paper or other physical forms (microfilm, negatives, photographs, audio or video recordings and other assets).
Video Lifecycle Management (VLM) is a video aware subset of ILM.
ILM includes every phase of a "record" from its beginning to its end. And while it is generally applied to information that rises to the classic definition of a record (
Records management), it applies to any and all informational assets. During its existence, information can become a record by being identified as documenting a business transaction or as satisfying a business need. In this sense ILM has been part of the overall approach of ECM
Enterprise content management.
However, in a more general perspective the term "business" must be taken in a broad sense, and not forcibly tied to direct commercial or enterprise contexts. While most records are thought of as having a relationship to enterprise business, not all do. Much recorded information serves to document an event or a critical point in history. Examples of these are birth, death, medical/health and educational records.
e-Science, for example, is an emerging area where ILM has become relevant.
In the year 2004, attempts have been made by the Information Technology and Information Storage industries (
SNIA association) to assign a new broader definition to Information Lifecycle Management (ILM) according to this broad view:
- Information Lifecycle Management comprises the policies, processes, practices, and tools used to align the business value of information with the most appropriate and cost effective IT infrastructure from the time information is conceived through its final disposition. Information is aligned with business processes through management policies and service levels associated with applications, metadata, information, and data.
Functionality
For the purposes of business records, there are five phases identified as being part of the lifecycle continuum along with one exception. These are:
- Creation and Receipt
- Distribution
- Use
- Maintenance
- Disposition
Creation and Receipt deals with records from their point of origination. This could include their creation by a member of an organization at varying levels or receipt of information from an external source. It includes correspondence, forms, reports, drawings, computer input/output, or other sources.
Distribution is the process of managing the information once it has been created or received. This includes both internal and external distribution, as information that leaves an organization becomes a record of a transaction with others.
Use takes place after information is distributed internally, and can generate business decisions, document further actions, or serve other purposes.
Maintenance is the management of information. This can include processes such as filing, retrieval and transfers. While the connotation of 'filing' presumes the placing of information in a prescribed container and leaving it there, there is much more involved. Filing is actually the process of arranging information in a predetermined sequence and creating a system to manage it for its useful existence within an organization. Failure to establish a sound method for filing information makes its retrieval and use nearly impossible. Transferring information refers to the process of responding to requests, retrieval from files and providing access to users authorized by the organization to have access to the information. While removed from the files, the information is tracked by the use of various processes to ensure it is returned and/or available to others who may need access to it.
Disposition is the practice of handling information that is less frequently accessed or has met its assigned retention periods. Less frequently accessed records may be considered for relocation to an 'inactive records facility' until they have met their assigned retention period. "Although a small percentage of organizational information never loses its value, the value of most information tends to decline over time until it has no further value to anyone for any purpose. The value of nearly all business information is greatest soon after it is created and generally remains active for only a short time --one to three years or so-- after which its importance and usage declines. The record then makes its life cycle transition to a semi-active and finally to an inactive state."
[1] Retention periods are based on the creation of an organization-specific retention schedule, based on research of the regulatory, statutory and legal requirements for management of information for the industry in which the organization operates. Additional items to consider when establishing a retention period are any business needs that may exceed those requirements and consideration of the potential historic, intrinsic or enduring value of the information. If the information has met all of these needs and is no longer considered to be valuable, it should be disposed of by means appropriate for the content. This may include ensuring that others cannot obtain access to outdated or obsolete information as well as measures for protection privacy and confidentiality.'
Long-term records are those that are identified to have a continuing value to an organization. Based on the period assigned in the retention schedule, these may be held for periods of 25 years or longer, or may even be assigned a retention period of "indefinite" or "permanent". The term "permanent" is used much less frequently outside of the Federal Government, as it is impossible to establish a requirement for such a retention period. There is a need to ensure records of a continuing value are managed using methods that ensure they remain persistently accessible for length of the time they are retained. While this is relatively easy to accomplish with paper or microfilm based records by providing appropriate environmental conditions and adequate protection from potential hazards, it is less simple for electronic format records. There are unique concerns related to ensuring the format they are generated/captured in remains viable and the media they are stored on remains accessible. Media is subject to both degradation and obsolescence over its lifespan, and therefore, policies and procedures must be established for the periodic conversion and migration of information stored electronically to ensure it remains accessible for its required retention periods.
Exceptions occur with non-recurring issues outside the normal day to day operations. One example of this is a
legal hold, litigation hold or
legal freeze is requested by an attorney. What follows is that the records manager will place a legal hold inside the records management application which will stop the files from being enqueued for disposition.