eReference Publishing: Current Issues and Trends in Preservation

The RUSA/Codes Reference Publishing Advisory Committee sponsored a program at the Annual Conference in New Orleans, “eReference Publishing: Current Issues and Trends in Preservation.”   Joseph, Yue, Chair of the committee, moderated the panel discussion.  Speakers included:

  • Jacob Nadal, Preservation Officer, UCLA Library
  • Heather Ruland Staines, Sr. Manager eOperations at Springer Science + Business Media
  • Marie McCaffrey, Executive Director, HistoryLink.org
  • Ken DiFiore, Associate Director, Outreach & Participation Services,Portico

Jacob Nadal (jacobnadal.com/70)  provided a broad discussion of preservation and how it relates to eReference

3 Core Issues in Preservation:

  • Reliable storage – what is the best way to store the information?  Backup vs. storage. Backup is the recovery of systems but the systems are subject to obsolesce and require maintenance.  Storage is for the recovery of content, which can be much more durable and technology independent.  Clouds, tape libraries, RAID arrays (inexpensive discs) are all viable.
  • Choosing formats and technology platforms – What is the best format and platform to store information? open-source? proprietary?
  • Planning for obsolescence – there will come a point when the digital materials we use today won’t be usable.  It’s a problem, but not a pressing problem.

What is the state of publishing?  Craig Mod wrote an essay about pre-artifact publishing.  Jacob shared a diagram showing an idea becoming an artifact and being distributed to a reader.  The second diagram displayed post-artifact publishing where the digital object is changed over time.  The digital artifact takes on a life of its own.  Along the way, smaller artifacts are created, but they aren’t as great in size/importance as the original.   For preservation, this becomes difficult because you can only save something once it is an artifact. Jacob analyzed the continuously updated eReference book with a dance recital.  You can’t preserve the dancing, only the recording of it at that time.

One example of preservation that does work is wikipedia.  http://en.wikipedia.org/wiki/Preservation_(library_and_archival_science)

Four years of changes to a file are tracked in this example.  His example met the core issues in preservation.  Lessons learned:  investigate citation systems and versioning – digital asset management; move providers towards known platforms, reliable file formats, open standards; get clear rights including post-cancellation access.

Heather Ruland Staines – Springer Science + Business Media

eReference from a publishers perspective includes encyclopedias, atlases, biographies, dictionaries, and databases.  They just launched Springer eReference. They participate in KB, GNL, Portico, CLOCKSS, LOCKSS preservation initiatives.  Internal discussions ensued about the type of ownership model for eBooks – if libraries truly own the eBooks, shouldn’t they be responsible for archiving?  Heather disagrees and as a result, Springer preserves eBooks and reference works in a variety of ways.

  • PDFs + metadata or xml files when avaialble
  • preservation plan by the initiative (applicable)
  • internally via content management system

When Heather reached out to other publishers, she found that preservation depended upon the business model. For example, if access was via subscription, once payment ceases, so does access.

Business models and digital preservation:

  • access models vs. ownership
  • versioning and updates
  • eReferences vs. digital collections
  • eReference paired with journals

How is the content collected/preserved at Springer?

  • media storage vs. ftp site
  • ONIX feed of content + metdata (updated content may replace previous version or be along side)
  • harvesting via LOCKSS box or similar crawl

What are we trying to preserve?

  • content
  • organizational structure
  • inter-connections and linking
  • user experience
  • user generated content
  • concepts and the information surrounding the concepts

Main concerns for Publishers:

  • reference works are becoming more dynamic and much closer to databases
  • proliferation of file types that are included in these works
  • ensuring that ciations, updates, errata, and addenda are connected and resolve properly
  • what to preserve shapshot vs. entire user experience
  • where do we go from here?  interactive experiences and becoming more like games

Marie McCaffrey, Executive Director, HistoryLink.org

HistoryLink – have been building this encyclopedia for over 13 years.  They started with the concept of keeping information internal and branded and contributors are paid.  They follow all the same standards as if they were printing a book.  Every essay is signed/dated/sourced and when they make a change, it is noted.  The essays are always being updated, corrected or expanded.

Preservation includes backup tapes created at various times and stored in various locations.  Marie commented that google is a good source for archived content because of things in the cached files.

Ken DiFiore, Associate Director, Outreach & Participation Services,Portico

Ken discussed the role of Portico in preservation.  Portico is a non-profit organization.  last year they were certified as a trustworthy digital repository.  They receive content from a variety of players in the scholarly community.

Where does the preservation of eReference fit? and by the way, what is an eReference product?

Regardless of what the content is, they plan to apply the same preservation objectives to all content.  This includes:

  • Usability—the intellectual content of the item must remain usable via the delivery mechanism of current technology
  • Authenticity—the provenance of the content must be proven and the content an authentic replica of the original
  • Discoverability—the content must have logical bibliographic metadata so that it can be found by end users through time
  • Accessibility—the content must be available for use to the appropriate community

Portico goes through the content with great detail.  The source files goes through processing steps: files are converted or normalized, they apply a metadata wrapper, perform quality control and create standard archival package, then finally deposit package in archive management system.

They can manage versions of content through their metadata.  They have a new metadata schema that they use.

What are the characteristics of eReference?

  • content appearance is similar to journal databases
  • each article is a discrete unit
  • generally modeled after a print resource
  • looking at the underlying file formats, the production process is to send the print (PDF) off to a conversion to XML so that files are identical to files used in ejournal and ebook preservation.