Thursday, December 14, 2006

EAD Conclusion:

EAD is a rapidly changing metadata scheme. From SGML to XML, to OAI rebellion to gradually increasing OAI compliance, EAD is changing and improving year to year. EAD has completely established itself as the archival metadata scheme -- there is no real rival -- but it has certain limitations in useability and interoperability that are slowly being addressed.

The XML foundation of EAD is admittedly complicated. It is not practical for an archivist to manually input all the DTD angled brackets <>s of the language. Automatic EAD tools are being developed, but there is no universally accepted tool.

Overall, EAD has brought archivists, explorers of vanished historical worlds into the forefront of technology. XML EAD is the 21st century answer to the question of how to find 17th century, 16th century, 15th century and before information.

Authority Control:

One of EAD's strengths, and a reason for its widespread acceptance, is that it is relatively flexible. More than most other schemas, EAD lets for variety in detail of coding, sequence, and quanity of information. The LEVEL attribute can be entered high up on the hierarchy or low down on it. This flexibilty of course does not mean that EAD is standards free. It wouldn't be a proper metadata scheme without standards. The all-important XML tags vary not at all.

EAD is maintained by the Society of American Archivists, the Research Libraries Group, and the Library of Congress.

The SAA is more involved with actually talking to archivists at workshops, its role may be smaller than the roles played by RLG and the LOC.

The RLG manages the EAD Best Practices Manual, this is scripture for those who wish to make universally recognized EAD. The EAD Application Guidelines provide indispensable information about the tags that EAD runs on.

The LOC is the official master of EAD.

http://www.loc.gov/ead/

It is the LOC that maintains the databases of the EAD tags.

Overall, while EAD tags are standardized, the EAD hierarchy is not. Different archivists may use the tags in slightly different places in the XML hierarchy.

EAD and OAI-PMH

As related in a previous post, EAD does not "play well with OAI-PMH."

OAI, the Open Archives Initiative-Protocols for Metadata Harvesting(www.openarchives.org) is an organization of college-level librarians dedicated to making electronic materials publically available and mutually translatable. Contrary to the title, using the word "archives," OAI has little to actually do with archives. The Society of American Archivists does not bless the OAI project.

From examining OAI's website, there are some issues in the OAI-EAD relationship. For its part, OAI gives less attention to EAD than it does other metadata schemes. For their part, archivists rarely try to make their finding aids EAD compliant. (for verification of this, go to http://web.library.uiuc.edu/ahx/workpap/MARAC03.pdf). Quoting Chris Prom, a pro-OAI archivist at the University of Illinois Urbana-Champagne, "We 'real' archivists have a lot ot learn from those who are implementing [OAI]"

Archivists have their own reasons for not embracing OAI. According to William Landis, OAI fudges a finding aid's description of provenance and original order. As ("Nuts and Bolts: Implementing Descriptive Standards to Enable Virtual Collections," Journal of Archival Organization, Journal of Archival Organization.)

A basic problem with OAI for EAD is that OAI is focused on making metadata for specific items, where archivists are more concerned with context, the order of the documents, and overall collections.

To give a specific example, EAD is capable of differentiating letters to Mark Twain, from Mark Twain, and about Mark Train. Other metadata schemes, the kinds that OAI is made for, do not handle the to, from, about issue as well.

EAD Quality Control:

The RLG (Research Libraries Group) unveiled a new EAD quality control program in February 2006. The new quality control program is an update on their preexisting EAD Report Card.

http://www.rlg.org/en/page.php?Page_ID=20513

The program itself is a complicated product, but it is extremely easy to use. If one has downloaded the automated program (a webbased version is available, but RLG reports that it is slower), all a user has to do to check his EAD's XML is use a Browse function, and, voila, errors are pointed out.

Wednesday, December 13, 2006

Interoperability:

EAD’s hierarchical nature and flexibility make it a more powerful tool for archivists, but that same hierarchical nature and flexibility make interoperability harder than it would be otherwise. In other words, translating a title into EAD is not as simple as translating a title from MARC into Dublin Core is not as easy as crosswalking Marc 245 "Last of the Mohicans" to Dublin Core "Last of the Mohicans" title.

An interesting discussion of EAD conversion problems resulting for its hierarchical nature is at this blog, http://metadataintern.blogspot.com/. Go to the entry "The end is in sight. Or, is it?"

Particular angst is over OAI conversion issues.

The debate over EAD and interoperability has received some attention from the archival community. At the 2002 SAA convention Christopher J. Prom made a wittily titled "Does EAD Play Well with Other Metadata Standards: Searching and Retrieving EAD Using the OAI Protocols."

EAD crosswalk construction is not impossible, but it is complicated, by EAD’s hierarchies, wrappers, etc. In other metadata schemes crosswalks can easily be constructed between neatly analogous elements, not so in EAD.

The Library of Congress and Getty provide for the following transferal options:
(the Getty crosswalk page is at: http://www.getty.edu/research/conducting_research/standards/intrometadata/metadata_element_sets.html)

ISAD-G to EAD
EAD to ISAD-G
Dublin Core to EAD
USMARC to EAD

Notice that conversion from EAD to the other metadata schemes is not covered.

RLG (the Research Libraries Group, a non-profit university consortium) offers its own service, cooffered with a for-profit known as Apex CoVantage, for conversion to EAD:

http://www.rlg.org/en/page.php?Page_ID=448

The SAA also offers a conversion service.

Using an EAD Finding Aid: How hard is it?

Elizabeth Yakel has charged that EAD-based finding aids are difficult for the public to use. While the results of her study at the University of Pittsburgh seem damning, her study lacks scientific validity, since she does not compare success with EAD finding aids to success with non-EAD finding aids.

I lack the means to do a true scientific study, but I can share my own experiences.
My experience with EAD finding aids is that they have their limitations, but the experience one has with an EAD finding aid is a function of how detailed the finding aid itself.

For this exercise, I decided to experiment with several finding aids created by the University of Michigan's Bentley Historical Library on Michigan-issues.

http://bentley.umich.edu/EAD/

The search interface is very attractively designed. The interface allows one to search by "entire finding aid," names, places, subjects, call number, collection title, and repository. There are also simple and Boolean options.

For this exercise, I did a few experimental searches on subject.

I entered in the simple search field a few topics that I was certain would be covered in at least a few of these Michigan finding aids:

Detroit Riots
George Romney
Mesabi Range
Henry Ford

Simple Search

When I searched by subject, the Detroit Riots did not appear. Nor did "12th Street Riot" produce anything. Only when I searched by the entire finding aid did Detroit Riots hits come up. There were scores of Detroit Riots hits, so the inability of the finding aid to produce hits for "Detroit Riots" as a subject is possibly a weakness of EAD.

Aside from somehow not listing the Detroit Riots as a subject, using EAD was easy. George Romney hits came up when I used name and subject, Mesabi Range hits came up as a subject and a place, Henry Ford came up as a subject and a name. Curiously, there was only one hit for George Romney with either subject or name. Since he was an important governor, it seems difficult to believe that there would be only one collection that has materials relating to him.

Boolean Search

Boolean Search worked excellently. The interface was different from the standard Google/Yahoo interface in that it had separate cells for different terms, plus a dropdown menu for and/or/not, but it was intuitive. Detroit AND Riots produced relevant hits.

Overall, in my limited experience, I feel that criticisms of EAD for being difficult to use are mostly, but not completely, unfounded.

Tuesday, December 12, 2006

Metadata Schemes: Does the Public Care?

In class December 11th we had a discussion on the issue of whether or not it mattered to the public what metadata scheme archivists used. The majority of the class seemed to be of the opinion that the public was indifferent to metadata, and that discussions of metadata had extremely limited impact outside of the information science field.

It is difficult to argue with this view. The public cares about as much about the inner workings of various finding aids as it does the inner workings of microwave ovens, cellular metabolism, and the American legislative process. The term "metadata" is still a highly specialized term, when I tell people that I am taking a course on "data about data" I get some very quizzical looks, even from educated people.

Librarians/information science professionals are extremely keen on cataloging and organizing information. Organizing information is how we define our profession and without that goal we would lose our raison d'etre.

The unfortunate thing for librarians/information science professionals is that we are far from the cutting edge of the information field. Leaps in humanity's ability to find information come not from university information scientists, but from entrepreneurial computer scientists. At the same time in the 1990s that archivists were arguing about EAD and pushing for its acceptance, a few Comp Sci PhD students at Stanford were creating Google.

Google, a web engine, not a programming language, has changed how we seek and find information more than any metadata scheme. Google's power comes from the fact that Larry Page and Sergey Brin's success came from something that changed how a program seeks information. By contrast, a metadata scheme attempts to be powerful based on how people input (ie, tag) information.