Thursday, December 14, 2006

EAD Conclusion:

EAD is a rapidly changing metadata scheme. From SGML to XML, to OAI rebellion to gradually increasing OAI compliance, EAD is changing and improving year to year. EAD has completely established itself as the archival metadata scheme -- there is no real rival -- but it has certain limitations in useability and interoperability that are slowly being addressed.

The XML foundation of EAD is admittedly complicated. It is not practical for an archivist to manually input all the DTD angled brackets <>s of the language. Automatic EAD tools are being developed, but there is no universally accepted tool.

Overall, EAD has brought archivists, explorers of vanished historical worlds into the forefront of technology. XML EAD is the 21st century answer to the question of how to find 17th century, 16th century, 15th century and before information.

Authority Control:

One of EAD's strengths, and a reason for its widespread acceptance, is that it is relatively flexible. More than most other schemas, EAD lets for variety in detail of coding, sequence, and quanity of information. The LEVEL attribute can be entered high up on the hierarchy or low down on it. This flexibilty of course does not mean that EAD is standards free. It wouldn't be a proper metadata scheme without standards. The all-important XML tags vary not at all.

EAD is maintained by the Society of American Archivists, the Research Libraries Group, and the Library of Congress.

The SAA is more involved with actually talking to archivists at workshops, its role may be smaller than the roles played by RLG and the LOC.

The RLG manages the EAD Best Practices Manual, this is scripture for those who wish to make universally recognized EAD. The EAD Application Guidelines provide indispensable information about the tags that EAD runs on.

The LOC is the official master of EAD.

http://www.loc.gov/ead/

It is the LOC that maintains the databases of the EAD tags.

Overall, while EAD tags are standardized, the EAD hierarchy is not. Different archivists may use the tags in slightly different places in the XML hierarchy.

EAD and OAI-PMH

As related in a previous post, EAD does not "play well with OAI-PMH."

OAI, the Open Archives Initiative-Protocols for Metadata Harvesting(www.openarchives.org) is an organization of college-level librarians dedicated to making electronic materials publically available and mutually translatable. Contrary to the title, using the word "archives," OAI has little to actually do with archives. The Society of American Archivists does not bless the OAI project.

From examining OAI's website, there are some issues in the OAI-EAD relationship. For its part, OAI gives less attention to EAD than it does other metadata schemes. For their part, archivists rarely try to make their finding aids EAD compliant. (for verification of this, go to http://web.library.uiuc.edu/ahx/workpap/MARAC03.pdf). Quoting Chris Prom, a pro-OAI archivist at the University of Illinois Urbana-Champagne, "We 'real' archivists have a lot ot learn from those who are implementing [OAI]"

Archivists have their own reasons for not embracing OAI. According to William Landis, OAI fudges a finding aid's description of provenance and original order. As ("Nuts and Bolts: Implementing Descriptive Standards to Enable Virtual Collections," Journal of Archival Organization, Journal of Archival Organization.)

A basic problem with OAI for EAD is that OAI is focused on making metadata for specific items, where archivists are more concerned with context, the order of the documents, and overall collections.

To give a specific example, EAD is capable of differentiating letters to Mark Twain, from Mark Twain, and about Mark Train. Other metadata schemes, the kinds that OAI is made for, do not handle the to, from, about issue as well.

EAD Quality Control:

The RLG (Research Libraries Group) unveiled a new EAD quality control program in February 2006. The new quality control program is an update on their preexisting EAD Report Card.

http://www.rlg.org/en/page.php?Page_ID=20513

The program itself is a complicated product, but it is extremely easy to use. If one has downloaded the automated program (a webbased version is available, but RLG reports that it is slower), all a user has to do to check his EAD's XML is use a Browse function, and, voila, errors are pointed out.

Wednesday, December 13, 2006

Interoperability:

EAD’s hierarchical nature and flexibility make it a more powerful tool for archivists, but that same hierarchical nature and flexibility make interoperability harder than it would be otherwise. In other words, translating a title into EAD is not as simple as translating a title from MARC into Dublin Core is not as easy as crosswalking Marc 245 "Last of the Mohicans" to Dublin Core "Last of the Mohicans" title.

An interesting discussion of EAD conversion problems resulting for its hierarchical nature is at this blog, http://metadataintern.blogspot.com/. Go to the entry "The end is in sight. Or, is it?"

Particular angst is over OAI conversion issues.

The debate over EAD and interoperability has received some attention from the archival community. At the 2002 SAA convention Christopher J. Prom made a wittily titled "Does EAD Play Well with Other Metadata Standards: Searching and Retrieving EAD Using the OAI Protocols."

EAD crosswalk construction is not impossible, but it is complicated, by EAD’s hierarchies, wrappers, etc. In other metadata schemes crosswalks can easily be constructed between neatly analogous elements, not so in EAD.

The Library of Congress and Getty provide for the following transferal options:
(the Getty crosswalk page is at: http://www.getty.edu/research/conducting_research/standards/intrometadata/metadata_element_sets.html)

ISAD-G to EAD
EAD to ISAD-G
Dublin Core to EAD
USMARC to EAD

Notice that conversion from EAD to the other metadata schemes is not covered.

RLG (the Research Libraries Group, a non-profit university consortium) offers its own service, cooffered with a for-profit known as Apex CoVantage, for conversion to EAD:

http://www.rlg.org/en/page.php?Page_ID=448

The SAA also offers a conversion service.

Using an EAD Finding Aid: How hard is it?

Elizabeth Yakel has charged that EAD-based finding aids are difficult for the public to use. While the results of her study at the University of Pittsburgh seem damning, her study lacks scientific validity, since she does not compare success with EAD finding aids to success with non-EAD finding aids.

I lack the means to do a true scientific study, but I can share my own experiences.
My experience with EAD finding aids is that they have their limitations, but the experience one has with an EAD finding aid is a function of how detailed the finding aid itself.

For this exercise, I decided to experiment with several finding aids created by the University of Michigan's Bentley Historical Library on Michigan-issues.

http://bentley.umich.edu/EAD/

The search interface is very attractively designed. The interface allows one to search by "entire finding aid," names, places, subjects, call number, collection title, and repository. There are also simple and Boolean options.

For this exercise, I did a few experimental searches on subject.

I entered in the simple search field a few topics that I was certain would be covered in at least a few of these Michigan finding aids:

Detroit Riots
George Romney
Mesabi Range
Henry Ford

Simple Search

When I searched by subject, the Detroit Riots did not appear. Nor did "12th Street Riot" produce anything. Only when I searched by the entire finding aid did Detroit Riots hits come up. There were scores of Detroit Riots hits, so the inability of the finding aid to produce hits for "Detroit Riots" as a subject is possibly a weakness of EAD.

Aside from somehow not listing the Detroit Riots as a subject, using EAD was easy. George Romney hits came up when I used name and subject, Mesabi Range hits came up as a subject and a place, Henry Ford came up as a subject and a name. Curiously, there was only one hit for George Romney with either subject or name. Since he was an important governor, it seems difficult to believe that there would be only one collection that has materials relating to him.

Boolean Search

Boolean Search worked excellently. The interface was different from the standard Google/Yahoo interface in that it had separate cells for different terms, plus a dropdown menu for and/or/not, but it was intuitive. Detroit AND Riots produced relevant hits.

Overall, in my limited experience, I feel that criticisms of EAD for being difficult to use are mostly, but not completely, unfounded.

Tuesday, December 12, 2006

Metadata Schemes: Does the Public Care?

In class December 11th we had a discussion on the issue of whether or not it mattered to the public what metadata scheme archivists used. The majority of the class seemed to be of the opinion that the public was indifferent to metadata, and that discussions of metadata had extremely limited impact outside of the information science field.

It is difficult to argue with this view. The public cares about as much about the inner workings of various finding aids as it does the inner workings of microwave ovens, cellular metabolism, and the American legislative process. The term "metadata" is still a highly specialized term, when I tell people that I am taking a course on "data about data" I get some very quizzical looks, even from educated people.

Librarians/information science professionals are extremely keen on cataloging and organizing information. Organizing information is how we define our profession and without that goal we would lose our raison d'etre.

The unfortunate thing for librarians/information science professionals is that we are far from the cutting edge of the information field. Leaps in humanity's ability to find information come not from university information scientists, but from entrepreneurial computer scientists. At the same time in the 1990s that archivists were arguing about EAD and pushing for its acceptance, a few Comp Sci PhD students at Stanford were creating Google.

Google, a web engine, not a programming language, has changed how we seek and find information more than any metadata scheme. Google's power comes from the fact that Larry Page and Sergey Brin's success came from something that changed how a program seeks information. By contrast, a metadata scheme attempts to be powerful based on how people input (ie, tag) information.

Friday, November 17, 2006

Authoring EAD Documents:

As SGML or XML, EAD is a complicated looking programming language. To actually write out EAD code, with all its brackets, dashes, periods, and numbers, would be extremely consuming, even for an experienced programmer.

Naturally, a solution has been found in the creation of programs where archivists can enter metadata elements like title, description, creation year, etc in fields, and then have the program translate the fields into SGML or XML. These programs are referred to as "Authoring Software."

Authoring Software varies archive to archive. Some archives will use special XML/SGML editors. ASCII text editors, like Notepad, work as well. From what I have read, adaptations of Microsoft Access may be the most commonly used. According to my boss at the Jewish Historical Society, the technology moves very quickly. (in fact, EAD only recently switched from SGML to XML)

If an archives can afford it, a native SGML/XML editor may be the best bet. These are programs which free an archivist from actually having to write the complicated SGML/XML code himself or herself. In these programs, the code is automatically generated, and all the archivist has to do is type the names, titles, dates, etc. There are other specialized native editors called "tree structured" editors. Interface Electronics' Internet Archivist is the best known EAD-specific program. There are many XML and SGML programs available which are not written for EAD, but can be used for EAD.

Friday, October 13, 2006

Since the 2003 conference of the Society of American Archivists, archivists have been more willing to express their issues with Encoded Archival Description. At that conference, a session was held “Demystifying EAD" - three archivists shared ways that they had improved Encoded Archival Description. One had use a Word Macros to create the metadata and two others Access.

In 2004 EAD suffered a further blow when Elizabeth Yakel released a study that demonstrated that EAD-based finding aids were actually difficult to use. The subjects of the study were information science students at the University of Pittsburgh. The finding aids pertained to Pittsburgh History. Yakel’s study compellingly demonstrated that EAD finding aids were not easy for non-archivists to use. If information science students could not efficiently use EAD finding aids, then what about historians? What about the general public?

A conclusion from the Yakel study is that perhaps archivists have over focused on EAD metadata fields to the detriment of usability.

In common with other fields, there is a growing effort to make EAD more of an open-source application. A project called the Archivists Toolkit – led by NYU, with UC-San Diego and the Five Colleges of Los Angeles – has existed for several years to offer open source options for archivists. The project is funded by the Andrew W. Mellon Foundation.

Sunday, October 01, 2006

EAD’s Architecture: The Programming Itself


EAD formerly used SGML as its programming language. EAD’s version of SGML was much easier to use than MARC’s, since EAD’s tag names were derived intuitively from their function, whereas MARC’s are alphanumeric tags as random as phone numbers.

Compare:

Title in EAD - For Whom the Bell Tolls

Title in MARC - 245 04 $aFor Whom the Bell Tolls/

EAD is similarly easy for authors, places of publication, copyright. EG .

One can also use nested embeddings. If a title has a person’s name in it, one can use a name tag to tell the computer that a name is there.

A good feature of EAD is its hierarchical nature. The entire archival description, including the title, author, creator, etc is under a tag . For the inventory, one uses tags like

component first level.


The (Descriptive Information) is one of the most important sections for the archivist. Under the one enters information like and .




EAD’s problem is that it is overall difficult for computers to understand. EAD is less machine-readable than MARC and HTML. There really is no good browser for it so EAD switched to an XML DTD relatively recently.

Tuesday, September 19, 2006

About EAD:


Background on the metadata, i.e., what community developed it, why it was developed, when it was developed.

Led by Daniel Pitti, EAD was developed by librarians and computer scientists at Berkeley to be a machine-readable metadata tool for archives, manuscript collections, and museums. From its inception, EAD has been designed to be accessible from the internet. The fathers and mothers of EAD had the following five hopes for their child: "1) ability to present extensive and interrelated descriptive information found in archival finding aids, 2) ability to preserve the hierarchical relationships existing between levels of description, 3) ability to represent descriptive information that is inherited by one hierarchical level from another, 4) ability to move within a hierarchical informational structure, and 5) support for element-specific indexing and retrieval."

The founders of EAD chose SGML (Standard Generalized Markup Language) for their language. SGML was chosen because different programs (ie, machines) would be able to read it, which was the whole point of EAD. SGML's power comes from its use of DTD (Document Type Definition) tags. By forcing archivists creating EAD entries to use tags like "<>", EAD is universally consistent.

As mentioned previously, EAD was developed for museums too, but it has really only caught on with archivists. Museums have their own separate metadata schemes. In 1995/1996 the Society of American Archivists made EAD their official metadata scheme.

Links to the Web Site that maintains the standard & information about the maintaining organization

A division of the LOC maintains EAD.

What elements are included in the scheme?

EAD has always had elements like title, author, and date of creation. Since EAD is primarily for archives, there is a special archival description element, . The newest EAD system has information like language and Materials Specific Details .

One useful feature of EAD is that it has tags for title, emphasis, and foreign words.

What projects are using the scheme?

The National Union Catalog of Manuscript Collections and its members collections are an example.

Significant readings & discussions about the scheme (articles, web sites, books, blogs, listservs)

There are a number of websites affiliated with the SAA and the LOC dedicated to EAD. I already mentioned the official EAD page, so here are a few more:

EAD receiving a best practices award from the Research Libraries

EAD report card.

The EAD listserv

This website is my personal favorite. Its language is very clear:

EAD site

Your opinion on the usefulness of the scheme and why

I have not even used MARC, so I concede that I may not have a solid basis for judging EAD. However, since EAD is so widely used, and it now starts with the well-known SGML format, I think it is an attractive system for any archive that does not have PastPerfect.

EAD - the archivist's friend

Introduction:

The topic of my blog is going to be Encoded Archival Description - EAD. The reason I am choosing EAD over a wide variety of other metadata schemes is its practicality for me as an archivist. I feel that learning EAD will make me a better, more professional archivist and technology user. Since my historical society does not use EAD, mastering EAD can possibly benefit my historical society too.

My archival employer is the Jewish Historical Society of Metrowest. In terms of how active we are and how many people see our exhibits, we are possibly the best historical society in New Jersey. Our archives wing has a number of fascinating and unique collections, but (until very recently), we have been relatively primitive in our metadata. Up until this past week, when we installed PastPerfect on all of our computers, we used Microsoft Word to create finding aids.

Instead of complicated forms with HTML elements like:
bracket title /bracket
bracket creator /bracket
bracket date /bracket

We just list something's subject. For instance, Rabbi Solomon Foster's 3/14/1935 handwritten letter about Passover would just be keyworded as "1935 Passover letter." Since the letter would be with other document's by Solomon Foster, it would be unnecessary to create all those element fields.

We readily concede that our system is inferior to a proper archival metadata scheme. You could not look up every letter by Solomon Foster in our system. Our system's advantage is that it is extremely easy to use. Since we rely on (often elderly) volunteers to do most of our work, our system is necessary.

Very recently we installed PastPerfect on all our computers. PastPerfect is a program makes learning complicated metadata schemes unnecessary. With PastPerfect, making an EAD or Dublin Core record is just as easy as making a bibliography is with Noodlebib.