Like other electronic library catalogs, the ESTC preserves its data in Machine-Readable Cataloging (MARC) format. MARC is widely used in library catalogs, providing an international standard for recording information about books and other materials and presenting it to catalog users. MARC was and is designed to allow computers to exchange information and present it in human-readable form; it was never intended to store data in a form that computers could use or process. The redesigned ESTC will be built around databases that store and make accessible its data for computational use and that can transform that data into the MARC format (or other library standards) for reuse in library systems.
Granulating Existing Data
Where applicable, data currently in the ESTC will be more finely “granulated” to improve searching. For example, a work with the publication date range of 1750-1760 will specify all years within that range so that searching “1752” retrieves the record. Similarly, data within the imprint field (place, publishers and date of publication) will be separated out so that unique elements can be searched and linked to other data sources (“Linked Data” below).
Collecting New Types of Data
We hope to be able to collect information that is not currently available in the ESTC (whether it is supported in the MARC format or not) and that will be of use to researchers. For example, a date of publication suggested by outside sources that differs from that in the imprint.
The survey at the end of this blog asks you to suggest other kinds of information about a work to collect.
When possible, individual database entries will be linked to outside resources, such as the Library of Congress authority file, the Virtual International Authority File, Wikipedia, etc. Moving to a linked data model will allow the ESTC to incorporate information from other projects and make ESTC data amenable for re-use by those projects. For example, armed with the appropriate VIAF identifier, the ESTC could “grab” and present information about Thomas Middleton from other sites using the same identifier, and as ESTC records related to Middleton’s works were updated, those hypothetical websites would be able to present the most current information from the ESTC at all times.
Some entries will not have external authority records that the ESTC can link to, and for those, the ESTC will establish authority records that users can curate and other projects can link to (see “Curating”).
The ESTC will periodically index openly-accessible URLs in ESTC records to which it has access (both legally and technically), and make that full text searchable (see “Searching”).
Researchers will be able to download discrete bits of data as well as entire records. For example, one might look for all works by Daniel Defoe published between 1700 and 1750 and then retrieve the publishers associated with those works. Research projects will be able to access the ESTC through an API to capture data for reuse.