Data

Like other electronic library catalogs, the ESTC preserves its data in Machine-Readable Cataloging (MARC) format.  MARC is widely used in library catalogs, providing an international standard for recording information about books and other materials and presenting it to catalog users.  MARC was and is designed to allow computers to exchange information and present it in human-readable form; it was never intended to store data in a form that computers could use or process.  The redesigned ESTC will be built around databases that store and make accessible its data for computational use and that can transform that data into the MARC format (or other library standards) for reuse in library systems.

Granulating Existing Data

Where applicable, data currently in the ESTC will be more finely “granulated” to improve searching.  For example, a work with the publication date range of 1750-1760 will specify all years within that range so that searching “1752” retrieves the record.  Similarly, data within the imprint field (place, publishers and date of publication) will be separated out so that unique elements can be searched and linked to other data sources (“Linked Data” below).

Collecting New Types of Data

We hope to be able to collect information that is not currently available in the ESTC (whether it is supported in the MARC format or not) and that will be of use to researchers.  For example, a date of publication suggested by outside sources that differs from that in the imprint.

The survey at the end of this blog asks you to suggest other kinds of information about a work to collect.

Linked Data

When possible, individual database entries will be linked to outside resources, such as the Library of Congress authority file, the Virtual International Authority File, Wikipedia, etc.  Moving to a linked data model will allow the ESTC to incorporate information from other projects and make ESTC data amenable for re-use by those projects.  For example, armed with the appropriate VIAF identifier, the ESTC could “grab” and present information about Thomas Middleton from other sites using the same identifier, and as ESTC records related to Middleton’s works were updated, those hypothetical websites would be able to present the most current information from the ESTC at all times.

Some entries will not have external authority records that the ESTC can link to, and for those, the ESTC will establish authority records that users can curate and other projects can link to (see “Curating”).

Full Text

The ESTC will periodically index openly-accessible URLs in ESTC records to which it has access (both legally and technically), and make that full text searchable (see “Searching”).

Downloading

Researchers will be able to download discrete bits of data as well as entire records.  For example, one might look for all works by Daniel Defoe published between 1700 and 1750 and then retrieve the publishers associated with those works.  Research projects will be able to access the ESTC through an API to capture data for reuse.

4 responses to “Data

  1. Nicolas K. Kiessling

    The ESTC would be a place to include provenance information in some imprint field. Most libraries do not have such entries in their electronic catalogs, but a few do. E.g., The Harry Ransom online catalogue has excellent provenance information in each entry (in printed items where such information exists. The Newberry online catalog does not have such information, and their entries, on the whole, are not as useful. The Folger has some such information, but it is not in every record.

    • JL

      Provenance information is already included, to the extent that reporting libraries are willing and able to include it (at least when reporting via the web interface). The issue of authority control also arises here, but having the name recorded exactly as it is (e.g.) on a bookplate is probably as helpful to someone looking for an individual’s books. Provenance and binding information is searchable.

  2. A major improvement to this already excellent database would be the implementation of a form (or a better form) of authority control for the printers and publishers. If I want to find all the works published by W. Rowland and I look for ‘Rowland’ I will also retrieve the books printed by J. Rowland. If I search for the books printed by W. Williamson, I will not retrieve the editions where he spelled his name VVilliamson.
    The STCN (Netherlands) and STCV (Flanders) do a terrific job at this, both in different ways, which is really helpful especially in the case of printers’ dynasties with many homonyms (there are at least five printers named Hieronymus Verdussen over a span of two centuries).
    The authority file could also include other information such as biographical details (life dates, name of parents and children, name of apprentices, …), street address, devices, gender, … which could all be separately searchable, and data could be made available to researchers for downloading. Links could be provided to the CERL Thesaurus.
    Since the ESTC includes the imprints in full, the data are already there and it isn’t even necessary to go back to the original documents.

  3. Pingback: What We’re Reading: June 18th-24th | JHIBlog

Leave a comment