The ESTC is both a bibliography and a union catalog. Its nearly 500,000 records for pre-1801 English printing provide information to identify individual imprints (author, title, place and date of publication) and to distinguish similar imprints from one another (by recording characteristic variations in features such as catchwords, misspellings, etc.). Each ESTC record also provides holdings information, identifying which institutions have copies of a particular title and, in many cases, providing information specific to each copy (such as provenance and annotations).
Conceived from the outset as an electronic resource, the ESTC maintains its data in the Machine-Readable Cataloging (MARC) format used by library catalogs everywhere. Ever since the project became available through the British Library’s online catalog, researchers have been able to freely search this MARC data and retrieve records based on their search criteria.
As the ESTC moves into the 21st century, it must change to accommodate shifts in the research landscape, the needs of its users, and its available resources. The new ESTC aims:
1) To harness the expertise of its users in the ongoing curation and enrichment of its data.
Over the last decade or so, through the use of computer algorithms and the efforts of in-house staff, the CBSR has processed hundreds of thousands of MARC records contributed by libraries throughout the world and by digitizing projects such as Google Books and the Hathi Trust, adding holdings to existing records and creating records for newly-identified works. The ESTC simply does not have the staff to keep pace with this explosion of records. The new ESTC will both make this contributed data available to users to search and invite users to assist in the bibliographical “detective work” required to fully process the records.
At the same time, the new ESTC will welcome users’ contributions in correcting, refining, and expanding the information in both its bibliographic records (for instance, by supplying evidence for a publication date differing from that in the imprint) and holdings data (for example, by linking digitized works to physical copies).
To aid in this “curatorial” work, the project will provide various ways in which to alert users to recent additions to the database or to areas in need of editorial oversight.
2) To become the “electronic hub” for relevant digitization projects and, as much as possible, make the universal corpus of digitized early modern English works accessible to scholars.
The ESTC has added to its records links to digital texts in the fee-based databases Early English Books Online (EEBO) and Eighteenth-Century Collections Online (ECCO). More recently it has also added links to thousands of freely-available texts in projects like Google Books and the HathiTrust. The new ESTC will expand the number of links to openly-accessible digitized works by allowing users to match contributed records for digitized works to ESTC records, and by making it easier for researchers and librarians to add links for electronic copies.
Moreover, the new ESTC aims to extend its role as a union catalog to the digital sphere by indexing links to these digital reproductions of ESTC texts. By doing so, the new ESTC will make it possible for users to search not only the bibliographical records for those works, but their full texts as well. Though achieving this goal will require overcoming substantial technical and legal hurdles, it is imperative if the ESTC is to remain relevant as scholars increasingly draw on digital surrogates.
3) To become a resource for new kinds of inquiry by making ESTC data more open to the wider web and more easily accessible for use in other digital projects.
Although the ESTC, like many online library catalogs, already refers to authority work done by other institutions (for instance, official author names and subject headings provided by the Library of Congress), it remains a self-contained catalog, its data largely cut off from other sources of information. The new ESTC will be able to connect to other sources around the web to deliver and retrieve data dynamically by using widely-recognized identifiers (such as those from the Library of Congress and the Virtual International Authority File (VIAF)) when possible, as well as establishing identifiers for other data.
While the MARC format is crucial for facilitating exchange with libraries, data in the new ESTC can and should be extended beyond MARC specifications to better serve the needs of researchers, both by making the data more accessible and collecting data not represented in MARC. Compatibility with library systems will be retained by mapping appropriate non-MARC data to applicable MARC fields.
The new ESTC will make it easier to extract data for use elsewhere, both through bulk downloads of records in a variety of formats (text, .xls, .csv, etc.) and through APIs opening up ESTC data for use on other sites. Whether you are a literary scholar wanting to retrieve bibliographical entries for all of an author’s works published during his or her lifetime or a developer wanting to present ESTC data as part of another web-based project, the new ESTC will make it possible to retrieve, extract, reuse, and remix ESTC data either through versioned downloads or real-time queries.