Cookbook

Examples of common operations encountered using the Digital Archive client.

Search for a resource by keyword

Run a keyword search across the title, description, and document content.

>>> from digitalarchive import Document
>>> # Find a document
>>> results = Document.match(description="Cuban Missile Crisis")
>>> # Acccess a single record.
From the Journal of S.M. Kudryavtsev, 'Record of a Conversation with Prime Minister of Cuba Fidel Castro Ruz, 21 January 1961'

Filter a Document search by language

Limit a search to documents in a certain language:
>>> from digitalarchive.models import Document, Language
>>> RYaN_docs = Document.match(description="project ryan", languages=[Language(id="ger")])
>>> RYaN_docs.count
32

Filter a Document search by date

Search for records after a certain date:
>>> from digitalarchive import Document
>>> from datetime import date
>>> postwar_docs = Document.match(start_date=date(1945, 9, 2))
Search for records before a certain date:
>>> from digitalarchive import Document
>>> from datetime import date
>>> prewar_docs = Document.match(end_date=date(1945, 9, 2))
Search for docs between two dates:
>>> from digitalarchive import Document
>>> from datetime import date
>>> coldwar_docs = Document.match(start_date=date(1945, 9, 2), end_date=date(1991, 12, 26))

Download the complete metadata for a document

>>> from digitalarchive import Document
>>> chernobyl_doc = Document.match(description="pripyat evacuation order").first()
>>> chernobyl_doc.repositories
>>> chernobyl_doc.repositories is None
True
>>> chernobyl_doc.hydrate()
>>> chernobyl_doc.repositories
[Repository(id='84', name='Central State Archive of Public Organizations of Ukraine (TsDAHOU)', uri=None, value=None), Repository(id='507', name='Archive of the Ukrainian National Chornobyl Museum', uri=None, value=None)]

Download the original scan of a document.

Original scans (referred to internally as MediaFile) are child records of Document. They must be hydrated before the PDF content can be accessed.

>>> from digitalarchive import Document
>>> chernobyl_doc = Document.match(id="208406").first()
>>> original_scan = chernobyl_doc.media_files[0]
>>> original_scan.pdf is None
True
>>> original_scan.hydrate()
>>> type(original_scan.pdf)
<class 'bytes'>
>>> len(original_scan.pdf)
10936093

Download the translation or transcript of a document.

Like original scans, Transcript and Translation are child records of Document. They must also be hydrated before their content can be accessed. Translations and transcripts are typically presented as HTML files, but may sometimes be presetened as PDFs.

>>> from digitalarchive import Document
>>> chernobyl_doc = Document.match(id="208406").first()
>>> translation = chernobyl_doc.translations[0]
>>> translation.hydrate()
>>> translation.filename
'TranslationFile_208406.html'

Serialize and dump a document to the filesystem.

>>> from digitalarchive import Document
>>> chernobyl_doc = Document.match(id="208406").first()
>>> chernobyl_doc.hydrate()
>>> chernobyl_doc_str = chernobyl_doc.json()
>>> chernobyl_doc == Document.parse_raw(chernobyl_doc_str)
True