Quickstart¶
The Document
is the basic unit of content in the Digital Archive. Every document is accompanied by metadata, including
a short description of its content, information about the archive it was obtained, subjects it is tagged with,
alongside other information.
Most of the Digital Archive’s documents originate from outside the United States. Translations are available for most
documents, as well as original scans in some cases. The Document
model describes the available
methods and attributes for documents.
The digitalarchive
package also provides models for other kinds of resources, such as
Subject
, Collection
,
Theme
, Coverage
, and
Repository
. These models can be used as filters when searching for
documents. Consult the Public API documentation for a full description of available models.
Searching¶
The Document
, Contributor
, Coverage
, Collection
, Subject
, and Repository
, models each expose a
match()
method that can be used to search for documents. The method accepts a
list of keyword arguments corresponding to the attributes of the matched for model.
>>> from digitalarchive import Document
>>> docs = Document.match(description="Cuban Missile Crisis")
The match method always returns an instance of digitalarchive.matching.ResourceMatcher
. ResourceMatcher
exposes a first()
method for to accessing a single document and an
all()
for accessing a list of all respondent records.
>>> from digitalarchive import Document
>>> docs = Document.match(description="Cuban Missile Crisis")
>>> docs.first().title
"From the Journal of S.M. Kudryavtsev, 'Record of a Conversation with Prime Minister of Cuba Fidel Castro Ruz, 21 January 1961'"
Searching for a record by its id
always returns a single record and ignores any other keyword arguments.
>>> from digitalarchive import Document
>>> test_search = Document.match(id="175898")
>>> test_search.count
1
>>> doc = test_search.first()
>>> doc.title
'Memorandum on a Discussion held by the Consul-General of the USSR in Ürümchi, G.S. DOBASHIN, with the Secretary of the Party Committee of the Xinjiang Uyghur Autonomous Region, Comrade LÜ JIANREN'
Filtering Searches¶
One can limit searches to records created between specific dates by passing a start_date
keyword, an end_date
keyword, or both.
>>> from digitalarchive import Document
>>> from datetime import date
>>> Document.match(start_date=date(1989, 4, 15), end_date=date(1989, 5, 4))
ResourceMatcher(model=<class 'digitalarchive.models.Document'>, query={'start_date': '19890415', 'end_date': '19890504', 'model': 'Record', 'q': '', 'itemsPerPage': 200}, count=22)
Searches can also be limited to records contained within a specific collection, subject, or other container. Matches for
Documents can be filtered by one or more Collection
, Repository
, Coverage
, Subject
, Contributor
,
and Donor
instances:
>>> from digitalarchive import Collection, Document
>>> xinjiang_collection = Collection.match(id="491").first()
>>> xinjiang_collection.name
'“Local Nationalism" in Xinjiang, 1957-1958'
>>> docs = Document.match(collections=[xinjiang_collection])
>>> docs.count
9
Hydrating Search Results¶
Most search results return “unhydrated” instances of resources with incomplete metadata. All attributes that are not yet
available are represented by NoneType
. Use the
hydrate()
method to download the full metadata for a resource.
>>> from digitalarchive import Document
>>> test_doc = Document.match(description="Vietnam War").first()
>>> test_doc.source is None
True
>>> test_doc.hydrate()
>>> test_doc.source
'AVPRF f. 0100, op. 34, 1946, p. 253, d. 18. Obtained and translated for CWIHP by Austin Jersild.'
It is also possible to hydrate all of the contents of a search result using the
hydrate()
method of ResourceMatcher
.
This operation can take some time for large result sets.
>>> from digitalarchive import Document
>>> docs = Document.match(description="Taiwan Strait Crisis")
>>> docs.hydrate()
When hydrating a result set, it it is also possible to recursively hydrate any child records (translations, transcripts,
etc.) in the result set by setting the recurse
parameter of
hydrate()
to True
.
>>> from digitalarchive import Document
>>> docs = Document.match(description="Taiwan Strait Crisis")
>>> docs.hydrate(recurse=True)