Here are three proposals concerning the structure and maintenance of formal, inter-referential, digitally stored texts: (1) include abstract atomic identifiers in texts, (2) identify these identifiers with references to text objects, and (3) keep among the texts records of computationally substantiated claims about those texts. We use "formal" in a narrow sense approximating computer-checkable.
We are informed by informal symbolic practices used in mathematical text and program source text, which we hope to enhance and exploit explicitly; the basic management problem is how to alter texts rather freely without ruining the bases for claims depending upon them, which becomes an issue of accounting for various dependencies between texts. We are not here proposing the use of abstract structured text; nonetheless, experience using it in Nuprl4 has led us to appreciate the benefits of distinguishing abstract form from concrete presentation, and also has shown us the cognitive and practical unimportance of just which identifiers occur in abstract structured texts when texts are mediated by a system that realizes concrete presentation.
Abstract treatment of identifiers involves concrete realization during communications between "text servers" and their clients. The benefit of treating identifiers abstractly is a radical avoidance of name collision, even at runtime, and is important for claims about texts that are based upon program execution. The notion of text collections and equivalence of text collections modulo change of identifiers is made precise by the second proposal. The complete identification of abstract identifiers with reference values is discussed, addressing the issues of dangling pointers, the association of ordinary symbolic identifiers with meaningful defining texts, the "flatness" of the pointer space, and the perhaps counterintuitive collapse of two abstract name spaces into one.
The notion of "certification system" is introduced as a formalization of generic computationally defined claims about texts, emphasizing the diversity of clients who may not agree on a common "logic." The notion of a certificate whose computational meaning, in the context of the texts it refers to, is completely specified (although perhaps non-deterministic), and the notion of a certificate further being deterministic, are introduced and elaborated with regard to their epistemic value. What it means to give certificate texts the force of factual records, and mechanisms to accomplish this, are discussed. Scenarios for practically exploiting identifier abstractness and fully deterministic certificates are considered, involving the combination of partially independently developed texts and the experimental modification of texts in a collection. The importance of implementing multiple certification systems is articulated.
Peter B. Hirtle, 2000:
Archival Authenticity in a Digital Age, CLIR
Anne J. Gilliland-Swetland, 2000: Enduring Paradigm, New Opportunities: The Value of the Archival Perspective in the Digital Environment, CLIR
Kenneth Thibodeau, 2001: Building the Archives of the Future: Advances in Preserving Electronic Records at the National Archives and Records Administration, D-Lib
Roy Levin, Paul R. McJones, 1993: The Vesta Approach to Precise Configuration of Large Software Systems, SRC
Jason Hickey et al.: MetaPRL Home Page, Cal Tech
Allen, Bickford, Constable, Eaton, Kreitz and Lorigo, 2002: FDL: A Prototype Formal Digital Library, Cornell
Erratum: Dummett's book on Frege is from 1981.
There are some companion notes for this report.