Network Working Group L. Masinter Internet-Draft Adobe Systems Intended status: Informational February 3, 2009 Expires: August 7, 2009 "duri" and "tdb" URN namespaces based on dated URIs draft-masinter-dated-uri-05 Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on August 7, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract This document defines two namespaces of URNs, based on using a timestamp with an (encoded) URI. The results are namespaces in which Masinter Expires August 7, 2009 [Page 1] Internet-Draft duri and tdb URN namespaces February 2009 names are readily assigned, offer the persistence of reference that is required by URNs, but do not require a stable authority to assign the name. The first namespace ("duri") is used to refer to URI- identified resources as they appeared at a particular time. The second namespace ("tdb") is useful as a way of creating URNs that refer to physical objects or even abstractions that are not themselves networked resources. The definition of these namespaces may reduce the need to define new URN namespaces merely for the purpose of creating stable identifiers. In addition, they provide a ready means for identifying "non- information resources" by semantic indirection. Note This document is not a product of any working group. Many of the ideas here have been discussed since 2001. This document has been discussed on the mailing list . Masinter Expires August 7, 2009 [Page 2] Internet-Draft duri and tdb URN namespaces February 2009 Table of Contents 1. Overview and Requirements . . . . . . . . . . . . . . . . . . 4 1.1. Easy URN assignment . . . . . . . . . . . . . . . . . . . 4 1.2. Persistent identifiers . . . . . . . . . . . . . . . . . . 4 1.3. URIs for abstractions . . . . . . . . . . . . . . . . . . 5 2. Namespace definitions . . . . . . . . . . . . . . . . . . . . 6 2.1. "duri" namespace . . . . . . . . . . . . . . . . . . . . . 6 2.2. "tdb" namespace . . . . . . . . . . . . . . . . . . . . . 6 3. Encoding URIs . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1. Characters that must be encoded . . . . . . . . . . . . . 7 3.2. Hierarchy and unencoded / . . . . . . . . . . . . . . . . 8 4. Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5. Additional Considerations . . . . . . . . . . . . . . . . . . 8 5.1. Embedded URI schemes . . . . . . . . . . . . . . . . . . . 9 5.2. Why dates for both tdb and duri? . . . . . . . . . . . . . 10 5.3. Useful dates . . . . . . . . . . . . . . . . . . . . . . . 10 5.4. Free assignment . . . . . . . . . . . . . . . . . . . . . 10 5.5. Resolution . . . . . . . . . . . . . . . . . . . . . . . . 11 5.6. Why Names with Semantics? . . . . . . . . . . . . . . . . 11 5.7. Avoiding MetaData . . . . . . . . . . . . . . . . . . . . 11 5.8. Avoiding duri and tdb . . . . . . . . . . . . . . . . . . 11 5.9. tdb and levels of indirection . . . . . . . . . . . . . . 12 6. URN Specification Templates . . . . . . . . . . . . . . . . . 12 6.1. duri Specification Template . . . . . . . . . . . . . . . 12 6.2. tdb Specification Template . . . . . . . . . . . . . . . . 13 7. IANA considerations . . . . . . . . . . . . . . . . . . . . . 14 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14 10.2. Informative References . . . . . . . . . . . . . . . . . . 15 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 15 Masinter Expires August 7, 2009 [Page 3] Internet-Draft duri and tdb URN namespaces February 2009 1. Overview and Requirements The URN namespaces defined here solve several related problems. 1.1. Easy URN assignment The URN specification [7] allows for many URN namespaces, and many have been registered. However, obtaining an appropriate URN in any of the currently defined URN namespaces may be difficult: a number of URN namespace registrations have been accompanied by comments that no other URN namespace was available for the class of documents for which identifiers were wanted. 1.2. Persistent identifiers RFC 1737 [7] defines several requirements for Uniform Resource Names. In particular, it requires "persistence": Persistence: It is intended that the lifetime of a URN be permanent. That is, the URN will be globally unique forever, and may well be used as a reference to a resource well beyond the lifetime of the resource it identifies or of any naming authority involved in the assignment of its name. Many people have wondered how to create globally unique and persistent identifiers. There are a number of URI schemes and URN namespaces already registered. However, an absolute guarantee of both uniqueness and persistence is very difficult. In some cases, the guarantee of persistence comes through a promise of good management practice, such as is encouraged in "Cool URLs don't change" [6]. However, relying on promise of good management practice is not the same as having a design that guarantees reliability independent of actual administrative practice. A primary design goal for URIs is that they are intended to mean the same thing, no matter in what context they appear: a "Uniform" way to Identify a Resource. However, even when URIs have Uniform meaning from the point of view of the source of the reference, they don't guarantee stability over time. Despite best efforts and intentions, identifying information can change in unpredictable ways: domain names can disappear or be reassigned, name assigning organizations can change structure, responsibility, disappear, merge, or change in unpredictable ways. There is a significant dependence in the interpretation of many URNs with the concept of "naming authority". The authority is presumably some individual or organization both to insure uniqueness of assignment and also to help with understanding the meaning of the Masinter Expires August 7, 2009 [Page 4] Internet-Draft duri and tdb URN namespaces February 2009 link between the name and the named. However, authorities, whether individuals or organizations, have a lifetime, and must be consulted at some point to understand the bindings. The functioning of names as unique identifiers and holders of meaning depends on having a reliable infrastructure of consulting the authority or the authorities records to determine the thing referenced. 1.3. URIs for abstractions The description of URIs [3] describes a range for 'Resource' that is quite broad: This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity). One might use a URI such as "mailto:" email address to identify a person, or a "http:" URI to identify an abstract comment. However, this leaves the question of how one might identify, within the same context, both the system mailbox and the person to which it is assigned, or the web page at a http URI and the person or concept it describes. The "tdb" URN scheme allows ready assignment of URIs for abstractions that are distinguished from the media content that describes them. The goal, then, of the "tdb" URN scheme proposed below is to provide a mechanism which is, at the same time: permanent: The identity of the resource identified is not subject to reinterpretation over time. explicitly bound: The mechanism by which the identified resource can be determined is explicitly included in the URI. useful for non-networked items: Allows identification of resources outside the network: people, organizations, abstract concepts. Masinter Expires August 7, 2009 [Page 5] Internet-Draft duri and tdb URN namespaces February 2009 no administration: The mechanism does not depend on reliable administrative processes of authorities for either assignment or interpretation. 2. Namespace definitions 2.1. "duri" namespace It is traditional in convention references and citations in printed works to include the date of publication; this practice serves the important purpose that the context of the naming can be determined. The "duri" URN namespace takes the form: urn:duri:: where is a digit string corresponding to a date (Section 4), and an is an absolute URI-reference [3] in which any character excluded from URN syntax has been escaped (Section 3). The meaning of a duri is "the resource (or fragment) that was identified by the (after hex decoding) at the very last instant of the date(time) given". For example, 'urn:duri:2001:http://www.ietf.org' is a persistent identifier to 'http://www.ietf.org' as of the very last instant of the year 2001. A duri may not be a resource locator in a practical sense if the time of location has not yet arrived or has passed. However, is an acceptable resource identifier, and fulfills all of the requirements for URNs [7]. 2.2. "tdb" namespace The second URN namespace defined is a parallel space which is useful for describing entities, concepts, abstractions, and other items which are not themselves network accessible resources, but have been at some point described by network accessible resources. The "tdb" namespace designates the "thing described by" a resource at a given URI at the given time. This URN namespace is described by 'tdb', e.g., urn:tdb:: with the same syntactic rules as 'duri'. The intent is to use the inversion of "is a document about". It is common practice to give a reference for a concept by including a Masinter Expires August 7, 2009 [Page 6] Internet-Draft duri and tdb URN namespaces February 2009 pointer to a document, segment, phrase that defines the concept. "tdb" attempts to capture this practice in URI space. For example, "urn:tdb:2001:http://www.ietf.org" can be used to designate the Internet Engineering Task Force organization, at least as it was described by or referenced by its home page at the last instant of 2001. The "tdb" namespace differs from most other URN methods for identifying abstractions because the designation of what is actually identified by the tdb doesn't depend on knowing the intention of the "assigner" of the identifier. Unlike many of the alternatives proposed, the identification is not dependent on the context of use. The "tdb" namespace can be thought of as adding a level of semantic indirection to URI resolution. While one could imagine using 'tdb' without a date, it would leave the possibility that a reference that is unambiguous at one time might become ambiguous at some other time. There are two ways that the date is useful for "tdb" URNs: it fixes the time of access of the resource, for variable descriptions, and it fixes the time of interpretation, for descriptions whose meaning (in natural language) might vary. 3. Encoding URIs Both "duri" and "tdb" URN namespaces require that some characters in the URI references be encoded. 3.1. Characters that must be encoded The characters that must be encoded are: o All characters marked excluded in RFC 2141 [1], section 2.4: "&<>[]^`{|}~ These are excluded because they are not allowed in URNs. o The character "#" Note that can include a fragment identifier; the "#" character used to delimit it must be encoded. This feature is intended for use with "tdb", where the fragment identified might contain the description. Including an encoded "#" with a "duri" is not as useful, since the fragment identifier might as well be applied to the duri itself. o The character "%" The encoded-URI can itself contain encoded characters, which are encoded with the same method. To insure that decoding happens at the right level of processing, the "%" itself must be encoded. Unfortunately, there may well be cases where there is a double encoding of characters, first to construct the embedded URI itself Masinter Expires August 7, 2009 [Page 7] Internet-Draft duri and tdb URN namespaces February 2009 and second to then embed the URI within the tdb or duri URN. 3.2. Hierarchy and unencoded / The URN specification [7] discourages the use of "/" in URNs because, in general, there is no good interpretation of hierarchy and relative URIs for assigned names. However, for the particular case of duris (at least), there seems to be no good reason to avoid the "/" because it corresponds fairly naturally (in many cases) to the hierarchy of the original space. Note that because of this, "duri" URNs can actually be used with relative URI references with some amount of reliability. (The double encoding of previously encoded URI characters causes some problems.) 4. Dates A is a simple expression of date, optional time, with arbitrary precision. The goal is to allow relatively short expressions of dates with no ambiguity, and with arbitrary precision. date = year [ month [ day [ hour [ minute [ second [ fraction ]]]]]] year = 4digit month = 2digit day = 2digit hour = 2digit minute = 2digit second = 2digit fraction = *digit The representation of a date or time refers to the very last instant of the given date/time range at the resolution supplied, so that 1999 and 1999123123595999999 are equivalent. If necessary, "dates" can include times and even fractional times, so that a generator of duris can be arbitrarily precise. Dates are interpreted relative to International Atomic Time (TAI) [4]. The syntax and semantics are similar to those in RFC 2550 [8]; in particular, using TAI avoids ambiguity about time zones and difficulties with leap seconds. 5. Additional Considerations Masinter Expires August 7, 2009 [Page 8] Internet-Draft duri and tdb URN namespaces February 2009 5.1. Embedded URI schemes The intent of "duri" and "tdb" is to use them with embedded URI references that identify documents or document fragments. That is, they are most useful for information resources. For example, use with a "http" URI can be used to refer to a web page or the subject of a web site at a given time. This can be a way of referring to a web site at some time in the past, or an organization that has changed or merged. Local systems that have known-to-be unique host names can use "file" URIs with "tdb", for example, urn:tdb:20010814142327:file://this.example.com/c|/temp/test.txt since this use is primarily focused on providing a unique way of identifying an abstraction, even if the referent of the abstraction is not widely known. (Using 'file:' URIs in this way without a fully qualified domain name is not recommended, because the interpretation is not uniform.) Some URI schemes are more problematic. For example, using tdb or duri with an embedded 'urn:' might not seem to be too useful. But it might be useful where the assignment of names in a URN namespace are not, in practice, permanent, or that one might want to refer to the assignment as of a given date. In this case, it is possible to use a "urn" within a "duri", e.g., urn:duri:2000:urn:ietf:std:50 might be used to refer to "the document that was STD 50 in effect as of the last instant of 2000". [2] One might consider using "tdb" with "data" to designate concepts that can be described uniquely briefly inline. For example, urn:tdb:2001:data:,The%2520US%2520president names the concept described by the (text/plain) string "The US president" at the very last instant of 2001. (Note the awkward double quoting of space as "%20" and then the "%" as "%25".) Of course, this practice is only useful if the referent of the data is (or was at the time) completely unique. Since "data" does not contain a way to designate content-language, the string in question would have to not be ambiguous as to its language. In the case of 'data', there is no assigning authority at all; the interpretation of the 'tdb' URN depend on the interpreting community. Masinter Expires August 7, 2009 [Page 9] Internet-Draft duri and tdb URN namespaces February 2009 5.2. Why dates for both tdb and duri? Some have suggested that perhaps only one of "tdb" or "duri" might have a date parameter, e.g., allow "tdb:" as a URI scheme, and "urn:duri::tdb:" if one really wanted to fix the date of interpretation. While doing this might be possible, it seems that the usage pattern which allows the date to be omitted when needed satisfies the instances where the resource doing the description is permanent to stand. There are actually two dates to consider, with "tdb". There is the date that the resource is obtained, and there is the date that the description it makes is read, understood, and used to denote. Normally in a literary work in natural language which makes a reference to another work, both the reference itself and the work referenced are dated, e.g., a footnote in an article written in 1967 might talk about a "private communication" which itself had a date. The difference between a URI and a conventional literary reference is the desire to be able to extract the URI from its context and still retain its meaning. 5.3. Useful dates Dates far in the future are suspect, because the meaning of the duri or tdb cannot readily be determined in advance reliably. Dates whose range ends before assignment of the resource to the embedded URI SHOULD NOT be used, because the meaning of the reference is left in question. For example, using http URIs with a date range which ends before a web service was available at the given URI doesn't make much sense. However, although these practices are NOT RECOMMENDED, there is no assurance that they haven't been used; by itself, a duri/tdb does not constitute an assertion that the encoded-URI was available or assigned at the date specified. Note that the use of the "very last instant" allows for the conventional bibliographic convention that a work published in 2009 can use "2009" as the date string, to refer to the work in the year of publication. 5.4. Free assignment Because of the many possible schemes that can be used in the portion, there should be no difficulty in almost any computational process being able to assign duris or tdbs at will. Of course, it is necessary for there to be some resource which is available at some point in time, and to have a clock which is Masinter Expires August 7, 2009 [Page 10] Internet-Draft duri and tdb URN namespaces February 2009 accurate to the granularity of the frequency of assignment. 5.5. Resolution There no accurate resolution servers for duri or tdb URNs. However, duri might be "resolvable" in the sense that a resource that was accessed at a point in time might have the result of that access cached or archived in an Internet archive service. See, for example, the "Internet Archive" project [9]. A "tdb" is only tesolvable in the sense that if the corresponding duri can be esolved, it may be possible that the result can be accessed and interpreted. Clients without access to an Internet archive service might take the decoded of a duri and attempt resolution of *that* identifier. This will give an approximation whose reliability depends on the amount of time elapsed since the date indicated. 5.6. Why Names with Semantics? There are a number of proposals for URN schemes that create otherwise unbound "names", where the URN scheme only provides for uniqueness. Neither "duri" nor "tdb" intrinsically have the property that the names assigned are without any resolution semantics. This is intentional; it's difficult to create names that carry no semantics whatsoever about the authority that assigned the name and the intention of the authority for what the name should designate. 5.7. Avoiding MetaData One might consider the date in a duri/tdb to be just one piece of additional metadata about the encoded-URI, and consider adding other pieces of metadata as annotation. However, the use of the date in a duri/tdb is intended primarily as a mechanism of accomplishing uniqueness over time. No other bit of metadata or description readily fills that purpose. Further, the date is not descriptive (an assertion about the encoded-URI) but merely refining. 5.8. Avoiding duri and tdb Many applications of URIs already provide a context of date. For example, one could imagine a hypertext system where the URIs contained within a document were intended to refer to the resources as of the date of the enclosing document. This would be a reasonable interpretation of URIs within an Internet archive system, for example. Masinter Expires August 7, 2009 [Page 11] Internet-Draft duri and tdb URN namespaces February 2009 And some applications of URIs arguably already contain the level of interpretive indirection that is explicit with "tdb". For example, one might consider the use of URIs as namespace names within XML [5] as a reference to the "thing described by" the URI used. 5.9. tdb and levels of indirection The "tdb" scheme introduces a level of semantic indirection. The puzzles and confusions about use and mention, name and reference, and levels of indirection have been puzzling and amusing for quite a while. "It's long," said the Knight, "but it's very, very beautiful. Everybody that hears me sing it--either it brings tears into their eyes, or else--" "Or else what?" said Alice, for the Knight had made a sudden pause. "Or else it doesn't, you know. The name of the song is called 'Haddock's Eyes.'" "Oh, that's the name of the song, is it?" Alice said, trying to feel interested. "No, you don't understand," the knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged Aged Man.'" "Then I ought to have said 'That's what the song is called'?" Alice corrected herself. "No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!" "Well, what is the song, then?" said Alice, who was by this time completely bewildered. "I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention." [10] 6. URN Specification Templates 6.1. duri Specification Template Namespace ID: "duri" requested. Registration Information: Registration Version: 1 Registration Date: 2001-08-19 Declared registrant of the namespace: Larry Masinter Declaration of syntactic structure: Briefly, the syntax is urn:duri:: The syntax is described in this document. Masinter Expires August 7, 2009 [Page 12] Internet-Draft duri and tdb URN namespaces February 2009 Relevant ancillary documentation: (See References of this document) Identifier uniqueness considerations: Uniqueness is guaranteed by the structure of adding a designation of a specific instant to a URI. However, URIs with ambiguous interpretation at any given instant (e.g., "file" URIs without a given host name) will not be unique. Identifier persistence considerations: The designation of a dated URI is completely persistent for all time. Process of identifier assignment: Any date can be used with any URI independently by anyone. Process of identifier resolution: Identifiers can only be resolved approximately. See Section 5.5. Conformance with URN Syntax: Note that the use of "/" for hierarchy, while discouraged in the URN specification, is allowed in duris. Rules for Lexical Equivalent: For dates, YYYY is equivalent to YYYY01, YYYYMM is equivalent to YYYYMM01, while YYYYMMDD is equivalent to YYYYMMDD0... followed by any number of 0's. In considering equivalence of the encoded URI, if two duris with equivalent dates contain lexically equivalent URIs, the duris are equivalent. Validation mechanism: Dates should be reasonable and meet the syntactic requirements. The URI encoded within should meet the syntactic requirements of the URI scheme used. Scope: Global. 6.2. tdb Specification Template Namespace ID: "tdb" requested. Registration Information: Registration Version: 1 Registration Date: 2002-04-01 Declared registrant of the namespace: Larry Masinter Declaration of syntactic structure: Briefly, the syntax is urn:tdb:: The syntax is described in Section 2.1. Relevant ancillary documentation: (See References of this document) Identifier uniqueness considerations: Uniqueness is guaranteed by the structure of adding a designation of a specific instant to a URI. However, URIs with ambiguous interpretation at any given instant (e.g., "file" URIs without a given host name) will not be unique. Identifier persistence considerations: The designation of a dated URI is completely persistent for all time, although the intent of a resource that is no longer available will be hard to discern. Process of identifier assignment: Any date can be used with any URI independently by anyone. However, assigning an identifier to a non-networked resource such as a person or abstract concept requires (at least conceptually) first creating a networked resource that uniquely describes the target, and then constructing Masinter Expires August 7, 2009 [Page 13] Internet-Draft duri and tdb URN namespaces February 2009 the duri using the URI of the description. Process of identifier resolution: Resolution of "tdb" identifiers requires interpreting the resource identified by the corresponding "duri". See Section 2.2 of this document. Rules for Lexical Equivalent: As with "duri"; see Section 6.1. Conformance with URN Syntax: As with "duri"; see Section 6.1. Validation mechanism: As with "duri", see Section 6.1. Scope: Global. 7. IANA considerations This document includes two URN NID registrations (Section 6.1 and Section 6.2) that should be entered into the IANA registry of URN NIDs. 8. Security Considerations "duri" and "tdb" identifiers are not any more reliable because they have dates. URIs don't contain enough information to supply the authority for deciding what was or wasn't at a given URI at a given date. 9. Acknowledgements There have been many discussions over several years on the relationship of URLs, URNs, URIs, resources and resource identifiers, with many contributions. Particular thanks to Aaron Swartz, Brian McBride and Stuart Williams, Michael Mealling, and many others. 10. References 10.1. Normative References [1] Moats, R., "URN Syntax", RFC 2141, May 1997. [2] Moats, R., "A URN Namespace for IETF Documents", RFC 2648, August 1999. [3] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 3986, January 2005. [4] Bureau International des Poids et Mesures, "International Atomic Time". Masinter Expires August 7, 2009 [Page 14] Internet-Draft duri and tdb URN namespaces February 2009 [5] Bray, T., Hollander, D., and A. Layman, "Namespaces in XML", W3C Recommendation REC-xml-names, January 1999, . 10.2. Informative References [6] Berners-Lee, T., "Cool URIs don't change", 1998, . [7] Sollins, K., "Functional Requirements for Uniform Resource Names", RFC 1737, December 1994. [8] Glassman, S., Manasse, M., and J. Mogul, "Y10K and Beyond", RFC 2550, April 1 1999. [9] Kahle, B., "Preserving the Internet", Scientific American , March 1997, . [10] Carroll, L., "Through the Looking Glass", 1872, . Author's Address Larry Masinter Adobe Systems 345 Park Ave San Jose, CA 95110 US Phone: +1 408 536 3024 Email: LMM@acm.org URI: http://larry.masinter.net Masinter Expires August 7, 2009 [Page 15]