Why a Document Management System Should Care About Hyperlinks
* A link is some construct to represent a relationship between two or more things
* Historical use is straightforward association of two data items, e.g., a cross-reference
* Historical venue for links is hypermedia systems
* Emerging use of links:
o to locate distributed objects
o to specify dependencies
o to associate “metadata” with data
Some common link requirements
* Links to arbitrary data and to points within that data
* Links within and across documents
* Links with multiple endpoints
* Links that carry with them some set of semantics, such as a type and behavior
* Links into data that cannot be modified
* Control over the direction of traversal of a link
* Control over what types of objects a link can point to
* Notification when a link end is invalidated or modified
* Version history of a link
* Context-sensitive links
Representing links
* SGML ID/IDREF
* HTML
* HyTime
* XML Link
Well-accepted properties of links
* Specification of the link itself usually via some combination of elements and/or attributes (link recognition)
* Specification of how to find the endpoints (addressing)
* What the link is for (role)
* What to do when the link is “activated” (behavior)
* Allowed types of things the link can point to
* Allowed direction of traversal between link endpoints
* Various other metadata about the link itself: a descriptor, who created it and when, system-specific instructions, etc.
Link lifecycle
* Typical user uses links: hypermedia systems present and traverse existing links
* Somebody has to create and manage links: authoring and document management systems must do interesting things with links
* Linking within constantly modified data presents some hard problems:
o Addressing that works when linked data is modified
o Ongoing validation that links are still valid
o Automated synthesis of links
Why ID/IDREF doesn't cut it
* Limited to one document
* Limited to addressing an element
* Not enough standard semantics
* Can't maintain links independently of the data
* Impossible if read-only data doesn't already have IDs
Fundamentals of HTML links
* The “A” tag is a link
* It has an “href” attribute whose value is a URL
This is an HTML link: <a "href=http://www.texcel.no">
HTML link shortcomings
* No links to spans of text and spans of content
* No links into arbitrary data types
* No links with multiple endpoints
* No links independent of data
* No control over the types of endpoints of a link
* No control over the direction of traversal of a link
Fundamentals of HyTime links
* Hypermedia/Time-based Structuring Language, defined in ISO standard 10744
* In the standard, HyTime defines architectural forms: a set of “meta” element classes and attributes with standard semantics
* When you write a DTD, you make an element a link by applying an architectural form via a “HyTime” attribute
* End result is that instances of the element in a document are links
* A link relates two or more link ends. Each link end is a locator to a piece of data known as an anchor.
* A link end addresses the anchor via various mechanisms such as nameloc, treeloc, queryloc, and dataloc.
* A contextual link gets one of its link ends from the link element's position in the document.
* An independent link resides independently of any of its link ends.
This is a HyTime link: <clink hytime="clink" linkend="TexcelLogo">
Shortcomings of HyTime
* According to some people, it is not possible for HyTime to have shortcomings because it can do anything
* This is its shortcoming
Fundamentals of XML links
* Any element becomes an XML Link when it has an attribute named “xml-link”
* The “href” attribute is the locator and is a URL
* Additionally standardizes the fragment id and query portions of a URL as either ID referencing or TEI Extended Pointers (XPointer)
* Other attributes specify information about the link such as its role and behavior
* Simple links are like HyTime contextual links; extended links are like HyTime independent links
This is an XML link: <simple xml-link="simple" href="file:///C|/texcel/im/lib/texcel.gif">
Shortcomings of XML Links
* No links into arbitrary data types
* No control over the types of endpoints of a link
* No control over the direction of traversal of a link
Links “inside” an SGML document management system
* SGML document management systems typically have unique object identification for every SGML element
* These repository identifiers (RIDs) make complex addressing unnecessary: link resolution is simple
* Within its own domain, a system can provide efficient link storage and manipulation
* When data goes out of this domain, links can be exported to a standard form
Link creation
* Present candidates for link targets, e.g., via tree views, query results, views of content
* Generate an address to a link target
* Automatically generate ID values
* Ensure links are only to allowed types
* Associate link type and other information with the link
* Update an independent link map
* Automatically create links
Link maintenance
* Integrate with authoring systems to prevent deletion of link targets
* Notify when link target contents are modified
* Notify when an address locates a different target
* Potentially recalculate addresses
* Retrieve and update link metadata
* Maintain an independent link map
* Maintain context of link applicability
* Trace link lifecycle
Link delivery
* Real-time link traversal
* Determine and export a web of linked data
* Export links in a form optimized for the delivery system
* Drive conversion and delivery processes
There's more to links than viewing
* Tangible benefits of planning for links across the entire document lifecycle
* Exploit the capabilities of your SGML document management system to support linking
* Quality gains are certain to follow