Why a Document Management System Should Care About Hyperlinks

 

* A link is some construct to represent a relationship between two or more things

* Historical use is straightforward association of two data items, e.g., a cross-reference

* Historical venue for links is hypermedia systems

* Emerging use of links:

o to locate distributed objects

o to specify dependencies

o to associate “metadata” with data

Some common link requirements

* Links to arbitrary data and to points within that data

* Links within and across documents

* Links with multiple endpoints

* Links that carry with them some set of semantics, such as a type and behavior

* Links into data that cannot be modified

* Control over the direction of traversal of a link

* Control over what types of objects a link can point to

* Notification when a link end is invalidated or modified

* Version history of a link

* Context-sensitive links

Representing links

* SGML ID/IDREF

* HTML

* HyTime

* XML Link

Well-accepted properties of links

* Specification of the link itself usually via some combination of elements and/or attributes (link recognition)

* Specification of how to find the endpoints (addressing)

* What the link is for (role)

* What to do when the link is “activated” (behavior)

* Allowed types of things the link can point to

* Allowed direction of traversal between link endpoints

* Various other metadata about the link itself: a descriptor, who created it and when, system-specific instructions, etc.

Link lifecycle

* Typical user uses links: hypermedia systems present and traverse existing links

* Somebody has to create and manage links: authoring and document management systems must do interesting things with links

* Linking within constantly modified data presents some hard problems:

o Addressing that works when linked data is modified

o Ongoing validation that links are still valid

o Automated synthesis of links

Why ID/IDREF doesn't cut it

* Limited to one document

* Limited to addressing an element

* Not enough standard semantics

* Can't maintain links independently of the data

* Impossible if read-only data doesn't already have IDs

Fundamentals of HTML links

* The “A” tag is a link

* It has an “href” attribute whose value is a URL

 

This is an HTML link: <a "href=http://www.texcel.no">
HTML link shortcomings

* No links to spans of text and spans of content

* No links into arbitrary data types

* No links with multiple endpoints

* No links independent of data

* No control over the types of endpoints of a link

* No control over the direction of traversal of a link

Fundamentals of HyTime links

* Hypermedia/Time-based Structuring Language, defined in ISO standard 10744

* In the standard, HyTime defines architectural forms: a set of “meta” element classes and attributes with standard semantics

* When you write a DTD, you make an element a link by applying an architectural form via a “HyTime” attribute

* End result is that instances of the element in a document are links

* A link relates two or more link ends. Each link end is a locator to a piece of data known as an anchor.

* A link end addresses the anchor via various mechanisms such as nameloc, treeloc, queryloc, and dataloc.

* A contextual link gets one of its link ends from the link element's position in the document.

* An independent link resides independently of any of its link ends.

This is a HyTime link: <clink hytime="clink" linkend="TexcelLogo">
Shortcomings of HyTime

* According to some people, it is not possible for HyTime to have shortcomings because it can do anything

* This is its shortcoming

Fundamentals of XML links

* Any element becomes an XML Link when it has an attribute named “xml-link”

* The “href” attribute is the locator and is a URL

* Additionally standardizes the fragment id and query portions of a URL as either ID referencing or TEI Extended Pointers (XPointer)

* Other attributes specify information about the link such as its role and behavior

* Simple links are like HyTime contextual links; extended links are like HyTime independent links

This is an XML link: <simple xml-link="simple" href="file:///C|/texcel/im/lib/texcel.gif">
Shortcomings of XML Links

* No links into arbitrary data types

* No control over the types of endpoints of a link

* No control over the direction of traversal of a link

Links “inside” an SGML document management system

* SGML document management systems typically have unique object identification for every SGML element

* These repository identifiers (RIDs) make complex addressing unnecessary: link resolution is simple

* Within its own domain, a system can provide efficient link storage and manipulation

* When data goes out of this domain, links can be exported to a standard form

Link creation

* Present candidates for link targets, e.g., via tree views, query results, views of content

* Generate an address to a link target

* Automatically generate ID values

* Ensure links are only to allowed types

* Associate link type and other information with the link

* Update an independent link map

* Automatically create links

Link maintenance

* Integrate with authoring systems to prevent deletion of link targets

* Notify when link target contents are modified

* Notify when an address locates a different target

* Potentially recalculate addresses

* Retrieve and update link metadata

* Maintain an independent link map

* Maintain context of link applicability

* Trace link lifecycle

Link delivery

* Real-time link traversal

* Determine and export a web of linked data

* Export links in a form optimized for the delivery system

* Drive conversion and delivery processes

There's more to links than viewing

* Tangible benefits of planning for links across the entire document lifecycle

* Exploit the capabilities of your SGML document management system to support linking

* Quality gains are certain to follow

 

www.texcel.no

Texcel AB

Document Management

XML Architectures

sitemap