XMLNews-Meta Tutorial |
Copyright (c) 1999 by XMLNews.org. Free redistribution permitted.
XMLNews-Meta is an extensible news industry metadata vocabulary conforming to the World Wide Web Consortium's Resource Description Framework (RDF) Recommendation. An XMLNews-Meta record provides information about a news story or other resource, while an XMLNews-Story document contains an actual news story; you can use XMLNews-Meta to exchange information about any kind of news resource in any format.
This tutorial introduces the properties in the core XMLNews-Meta vocabulary and demonstrates how you can extend an XMLNews-Meta record to include additional properties from different Namespaces. For more detailed (and authoritative) reference documentation, please see the XMLNews-Meta specification.
The core XMLNews-Meta vocabulary consists of over 40 properties that enable you to provide information such as the distributor, format and release time of a news resource (such as a news story, video clip, audio clip or photograph). All of the core properties belong to an XMLNews Namespace; you are free to extend XMLNews-Meta by adding additional properties from other Namespaces to match your technical and business requirements.
Most of the core properties in the XMLNews-Meta vocabulary provide information about the following areas:
NOTE: In the examples in this tutorial, the prefix
xn:
is shorthand for the Namespace URI “http://www.xmlnews.org/namespaces/meta#
”. XMLNews-Meta records are free to use different prefixes, as long as they map to the same Namespace URI. For more information, see the Namespaces in XML specification.
Every XMLNews-Meta record consists of the xn:Resource element, with declarations for all of the Namespaces used in the record (it is also good practice to include an XML declaration):
<?xml version="1.0"?> <xn:Resource xmlns:xn="http://www.xmlnews.org/namespaces/meta#"> </xn:Resource>
All of the properties appear as elements between the start and the end tags of the xn:Resource element; the name of the element is the property name, and the contents are the value.
One property, xn:resourceId, must be present somewhere in every XMLNews-Meta record: it provides the unique identifier of the resource being described:
<xn:Resource xmlns:xn="http://www.xmlnews.org/namespaces/meta#"> <xn:resourceId>082098709870987</xn:resourceId> </xn:Resource>
You may include any number of other properties with the xn:Resource element, in any order. If you use properties from outside the core XMLNews-Meta vocabulary, you must declare any Namespaces that they use as well as the XMLNews-Meta Namespace:
<xn:Resource xmlns:xn="http://www.xmlnews.org/namespaces/meta#" xmlns:spt="http://www.sportsonline.com/ns#"> <xn:resourceId>082098709870987</xn:resourceId> <xn:title>Jays beat Yankees</xn:title> <xn:category>sports</xn:category> <spt:score>Jays 8, Yankees 4</spt:score> </xn:Resource>
For more information on Namespaces, see Namespaces in XML.
Within an XMLNews-Meta record, you can use the following core properties to describe header (or envelope) information about a resource, such as its creator, dateline, and priority:
xn:title (the title of the resource, such as “Three die in Fire”; you can use this both for a headline and for subheadlines)
xn:creator (the creator of the resource, such as the reporter who wrote a news story or conducted an interview)
xn:dateline (the location and date where the resource originated, such as “Wichita, Kansas. May 6, 1999”)
xn:classification (the news category of the resource, such as “sports”)
xn:description (a detailed statement about the resource, such as a story abstract or a photo cutline)
xn:language (the ISO 629 code for the primary language used in the resource, such as “fr” for French)
xn:priority (a number indicating the importance of a resource)
xn:fixtureName (a human-readable name for a news a fixture, such as “Weekend Wrapup” or “Closing Stocks”)
xn:fixtureCode (a machine-readable code for identifying a news fixture)
All of these properties are optional in an XMLNews-Meta record: you use them only when you have the information available.
For example, consider an imaginary publication Biz News that includes a daily fixture on the European markets. On February 28, 1999, the story is about a surge in trading on the London Stock Exchange, filed by John Smith. You can use the following XMLNews-Meta properties to capture the header information for this story (in an actual XMLNews-Meta record, all of these would appear within the xn:Resource element):
<xn:title>LSE Soars</xn:title> <xn:creator>John Smith</xn:creator> <xn:dateline>London, England, February 28, 1999</xn:dateline> <xn:language>en</xn:language> <xn:description>Heavy trading late in the day leaves London Stock Exchange up 500 points.</xn:description> <xn:classification>financial</xn:classification> <xn:fixtureName>European Markets</xn:fixtureName>
In this example, the xn:title element contains a copy of the story's headline, the xn:creator element contains a copy of the story's byline, and the xn:description element contains a copy of the story's dateline. The abbreviation “en” in xn:language is the ISO 629 code for English.
Although the properties align well with the main header information (headline, byline, dateline) for a traditional printed news story, you can also use them to describe a non-textual resource like a photograph. The following example contains properties describing a photograph taken by Rachel Asa in Lisbon on June 28, 1999, and distributed by the (imaginary) ACME News Corporation:
<xn:title>Fishing boats</xn:title> <xn:creator>Rachel Asa</xn:creator> <xn:dateline>Lisbon,Portugal</xn:dateline> <xn:classfication>http://www.acme.com/classifications/science</xn:classification> <xn:description>Fishermen fold their nets near Lisbon while the EU discusses fishing policy.</xn:description>
This time, the creator element identifies the photographer (not the writer as with a story); if you were including a video clip, you might use the creator element to identify the producer of the clip, or the reporter who appears in the clip. If the photo is a file photo, you can use the creator element as follows:
<xn:creator>Acme File Photo</xn:creator>
With a photograph, the description element might actually contain a copy of the photo's cutline (although it does not have to).
If this photograph is Acme's Photo of the Month, you might also want to add the following elements to identify it:
<xn:fixtureCode>http://www.acme.com/fixtures/photomonth</xn:fixtureCode> <xn:fixtureName>Photo of the Month</xn:fixtureName>
NOTE: as in this example, URLs make excellent unique, machine-readable codes, since they provide a natural scoping mechanism (codes from different providers are unlikely to overlap).
There are several times that mark the major milestones in the life of a news resource: the time the story is published, the time it may be released (if not immediately), the time it is received by a customer, and the time that the story expires (if any). XMLNews-Meta provides optional properties for recording any or all of these times:
xn:publicationTime (date and time when the resource was distributed)
xn:releaseTime (earliest date and time when the resource may be distributed)
xn:receivedTime (date and time when the resource was received on current system)
xn:expireTime (date and time when the resource may no longer be distributed)
Let's assume that the London Stock Exchange story described in the Header Information section was filed on 28 February 1999 at 4:00 pm London time and is received by a New York news distributor at 11:15 am local time:
<xn:publicationTime>19990228T1600</xn:publicationTime> <xn:receivedTime>19990228T1115-0500</xn:receivedTime>
Notice that, since we used the local time the story was received, it's necessary to specify a five-hour offset from GMT (for New York). It would also have been possible to use GMT throughout:
<xn:publicationTime>19990228T1600</xn:publicationTime> <xn:receivedTime>19990228T1615</xn:receivedTime>
Many news resources, like press releases or election results, cannot be released until a specific time; for these, you can specify a delayed release time. In the following example, a resource is published at 4:00 pm EST on February 28, 1999, but is not allowed to be released until 9:00 am on March 1:
<xn:publicationTime>19990228T1600-0500</xn:publicationTime> <xn:releaseTime>19990301T0900-0500</xn:releaseTime>
Some resources also have an explicit expiry time, either because the information in them will be out of date (as in the case of stock quotes) or because redistribution rights are granted only for a limited period. If the photograph used for the second example in the Header Information section were the photo of the month, it would need to expire before the next photo of the month was released:
<xn:publicationTime>19990501T000000</xn:publicationTime> <xn:expireTime>19990531T235959</xn:expireTime>
The photo was published at midnight on May 1, and will expire just before midnight on May 31, so that another photo of the month can be issued.
News resources often travel along a complex route, starting with a local provider or bureau and passing through wire services, amalgamators, value-added redistributors, and others before arriving at their final destination. XMLNews-Meta provides several optional properties for keeping track of where a story has come from:
xn:providerCode (a machine-readable identifier for the supplier of the resource)
xn:providerName (a human-readable identifier for the supplier of the resource)
xn:distributorCode (a machine-readable identifier for the distributor of the resource)
xn:distributorName (a human-readable identifier for the distributor of the resource)
xn:serviceCode (a machine-readable identifier for part of a newsfeed)
xn:serviceName (a human-readable identifier for part of a newsfeed)
xn:sourceCode (a machine-readable identifier for the organization that originally produced the resource)
xn:sourceName (a human-readable identifier for the organization that originally produced the resource)
The distinction between the provider, distributor and source properties is subtle, and may be determined by contractual agreements rather than clear definitions. XMLNews-Meta uses the source properties to identify the original creator of the resource (for example, a local paper or television station), the provider properties to identify the primary provider of the information (such as a major wire service), and the distributor properties to identify other members of the distribution chain, if any. The service properties identify a particular service of the provider, such as “technology news”.
Let's return to our story about the London Stock market used in the Header Information section. The story is distributed by Biz News Incorporated, who received it from Acme News Corporation, who picked it up from the London Financial Times. Biz News distributes the story through its Today in Finance service. You can include all of this information the the XMLNews-Meta properties as follows:
<xn:sourceName>London Financial Times</xn:sourceName> <xn:distributorName>Acme News Corporation</xn:distributorName> <xn:providerName>Biz News Inc.</xn:providerName> <xn:serviceName>Today in Finance</xn:serviceName>
These properties work similarly with a photograph or other non-textual resource.
What news vendors sell is usually not a news resource itself but the right to use that resource in certain ways and places for a certain period of time. You can use the properties described in the Milestones section to provide information about the period of time for which a resource is available; there are also two other, more general properties that relate to rights:
copyright (contains a single copyright statement for the resource)
distributionRights (contains a statement of redistribution rights for the resource)
News resources will often contain more than one copyright statement, especially if the resource contains contributions from more than one source. Since there are so many different types of distribution agreements available, the xn:distributionRights property simply contains plain prose.
Here is some sample rights information for a news story:
<xn:copyright>Portions copyright (c) 1999 by London Financial Times</xn:copyright> <xn:copyright>Copyright (c) 1999 by Acme News Corporation All rights reserved.</xn:copyright> <distributionRights>Distribution permitted within Canada, the United States, and Mexico.</distributionRights>
Notice that this example contains two copyright statements, and that each one appears as a separate property.
XMLNews-Meta records can contain detailed information about the subject matter of a news resource, using the following properties:
xn:subjectCode (a unique machine-readable identifier for any subject matter related to the resource)
xn:subjectName (a human-readable name of any subject related to the resource, such as “fly fishing”)
xn:companyCode (a unique machine-readable symbol for a publicly-traded company, such as “NASDAQ:MSFT”)
xn:companyName (the full, human-readable name of a private or public company, such as “Microsoft”)
xn:eventCode (a unique machine-readable identifier for an event)
xn:eventName (a human-readable name for an event, such as “Operation Desert Storm” or “Yom Kippur”)
xn:industryCode (a unique machine-readable identifier for an industry)
xn:industryName (a human-readable name for an industry, such as “semiconductors”)
xn:locationCode (a unique machine-readable identifier for a location)
xn:locationName (a human-readable name for a location, such as “Middle East” or “Bourbon Street”)
xn:personCode (a unique machine-readable identifier for a person)
xn:personName (a human-readable name for a person, such as “Benazir Bhutto”)
xn:url (a URL associated with a resource, such as the location of a Web site described in a news feature, but not the URL of the resource itself)
The subject properties are the most general: all of the others are specializations.
Imagine, for example, that a news story contains comments by British Prime Minister Tony Blair on Microsoft made in Seattle during a tour of the U.S. Northwest. The XMLNews-Meta record might contain the following properties:
<xn:personName>Tony Blair</xn:personName> <xn:locationName>Britain</xn:locationName> <xn:locationName>Seattle</xn:locationName> <xn:eventName>visit to U.S. Northwest</xn:eventName> <xn:companyCode>NASDAQ:MSFT</xn:companyCode> <xn:companyName>Microsoft</xn:companyName>
Some providers will have standard codes for well-known people and places to enable more accurate searching and filtering. Ideally, these codes should be fully-qualified URLs to avoid confusion between codes from different distributors or providers:
<xn:personCode>http://www.acmenews.com/codes/people/blair0516</xn:personCode> <xn:personName>Tony Blair</xn:personName> <xn:locationCode>http://www.acmenews.com/codes/regions/europe/uk</xn:locationCode> <xn:locationName>Britain</xn:locationName> <xn:locationCode>http://www.acmenews.com/codes/regions/na/us/wa/seattle</xn:locationCode> <xn:locationName>Seattle</xn:locationName> <xn:eventName>visit to U.S. Northwest</xn:eventName> <xn:companyCode>NASDAQ:MSFT</xn:companyCode> <xn:companyName>Microsoft</xn:companyName>
The same properties can apply to a photograph:
<xn:eventCode>http://www.acme.com/codes/events/1999/06/eu-fishing</xn:eventCode> <xn:eventName>EU Fishing Talks</xn:eventName> <xn:locationCode>http://www.acme.com/codes/regions/europe/pt/lisbon</xn:locationCode> <xn:locationName>Lisbon</xn:locationName> <xn:locationCode>http://www.acme.com/codes/regions/europe</xn:locationCode> <xn:locationName>Europe</xn:locationName>
Notice that with the photograph we have repeated the location element to include two different ways to categorize the location.
News resources have various connections with each other: a simple story, for example, can contain a photograph, can be contained in a digest, and can be one in a series of different versions of the same story. It can also be based on a resource in a different format (such as video), and have resources in other formats (such as a radio report) based on it. XMLNews-Meta provides several properties for tracing these kinds of links (not to be confused with the general-purpose hypertext links used in HTML):
xn:nextVersion (the identifier of a resource containing a newer version of the this resource)
xn:previousVersion (the identifier of a resource containing an earlier version of this resource)
xn:parent (the identifier of a resource that logically contains this resource, such as an article that contains a photograph)
xn:child (the identifier of a resource that is logically contained by this resource, such as an article contained in a digest)
xn:prototype (the identifier of a resource from which this one was derived, such as a longer news story that has been summarized)
xn:rendition (the identifier of a resource that has been derived from this one, such as a radio news report based on a newspaper article)
For example, the first version of a story about the London Stock market might have no links and only its resource identifier (see the Top Level section):
<xn:resourceId>098709870</xn:resourceId>
An hour later, changes in the market prompt a new version of the story; this time, there is a link to the previous version:
<xn:resourceId>86576586</xn:resourceId> <xn:previousVersion>098709870</xn:previousVersion>
Depending on the system architecture, the record for the first version of the story might also have been updated:
<xn:resourceId>098709870</xn:resourceId> <xn:nextVersion>86576586</xn:nextVersion>
The newswire might also send down a photograph related to the story, and its record have a link to the resource ID of the story in which it appears:
<xn:resourceId>532543245</xn:resourceId> <xn:parent>86576586</xn:parent>
If the photograph were used in more than one story, its XMLNews-Meta record could contain a pointer to each one:
<xn:resourceId>532543245</xn:resourceId> <xn:parent>86576586</xn:parent> <xn:parent>39547547</xn:parent>
Likewise, the records for the stories can contain pointers to the photograph, if desired:
<xn:resourceId>86576586</xn:resourceId> <xn:previousVersion>098709870</xn:previousVersion> <xn:child>532543245</xn:child>
Finally, the record audio file for a radio broadcast based on the story can also point back to it:
<xn:resourceId>29576488</xn:resourceId> <xn:prototype>86576586</xn:prototype>
And the story's record, if desired, can point to the radio broadcast:
<xn:resourceId>86576586</xn:resourceId> <xn:rendition>29576488</xn:rendition>
The architectural implementations of linking can be very complex; XMLNews provides the properties necessary to represent the links if desired, but does not dictate a single method for maintaining and updating them.
The best way to extend the information available in an XMLNews-Meta
record is to use (or invent) properties from another Namespace. For
example, if the (fictional) Sports Online
provider wanted
to add an additional property for game scores, they could create their
own Namespace and use the property score within it:
<xn:Resource xmlns:xn="http://www.xmlnews.org/namespaces/meta#" xmlns:spt="http://www.sportsonline.com/ns#"> <xn:resourceId>082098709870987</xn:resourceId> <xn:title>Jays beat Yankees</xn:title> <xn:category>sports</xn:category> <spt:score>Jays 8, Yankees 4</spt:score> </xn:Resource>
If the (fictional) rating agency News Ratings wanted to score articles based on user-supplied criteria, they could create their own Namespace and use the property score within it:
<xn:Resource xmlns:xn="http://www.xmlnews.org/namespaces/meta#" xmlns:rating="http://www.newsratings.com/xml/namespace#"> <xn:resourceId>082098709870987</xn:resourceId> <xn:title>Jays beat Yankees</xn:title> <xn:category>sports</xn:category> <rating:score>7.6</rating:score> </xn:Resource>
Even though the two properties have the same base name, “score”, there is no risk of confusion because they belong to separate Namespaces:
<xn:Resource xmlns:xn="http://www.xmlnews.org/namespaces/meta#" xmlns:spt="http://www.sportsonline.com/ns#" xmlns:rating="http://www.newsratings.com/xml/namespace#"> <xn:resourceId>082098709870987</xn:resourceId> <xn:title>Jays beat Yankees</xn:title> <xn:category>sports</xn:category> <spt:score>Jays 8, Yankees 4</spt:score> <rating:score>7.6</rating:score> </xn:Resource>
Processing software will simply ignore properties that it does not recognize, so providers can invent new properties as required without affecting existing software. Note that the Namespaces are arbitrary URIs: they do not actually need to point to anything that can be retrieved by a browser.
In some cases, however, there will be a need (contractual or technical) to pass on arbitrary NAME=VALUE pairs exactly as supplied by the provider. For this purpose, there is a special property available xn:vendorData:
<xn:vendorData>XXX=YYY</xn:vendorData> <xn:vendorData>AAA=BBB</xn:vendorData>
It is usually dangerous to send vendor data outside of a closed system, since there is a risk of confusion (there is no partitioning into Namespaces), but this mechanism does provide an internal work-around for problems with legacy systems.