22 July 2004

Xerox Research Centre Europe (XRCE): Media Backgrounder (Document Structure Research Area)

Xerox Research Centre Europe (XRCE) is structured into four complementary research areas: content analysis; document structure; image processing; and work practice technology.

The document structure research area of XRCE research is aligned to the increased adoption of extensible mark-up language (XML) by the IT and internet industries, and the sheer potential of XML as a language of communication between disparate systems.

While the primary benefit of XML is in exchanging data, greater benefits can be gained in content and document management. First of all, XML is naturally suited to represent the logical structure of documents (e.g. titles, sections, chapters, paragraphs) independently of their visual rendering. More importantly it can represent the semantics or meaning of documents (i.e. varied elements such as authors, dates, organisation or product names, financial data, copyright statements, legal warnings). This provides the potential for advanced, semantic-enabled search and data mining, but also for smart processes throughout the document lifecycle including content reuse and repurposing, quality assurance and security. It is also a natural bridge between databases and content for document validation and updating .

However, the challenges of how to create new documents automatically in XML, and convert legacy documents to XML, remain. XRCE is developing and combining new methods for Legacy Document Conversion where the research addresses the three faces of a structural document: layout, logical structure and semantics. The second research theme in this area is XML Schema management where researchers are addressing ways to link together different XML stores, and to repurpose and reformulate XML documents in order to enable “Smart processes”.

The document structure research area not only generates technologies with practical business applications to solve business issues on its own, but also with the other three XRCE research areas.
It combines expertise in machine learning, document mining and clustering, querying and visualization and hybrid methods for document acquisition. One technology that has been guided by XRCE from R&D concept to development and commercialisation is the SmartTagger for which a separate individual fact sheet is available (see below).

For more information, please refer to www.xrce.xerox.com or contact...

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

About Me

My photo
Toronto, Ontario, Canada
PR, internal communications and branding pro currently freelancing as a consultant, writer, DJ, and whatever else comes my way.