SCHEMA Blog (EN)

Corporate blog of SCHEMA GmbH

Automated support for quality assurance using Schematron

Leave a comment

Quality assurance of technical documentation is often ensured by means of the four-eyes principle. However, time and time again, small errors or imprecisions still find their way into texts, even after careful checking. It would be useful to conduct an automated check according to formal rules, such as upper/lower case in headings. This is what the validation language Schematron (an ISO/IEC standard) does. It provides the opportunity to analyse and check text content available in XML, both automatically and with sensitivity to context.

Unlike other checking mechanisms, Schematron is not based on a grammar or spell check. Instead, it checks XML documents according to specific rules that the user can set themselves.

A Schematron rule can be used to check the content of a document or to conduct a specific review of certain XML elements within the document. This makes it easy to find forbidden terminology, for example. It also ensures double spaces or blank headings and paragraphs are avoided.

In addition, it enables the structure of the document to be analysed. This ensures that elements, such as the table of contents, appear in the right place. Thus, it is also possible to check the number of items in a list, for example.

How Schematron works

The function behind Schematron is based on XSL and XPath. Both programming languages are used for navigation in the document and the subsequent checking of conditions. XPath functions in a similar way to conventional specifications or URLs, which most people are familiar with. Various conditions, against which the content of the XML elements should be checked, are embedded within a Schematron rule.

For example:

<rule context=”ul | ol”>

<assert test=”count(li)<=7″>The list contains more than seven list items. </assert>

</rule>

With this rule, all lists are checked for the number of list items. If the rule that was formulated under test returns the logical value false, the content of the assert statement is displayed.

Syntax

A rule is always constructed according to the same principle and mainly consists of five different elements: <schema>, <pattern>, <rule>, <assert> or <report> and <ns>.

In the rule statement, the context attribute is used to specify one or more XML elements whose context or content should be checked. The rule specified in the assert or report element under the test attribute is applied to all nodes to which this characteristic applies. The content of the assert or report element is then displayed as an error message.

Error messages

There are essentially two options for triggering a message. The first option is the assert statement. This triggers a message whenever the condition linked with it is not fulfilled (see example). The second option is the report statement. In contrast to the assert statement, the report statement triggers a message if the condition with which it is linked is fulfilled. However, there is no essential need to make any distinction. By using negation, you can formulate all rules with both statements.

Error message hierarchy

Error messages can also be placed in a hierarchy. As with warnings, a level can be assigned to the error message that indicates its degree of severity. Error hierarchies are not predefined by Schematron. The choice of warning levels available depends on the respective implementation. For example, the SCHEMA ST4 content management system offers the following error message hierarchy:

  • Information
  • Warning
  • Error
  • Critical error

There is also the option to generate error messages dynamically by retrieving values that are then displayed in the error message.

Schematron input mask

SCHEMA ST4 has supported Schematron for some time. The dialog field, in which rules can be formulated, makes it very easy to use. You can switch between two different modes of rule formulation – expert mode and auto mode. Expert mode enables you to formulate code for the relevant rule relatively autonomously. In auto mode, the author of the rule is provided with greater support and does not formulate the code autonomously. The author follows an input screen, which asks them to specify the key elements of a rule, and then packages this requirement into a consistent rule.

It is easy to adapt rule formulation in SCHEMA ST4. Only the assert statement, and not the report statement, may be used. However, since any rule can be formulated with the assert and report statement when using negation, this does not pose any restriction whatsoever; it simply makes it easier for the user. Collective rules are used for the purpose of rule nesting in SCHEMA ST4. This means that complex rules can also be implemented.

SCHEMA ST4 Schematron Report

Self-created Schematron rules are used in ST4 with the Schematron Report

The integration of Schematron in SCHEMA ST4 makes it considerably easier for users with little programming knowledge to edit rules, and to formulate and apply their own rules.

Formulating a Schematron rule

To formulate a Schematron rule, it can help to begin by expressing the rule required in words. You can then use XPath and XSL to formulate this into a Schematron rule. You need to bear the following points in mind:

  • Formulate meaningful error messages so that the error and its degree of severity can be clearly identified.
  • Construct collective rules (e.g. summarise specific rules for projects/customers).
  • Formulate consistent and meaningful names and descriptions for rules.
  • For use in SCHEMA ST4: copy and amend rules that have already been formulated.

Examples

Title with no double spaces:

<rule context=”node | textmodule”>

<assert test=”not(contains(title,’  ‘))”> The title contains double spaces. </assert>

</rule>

 

Title page in project exists:

<rule context=”node[@class=’Project’]”>

<assert test=”node[@class=’Title’]”> There is no title page in the project. </assert>

</rule>

Summary

Schematron can be a great help for quality assurance. With a little practice, you can quickly formulate and analyse new rules. With Schematron, XML documents can be checked again in a way that differs from conventional schema languages. A combination of several schema languages can be useful. Using Schematron in SCHEMA ST4 can save a lot of time and effort through automatic checking, as entire sets of rules can be executed automatically. With little previous knowledge, rule formulation is quick and easy to learn as SCHEMA ST4 provides additional support. For specific conventions and frequently occurring errors, it definitely makes sense to consider creating one or more Schematron rules.

 

Katharina Kirchner, formerly an intern and currently a working student at CARSTENS + PARTNER. She is studying “Technical Writing and Communication” at Munich University of Applied Sciences.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s