JiBX: Schema Compatability

JiBX Basics

Overview

Flexibility

Performance

Clean code

News and Status

User Comments

Schema compatability

The W3C XML Schema specification is considered by many to be complex and poorly structured, to the point where several years after its release compatibility problems are still being found effecting the few relatively "complete" implementations available. Even so, the backing of all the major industry players has largely succeeded in establishing schema as the preferred mechanism for defining XML document grammars. Compatibility with schema definitions is therefore an increasingly important issue for many Java developers working with XML.

JiBX binding definitions are primarily designed for ease of use by Java developers working with XML. Most schema constructs have JiBX equivalents, but some of the equivalents are not exact; also, some JiBX binding definition constructs do not have schema equivalents. This page attempts to cover both the similarities and differences between bindings and schemas. If you need to work with schemas, it's best to keep these differences in mind. JiBX is extensible enough to provide work-arounds for most of the schema issues, but your development will be easier the more you can avoid issues in the first place by using compatible XML structures.

If you're starting from existing Java code the Generator Tools subproject can be useful to give you a default binding definition, and a basic schema definition from the combination of Java code and a binding definition (whether generated or constructed by hand). Likewise, if you're starting from a schema, the Xsd2Jibx subproject can generate both Java code and a binding definition to match the schema. But both these subprojects have limitations on the features they can handle, so you'll often need to hand-modify the generated artifacts in complex cases.

Basic schema components

Global elements and named complex types are the most common components at the top level of a schema document. Each global element can be the root element of an instance document matching the schema, and can also be included by reference in other definitions. In JiBX terms, a non-abstract mapping definition that's a direct child of the binding element corresponds precisely to a global element of a schema definition. Each named complex type in a schema is essentially an anonymous element definition which can be referenced within other definitions, with the actual element name set at the point of use (and possibly different for each use). A named complex type roughly corresponds to an abstract mapping definition with no extension mappings, but the abstract mapping is more flexible.

The flexibility of the JiBX abstract mapping allows it to be used for several other schema constructs, including named model and attribute groups. Abstract mappings can even be used for a combined model and attribute group structure which has no equivalent in schema terms. The difference between using an abstract mapping as a complex type and using it as a model or attribute group equivalent just comes down to whether you specify a name on the reference to the abstract mapping. With a name specified you're using the mapping as a complex type-equivalent, where the specified name creates a wrapper element for the structure defined by the mapping. Without a name on the reference you're embedding the mapping structure directly within the enclosing definition.

Extension mappings provide the equivalent of schema substitution groups. In schema terms, a substitution group is a tree structure of element definitions, where the root of the tree sets a basic structure which can be extended or restricted by branches. When a mapping is used as the base of a substitution group it is extended by other mapping definitions, each defining a distinct element name.

Subclass relationships

Schema defines an extension mechanism for complex types which is roughly equivalent to the extends relationship between a subclass and the superclass. This can be modeled in JiBX binding definitions by using abstract mapping definitions for both the base class (or interface) and the subclass. The mapping for the subclass can then invoke the superclass mapping as part of its structure (by using a structure element with a map-as attribute specifying the superclass). It's not necessary for the subclass mapping to extend the base class mapping (and would generally not be appropriate for it to do so, except in the case where the definitions are being used as part of a substitution group).

The schema form of extension only allows an extension type to append to the base type definition, not insert new components into the base structure. In terms of JiBX bindings, this means that mapping definitions which are intended to be used as the equivalent of schema extensions should invoke the base class mapping as the first item in the ordered list of child components. Schema has no equivalent to the more flexible structure allowed by JiBX, where extension types can include the base class mapping at any point in the structure (or even replace the base class mapping completely).

Schema also defines a restriction mechanism for complex types which has no equivalent in programming language terms. The closest analogy is probably to a subclass which prohibits the use of some fields of the base class. This relationship is not one which is normally seen as desirable in object-oriented programming terms. If you really want to implement this type of relationship using JiBX you can do so, by defining a trivial subclass of the base class (one which does not add anything to the base class), then mapping only those inherited fields or properties of the base class corresponding to items included in the restricted schema type.

Content models

Schema uses three separate model group variations for representing the order and sequence of child elements within an instance document. The sequence variation is the most common. It's used to represent a set of child elements occurring in a particular order. This is the same as the default form of grouping used in JiBX binding definitions, so no special handling is needed to represent a sequence model group in a binding. The other two model group variations are choice and all, and these do require some special handling.

The choice variation is the second most common content model. A choice element in a schema definition allows one and only one of the nested element definitions to be present at that point in an instance document. JiBX 1.1 binding definitions provide the loose equivalent of a choice model group by using the choice="true" attribute of a mapping or structure definition. This properly accepts only one of the alternative elements in the group when unmarshalling, but will generate output with more than one of the alternatives when marshalling if the values are present (but see Verification hooks if you need to enforce the schema limits).

Both sequence and choice model groups can contain nested model groups of these same types, in addition to actual element definitions. Each of the child components of these model groups, whether an element or a nested model group, can specify minimum and maximum occurance counts for that component. These occurrance counts each default to 1, meaning that by default one and only one occurrance of a component can be present in an instance document. This is the same as the default in a JiBX binding definition. The common case of an optional component in the schema (minOccurs="0") is handled using optional="true" in the binding definition, while cases with repeated values (a maxOccurs value greater than 1, or "unbounded") correspond to collection elements in the binding definition. JiBX does not enforce the equivalent of schema limits on the number of times a repeated value occurs, but this is generally a minor issue (again, see Verification hooks if you need to enforce the schema limits).

The third schema content model variation is the all element. This allows at most one occurrance of each contained element, which can be in any order. The all model group can only contain element definitions, and cannot be used as a child component of the other types of model groups. Because of these restrictions it's probably the most uncommon model group variation. JiBX supports this model directly as a group of element definitions nested within a structure with ordered="false". Note that versions of JiBX prior to 1.1 did not fully support this model, because they required all components of an unordered grouping to be optional; this restriction has been removed in 1.1.

Schema definitions permit any components as part of a content model, meaning that any element (potentially restricted by namespace) will be allowed at that point in an instance document. This type of arbitrary XML content cannot be handled directly by any data binding framework. However, JiBX supports using document models for portions of documents by way of custom marshaller/unmarshallers supplied as part of the jibx-extras.jar. See Document models for details of using this extension to the basic JiBX framework.

One final content model issue relates to the handling of text content. JiBX provides support for specifying the structure of mixed content (with a combination of text and child elements) which is missing from schema. The best you can do in schema terms is to flag the containing element as using mixed content. JiBX also allows you to specify CDATA text values in your output, but this is strictly a syntactical convenience which doesn't actually effect the grammar of the generated XML documents.

Simple types

Schema definitions can make use of a wide variety of types for both attribute values and text content. Starting from a base of forty-some predefined types, you can derive your own even more esoteric and specialized types by either restricting the possible values on a type, forming a list of whitespace-separated instances of a type, or merging multiple types in a union. Derived types may be named (when defined as top-level components of the schema definition) or anonymous.

The JiBX binding definition equivalent to a simple type is a serializer/deserializer method pair. These convert the text representation of a value to and from the representation used by the Java application code. The Java representation can be either an object or a primitive type. Named simple types in a schema generally correspond to format elements in a binding definition, while inline simple types correspond to using serializer="class-and-method" and deserializer="class-and-method" directly on a value element.

The most common schema predefined types have default Java equivalents in JiBX terms (see Value Conversions for details). For other predefined types you can generally define your own conversions in the form of serializer/deserializer methods. The same holds for derived types: In the case of restriction-derived types you can either use the base type directly and rely on checks in your code to enforce the restriction (such as a value range check in the setter method for a property) or define your own conversion methods to handle the derived type directly. For list types you can again define your own conversion methods (which would generally convert from and to arrays of some base type). For union types you probably need to define a corresponding class which wraps fields of each possible type, along with a custom conversion for this class. See Custom serializers and deserializers for an introduction to working with custom value conversions.

The only predefined schema types which cause problems when working with JiBX are QName and IDREFS. Both of these require access to the marshalling/unmarshalling context information in order to properly handle conversions (in the first case for the namespace information, in the second for access to ID definitions). These are not commonly used types in general XML, but are often used for special purposes. With JiBX 1.0 the only way to handle these types is with a custom marshaller/unmarshaller for the containing element. See Custom marshallers and unmarshallers for an introduction to extending JiBX in this manner. JiBX 1.1 includes a QName implementation as part of the standard runtime (the org.jibx.runtime.QName class).

Verification hooks

JiBX user extensions can support selective verification of schema constraints when marshalling and unmarshalling. The pre-get attribute specifies a method to be called on an instance of a class before that instance is marshalled. A method of this type can check that all property values of the instance match the schema requirements, throwing an exception if any error is found. The post-set attribute specifies a method to be called on an instance of a class after that instance is unmarshalled. A method of this type can check that the property values unmarshalled from a document match the schema definition, again throwing an exception if any error is found.

These user extension methods can verify any or all aspects of data with respect to the schema definition, including such things as making sure only one component of a choice model group is present when marshalling, or that restricted values match the expected patterns. There is currently no support for generating methods of these types automatically, but a future replacement for the current Xsd2Jibx code may add this functionality.

Unsupported schema features

There are a few aspects of schema usage which are not supported at all in the current JiBX 1.0 code. Probably the most important missing feature is support for xsi:type attributes in instance documents. This attribute effectively embeds schema metadata directly into the instance document, which seems a questionable practice from the standpoint of structuring. However, its use has become so widespread (especially in the context of web services) that support is probably necessary in a general XML framework. JiBX 2.0 will fully support this attributes.

The xsi:nil attribute, also used in instance documents, was not supported by JiBX 1.0. JiBX 1.1 added support for this feature using the nillable attribute in the object attribute group.

Another part of schema which is not supported by JiBX is the use of identity constraints in the form of keyref, keydef, and unique elements. These schema definition elements build on top of the XPath document navigation paths to define database-like references between document components. The XPath navigation unfortunately does not correspond well with programming language object structures. It's possible that some form of partial support for these components may be added in JiBX 2.0, but they are unlikely to ever be supported fully as such.

These XPath-based identity constraints tend to be rarely used in practice. JiBX does include extended support for the more common ID-IDREF links between document components, which can often be substituted for the XPath-based alternatives. If you really need the XPath type of linkages you can use a combination of user extensions methods and (in the most difficult cases) custom marshaller/unmarshallers to implement these linkages in a flexible way.

Finally, there's no JiBX equivalent to anyAttribute "wildcard" attributes in schema. These allow any attribute (potentially restricted by namespace) to be used with that element in an instance document. The only way to handle arbitrary attributes with JiBX is by using custom code such as a specialized pre-get method or a marshaller/unmarshaller for the element.