Home of: [Atelier "FUJIGURUMA"] >> [SASAX hosted by SourceForge.net]

SEE "For Readers of English Version",
or Japanese version of this page


This document explains motivation to develop SASAX framework.

  1. problem of SAX(this page)
  2. reason why not other than SAX

Is SAX "Simple" in use ?

Is no SAX simple in use, is it ? Its "simple"-ness seems to be only of API definition.

Context management

When you want to parse XML document shown as "sample of XML document", you will catch below SAX events.:


sample of XML document
  1. startElement("idList")
  2. characters("\n ")
  3. startElement("id")
  4. characters("12345")
  5. endElement("id")
  6. characters("\n ")
  7. startElement("id")
  8. characters("23456")
  9. endElement("id")
  10. characters("\n")
  11. endElement("idList")

Then, you may do them on "startElement" event,

on "characters" event,

and on "endElement" event.

So, you should manage (1)transition between three parsing modes ("root parsing", "list parsing" and "ID parsing"), and (2)storage area for intermediate string of ID (because there is no warranty that content of "id" is notified at a time).

If you use "validating parser" with DTD or XML Schema definition, you may catch "ignorableWhitespace" events instead of "characters" with "\n " (and you can ignore those events), and also focus only on "characters" events between "start-" and "endElement" events for "id". But you should still manage (2)storage area.

And you will notice necessity of state transition management, when you want to parse more complex XML document, heterogeneous or nested structures for examples.


You should examine/deserialize from string representation to other types(e.g.: int, float, double, and so on) because they are specified as string in XML document.

Deserialization of some types throws runtime exception (e.g.: NumberFormatException for number), requires complex procedure(e.g.: dateTime), or some other utilities(e.g.: hexBinary or base64Binary).

The most important thing should be not to deserialize primitive values from string but to examine them(or their relations) in business logic.

Poorness around namespace

"start-" and "endElement" SAX events have 3 arguments to identify name of element: "namespace URI", "local name" and "qualified name".

You can receive "namespace URI" and "local name" on both events, only if you enable namespace awareness of SAXParser (by invocation of "setNamespaceAware(true)" on SAXParserFactory). Otherwise you can only receive "qualified name".

It is too complex to write code which can be used with both namespace aware/non-aware SAXParser, is not it ? It is wasteful to pass/receive (may be)unnecessary argument(s), is not it ?

In addition to it, SAX framework does not resolve namespace URI prefix for value of element or attribute, though it does so for name of them atuomatically.

SAX framework provides notification API for namespace URI prefix mapping management, but not any utilities for looking it up, though complete mapping ought to be held in SAX parser. So, you should manage them by yourself if you want to examine prefixed element/attribute values.

>> Next page(2/2), or jump from navigator bar at the top/bottom of page