A parser-formatter using signals and slots

The use of signals and slots in the previous section was an example of using signals and slots in GUI building. Of course, you can use signals and slots to link GUI widgets with each other, and most of your slot implementations will be in subclasses of QWidget — but the mechanism works well under other circumstances. A GUI is not necessary.

In this section, I will show how signals and slots make a natural extension to the event driven nature of XML parsers. As you probably know, XML is a fairly simple mark-up language that can be used to represent hierarchical data. There are basically two ways to look at XML data. One is to convert the data in one fell swoop into some hierarchical representation (for example, dictionaries containing dictionaries). This method is the DOM (data-object-model) representation. Alternatively, you can parse the data character by character, generating an event every time a certain chunk has been completed; this is the SAX parser model.

Python contains support for both XML handling models in its standard libraries. The currently appreciated module is xml.sax, which can make use of the fast expat parser. However, expat is not part of standard Python. There is an older, deprecated module, xmllib, which uses regular expressions for parsing. While deprecated, this module is still the most convenient introduction to XML handling with Python. It's also far more ‘Pythonic' in feel than the Sax module, which is based on the way Java does things.

We'll create a special module that will use xmllib to parse an XML document and generate PyQt signals for all elements of that document. It is easy to connect these signals to another object (for instance, a PyQt QListView which can show the XML document in a treeview). But it would be just as easy to create a formatter object that would present the data as HTML. A slightly more complicated task would be to create a formatter object that would apply XSLT transformations to the XML document — that is, it would format the XML using stylesheets. Using signals and slots, you can connect more than one transformation to the same run of the parser. A good example would be a combination of a GUI interface, a validator, and a statistics calculator.

The next example is very simple. It is easy to extend, though, with special nodes for comments, a warning message box for errors, and more columns for attributes.

Example 7-9. An XML parser with signals and slots

#
# qtparser.py — a simple parser that, using xmllib,
# generates a signal for every parsed XML document.
#

import sys
import xmllib                                              (1)
from qt import *

TRUE=1                                                     (2)
FALSE=0
        
(1)
We import the deprecated xmllib module. It is deprecated because the sax module, which uses the expat library, is a lot faster. The xmllib module is far easier to use, however, and since it uses regular expressions for its parsing, it is available everywhere, while the expat library must be compiled separately.
(2)
It is often convenient to define constants for the boolean values true and false.
class Parser(xmllib.XMLParser):                            (1)

    def __init__(self, qObject,  *args):                   (2)
        xmllib.XMLParser.__init__(self)
        self.qObject=qObject

    def start(self, document):                             (3)
        xmllib.XMLParser.feed(self, document)
        xmllib.XMLParser.close(self)

        
(1)
This is the Parser class. It inherits the XMLParser class from the xmllib module. The XMLParser class can be used in two ways: by overriding a set of special methods that are called when the parser encounters a certain kind of XML element, or by overriding a variable, self.elements, which refers to a dictionary of tag-to-method mappings. Overriding self.elements is very helpful if you are writing a parser for a certain DTD or XML document type definition, though it is not the way to go for a generic XML structure viewer (such as the one we are making now).

An example for a Designer ui file could contain the following definition:

self.elements={'widget'  : (self.start_widget,
                            self.end_widget)
              ,'class'   : (self.start_class,
                            self.end_class)
              ,'property': (self.start_property,
                            self.end_property)
              ,name'     : (self.start_name,
                            self.end_name)}
          

The keys to this dictionary are the actual tag strings. The tuple that follows the key consists of the functions that should be called for the opening and the ending tag. If you don't want a function to be called, enter None. Of course, you must implement these functions yourself, in the derived parser class.

(2)
The first argument (after self, of course) to the constructor is a QObject. Multiple inheritance isn't a problem in Python, generally speaking, but you cannot multiply inherit from PyQt classes. Sip gets hopelessly confused if you do so. So we pass a QObject to the constructor of the Parser class. Later, we will have this QObject object emit the necessary signals.
(3)
The start function takes a string as its parameter. This string should contain the entire XML document. It is also possible to rewrite this function to read a file line by line; the default approach makes it difficult to work with really large XML files. Reading a file line by line is a lot easier on your computer's memory. You should call close() after the last bit of text has been passed to the parser.
    #
    # Data handling functions                              (1)
    #
    def handle_xml(self, encoding, standalone):            (2)
        self.qObject.emit(PYSIGNAL("sigXML"),
                          (encoding, standalone))
                                                           (3)
    def handle_doctype(self, tag, pubid, syslit, data):
        self.qObject.emit(PYSIGNAL("sigDocType"),
                         (tag, pubid, syslit, data,))      (4)

    def handle_data(self, data):
        self.qObject.emit(PYSIGNAL("sigData"),(data,))     (5)

    def handle_charref(self, ref):
        self.qObject.emit(PYSIGNAL("sigCharref"),(ref,))   (6)

    def handle_comment(self, comment):
        self.qObject.emit(PYSIGNAL("sigComment"),(comment,))(7)

    def handle_cdata(self, data):
        self.qObject.emit(PYSIGNAL("sigCData"),(data,))    (8)

    def handle_proc(self, data):
        self.qObject.emit(PYSIGNAL("sigProcessingInstruction"),(9)
                         (data,))

    def handle_special(self, data):                        (10)
        self.qObject.emit(PYSIGNAL("sigSpecial"), (data,))

    def syntax_error(self, message):                       (11)
        self.qObject.emit(PYSIGNAL("sigError"),(message,))

    def unknown_starttag(self, tag, attributes):           (12)
        self.qObject.emit(PYSIGNAL("sigStartTag"),
                         (tag,attributes))
                                                           (13)
    def unknown_endtag(self, tag):
        self.qObject.emit(PYSIGNAL("sigEndTag"),(tag,))
                                                           (14)
    def unknown_charref(self, ref):
        self.qObject.emit(PYSIGNAL("sigCharRef"),(ref,))

    def unknown_entityref(self, ref):
        self.qObject.emit(PYSIGNAL("sigEntityRef"),(ref,))
        
(1)
The xmllib.XMLParser class defines a number of methods that should be overridden if you want special behavior. Even though we will only use the methods that are called when a document is started and when a simple element is opened and closed, I've implemented all possible functions here.
(2)
Every valid XML document should start with a magic text that declares itself to be XML — note that that the .ui Designer files don't comply with this requirement. This method is fired (and thus the signal is fired) when the parser encounters this declaration. Normally, it looks like this: <?xml version="1.0" standalone="no"?>, with the minor variation that standalone can also have the value "yes".
(3)
If an XML document has a documenttype, this method is called. A doctype declaration looks like this:
<!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
     "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">
            

and points to a DTD — a description of what's allowed in this particular kind of XML document.

(4)
There can be data in between the tags in an XML document, just as with the text in a HTML document. This function is called when the parser encounters such data.
(5)
In XML, you can use special characters that are entered with &#, a number, and closed with a semicolon. Python's xmllib will want to translate this to an ASCII character. You cannot use xmllib to parse documents that contain references to Unicode characters.
(6)
XML has the same kind of comments as HTML. Most parsers simply pass the comments, but if you want to show them (for instance, in a structured view of an XML document) or if you want to preserve the contents of the file exactly, you can connect a slot to the signal emitted by this function.
(7)
CDATA is literal data enclosed between <![CDATA[ and ]]>. A file containing
<![CDATA[surely you will be allowed to
starve to death in one of the royal parks.]]>
            

will present the quote ‘surely you will be allowed to starve to death in one of the royal parks.' to any slot that is connected to sigCData.

(8)
This is called when the XML document contains processing instructions. A processing instruction begins with <?. All special cases, such as the XML declaration itself, are handled by other methods.
(9)
You can declare entities in XML — references to something externally defined. Those start with <!. The contents of the declaration will be passed on in the data argument.
(10)
XML is far less forgiving than HTML (or at least, XML has both a stricter definition and less easy-going parsers), and whenever an error is encountered, such as forgetting to close a tag, this method is called.
(11)
unknown_starttag is the most interesting method in the xmllib.XMLParser class. This is called whenever the xmllib parser encounters a plain tag that is not present in its elements dictionary. That is, it will be called for all elements in our current implementation.
(12)
Likewise, unknown_endtag is called for the corresponding ending tags.
(13)
Whenever the parser encounters an unresolvable numeric character reference, this function is called.
(14)
Unknown entities are forbidden in XML — if you use an entity somewhere in your document (which you can do by placing the name of the entity between an ampersand and a semicolon), then it must be declared. However, you might want to catch occurrences of unknown entities and do something special. That's why the function unknown_entityref is implemented here. By default unknown_entityref calls the syntax_error() function of xmllib.XMLParser.

The TreeView class will show the contents of the XML file.

class TreeView(QListView):                                 (1)

    def __init__(self, *args):
        apply(QListView.__init__,(self, ) + args)
        self.stack=[]                                      (2)
        self.setRootIsDecorated(TRUE)                      (3)
        self.addColumn("Element")                          (4)

    def startDocument(self, tag, pubid, syslit, data):     (5)
        i=QListViewItem(self)
        if tag == None: tag = "None"
        i.setText(0, tag)
        self.stack.append(i)

    def startElement(self, tag, attributes):               (6)
        if tag == None: tag = "None"
        i=QListViewItem(self.stack[-1])
        i.setText(0, tag)
        self.stack.append(i)

    def endElement(self, tag):                             (7)
        del(self.stack[-1])
      
(1)
The TreeView class is a simple subclass of PyQt's versatile QListView class.
(2)
Because XML is a hierarchical file format, elements are neatly nested in each other. In order to be able to create the right treeview, we should keep a stack of the current element depth. The last element of the stack will be the parent element of all new elements.
(3)
This option sets the beginning of the tree at the first element, making it clear to the user that it's an expandable tree instead of a simple list.
(4)
We present only one column in the listview — if you want to show the attributes of elements, too, you might add a few more columns.
(5)
The startDocument function is called when the XML document is opened. It also starts the call stack by creating the first element. The first QListViewItem object has the listview as a parent; all others with have a QListViewItem object as parent. The constructor of QListViewItem is so overloaded that sip tends to get confused, so I create the item and set its text separately.
(6)
Whenever an element is opened, a QListViewItem item is created and pushed on the stack, where it becomes the parent for newly opened elements.
(7)
Conversely, when the element is closed, it is popped from the stack.
def main(args):

    if (len(args) == 2):
        app = QApplication(sys.argv)

        QObject.connect(app, SIGNAL('lastWindowClosed()'),
                        app, SLOT('quit()'))
        w = TreeView()
        app.setMainWidget(w)

        o=QObject()                                        (1)
        p=Parser(o)                                        (2)
        QObject.connect(o, PYSIGNAL("sigDocType"),         (3)
                           w.startDocument)
        QObject.connect(o, PYSIGNAL("sigStartTag"),
                           w.startElement)
        QObject.connect(o, PYSIGNAL("sigEndTag"),
                           w.endElement)

        s=open(args[1]).read()                             (4)
        p.start(s)

        w.show()
        app.exec_loop()
    else:
        print "Usage: python qtparser.py FILE.xml"

if __name__=="__main__":
    main(sys.argv)
        
(1)
Here we create a QObject which is used to emit all necessary signals, since we cannot inherit from more than one PyQt class at the same time. Note that by using this technique, you don't have to subclass from QObject in order to be able to emit signals. Sometimes delegation works just as well.
(2)
A parser object is created, with the QObject object as its argument.
(3)
Before feeding the parser the text, all connections we want are made from the QObject object (which we passed to the parser to make sure it can emit signals) to the TreeView object that forms the main window.
(4)
The file whose name was given on the command line is read and passed on to the parser. I have included a very small test file, test.xml, but you can use any Designer UI design file.

This is a very simple and convenient way of working with XML files and PyQt gui's — but it's generally useful, too. The standard way of working with XML files and parsers allows for only one function to be called for each tag. Using signals and slots, you can have as many slots connected to each signal as you want. For instance, you can have not only a gui, but also an analyzer that produces statistics listening in on the same parsing run.

The result of parsing a Designer .ui file.

On a final note, there is one bug in this code... See if you can find it, or consult the Section called QListView and QListViewItem in Chapter 10 for an explanation.