Using PLY for Parsing Without Using it for Lexing

Mon, Nov 23, 2009

Over the past week or so I’ve been struggling with attempting to write my own parser (or parser generator) by hand. A few days ago I finally decided to give up on this notion (after all the parser isn’t my end goal) as it was draining my time from the interesting work to be done. However, I wanted to keep my existing lexer. I wrote the lexer by hand in the method I described in a previous post, it’s fast, easy to read, and I rather like my handiwork, so I wanted to keep it if possible. I’ve used PLY before (as I described last year) so I set out to see if it would be possible to use it for parsing without using it for lexing.

As it turns out PLY expects only a very minimal interface from it’s lexer. In fact it only needs one method, token(), which returns a new token (or None at the end). Tokens are expected to have just 4 attributes. Having this knowledge I now set out to write a pair of compatibility classes for my existing lexer and token classes, I wanted to do this without altering the lexer/token API so that if and when I finally write my own parser I don’t have to remove legacy compatibility stuff. My compatibility classes are very small, just this:

class PLYCompatLexer(object):
    def __init__(self, text):
        self.text = text
        self.token_stream = Lexer(text).parse()

    def token(self):
        try:
            return PLYCompatToken(self.token_stream.next())
        except StopIteration:
            return None


class PLYCompatToken(object):
    def __init__(self, token):
        self.type = token.name
        self.value = token.value
        self.lineno = None
        self.lexpos = None

    def __repr__(self):
        return "<Token: %r %r>" % (self.type, self.value)

This is the entirety of the API that PLY needs. Now I can write my parser exactly as I would normally with PLY.