Using PLY for Parsing Without Using it for Lexing
Over the past week or so I’ve been struggling with attempting to write my own parser (or parser generator) by hand. A few days ago I finally decided to give up on this notion (after all the parser isn’t my end goal) as it was draining my time from the interesting work to be done. However, I wanted to keep my existing lexer. I wrote the lexer by hand in the method I described in a previous post, it’s fast, easy to read, and I rather like my handiwork, so I wanted to keep it if possible. I’ve used PLY before (as I described last year) so I set out to see if it would be possible to use it for parsing without using it for lexing.
As it turns out PLY expects only a very minimal interface from it’s lexer. In fact it only needs one method, token(), which returns a new token (or None at the end). Tokens are expected to have just 4 attributes. Having this knowledge I now set out to write a pair of compatibility classes for my existing lexer and token classes, I wanted to do this without altering the lexer/token API so that if and when I finally write my own parser I don’t have to remove legacy compatibility stuff. My compatibility classes are very small, just this:
class PLYCompatLexer(object):
def __init__(self, text):
self.text = text
self.token_stream = Lexer(text).parse()
def token(self):
try:
return PLYCompatToken(self.token_stream.next())
except StopIteration:
return None
class PLYCompatToken(object):
def __init__(self, token):
self.type = token.name
self.value = token.value
self.lineno = None
self.lexpos = None
def __repr__(self):
return "<Token: %r %r>" % (self.type, self.value)
This is the entirety of the API that PLY needs. Now I can write my parser exactly as I would normally with PLY.