Parsing language with lots of "keywords" #1542
Unanswered
maurymarkowitz
asked this question in
Q&A
Replies: 2 comments 1 reply
-
@maurymarkowitz Yes, Lark's contextual lexer actually supports this innately. See this sample code: from lark import Lark
grammar = r"""
start: line_statement*
line_statement: NUMBER statement
statement: "FOR" variable "=" expression "TO" expression
expression: NUMBER
variable: CNAME
%import common.NUMBER
%import common.WS_INLINE
%import common.CNAME
%ignore WS_INLINE
"""
lark = Lark(grammar, start='start', parser='lalr')
inputs = [
'10 FOR I = 1 TO 10',
'10FORI=1TO10'
]
for i in inputs:
tree = lark.parse(i)
print(tree.pretty()) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I really hope this is an allowable post here.
I came across Lark by accident while going down a google rabbit hole. I wonder if it might be able to solve a longstanding problem I've had.
I have previously written a system using flex/bison that runs old dialects of BASIC. The biggest problem I face is that BASIC does not require whitespace. This makes picking out keywords difficult. Consider "10 FOR I=1 TO 10", which can be entered as "10FORI=1TO10". So is that "FOR I" or "FORI"? In BASIC, parsing stops as soon as you hit a complete keyword, so it emits at FOR. Coding this in flex/bison is really annoying - basically you build an array of token strings and loop over it, so now you have two lists to maintain.
I'm wondering if anyone has come across something similar and Lark offers a solution? I'm sure there's a term for this, but reading the docs doesn't turn up anything that caught my eye.
Beta Was this translation helpful? Give feedback.
All reactions