Replies: 1 comment 1 reply
-
This is the intended behavior, but I understand it isn't always convenient. You could try to require a space or comma as part of the token, like this: (writing from memory) FRUIT1: ("apple" | "banana" | "strawberry" | "raspberry") /(?=[,\s])/
FRUIT2: ("apple" | "banana" | "strawberry" | "raspberry") /(?!\w)/ Regarding the example you gave, you can rewrite it in a more efficient way: SEP: "," | " "
fruits_sep: fruits SEP
animals_sep: animals SEP
start: fruits_sep? (animals_sep? vehicles | animals)
| fruits And you can move the optional operator into the rule, as lark supports empty rules, and that will help with the exponential growth of the rules. If none of these work well enough, we can discuss modifying the parser. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello and thank you for developing Lark!
This feels like a simple problem but I can't find a (clean) solution for it. I'm looking for a way to require some separator between rules, which must be present only if these rules are present. Let's use this basic grammar example.
So the following strings will be parsed successfully:
apple,banana cat,dog plane
,raspberry boat
,mouse plane
, etc.However, it will also parse
appleplane
orbanana mouseplane
as all spaces are ignored.I could do something like
start: (fruits " ")? (animals " ")? vehicles?
, but it would require a trailing space if there are only fruits and/or animals (and no vehicles), likeapple mouse
.What surprised me is that I tried the Python grammar with the online IDE and it seems to ignore spaces the same way. So that a
whileTrue:pass
produces the same valid AST aswhile True: pass
, even though it raises a SyntaxError in a REPL. (And just to be clear: that is really not a criticism or a complaint, it's just to explain my thought process.)Of course I could solve this really simple example with something like this.
However, it would grow exponentially with the number of rules in
start
(and it makes the grammar quite difficult to read and more error-prone if there are many rules). So is there any way to specify a separator between EBNF rules that I don't know of, or is it an open question/problem?Beta Was this translation helpful? Give feedback.
All reactions