Skip to content

Commit 08c9193

Browse files
authored
Merge pull request #1502 from nchammas/whitespace-example
Add guidance on handling comments in languages with significant indentation
2 parents aa3f6c2 + c3b83f0 commit 08c9193

File tree

5 files changed

+42
-11
lines changed

5 files changed

+42
-11
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,4 @@ tags
1313
/build
1414
docs/_build
1515
docs/examples
16+
docs/sg_execution_times.rst

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@
7676
#
7777
# This is also used if you do content translation via gettext catalogs.
7878
# Usually you set "language" from the command line for these cases.
79-
language = None
79+
language = 'en'
8080

8181
# List of patterns, relative to source directory, that match files and
8282
# directories to ignore when looking for source files.

docs/how_to_develop.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,3 +65,15 @@ Another way to run the tests is using setup.py:
6565
```bash
6666
python setup.py test
6767
```
68+
69+
## Building the Documentation
70+
71+
To build the documentation:
72+
73+
```sh
74+
cd docs/
75+
pip install -r requirements.txt
76+
make html
77+
```
78+
79+
To review the result, open the built HTML files under `_build/html/` in your browser.

docs/recipes.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ Prints out:
7979

8080
*Note: We don't have to return a token, because comments are ignored*
8181

82+
8283
## CollapseAmbiguities
8384

8485
Parsing ambiguous texts with earley and `ambiguity='explicit'` produces a single tree with `_ambig` nodes to mark where the ambiguity occurred.
@@ -193,3 +194,13 @@ def parse_with_progress(parser: Lark, text: str, start=None):
193194
```
194195

195196
Keep in mind that this implementation relies on the `InteractiveParser` and, therefore, only works with the `LALR(1)` parser, and not `Earley`.
197+
198+
199+
## Parsing a Language with Significant Indentation
200+
201+
If your grammar needs to support significant indentation (e.g. Python, YAML), you will need to use
202+
the `Indenter` class. Take a look at the [indented tree example][indent] as well as the
203+
[Python grammar][python] for inspiration.
204+
205+
[indent]: examples/indented_tree.html
206+
[python]: https://github.com/lark-parser/lark/blob/master/lark/grammars/python.lark

examples/indented_tree.py

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,28 +3,34 @@
33
===================
44
55
A demonstration of parsing indentation (“whitespace significant” language)
6-
and the usage of the Indenter class.
6+
and the usage of the ``Indenter`` class.
77
88
Since indentation is context-sensitive, a postlex stage is introduced to
9-
manufacture INDENT/DEDENT tokens.
9+
manufacture ``INDENT``/``DEDENT`` tokens.
1010
11-
It is crucial for the indenter that the NL_type matches
12-
the spaces (and tabs) after the newline.
11+
It is crucial for the indenter that the ``NL_type`` matches the spaces (and
12+
tabs) after the newline.
13+
14+
If your whitespace-significant grammar supports comments, then ``NL_type``
15+
must match those comments too. Otherwise, comments that appear in the middle
16+
of a line will `confuse Lark`_.
17+
18+
.. _`confuse Lark`: https://github.com/lark-parser/lark/issues/863
1319
"""
1420
from lark import Lark
1521
from lark.indenter import Indenter
1622

1723
tree_grammar = r"""
18-
?start: _NL* tree
19-
20-
tree: NAME _NL [_INDENT tree+ _DEDENT]
21-
2224
%import common.CNAME -> NAME
2325
%import common.WS_INLINE
24-
%declare _INDENT _DEDENT
26+
%import common.SH_COMMENT
2527
%ignore WS_INLINE
28+
%ignore SH_COMMENT
29+
%declare _INDENT _DEDENT
2630
27-
_NL: /(\r?\n[\t ]*)+/
31+
?start: _NL* tree
32+
tree: NAME _NL [_INDENT tree+ _DEDENT]
33+
_NL: (/\r?\n[\t ]*/ | SH_COMMENT)+
2834
"""
2935

3036
class TreeIndenter(Indenter):
@@ -39,6 +45,7 @@ class TreeIndenter(Indenter):
3945

4046
test_tree = """
4147
a
48+
# check this comment out
4249
b
4350
c
4451
d

0 commit comments

Comments
 (0)