add more language support

fncokg · fncokg · commit 82c6238ab2da · 2025-03-26T11:55:44.000+08:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,7 +1,24 @@
+# 1.3.0 (2025-03-26)
+Support various custom numbering styles in a more clear and flexible way:
+- support various and auto-extend numbering symbols: arabic numbers, roman numbers, latin/greek/cyrillic letters (both upper and lower case), Chinese numbers, etc.
+- support custom numbering symbols for anything: any level of section, figure, table, equation, theorem, etc.
+
+Support appendix numbering.
+
+Metadata `{item_type}-symbols` and fields `{item_type}_sym` are no longer supported.
+
+## Migration Guide
+For people who are using 1.2.x version, there are some **removals**:
+
+The following metadata keys have already marked as deprecated in the previous version, and they are now **removed**:
+- metadata keys `section-format-source-i` and `section-format-ref-i` for the i-the level section numbering formatting are now removed. You should use the new `section-src-format-i` and `section-cref-format-i` keys instead.
+
+The following features are now **removed**, and you should use the new API instead:
+- metadata keys `{item_type}-symbols` and formatting fields `{item_type}_sym` are now removed. Now you can simply use `{item_type}-numstyle` to specify the numbering style directly.
+
 # 1.2.5 (2025-03-13)
 Support theorem numbering. Also refer to a [StackExchange question](https://tex.stackexchange.com/questions/738132/simultaneously-cross-referencing-numbered-amsthm-theorems-and-numbered-equations).
 
-
 # 1.2.4 (2025-03-07)
 Support customizable spacing command in the `equation-src-format` field (default now is `"\\quad({num})"`). Also refer to issue #11.
 
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 # pandoc-tex-numbering
-This is an all-in-one pandoc filter for converting your LaTeX files to any format while keeping **numbering, hyperlinks, caption formats and (clever) cross references in (maybe multi-line) equations, sections, figures, tables and theorems**. The formating is highly customizable, easy-to-use, and even more flexible than the LaTeX default.
+This is an all-in-one pandoc filter for converting your LaTeX files to any format while keeping **numbering, hyperlinks, caption formats and (clever) cross references in (maybe multi-line) equations, sections, figures, tables, theorems and appendices**. The formating is highly customizable, easy-to-use, and even more flexible than the LaTeX default.
 
 # Contents
 - [pandoc-tex-numbering](#pandoc-tex-numbering)
@@ -11,6 +11,7 @@ This is an all-in-one pandoc filter for converting your LaTeX files to any forma
 - [Quick Start](#quick-start)
 - [Customization](#customization)
   - [General](#general)
+  - [Numbering System](#numbering-system)
   - [Formatting System](#formatting-system)
     - [Prefix-based System](#prefix-based-system)
     - [Custom Formatting System (f-string formatting)](#custom-formatting-system-f-string-formatting)
@@ -20,8 +21,7 @@ This is an all-in-one pandoc filter for converting your LaTeX files to any forma
   - [Theorems](#theorems)
   - [List of Figures and Tables](#list-of-figures-and-tables)
   - [Multiple References](#multiple-references)
-- [Details](#details)
-  - [Equations](#equations-1)
+  - [Appendix](#appendix)
   - [List of Figures and Tables](#list-of-figures-and-tables-1)
   - [Data Export](#data-export)
   - [Log](#log)
@@ -41,7 +41,8 @@ This is an all-in-one pandoc filter for converting your LaTeX files to any forma
 - **`cleveref` Package**: `cref` and `Cref` commands are supported. You can customize the prefix of the references.
 - **Subfigures**: `subcaption` package is supported. Subfigures can be numbered with customized symbols and formats.
 - **Theorems**: Theorems are supported with customized formats.
-- **Non-Arabic Numbers**: Chinese numbers "第一章", "第二节" etc. are supported. You can customize the numbering format.
+- **Appendices**: Appendices are supported with customized formats.
+- **Non-Arabic Numbers**: Various non-arabic numbers are supported, such as Latin letters, Chinese, Roman, Greek, Cyrillic, etc.
 - **Custom List of Figures and Tables**: **Short captions** as well as custom lof/lot titles are supported for figures and tables.
 - **Custom Formatting of Everything**: You can customize the format of the numbering and references with python f-string format based on various fields we provide.
 
@@ -84,6 +85,26 @@ You can set the following variables in the metadata of your LaTeX file to custom
 - `data-export-path`: Where to export the filter data. Default is `None`, which means no data will be exported. If set, the data will be exported to the specified path in the JSON format. This is useful for further usage of the filter data in other scripts or filter-debugging.
 - `auto-labelling`: Whether to automatically add identifiers (labels) to figures and tables without labels. Default is `true`. This has no effect on the output appearance but can be useful for cross-referencing in the future (for example, in the `.docx` output this will ensure that all your figures and tables have a unique auto-generated bookmark).
 
+## Numbering System
+- `{item_type}-numstyle`: The style of the numbering of figures, tables, equations, sections, theorems, subfigures. For example `figure-numstyle` represents the style of the numbering of figures.
+- `{item_type}-numstyle-{i}`: The style of the i-th level of the numbering of sections or appendices. For example, `section-numstyle-1` represents the style of the first level of the numbering of sections.
+
+Possible values are:
+- `arabic`: Arabic numbers (1, 2, 3, ...)
+- `roman`: Lowercase Roman numbers (i, ii, iii, ...)
+- `Roman`: Uppercase Roman numbers (I, II, III, ...)
+- `latin`: Lowercase Latin numbers (a, b, c, ...)
+- `Latin`: Uppercase Latin numbers (A, B, C, ...)
+- `greek`: Lowercase Greek numbers (α, β, γ, ...)
+- `Greek`: Uppercase Greek numbers (Α, Β, Γ, ...)
+- `cyrillic`: Lowercase Cyrillic numbers (а, б, в, ...)
+- `Cyrillic`: Uppercase Cyrillic numbers (А, Б, В, ...)
+- `zh`: Chinese numbers (一, 二, 三, ...)
+
+Default values of most of the items are `arabic`. Exceptions are:
+- Default value of `subfigure-numstyle` is `latin`.
+- Default value of `appendix-numstyle-1` is `Latin`.
+
 ## Formatting System
 
 We support a very flexible formatting system for the numbering and references. There are two different formatting systems for the numbering and references. You can use them together. The two systems are:
@@ -119,18 +140,28 @@ For sections, every level has its own formatting. You can set the metadata, for
 For equations, the default `src` format (i.e. `equation-src-format`) is `"\\qquad({num})"`. `\qquad` is used to offer a little space between the equation and the number. You can customize it as you like.
 
 #### Metadata Values
-The metadata values are python f-string format strings. Various fields are provided for you to customize the format. For example, if you set the `number-reset-level` to 2, `figure-prefix` to `figure` and `prefix-space` to `True`. Then, the fifth figure under subsection 2.3 will have the following fields:
-- `num`: `2.3.5`
-- `parent_num`: `2.3`
+The metadata values are python f-string format strings. Various fields are provided for you to customize the format. For example, if you have the following settings:
+- `number-reset-level`: `2`
+- `figure-prefix`:`"figure"`
+- `prefix-space` to `True`. 
+- `section-numstyle-1`: `"Roman"`
+- `figure-numstyle`: `"latin"`
+
+Then, the fifth figure under subsection 2.3 will have the following fields:
+- `num`: `II.3.e`
+- `parent_num`: `II.3`
+- `this_num`: `e` (note that the fields ended with `_num` will keep the numbering style settings)
 - `fig_id`: `5`
 - `prefix`: `figure ` (note the space at the end)
 - `Prefix`: `Figure `
 - `h1`: `2`
 - `h2`: `3`（note that `h2` is accessible only when the `number-reset-level` >= 2 and so on）
-- `h1_zh`: `二` (Chinese number support)
-- `h2_zh`: `三`
-
-For the subfigures, a special field `subfig_sym` is provided to represent the symbol of the subfigure. For example, if you set the `subfigure-symbols` metadata to `"αβγδ"`, the second subfigure will have the `subfig_sym` field as `"β"` while the `subfig_id` field as `2`.
+- `h1_zh`: `二`
+- `h1_roman`: `ii`
+- `h1_Roman`: `II`
+- `h1_latin`: `b`
+- `h1_Latin`: `B`
+- ... (any supported languages or symbols, see the [Numbering System](#numbering-system) section)
 
 Here are some examples of the metadata values:
 - set the `fig-src-format` metadata to `"{prefix}{num}"`, the numbering before its caption will be shown as "Figure 2.3.5"
@@ -168,6 +199,15 @@ For more details, see the [List of Figures and Tables](#list-of-figures-and-tabl
 
 NOTE: in case of setting metadata in a yaml file, the spaces at the beginning and the end of the values are by default stripped. Therefore, if you want to keep the spaces in the yaml metadata file, **you should mannually escape those spaces via double slashes.** For example, if you want set `multiple-ref-last-separator` to `" and "` (spaces appear at the beginning and the end), you should set it as `"\\ and\\ "` in the yaml file. See pandoc's [issue #10539](https://github.com/jgm/pandoc/issues/10539) for more further discussions.
 
+## Appendix
+- `appendix-names`: The names of the appendices separated by "/,". If you have this in your tex file:
+  ```latex
+    \appendix
+    \chapter{First Appendix}
+    \chapter{Second Appendix}
+    ```
+    You should set the metadata `appendix-names` to `"First Appendix/,Second Appendix"`. Note that the names should be separated by `"/,"`, not by `","` (so as to avoid conflicts with the commas in the names).
+
 # Details
 
 ## Equations
diff --git a/src/pandoc_tex_numbering/lang_num.py b/src/pandoc_tex_numbering/lang_num.py
@@ -33,7 +33,7 @@ def arabic2chinese(num):
         result = result[1:]
     return result
 
-def arabic2roman(num):
+def arabic2upper_roman(num):
     if num == 0: return "0"
     breaks = [1000,900,500,400,100,90,50,40,10,9,5,4,1]
     numerals = ["M","CM","D","CD","C","XC","L","XL","X","IX","V","IV","I"]
@@ -46,13 +46,16 @@ def arabic2roman(num):
                 continue
     return result
 
-def arabic2upper_latina(num):
-    upper_latina_numerals = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
-    return _from_seq(upper_latina_numerals,num)
+def arabic2lower_roman(num):
+    return arabic2upper_roman(num).lower()
 
-def arabic2lower_latina(num):
-    lower_latina_numerals = "abcdefghijklmnopqrstuvwxyz"
-    return _from_seq(lower_latina_numerals,num)
+def arabic2upper_latin(num):
+    upper_latin_numerals = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+    return _from_seq(upper_latin_numerals,num)
+
+def arabic2lower_latin(num):
+    lower_latin_numerals = "abcdefghijklmnopqrstuvwxyz"
+    return _from_seq(lower_latin_numerals,num)
 
 def arabic2upper_greek(num):
     upper_greek_numerals = "ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ"
@@ -62,11 +65,22 @@ def arabic2lower_greek(num):
     lower_greek_numerals = "αβγδεζηθικλμνξοπρστυφχψω"
     return _from_seq(lower_greek_numerals,num)
 
+def arabic2lower_cyrillic(num):
+    lower_cyrillic_numerals = "абвгдежзийклмнопрстуфхцчшщъыьэюя"
+    return _from_seq(lower_cyrillic_numerals,num)
+
+def arabic2upper_cyrillic(num):
+    upper_cyrillic_numerals = "АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ"
+    return _from_seq(upper_cyrillic_numerals,num)
+
 language_functions = {
     "zh": arabic2chinese,
-    "roman": arabic2roman,
-    "letter": arabic2lower_latina,
-    "Letter": arabic2upper_latina,
-    "gletter": arabic2lower_greek,
-    "Gletter": arabic2upper_greek,
+    "Roman": arabic2upper_roman,
+    "roman": arabic2lower_roman,
+    "latin": arabic2lower_latin,
+    "Latin": arabic2upper_latin,
+    "grekk": arabic2lower_greek,
+    "Greek": arabic2upper_greek,
+    "cyrillic": arabic2lower_cyrillic,
+    "Cyrillic": arabic2upper_cyrillic
 }
diff --git a/src/pandoc_tex_numbering/numbering.py b/src/pandoc_tex_numbering/numbering.py
@@ -12,9 +12,9 @@ def header_fields(header_nums):
         })
     return fields
 
-def nums2fields(nums,item_type,num_style="plain",prefix=None,pref_space=True,parent=None):
+def nums2fields(nums,item_type,num_style="arabic",prefix=None,pref_space=True,parent=None):
     parent_num = parent.ref if not parent is None else ""
-    if num_style == "plain":
+    if num_style == "arabic":
         this_num = str(nums[-1])
     else:
         assert num_style in language_functions, f"Invalid num_style: {num_style}, must be one of {list(language_functions.keys())}"
@@ -48,7 +48,7 @@ def nums2fields(nums,item_type,num_style="plain",prefix=None,pref_space=True,par
     return {**common_fields,**add_fields}
 
 class Formater:
-    def __init__(self,fmt_presets,item_type,num_style="plain",prefix=None,pref_space=True):
+    def __init__(self,fmt_presets,item_type,num_style="arabic",prefix=None,pref_space=True):
         self.fmt_presets = fmt_presets
         self.item_type = item_type
         self.num_style = num_style
diff --git a/src/pandoc_tex_numbering/pandoc_tex_numbering.py b/src/pandoc_tex_numbering/pandoc_tex_numbering.py
@@ -71,7 +71,7 @@ def prepare(doc):
         "lot_title": doc.get_metadata("lot-title", "List of Tables"),
 
         # Appendix Settings
-        "apx_names": doc.get_metadata("appendix-names", "Appendix").split("\,"),
+        "apx_names": doc.get_metadata("appendix-names", "Appendix").split("/,"),
 
         # Miscellaneous
         "data_export_path": doc.get_metadata("data-export-path", None),
@@ -135,7 +135,7 @@ def prepare(doc):
             item_type=item,
             prefix=doc.get_metadata(f"{aka[item]}-prefix", aka[item].capitalize()),
             pref_space=pref_space,
-            num_style=doc.get_metadata(f"{aka[item]}-numstyle", "plain")
+            num_style=doc.get_metadata(f"{aka[item]}-numstyle", "arabic")
         )
     
     for thm_type in doc.settings["theorem_names"]:
@@ -154,7 +154,7 @@ def prepare(doc):
             item_type=item_type,
             prefix=doc.get_metadata(f"theorem-{thm_type}-prefix", thm_type.capitalize()),
             pref_space=pref_space,
-            num_style=doc.get_metadata(f"theorem-{thm_type}-numstyle", "plain")
+            num_style=doc.get_metadata(f"theorem-{thm_type}-numstyle", "arabic")
         )
 
     
@@ -168,7 +168,7 @@ def prepare(doc):
         item_type="subfig",
         prefix=doc.get_metadata("subfigure-prefix", "Figure"),
         pref_space=pref_space,
-        num_style=doc.get_metadata("subfigure-numstyle", "letter")
+        num_style=doc.get_metadata("subfigure-numstyle", "latin")
     )
 
     formaters["sec"] = []
@@ -186,9 +186,9 @@ def prepare(doc):
                 fmt = doc.get_metadata(f"{aka[item]}-{preset}-format-{i}", default)
                 fmt_presets[preset] = fmt
             if item == "apx" and i == 1:
-                default_numstyle = "Letter"
+                default_numstyle = "Latin"
             else:
-                default_numstyle = "plain"
+                default_numstyle = "arabic"
             i_th_formater = Formater(
                 fmt_presets=fmt_presets,
                 item_type=item,
@@ -361,7 +361,6 @@ def add_label_to_caption(num_obj,label:str,elem:Union[Figure,Table]):
 
 
 def find_labels_header(elem,doc):
-    logger.info(f"Finding labels in header: {elem} with level {elem.level}")
     this_level = elem.level
     if this_level == 1:
         header_txt = to_string(elem)