Skip to content

Commit fb18862

Browse files
Update image regex to handle new page format (#254)
Fix test that fails due to being unable to parse image pages, and add a new test file for this case. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 701c6e4 commit fb18862

File tree

2 files changed

+613
-1
lines changed

2 files changed

+613
-1
lines changed

wikiteam3/dumpgenerator/dump/image/html_regexs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,6 @@
2929
r'(?im)<td class="TablePager_col_img_name">\s*<a href[^>]*?>(?P<filename>[^>]+)</a>[^<]*?<a href="(?P<url>[^>]+)">[^<]*?</a>[^<]*?</td>\s*'
3030
r'<td class="TablePager_col_thumb">[^\n\r]*?</td>\s*'
3131
r'<td class="TablePager_col_img_size">[^<]*?</td>\s*'
32-
r'<td class="(?:TablePager_col_img_user_text|TablePager_col_img_actor)">\s*(<a href="[^>]*?" title="[^>]*?">)?(?P<uploader>[^<]+?)(</a>)?\s*</td>'
32+
r'<td class="(?:TablePager_col_img_user_text|TablePager_col_img_actor)">\s*(?:<a href="[^>]*?" title="[^>]*?">)?(?:<bdi>)?(?P<uploader>[^<]+?)(?:</bdi>)?(?:</a>)?\s*(?:<span class="mw-usertoollinks">(?:(?!</span>)(?!</td>).)*?</span>)?</td>'
3333
),
3434
]

0 commit comments

Comments
 (0)