Rotation metadata on scanned PDFs not handled by get_bitmap_rects in pypdfium2 backend

### Bug

While working with scanned PDF files, I noticed that some pages include rotation metadata rather than being physically rotated. In other words, the page you see in a viewer is visually rotated, but the underlying bitmap remains unrotated.

This causes an issue when performing OCR page detection with the pypdfium2 backend, as its get_bitmap_rects method does not account for the rotation metadata. As a result, the detected coordinates are misaligned with the displayed orientation of the page.

For example, I have two scanned PDFs — one includes rotation metadata while the other does not. When I enable settings.debug.visualize_ocr = True and inspect the debug output, the bounding boxes are significantly misaligned. This misalignment leads to many result items missing information, since there are no directly extractable text elements on the page.

Expected:

<img width="595" height="842" alt="Image" src="https://github.com/user-attachments/assets/82733a84-b6db-41f7-ac26-c94c9063b724" />

What I get:

<img width="595" height="842" alt="Image" src="https://github.com/user-attachments/assets/a8ec062b-8e32-4426-964b-14c3f86ae526" />


### Steps to reproduce

...
```python
from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import FigureElement, InputFormat, Table
from docling.datamodel.pipeline_options import PdfPipelineOptions, TableStructureOptions, TableFormerMode, RapidOcrOptions, EasyOcrOptions, TableStructureOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.backend.pypdfium2_backend import PyPdfiumDocumentBackend
from docling.datamodel.settings import settings
# 
settings.debug.visualize_layout = True
settings.debug.visualize_raw_layout = True
# settings.debug.visualize_tables = True
settings.debug.visualize_cells = True
settings.debug.visualize_ocr = True

pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.images_scale = 1.0
pipeline_options.generate_picture_images = True
pipeline_options.generate_table_images = True
pipeline_options.table_structure_options = TableStructureOptions(mode=TableFormerMode.ACCURATE)
ocr_options = RapidOcrOptions(force_full_page_ocr=False)
pipeline_options.ocr_options = ocr_options

doc_converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options, backend=PyPdfiumDocumentBackend)
    }
)

pdf_paths = [
    './pdfs/scan_smpl.pdf',
    './pdfs/scan_smpl_rotated.pdf'
]

for pdf_path in pdf_paths:
    conv_res = doc_converter.convert(pdf_path)

print(conv_res.document.export_to_markdown())
```


use the following two pdfs as comparison

[scan_smpl.pdf](https://github.com/user-attachments/files/21594563/scan_smpl.pdf)
[scan_smpl_rotated.pdf](https://github.com/user-attachments/files/21594562/scan_smpl_rotated.pdf)

### Docling version

v2.43.0

### Python version

3.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rotation metadata on scanned PDFs not handled by get_bitmap_rects in pypdfium2 backend #2038

Bug

Steps to reproduce

Docling version

Python version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rotation metadata on scanned PDFs not handled by get_bitmap_rects in pypdfium2 backend #2038

Description

Bug

Steps to reproduce

Docling version

Python version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions