Skip to content

Commit afdc397

Browse files
authored
Merge pull request #67 from TUDelftGeodesy/metadata_reading
Metadata reading
2 parents bdcbb91 + 894d614 commit afdc397

File tree

16 files changed

+2350
-12
lines changed

16 files changed

+2350
-12
lines changed

.github/workflows/sonarcloud.yml

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,24 @@ jobs:
1212
- uses: actions/checkout@v4
1313
with:
1414
fetch-depth: 0 # Shallow clones should be disabled for a better relevancy of analysis
15+
16+
- name: Set up Python
17+
uses: actions/setup-python@v4
18+
with:
19+
python-version: '3.11'
20+
21+
- name: Install dependencies
22+
run: |
23+
python -m pip install --upgrade pip
24+
pip install pytest pytest-cov
25+
pip install -e .
26+
27+
- name: Run tests with coverage
28+
run: |
29+
pytest --cov=sarxarray --cov-report=xml --cov-report=term
30+
1531
- name: SonarQube Scan
1632
uses: SonarSource/sonarqube-scan-action@v4
1733
env:
18-
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # Needed to get PR information, if any
34+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
1935
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

docs/data_loading.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,76 @@ The loading chunk size can also be specified manually:
5555
stack_smallchunk = sarxarray.from_binary(list_slcs, shape, chunks=(2000, 2000))
5656
```
5757

58+
## Reading metadata
59+
60+
SARXarray provides a function to read metadata from the interferogram stack coregistered by Doris v4 or Doris v5. The metadata is read as a dictionary from the `slave.res` file under the folder of each SLC.
61+
62+
### Doris v4 metadata
63+
64+
A common Doris v4 output folder structure is as follows:
65+
66+
```
67+
stack/
68+
├── YYYYMMDD1/
69+
│ ├── slc_1.res
70+
│ ├── slc_1.raw
71+
│ ├── ...
72+
├── YYYYMMDD2/
73+
│ ├── slc_2.res
74+
│ ├── slc_2.raw
75+
│ ├── ...
76+
...
77+
```
78+
79+
Where `YYYYMMDD1`, `YYYYMMDD2`, etc. are the acquisition dates of the SLCs, and `slc_1.res`, `slc_2.res`, etc. are the metadata files for each SLC.
80+
81+
To read the metadata from the Doris v4 stack, first build a list of the SLC metadata files:
82+
83+
```python
84+
from pathlib import Path
85+
stack_folder = Path('stack/')
86+
res_file_list = list(tsx_folder.glob('???????/slave.res'))
87+
```
88+
89+
where the pattern `???????` matches the date folders. Then, you can use the `read_metadata` function with the `driver` argument set to `"doris4"`:
90+
91+
```python
92+
import sarxarray
93+
metadata = sarxarray.read_metadata(res_file_list, driver="doris4")
94+
```
95+
96+
### Doris v5 metadata
97+
A common Doris v5 output folder structure is as follows:
98+
99+
```text
100+
├── YYYYMMDD1/
101+
│ ├── slc_1.res
102+
│ ├── ifgs_1.res
103+
│ ├── slc_1.raw
104+
│ ├── ...
105+
├── YYYYMMDD2/
106+
│ ├── slc_2.res
107+
│ ├── ifgs_2.res
108+
│ ├── slc_2.raw
109+
│ ├── ...
110+
...
111+
```
112+
113+
Where `YYYYMMDD1`, `YYYYMMDD2`, etc. are the acquisition dates of the SLCs, and `slc_1.res`, `slc_2.res`, etc. are the metadata files for each SLC. The files `ifgs_1.res`, `ifgs_2.res`, etc. are the metadata files for each interferogram, which contain the information of the sizes of the coregistered interferograms.
114+
115+
To read the metadata from the Doris v5 stack, first build a list of the SLC metadata files:
116+
117+
```python
118+
from pathlib import Path
119+
stack_folder = Path('stack/')
120+
res_file_list = list(stack_folder.glob('???????/slc_*.res'))
121+
```
122+
123+
Then, you can use the `read_metadata` function with the `driver` argument set to `"doris5"`:
124+
125+
```python
126+
import sarxarray
127+
metadata = sarxarray.read_metadata(res_file_list, driver="doris5")
128+
```
129+
130+
`read_metadata` assumes that `ifgs_*.res` files are in the same folder as the `slc_*.res` files, and will read the interferogram sizes from them.

examples/demo_sarxarray.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@
7777
"cell_type": "markdown",
7878
"metadata": {},
7979
"source": [
80-
"We will load a interferogram stack, which has been corregistered and saved as binary files. In this example we will demo 3 interferograms with a `(azimuth, range)` coverage of `(9914, 41174)`. We assume the shape and data type is known."
80+
"We will load a interferogram stack, which has been coregistered and saved as binary files. In this example we will demo 3 interferograms with a `(azimuth, range)` coverage of `(9914, 41174)`. We assume the shape and data type is known."
8181
]
8282
},
8383
{

pyproject.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ branch = true
7575
source = ["sarxarray"]
7676

7777
[tool.ruff]
78-
select = [
78+
lint.select = [
7979
"E", # pycodestyle
8080
"F", # pyflakes
8181
"B", # flake8-bugbear
@@ -85,18 +85,18 @@ select = [
8585
"UP", # pyupgrade (upgrade syntax to current syntax)
8686
"PLE", # Pylint error https://github.com/charliermarsh/ruff#error-ple
8787
]
88-
ignore = [
88+
lint.ignore = [
8989
"D100", "D101", "D104", "D105", "D106", "D107", "D203", "D213", "D413"
9090
] # docstring style
9191

9292
line-length = 88
9393
exclude = ["docs", "build", "examples"]
9494
# Allow unused variables when underscore-prefixed.
95-
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
95+
lint.dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"
9696
target-version = "py310"
9797

98-
[tool.ruff.per-file-ignores]
98+
[tool.ruff.lint.per-file-ignores]
9999
"tests/**" = ["D"]
100100

101-
[tool.ruff.pydocstyle]
101+
[tool.ruff.lint.pydocstyle]
102102
convention = "numpy"

sarxarray/__init__.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
from sarxarray import stack
2-
from sarxarray._io import from_binary, from_dataset
2+
from sarxarray._io import from_binary, from_dataset, read_metadata
33
from sarxarray.utils import complex_coherence, multi_look
44

5-
__all__ = ("stack", "from_binary", "from_dataset", "multi_look", "complex_coherence")
5+
__all__ = (
6+
"stack",
7+
"from_binary",
8+
"from_dataset",
9+
"read_metadata",
10+
"multi_look",
11+
"complex_coherence",
12+
)

sarxarray/_io.py

Lines changed: 214 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,31 @@
11
import logging
22
import math
3+
import re
4+
from collections import defaultdict
5+
from datetime import datetime
6+
from pathlib import Path
7+
from typing import Literal
38

49
import dask
510
import dask.array as da
611
import numpy as np
712
import xarray as xr
813

9-
from .conf import _dtypes, _memsize_chunk_mb
14+
from .conf import (
15+
META_FLOAT_KEYS,
16+
META_INT_KEYS,
17+
RE_PATTERNS_DORIS4,
18+
RE_PATTERNS_DORIS5,
19+
RE_PATTERNS_DORIS5_IFG,
20+
TIME_FORMAT_DORIS4,
21+
TIME_FORMAT_DORIS5,
22+
TIME_STAMP_KEY,
23+
_dtypes,
24+
_memsize_chunk_mb,
25+
)
1026

1127
logger = logging.getLogger(__name__)
1228

13-
# Example: https://docs.dask.org/en/stable/array-creation.html#memory-mapping
14-
1529

1630
def from_dataset(ds: xr.Dataset) -> xr.Dataset:
1731
"""Create a SLC stack or from an Xarray Dataset.
@@ -266,3 +280,200 @@ def _calc_chunksize(shape: tuple, dtype: np.dtype, ratio: int):
266280
chunks = (chunks_az, chunks_ra)
267281

268282
return chunks
283+
284+
285+
def read_metadata(
286+
files: str | list | Path,
287+
driver: Literal["doris4", "doris5"] = "doris5",
288+
ifg_file_name: str = "ifgs.res",
289+
) -> dict:
290+
"""Read metadata of a coregistered interferogram stack.
291+
292+
This function reads metadata from one or more metadata files from a coregistered
293+
interferogram stack, and returns the metadata as a dictionary format.
294+
295+
This function supports two drivers: "doris4" for DORIS4 metadata files, e.g.
296+
coregistration results from TerraSAR-X; "doris5" for DORIS5 metadata files,
297+
e.g. coregistration results from Sentinel-1. More support for other drivers
298+
will be added in the future.
299+
300+
For drivers "doris4" and "doris5", it parses the metadata with predefined regular
301+
expressions, returning a dictionary with predefined keys. Check conf.py for
302+
available keys and regular expressions.
303+
304+
Specifically for the "doris5" driver, it is assumed that there is a "ifgs.res" file
305+
next to the input metadata file, which contains the interferogram size information.
306+
If the "ifgs.res" file is not found, the interferogram size information will
307+
not be included in the metadata.
308+
309+
If a single file is provided, it reads the metadata from that file.
310+
311+
If multiple files are provided, the function will read the metadata from each file,
312+
and combine the results based on the following rules:
313+
- If a metadata key has values in string format or integer format, it combines the
314+
values into a set.
315+
- If a metadata key has values in float format, and the standard deviation is less
316+
than 1% of the mean, it takes the average of the values.
317+
- For the two Doris drivers "doris4" or "doris5", if the metadata key is
318+
TIME_STAMP_KEY, it treats it as the timestamp of acquisition and
319+
converts it to a numpy array of datetime64 format, sorted in ascending order.
320+
321+
322+
Parameters
323+
----------
324+
files : str | list | Path
325+
Path(s) to the metadata files.
326+
driver : str, optional
327+
The driver to use for reading metadata. Supported drivers are "doris4" and
328+
"doris5". Default is "doris5".
329+
ifg_file_name : str, optional
330+
The name of the interferogram size file for the "doris5" driver.
331+
We assume this file is next to each metadata file and use it to read the
332+
interferogram size information. if it is not found, the size
333+
information will not be included in the metadata. Default is "ifgs.res".
334+
335+
Returns
336+
-------
337+
dict
338+
Dictionary containing the metadata read from the files.
339+
340+
Raises
341+
------
342+
NotImplementedError
343+
If the driver is not "doris4" or "doris5".
344+
"""
345+
# Check driver
346+
if driver not in ["doris4", "doris5"]:
347+
raise NotImplementedError(
348+
f"Driver '{driver}' is not implemented. "
349+
"Supported drivers are: 'doris4', 'doris5'."
350+
)
351+
352+
# If there is only one file, convert it to a list
353+
if not isinstance(files, list):
354+
files = [files]
355+
356+
# Force all files to be Path objects in case files is a list of strings
357+
files = [Path(file) for file in files]
358+
359+
# Parse metadata from each file
360+
# if a key does not exists, a list will be created
361+
metadata = defaultdict(list)
362+
for file in files:
363+
res = _parse_metadata(file, driver)
364+
for key, value in res.items():
365+
metadata[key].append(value)
366+
367+
# Regulate metadata for all files
368+
metadata = _regulate_metadata(metadata, driver)
369+
370+
return metadata
371+
372+
373+
def _parse_metadata(file, driver, ifg_file_name="ifgs.res"):
374+
"""Parse a single metadata file to a dictionary of strings."""
375+
# Select the appropriate patterns based on the driver
376+
if driver == "doris5":
377+
patterns = RE_PATTERNS_DORIS5
378+
patterns_ifg = RE_PATTERNS_DORIS5_IFG
379+
elif driver == "doris4":
380+
patterns = RE_PATTERNS_DORIS4
381+
patterns_ifg = None
382+
383+
# Open the file
384+
with open(file) as f:
385+
content = f.read()
386+
387+
# Read common metadata patterns
388+
results = {}
389+
for key, pattern in patterns.items():
390+
match = re.search(pattern, content)
391+
if match:
392+
results[key] = match.group(1)
393+
else:
394+
results[key] = None
395+
396+
# Doris5 has size information in ifgs.res file
397+
# Try to get the ifg size from ifgs.res next to slave.res, if it exists
398+
if patterns_ifg is not None:
399+
file_ifg = file.with_name(ifg_file_name)
400+
if file_ifg.exists():
401+
with open(file_ifg) as f_ifg:
402+
content_ifg = f_ifg.read()
403+
for key, pattern in RE_PATTERNS_DORIS5_IFG.items():
404+
match = re.search(pattern, content_ifg)
405+
if match:
406+
results[key] = match.group(1)
407+
else:
408+
results[key] = None
409+
410+
return results
411+
412+
413+
def _regulate_metadata(metadata, driver):
414+
"""Regulate metadata strings.
415+
416+
This function processes the metadata read from the DORIS files, which are strings,
417+
and converts according to the types specified in META_FLOAT_KEYS and META_INT_KEYS.
418+
419+
Check the documentation of `read_metadata` for the rules applied to the metadata.
420+
"""
421+
# Convert time metadata from string to datetime
422+
if driver == "doris5":
423+
time_format = TIME_FORMAT_DORIS5
424+
elif driver == "doris4":
425+
time_format = TIME_FORMAT_DORIS4
426+
list_time = []
427+
# If the time is a single string, convert it to a list
428+
if isinstance(metadata[TIME_STAMP_KEY], str):
429+
metadata[TIME_STAMP_KEY] = [metadata[TIME_STAMP_KEY]]
430+
for time in metadata[TIME_STAMP_KEY]:
431+
try:
432+
dt = datetime.strptime(time, time_format)
433+
list_time.append(np.datetime64(dt).astype("datetime64[s]"))
434+
except ValueError as e:
435+
raise ValueError(
436+
f"Invalid date format for key: '{TIME_STAMP_KEY}'. "
437+
f"Expected format is '{time_format}'."
438+
) from e
439+
metadata[TIME_STAMP_KEY] = np.sort(np.array(list_time))
440+
441+
for key, value in list(metadata.items()):
442+
# raise error if different types are found in value
443+
if len(set(type(v) for v in value)) > 1:
444+
raise TypeError(
445+
f"Inconsistency found in metadata key: {key}. "
446+
"Different types are found in the value list."
447+
)
448+
449+
# Only keep the unique values
450+
if isinstance(metadata[key], list):
451+
metadata[key] = set(value)
452+
453+
# Unfold the single value set to strings
454+
if len(metadata[key]) == 1:
455+
metadata[key] = next(iter(metadata[key]))
456+
457+
# if float, take the average unless std is larger than 1% of the mean
458+
if key in META_FLOAT_KEYS:
459+
# Convert to float
460+
arr = np.array(value, dtype=np.float64)
461+
if np.std(arr) / np.mean(arr) < 0.01:
462+
metadata[key] = np.mean(arr).item() # Convert to scalar
463+
else:
464+
raise ValueError(
465+
f"Inconsistency found in metadata key: {key}. "
466+
"Standard deviation is larger than 1% of the mean."
467+
)
468+
if key in META_INT_KEYS:
469+
if isinstance(metadata[key], str):
470+
metadata[key] = int(metadata[key])
471+
elif len(metadata[key]) > 1: # set with multiple values
472+
metadata[key] = set([int(v) for v in metadata[key]])
473+
474+
if key in ["number_of_lines", "number_of_pixels"]:
475+
if isinstance(metadata[key], set):
476+
warning_msg = f"Multiple values found in {key}: {metadata[key]}."
477+
logger.warning(warning_msg)
478+
479+
return metadata

0 commit comments

Comments
 (0)