Skip to content

Commit 8824bca

Browse files
authored
Merge pull request #1057 from koordinates/feature/point-cloud-import-override-crs
Add --override-crs option to point-cloud-import command
2 parents 0712723 + 42c4510 commit 8824bca

File tree

6 files changed

+245
-10
lines changed

6 files changed

+245
-10
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ docs/_build/
7878
*.code-workspace
7979
.vscode/
8080
.idea/
81+
.claude/
8182

8283
CMakeUserPresets.json
8384
build*/

CLAUDE.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Development Commands
6+
7+
### Building Kart
8+
9+
Kart uses CMake for building. **Python 3.11 is required** for compatibility with CI builds.
10+
11+
**Recommended approach (using VCPKG):**
12+
```bash
13+
# Configure with VCPKG dependency management
14+
cmake -B build -S . -DPython3_EXECUTABLE=/path/to/python3.11 -DUSE_VCPKG=ON
15+
16+
# Build development version
17+
cmake --build build
18+
19+
# Test the build
20+
./build/kart --version
21+
```
22+
23+
**Alternative (using CI vendor archives for faster builds):**
24+
```bash
25+
# Download vendor-{os}-{arch}-py3.11.zip from recent CI builds
26+
cmake -B build -S . -DPython3_EXECUTABLE=/path/to/python3.11 -DVENDOR_ARCHIVE=/path/to/vendor-archive.zip -DUSE_VCPKG=OFF
27+
cmake --build build
28+
```
29+
30+
**Production build:**
31+
```bash
32+
# Build bundled binary and install
33+
cmake --build build --target bundle
34+
cmake --install build
35+
```
36+
37+
### Testing
38+
39+
```bash
40+
# Run full test suite
41+
./build/venv/bin/pytest -v
42+
43+
# Run tests excluding slow ones
44+
./build/venv/bin/pytest -m "not slow"
45+
46+
# Run specific test categories
47+
./build/venv/bin/pytest -m "mssql" # SQL Server tests
48+
./build/venv/bin/pytest -m "mysql" # MySQL tests
49+
./build/venv/bin/pytest -m "e2e" # End-to-end tests
50+
```
51+
52+
### Code Quality
53+
54+
```bash
55+
# Linting (via Ruff)
56+
ruff check kart/
57+
58+
# Type checking
59+
mypy kart/
60+
61+
# Format code
62+
ruff format kart/
63+
```
64+
65+
**Note**: Pre-commit hooks automatically run Ruff, MyPy, and CMake formatting tools.
66+
67+
## Architecture Overview
68+
69+
Kart is a **distributed version control system for geospatial data** that extends Git's object model to handle tabular, point cloud, and raster datasets.
70+
71+
### Core Components
72+
73+
**Repository Management** (`kart/repo.py`):
74+
- `KartRepo`: Extends pygit2.Repository with geospatial-aware operations
75+
- Supports both "bare-style" (legacy) and "tidy-style" repositories
76+
- Built-in Git LFS integration and spatial filtering support
77+
78+
**Dataset Abstraction** (`kart/base_dataset.py`):
79+
- `BaseDataset`: Abstract base for all dataset types (table, point-cloud, raster)
80+
- Immutable view pattern - datasets are views of git trees
81+
- Common diff/merge interface across all data types
82+
83+
**Multi-Format Support**:
84+
- **Tabular** (`kart/tabular/`): Database-backed (GPKG, PostGIS, MySQL, SQL Server)
85+
- **Point Cloud** (`kart/point_cloud/`): Tile-based LAZ/LAS with PDAL integration
86+
- **Raster** (`kart/raster/`): Tile-based GeoTIFF with GDAL integration
87+
88+
**Working Copy System** (`kart/working_copy.py`):
89+
- Multi-part architecture supporting both database and file-based backends
90+
- Conflict detection and resolution for concurrent edits
91+
- State tracking against repository HEAD
92+
93+
**CLI Framework** (`kart/cli.py`):
94+
- Click-based modular command system
95+
- Direct Git command pass-through for compatible operations
96+
- Helper process architecture for environment isolation
97+
98+
### Key Design Patterns
99+
100+
- **Plugin Architecture**: Different strategies for tabular vs. tile-based data
101+
- **Repository Pattern**: Clean separation between git operations and geospatial handling
102+
- **Factory Pattern**: Dataset instantiation based on directory structure
103+
- **Streaming Processing**: Incremental handling of large datasets
104+
105+
### Data Storage Strategy
106+
107+
- All geospatial data stored as git objects (blobs/trees/commits)
108+
- Features/tiles organized in balanced hierarchical trees
109+
- Metadata stored separately from data for efficient operations
110+
- Automatic Git LFS integration for large files
111+
- Schema versioning with V2/V3 formats for tables
112+
113+
### Important File Locations
114+
115+
- **Main entry**: `kart/cli.py`
116+
- **Core repository logic**: `kart/repo.py`, `kart/core.py`
117+
- **Dataset base classes**: `kart/base_dataset.py`
118+
- **Format-specific implementations**: `kart/tabular/`, `kart/point_cloud/`, `kart/raster/`
119+
- **Working copy backends**: `kart/tabular/working_copy/`
120+
- **Build configuration**: `CMakeLists.txt`, `pyproject.toml`
121+
- **Test configuration**: `pytest.ini`
122+
123+
When working with this codebase, understand that Kart bridges the gap between Git's file-based version control and the specialized needs of geospatial data management, requiring careful handling of both git operations and geospatial data formats.

kart/point_cloud/import_.py

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
MutexOption,
88
KartCommand,
99
)
10+
from kart.crs_util import CoordinateReferenceString
1011
from kart.completion_shared import file_path_completer
1112
from kart.exceptions import InvalidOperation, INVALID_FILE_FORMAT
1213
from kart.lfs_util import prefix_sha256
@@ -112,6 +113,14 @@
112113
"the authoritative source for the given data and data is fetched from there if needed."
113114
),
114115
)
116+
@click.option(
117+
"--override-crs",
118+
type=CoordinateReferenceString(keep_as_string=True),
119+
help=(
120+
"Override the CRS of all source tiles and set the dataset CRS. "
121+
"Can be specified as EPSG code (e.g., EPSG:4326) or as a WKT file (e.g., @myfile.wkt)."
122+
),
123+
)
115124
@click.argument(
116125
"args",
117126
nargs=-1,
@@ -131,6 +140,7 @@ def point_cloud_import(
131140
num_workers,
132141
dataset_path,
133142
do_link,
143+
override_crs,
134144
args,
135145
):
136146
"""
@@ -162,6 +172,7 @@ def point_cloud_import(
162172
num_workers=num_workers,
163173
do_link=do_link,
164174
sources=sources,
175+
override_crs=override_crs,
165176
).import_tiles()
166177

167178

@@ -171,6 +182,12 @@ class PointCloudImporter(TileImporter):
171182
CLOUD_OPTIMIZED_VARIANT = "Cloud-Optimized Point Cloud"
172183
CLOUD_OPTIMIZED_VARIANT_ACRONYM = "COPC"
173184

185+
def extract_tile_metadata(self, tile_path, **kwargs):
186+
"""Override to pass override_crs parameter to point cloud metadata extraction."""
187+
if self.override_crs:
188+
kwargs["override_crs"] = self.override_crs
189+
return self.DATASET_CLASS.extract_tile_metadata(tile_path, **kwargs)
190+
174191
def get_default_message(self):
175192
return f"Importing {len(self.sources)} LAZ tiles as {self.dataset_path}"
176193

@@ -199,7 +216,9 @@ def get_merged_source_metadata(self, all_metadata):
199216
else:
200217
rewrite_metadata = RewriteMetadata.DROP_OPTIMIZATION
201218

202-
return rewrite_and_merge_metadata(all_metadata, rewrite_metadata)
219+
return rewrite_and_merge_metadata(
220+
all_metadata, rewrite_metadata, override_crs=self.override_crs
221+
)
203222

204223
def get_predicted_merged_metadata(self, all_metadata):
205224
if self.convert_to_cloud_optimized:
@@ -215,7 +234,9 @@ def get_predicted_merged_metadata(self, all_metadata):
215234
# For --preserve-format we allow both COPC and non-COPC tiles, so we don't need to check or store this information.
216235
rewrite_metadata = RewriteMetadata.DROP_OPTIMIZATION
217236

218-
return rewrite_and_merge_metadata(all_metadata, rewrite_metadata)
237+
return rewrite_and_merge_metadata(
238+
all_metadata, rewrite_metadata, override_crs=self.override_crs
239+
)
219240

220241
def get_actual_merged_metadata(self, all_metadata):
221242
rewrite_metadata = (
@@ -224,7 +245,9 @@ def get_actual_merged_metadata(self, all_metadata):
224245
else RewriteMetadata.DROP_OPTIMIZATION
225246
)
226247

227-
return rewrite_and_merge_metadata(all_metadata, rewrite_metadata)
248+
return rewrite_and_merge_metadata(
249+
all_metadata, rewrite_metadata, override_crs=self.override_crs
250+
)
228251

229252
def get_conversion_func(self, tile_source):
230253
if self.convert_to_cloud_optimized and not is_copc(tile_source.metadata):

kart/point_cloud/metadata_util.py

Lines changed: 28 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
from osgeo import osr
99

10-
from kart.crs_util import normalise_wkt, wkt_equal
10+
from kart.crs_util import normalise_wkt, wkt_equal, make_crs
1111
from kart.exceptions import (
1212
InvalidOperation,
1313
INVALID_FILE_FORMAT,
@@ -49,7 +49,7 @@ class RewriteMetadata(IntFlag):
4949

5050

5151
def rewrite_and_merge_metadata(
52-
tile_metadata_list, rewrite_metadata=RewriteMetadata.NO_REWRITE
52+
tile_metadata_list, rewrite_metadata=RewriteMetadata.NO_REWRITE, override_crs=None
5353
):
5454
"""
5555
Given a list of tile metadata, merges the parts we expect to be homogenous into a single piece of tile metadata in
@@ -59,18 +59,32 @@ def rewrite_and_merge_metadata(
5959
drop those parts of the metadata in accordance with the rewrite_metadata option. This means a) the merge will happen
6060
cleanly in spite of possible differences and b) we won't store any metadata that can't describe every tile in the
6161
dataset (ie, we won't store anything about whether tiles are COPC if we're going to allow a mix of both COPC and not).
62+
63+
If override_crs is provided, it will be used to override the CRS of all tiles, setting the dataset CRS.
6264
"""
6365
result = {}
66+
# Normalize override CRS if provided
67+
normalized_override_crs = None
68+
if override_crs:
69+
crs_obj = make_crs(override_crs)
70+
normalized_override_crs = normalise_wkt(crs_obj.ExportToWkt())
71+
6472
for tile_metadata in tile_metadata_list:
6573
_merge_metadata_field(
6674
result, "format.json", rewrite_format(tile_metadata, rewrite_metadata)
6775
)
6876
_merge_metadata_field(
6977
result, "schema.json", rewrite_schema(tile_metadata, rewrite_metadata)
7078
)
71-
_merge_metadata_field(
72-
result, "crs.wkt", tile_metadata["crs.wkt"], eq_func=wkt_equal
73-
)
79+
80+
# Handle CRS with potential override
81+
if override_crs:
82+
# Override CRS for all tiles if specified
83+
tile_crs = normalized_override_crs
84+
else:
85+
tile_crs = tile_metadata["crs.wkt"]
86+
87+
_merge_metadata_field(result, "crs.wkt", tile_crs, eq_func=wkt_equal)
7488
# Don't copy anything from "tile" to the result - these fields are tile specific and needn't be merged.
7589
return result
7690

@@ -157,7 +171,7 @@ def get_native_extent(info):
157171
)
158172

159173

160-
def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None):
174+
def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None, override_crs=None):
161175
"""
162176
Use pdal to get any and all point-cloud metadata we can make use of in Kart.
163177
This includes metadata that must be dataset-homogenous and would be stored in the dataset's /meta/ folder,
@@ -178,6 +192,7 @@ def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None):
178192
179193
pc_tile_path - a pathlib.Path or a string containing the path to a file or an S3 url.
180194
oid_and_size - a tuple (sha256_oid, filesize) if already known, to avoid repeated work.
195+
override_crs - if provided, override the CRS of the tile with this CRS.
181196
"""
182197
pc_tile_path = str(pc_tile_path)
183198

@@ -241,10 +256,16 @@ def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None):
241256
"size": size,
242257
}
243258

259+
# Use override CRS if provided, otherwise use the CRS from the file
260+
final_crs = compound_crs or horizontal_crs
261+
if override_crs:
262+
crs_obj = make_crs(override_crs)
263+
final_crs = crs_obj.ExportToWkt()
264+
244265
result = {
245266
"format.json": format_json,
246267
"schema.json": schema_json,
247-
"crs.wkt": normalise_wkt(compound_crs or horizontal_crs),
268+
"crs.wkt": normalise_wkt(final_crs),
248269
"tile": _remove_nones(tile_info),
249270
}
250271

kart/tile/importer.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ def __init__(
7676
num_workers,
7777
do_link,
7878
sources,
79+
override_crs=None,
7980
):
8081
"""
8182
repo - the Kart repo from the context.
@@ -95,6 +96,7 @@ def __init__(
9596
allow_empty - if True, the import commit will be created even if the dataset is not changed.
9697
num_workers - specify the number of workers to use, or set to None to use the number of detected cores.
9798
sources - paths to tiles to import.
99+
override_crs - if specified, override the CRS of all source tiles and set the dataset CRS.
98100
"""
99101
self.repo = repo
100102
self.ctx = ctx
@@ -111,6 +113,7 @@ def __init__(
111113
self.num_workers = num_workers
112114
self.do_link = do_link
113115
self.sources = sources
116+
self.override_crs = override_crs
114117

115118
need_to_store_tiles = not self.do_link
116119
need_tiles_for_wc = self.do_checkout and not self.repo.is_bare

tests/point_cloud/test_imports.py

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1177,3 +1177,67 @@ def test_point_cloud_import_from_s3__convert(
11771177
check_tile_is_reflinked(
11781178
repo_path / "auckland" / f"auckland_{x}_{y}.copc.laz", repo
11791179
)
1180+
1181+
1182+
def test_import_with_crs_override(
1183+
tmp_path,
1184+
chdir,
1185+
cli_runner,
1186+
data_archive_readonly,
1187+
check_lfs_hashes,
1188+
requires_pdal,
1189+
requires_git_lfs,
1190+
):
1191+
"""Test that --override-crs option overrides the CRS of all imported tiles."""
1192+
with data_archive_readonly("point-cloud/laz-auckland.tgz") as auckland:
1193+
repo_path = tmp_path / "point-cloud-repo"
1194+
r = cli_runner.invoke(["init", repo_path])
1195+
assert r.exit_code == 0, r.stderr
1196+
1197+
repo = KartRepo(repo_path)
1198+
with chdir(repo_path):
1199+
# First, import a single tile to establish the dataset CRS
1200+
# This tile is actually in EPSG:2193 but we override it to EPSG:2994
1201+
r = cli_runner.invoke(
1202+
[
1203+
"point-cloud-import",
1204+
f"{auckland}/auckland_0_0.laz",
1205+
"--dataset-path=auckland",
1206+
"--override-crs=EPSG:2994",
1207+
"--convert-to-copc",
1208+
]
1209+
)
1210+
assert r.exit_code == 0, r.stderr
1211+
1212+
# Get the original CRS from the dataset
1213+
r = cli_runner.invoke(["meta", "get", "auckland", "crs.wkt"])
1214+
assert r.exit_code == 0, r.stderr
1215+
original_crs = r.stdout.strip()
1216+
assert "2994" in original_crs
1217+
1218+
# Import another tile with --override-crs=EPSG:2994 to override its CRS
1219+
r = cli_runner.invoke(
1220+
[
1221+
"point-cloud-import",
1222+
f"{auckland}/auckland_0_1.laz",
1223+
"--dataset-path=auckland",
1224+
"--update-existing",
1225+
"--override-crs=EPSG:2994",
1226+
"--convert-to-copc",
1227+
]
1228+
)
1229+
assert r.exit_code == 0, r.stderr
1230+
1231+
# Verify that both tiles were imported successfully
1232+
r = cli_runner.invoke(["data", "ls"])
1233+
assert r.exit_code == 0, r.stderr
1234+
assert "auckland" in r.stdout
1235+
1236+
# Check that the CRS is still consistent in the dataset
1237+
r = cli_runner.invoke(["meta", "get", "auckland", "crs.wkt"])
1238+
assert r.exit_code == 0, r.stderr
1239+
final_crs = r.stdout.strip()
1240+
1241+
assert "2994" in final_crs
1242+
1243+
check_lfs_hashes(repo, 2)

0 commit comments

Comments
 (0)