Skip to content

Cannot access file with named returned by ls via fsspec interface #198

@mxmlnkn

Description

@mxmlnkn

Considering this setup:

pip install wsgidav cheroot
mkdir -p /tmp/served
echo foo > /tmp/served/'#not-a-good-name!'
ruby -run -e httpd /tmp/served/ --port 8000 --bind-address=127.0.0.1 &
wsgidav --host=127.0.0.1 --port=8047 --root="/tmp/served" --auth=anonymous &

This works in fsspec.implementations.http:

import pprint
from fsspec.implementations.http import HTTPFileSystem as HFS
url = "http://127.0.0.1:8000"
fs = HFS(url)
# What I would have expected to work:
# result = fs.ls("/")
result = fs.ls(url)
pprint.pprint(result)
pprint.pprint(fs.stat(result[1]['name']))

Output:

[{'name': 'http://127.0.0.1:8000/?N=D', 'size': None, 'type': 'file'},
 {'name': 'http://127.0.0.1:8000/?S=D', 'size': None, 'type': 'file'},
 {'name': 'http://127.0.0.1:8000/%23not-a-good-name!',
  'size': None,
  'type': 'file'},
 {'name': 'http://127.0.0.1:8000/?M=D', 'size': None, 'type': 'file'}]
{'ETag': '4013af-4-670a7539',
 'mimetype': 'application/octet-stream',
 'name': 'http://127.0.0.1:8000/%23not-a-good-name!',
 'size': 4,
 'type': 'file',
 'url': 'http://127.0.0.1:8000/%23not-a-good-name!'}

However, with the webdav4 fsspec implementation:

from webdav4.fsspec import WebdavFileSystem as WFS
fs = WFS("http://127.0.0.1:8047")
result = fs.ls("/")
print(result)
print(fs.stat("/" + urllib.parse.quote(path))
print(fs.stat("/" + result[0]['name']))

Output:

[{'content_language': None,
  'content_type': 'application/octet-stream',
  'created': datetime.datetime(2024, 10, 12, 13, 10, 17, tzinfo=tzutc()),
  'display_name': '#not-a-good-name!',
  'etag': '4199343-1728738617-4',
  'href': '/%23not-a-good-name!',
  'modified': datetime.datetime(2024, 10, 12, 13, 10, 17, tzinfo=datetime.timezone.utc),
  'name': '#not-a-good-name!',
  'size': 4,
  'type': 'file'}]

{'content_language': None,
 'content_type': 'application/octet-stream',
 'created': datetime.datetime(2024, 10, 12, 13, 10, 17, tzinfo=tzutc()),
 'display_name': '#not-a-good-name!',
 'etag': '4199343-1728738617-4',
 'href': '/%23not-a-good-name!',
 'modified': datetime.datetime(2024, 10, 12, 13, 10, 17, tzinfo=datetime.timezone.utc),
 'name': '#not-a-good-name!',
 'size': 4,
 'type': 'file'}

Traceback (most recent call last):
  File "/media/d/Myself/projects/ratarmount/worktrees/1/test-webdav.py", line 8, in <module>
    pprint.pprint(fs.stat("/" + result[0]['name']))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/fsspec/spec.py", line 1605, in stat
    return self.info(path, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/fsspec.py", line 126, in info
    return translate_info(self.client.info(path))
                          ^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/client.py", line 519, in info
    result = self.propfind(path, headers={"Depth": "1"})
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/client.py", line 309, in propfind
    http_resp = self.with_retry(call)
                ^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/func_utils.py", line 47, in wrapped_function
    return func()
           ^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/func_utils.py", line 70, in wrapped
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/client.py", line 354, in _request
    url = self.join_url(path, add_trailing_slash=add_trailing_slash)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/client.py", line 291, in join_url
    return join_url(self.base_url, path, add_trailing_slash=add_trailing_slash)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/webdav4/urls.py", line 25, in join_url
    return base_url.copy_with(path=path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/httpx/_urls.py", line 356, in copy_with
    return URL(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/httpx/_urls.py", line 119, in __init__
    self._uri_reference = url._uri_reference.copy_with(**kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/httpx/_urlparse.py", line 137, in copy_with
    return urlparse("", **defaults)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.local/lib/python3.12/site-packages/httpx/_urlparse.py", line 225, in urlparse
    raise InvalidURL(f"Invalid URL component '{key}'")
httpx.InvalidURL: Invalid URL component 'path'

Having to call urllib.parse.quote was unexpected for me, especially as it does not even require the full URL in contrast to HTTPFileSystem. It is also inconsistent that the name is a relative path isntead of an absolute one, but I am not even sure what it should be. I think this is insufficiently specified by fsspec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions