-
Notifications
You must be signed in to change notification settings - Fork 401
Description
I tried connecting to a HDFS storage, through the default configuation
(core-site.xml). Connecting, plus writing and reading a dataframe worked
find (not shown). However, when attempting to run the code:
"""
import dask.array as da
N = 10_000
rng = da.random.default_rng()
x = rng.random((N, N), chunks=(2000, 2000))
x.to_zarr("hdfs:///user/eriksen/test2.zarr")
"""
It ends up failing with the following issue, which seems to be fsspec related:
File /data/aai/scratch_ssd/eriksen/miniforge3/envs/dask/lib/python3.13/functools.py:1026, in cached_property.get(self, instance, owner)
1024 val = cache.get(self.attrname, _NOT_FOUND)
1025 if val is _NOT_FOUND:
-> 1026 val = self.func(instance)
1027 try:
1028 cache[self.attrname] = val
File /data/aai/scratch_ssd/eriksen/miniforge3/envs/dask/lib/python3.13/site-packages/fsspec/implementations/arrow.py:63, in ArrowFSWrapper.fsid(self)
61 @cached_property
62 def fsid(self):
---> 63 return "hdfs_" + tokenize(self.fs.host, self.fs.port)
AttributeError: 'pyarrow._hdfs.HadoopFileSystem' object has no attribute 'host'
I installed the environment today with Python 3.13 and the following packages:
cloudpickle==3.1.1
dask==2025.5.1
distributed==2025.5.1
fsspec==2025.5.1
pyarrow==20.0.0
toolz==1.0.0
zarr==3.0.8
zict==3.0.0
Please let me know if you need any other information or I should be reporting this
issue elsewhere.