Skip to content

Getting a generator or iterator instead of lists of OpenFile objects or addresses. #1882

@MalteEbner

Description

@MalteEbner

Functions like fsspec.open_files or FileSystem.ls return list-like objects when run on directories or with glob patterns. This has two main drawbacks:

  • The functions only return once the entire directory has been listed. When listing cloud buckets with millions of entries, this can take many minutes. It leads to:
    • Higher failure risk due to long runtimes.
    • No way to add user feedback in the meantime, e.g. a progress bar.
    • Can't start processing the first files found while the rest are still being listed.
  • All OpenFile objects or addresses are kept in memory at once.

Is it possible to get a generator or iterator instead of a list? I'm particularly interested in support for local, s3fs, and gcsfs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions