-
-
Notifications
You must be signed in to change notification settings - Fork 420
un-indenting head-safe check to apply for cached files too #3232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
8084f93
un-indenting head-safe to apply for cached files too
eas342 9571482
re-indent head_safe block, add a copy to the other section
keflavich 0c10fe9
fix @eas342's bug - which was that, if a partial file was detected and
keflavich 1bed399
A combination of many squashed commits for adding tests and several f…
keflavich 9dca39b
DOC: fixup changelog
bsipocz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is causing some issues for our internal MAST test suite. We actually do have certain cases where the response returned by the download request doesn't contain a header for "Content-Length". I'm working to understand these edge cases better, but it may be that there's no way to get the content length, in which case trying to download a file that's already cached fails. This may have been why the
continuation
parameter is set toFalse
in MAST download calls.Would the best course of action be to re-download completely if no content length is found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to have the foresight of not including this in the release just yet hoping that this will get tested while in dev.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snbianco can you tell us the answer to that question? What does it mean if there's no content-length? I'd argue that continuation in that scenario is probably not possible, but I don't understand the case here - why are we downloading a file without a specified length?
iirc, I couldn't test this because httpbin wouldn't serve anything without that header keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the best of my knowledge, there are certain files that are generated and delivered by a different server than the MAST portal. For example, the catalog files for the Hubble Legacy Archive are generated by https://hla.stsci.edu/. So, content length may be unknown to the MAST portal, and the portal doesn't currently return it in the download response.
If it does turn out that we can't get the length of these files from the Portal, redownloading seems like the only way to go. The downside to that is the time lost redownloading when a complete cached file could have been used. Would it be possible to raise a warning instead of a
ValueError
when the content length header is not found? Or add a parameter to control whether a warning or an error is raised?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's safe to resume a partial download with no known endpoint. Maybe, but I feel uncomfortable about it.
Why is the MAST portal involved if the data are served from another portal? Shouldn't we just be downloading from the other server? Maybe you don't know and it's all weird server configuration stuff, but this feels super weird and difficult to handle robustly without understanding what's going on.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been talking about this issue with a few folks at MAST, and I know a bit more about why these particular files don't have a content length. They are generated on-the-fly by a service that queries our database and puts the results into the requested format. They are also streamed and not staged at all. There is a content length coming from the service, but I assume its being dropped somewhere between there and the portal API. There are a few people looking into it, and I'm hopeful that we can add the header to the files that come from that particular service.
That being said, we might have this issue with other files that are generated dynamically and then streamed from the database. Since this only applies in the case that the file is already cached and continuation is not possible (MAST sets
continuation
toFalse
anyway), maybe the default behavior should be to log a warning and re-download the file entirely? Or log a warning and leave the cached file without re-downloading at all?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning toward 're-download and warn' as the most sensible default. If there are large files for which you really need continuation, we can consider adding a MAST-specific workaround