-
-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Labels
Description
When we rewrite resource URLs in pages or diffs to load things from archive.org instead of the live web, we currently always ask for unaltered (id_
) versions of the resources:
web-monitoring-ui/src/scripts/html-transforms.js
Lines 121 to 157 in 790e2a6
/** | |
* Creates a transform that will rewrite subresource URLs to point to the | |
* Wayback Machine. This is useful when we have snapshots of the page itself, | |
* but not its subresources. It won't always work (Wayback won't always have | |
* a snapshot of the subresource from a similar point in time), but it'll work | |
* a lot better than just pointing to the original URL, which might be missing | |
* or significantly altered by the time a diff is viewed. | |
* | |
* Note this *creates* the transform and is not the transform itself (because | |
* the transform must be custom to a particular source URL and point in time). | |
* @param {WebMonitoringDb.Page} page | |
* @param {WebMonitoringDb.Version} version | |
*/ | |
export function loadSubresourcesFromWayback (page, version) { | |
return document => { | |
// In some rare instances, there is old, messy version data from Versionista | |
// that doesn't have a URL for the version, so fall back to page URL. :( | |
const url = versionUrl(version) || page.url; | |
const timestamp = createWaybackTimestamp(version.capture_time); | |
document.querySelectorAll('link[rel="stylesheet"]').forEach(node => { | |
for (const attribute of ['href', 'data-href']) { | |
const value = node.getAttribute(attribute); | |
if (value) { | |
node.setAttribute(attribute, createWaybackUrl(value, timestamp, url)); | |
} | |
} | |
}); | |
document.querySelectorAll('script[src],img[src]').forEach(node => { | |
node.src = createWaybackUrl(node.getAttribute('src'), timestamp, url); | |
}); | |
// TODO: handle <picture> with all its subelements | |
// TODO: SVG <use> directives | |
// TODO: video/audio (similar structure to <picture>) | |
return document; | |
}; | |
} |
web-monitoring-ui/src/scripts/html-transforms.js
Lines 184 to 191 in 790e2a6
function createWaybackUrl (originalUrl, timestamp, baseUrl) { | |
if (typeof timestamp !== 'string') { | |
timestamp = createWaybackTimestamp(timestamp); | |
} | |
const url = resolveUrl(originalUrl, baseUrl); | |
return `https://web.archive.org/web/${timestamp}id_/${url}`; | |
} |
Instead, we should ask for the appropriate mode based on how we’re using the resource: js_
for scripts, cs_
for stylesheets, and im_
for images. We should only fall back to id_
in cases where we don’t know what type to use.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Backlog