You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Significantly improve performance of PalettedBlockArray::validate() (#35)
This change introduces a fast path for validation using carry-out vectors, as discussed here: https://devblogs.microsoft.com/oldnewthing/20190301-00/?p=101076
TL;DR: It's possible to detect invalid palette offsets by subtracting the current "word" from a bitfield composed of the max valid palette offset, which is significantly faster than shifting and naively comparing each one.
In addition, we allow the code to chew through the whole loop even if invalid offsets are detected, in the hope that compilers will use SIMD instructions and/or parallelize the loop, which is also substantially faster than breaking out early on a branch condition to throw an exception.
The downside of the fast path is that it cannot report the exact offset an error occurred at, only that an error is present. For this reason, the old, slower code is retained, which allows generating detailed errors if an error is detected.
Note: This does cause a change of behaviour for palettes with padded words (3, 5, and 6 bits per block). The palette is now validated to be zero as a side effect of this change. If non-zero padding is found, the fast path will fail, and fallback to the slow path, which will ignore the error. We may want to explicitly ignore or force padding to be zero.
0 commit comments