-
Notifications
You must be signed in to change notification settings - Fork 18
Description
SentinelArrays.jl uses undef
constructors to initialize the array to "missing" values. I propose to do the following instead:
- Use
missing
constructors to initialize arrays to "missing" (sentinel) values. - Use
undef
constructors to skip initialization (or more precisely, do whatever theundef
constructor does for the underlying array)
Background: undef
constructors of Base arrays have been (ab)used to initialize arrays with missing values. This relies on undocumented behavior: the undef
constructor does zero-initialization of the memory for union arrays, to avoid invalid element types. And due to implementation details, arrays of Union{Missing,...}
usually end up with all elements initialized to missing
.
Relying on undocumented behavior is not great, so an attempt was made to document it: JuliaLang/julia#31091. It turned out that this use of undef
works almost always, but can fail with unions of Missing
and other singleton types.
The issue was solved by addingmissing
constructors for Base arrays: JuliaLang/julia#25054 .
Arguments:
-
Using
undef
to initialize to a particular value makes no sense from a semantics point of view. It doesn't feel like a good API when "define to xxx" is made by calling "leave undefined"!Please note the problem is not in the behavior of Base constructors: a "leave undefined" constructor can return anything, so why not all-missing values. But users shouldn't rely on it (at least not when there is a better way).
-
Using
undef
for this purpose in SentinelArrays will encourage people to do the same with Base arrays, where it can introduce subtle bugs (since it works almost always, but not always). -
It leaves performance on the table. In Base, zero-initialization is necessary to guarantee valid union types. This constraint doesn't apply to SentinelArrays! And the whole point of SentinelArrays is to give better performance for particular use cases, so I think SentinelArrays should define
undef
constructor that do what it says on the tin: leave values uninitialized.
Relation to Base:
A downside of this proposal is that SentinelVector{Float64}(undef, 3)
would not longer behave the same as Array{Union{Missing,Float64}}(undef, 3)
. But the latter is undefined behavior. Is agreeing on this undefined behavior more important than having an API that makes sense and offering the best performance?
Another way to look at it:
- SentinelArrays should promote patterns that also work with Base arrays, so it should not promote using
undef
to getmissing
. - Base does the semantically correct thing:
undef
for uninitialized (where possible), andmissing
for initialized-to-missing. In particular, Base does not document the values returned byundef
constructors. I think SentinelArrays should do the same.