Skip to content

Define missing and undef constructors according to their semantics #40

@knuesel

Description

@knuesel

SentinelArrays.jl uses undef constructors to initialize the array to "missing" values. I propose to do the following instead:

  1. Use missing constructors to initialize arrays to "missing" (sentinel) values.
  2. Use undef constructors to skip initialization (or more precisely, do whatever the undef constructor does for the underlying array)

Background: undef constructors of Base arrays have been (ab)used to initialize arrays with missing values. This relies on undocumented behavior: the undef constructor does zero-initialization of the memory for union arrays, to avoid invalid element types. And due to implementation details, arrays of Union{Missing,...} usually end up with all elements initialized to missing.

Relying on undocumented behavior is not great, so an attempt was made to document it: JuliaLang/julia#31091. It turned out that this use of undef works almost always, but can fail with unions of Missing and other singleton types.

The issue was solved by addingmissing constructors for Base arrays: JuliaLang/julia#25054 .

Arguments:

  • Using undef to initialize to a particular value makes no sense from a semantics point of view. It doesn't feel like a good API when "define to xxx" is made by calling "leave undefined"!

    Please note the problem is not in the behavior of Base constructors: a "leave undefined" constructor can return anything, so why not all-missing values. But users shouldn't rely on it (at least not when there is a better way).

  • Using undef for this purpose in SentinelArrays will encourage people to do the same with Base arrays, where it can introduce subtle bugs (since it works almost always, but not always).

  • It leaves performance on the table. In Base, zero-initialization is necessary to guarantee valid union types. This constraint doesn't apply to SentinelArrays! And the whole point of SentinelArrays is to give better performance for particular use cases, so I think SentinelArrays should define undef constructor that do what it says on the tin: leave values uninitialized.

Relation to Base:

A downside of this proposal is that SentinelVector{Float64}(undef, 3) would not longer behave the same as Array{Union{Missing,Float64}}(undef, 3). But the latter is undefined behavior. Is agreeing on this undefined behavior more important than having an API that makes sense and offering the best performance?

Another way to look at it:

  1. SentinelArrays should promote patterns that also work with Base arrays, so it should not promote using undef to get missing.
  2. Base does the semantically correct thing: undef for uninitialized (where possible), and missing for initialized-to-missing. In particular, Base does not document the values returned by undef constructors. I think SentinelArrays should do the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions