-
Notifications
You must be signed in to change notification settings - Fork 1.6k
repr(ordered_fields) #3845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
repr(ordered_fields) #3845
Changes from 12 commits
e0cdce8
0637bbf
7e4b0c8
61ce0f2
4fa572e
755644c
0d0a79b
4f4bf18
063af08
0c0e429
9e316ff
a1700b6
d75a497
6526f5a
ed96c1b
01cfb40
f11567c
b17f192
488068e
93f53a5
a2737d5
bb8a392
612b99e
283df46
8fe1576
489b31a
a5fb9bc
88f631e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,326 @@ | ||
- Feature Name: `repr_ordered_fields` | ||
- Start Date: 2025-08-05 | ||
- RFC PR: [rust-lang/rfcs#3845](https://github.com/rust-lang/rfcs/pull/3845) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Introduce a new `repr` (let's call it `repr(ordered_fields)`, but that can be bikeshedded if this RFC is accepted) that can be applied to `struct`, `enum`, and `union` types, which guarantees a simple and predictable layout. Then provide an initial migration plan to switch users from `repr(C)` to `repr(ordered_fields)`. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Currently `repr(C)` serves two roles | ||
1. Provide a consistent, cross-platform, predictable layout for a given type | ||
2. Match the target C compiler's struct/union layout algorithm and ABI | ||
Comment on lines
+19
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Big fan of doing this split, especially for Pondering the bikeshed: (This could be contrasted with other potential reprs that I wouldn't expect this RFC to add, but could consider as future work, like a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is also useful for unions, so we don't need to rely on
This also works as an argument against names like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do think that It makes me ponder whether we should just have (Which makes me think of other things like addressing rust-lang/unsafe-code-guidelines#494 by having a different constructs for "bag of maybeuninit stuff that overlap" vs "distinct options with an active-variant rule for enum-but-without-stored-discriminant". But those are definitely not this RFC.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't mind spelling it as |
||
|
||
But in some cases, these two cases are in tension due to platform weirdness (even on major platforms like Windows MSVC) | ||
* https://github.com/rust-lang/unsafe-code-guidelines/issues/521 | ||
* https://github.com/rust-lang/rust/issues/81996 | ||
|
||
Providing any fix for case 2 would subtly break any users of case 1, which makes this difficult to fix within a single edition. | ||
|
||
As an example of this tension: on Windows MSVC, `repr(C)` doesn't always match what MSVC does for ZST structs (see this [issue](https://github.com/rust-lang/rust/issues/81996) for more details) | ||
|
||
```rust | ||
// should have size 8, but has size 0 | ||
#[repr(C)] | ||
struct SomeFFI([i64; 0]); | ||
``` | ||
|
||
Of course, making `SomeFFI` size 8 doesn't work for anyone using `repr(C)` for case 1. They want it to be size 0 (as it currently is). | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
`repr(ordered_fields)` is a new representation that can be applied to `struct`, `enum`, and `union` to give them a consistent, cross-platform, and predictable in-memory layout. | ||
|
||
`repr(C)` in edition <= 2024 is an alias for `repr(ordered_fields)` and in all other editions, it matches the default C compiler for the given target for structs, unions, and field-less enums. Enums with fields will be laid out as if they are a union of structs with the corresponding fields. | ||
RustyYato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Using `repr(C)` in editions <= 2024 triggers a lint to use `repr(ordered_fields)` as a future compatibility lint with a machine-applicable fix. If you are using `repr(C)` for FFI, then you may silence this lint. If you are using `repr(C)` for anything else, please switch over to `repr(ordered_fields)` so updating to future editions doesn't change the meaning of your code. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is too noisy. Most code out there using repr(C) is probably fine - IIUC, if you're not targeting Windows or AIX, maybe definitely fine? - and having a bunch of allow(...) across a bunch of projects seems unfortunate. Maybe we can either (a) only enable the lint for migration, i.e., the next edition's cargo fix would add allows for you or (b) we find some new name... C2 for the existing repr(C) usage to avoid allows. But (b) also seems too noisy to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it could just be an optional edition compatibility lint, so if someone enables e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That was the intention, hence the name Rustfix would update to Overall, I think this scheme only significantly burdens those who have a mix of FFI and non-FFI uses of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the new wording/lints too noisy still? |
||
|
||
``` | ||
warning: use of `repr(C)` in type `Foo` | ||
--> src/main.rs:14:10 | ||
| | ||
14 | #[repr(C)] | ||
| ^^^^^^^ help: consider switching to `repr(ordered_fields)` | ||
| struct Foo { | ||
| | ||
= note: `#[warn(edition_2024_repr_c)]` on by default | ||
= note: `repr(C)` is planned to change meaning in the next edition to match the target platform's layout algorithm. This may change the layout of this type on certain platforms. To keep the current layout, switch to `repr(ordered_fields)` | ||
``` | ||
|
||
After enough time has passed, and the community has switched over: | ||
This makes it easier to tell *why* the `repr` was applied to a given struct. If `repr(C)`, it's about FFI and interop. If `repr(ordered_fields)`, then it's for a dependable layout. | ||
# Reference-level explanation | ||
RustyYato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
## `repr(C)` | ||
|
||
> The `C` representation is designed for one purpose: creating types that are interoperable with the C Language. | ||
> | ||
> This representation can be applied to structs, unions, and enums. The exception is [zero-variant enums](https://doc.rust-lang.org/stable/reference/items/enumerations.html#zero-variant-enums) for which the `C` representation is an error. | ||
> | ||
> - edited version of the [reference](https://doc.rust-lang.org/stable/reference/type-layout.html#the-c-representation) on `repr(C)` | ||
The exact algorithm is deferred to whatever the default target C compiler does with default settings (or if applicable, the most commonly used settings). | ||
## `repr(ordered_fields)` | ||
|
||
> The `ordered_fields` representation is designed for one purpose: create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type | ||
> | ||
> This representation can be applied to structs, unions, and enums. | ||
> | ||
> - edited version of the [reference](https://doc.rust-lang.org/stable/reference/type-layout.html#the-c-representation) on `repr(C)` | ||
### struct | ||
Structs are laid out in memory in declaration order, with padding bytes added as necessary to preserve alignment. | ||
And their alignment is the same as their most aligned field. | ||
RustyYato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```rust | ||
// assuming that u32 is aligned to 4 bytes | ||
// size 16, align 4 | ||
#[repr(ordered_fields)] | ||
struct FooStruct { | ||
RustyYato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a: u8, | ||
b: u32, | ||
c: u16, | ||
d: u32, | ||
} | ||
``` | ||
|
||
Would be laid out in memory like so | ||
|
||
``` | ||
a...bbbbcc..dddd | ||
``` | ||
### union | ||
Unions would be laid out with the same size as their largest field, and the same alignment as their most aligned field. | ||
|
||
```rust | ||
// assuming that u32 is aligned to 4 bytes | ||
// size 4, align 4 | ||
#[repr(ordered_fields)] | ||
union FooUnion { | ||
a: u8, | ||
b: u32, | ||
c: u16, | ||
d: u32, | ||
} | ||
``` | ||
|
||
`FooUnion` has the same layout as `u32`, since `u32` has both the biggest size and alignment. | ||
### enum | ||
The enum's tag is the smallest signed integer type which can hold all of the discriminant values (unless otherwise specified). The discriminants are assigned such that each variant without an explicit discriminant is exactly one more than the previous variant in declaration order. | ||
|
||
If an enum doesn't have any fields, then it is represented exactly by it's discriminant. | ||
```rust | ||
// tag = i16 | ||
// represented as i16 | ||
#[repr(ordered_fields)] | ||
enum FooEnum { | ||
VarA = 1, | ||
VarB, // discriminant = 2 | ||
VarC = 500, | ||
VarD, // discriminant = 501 | ||
} | ||
|
||
// tag = u16 | ||
// represented as u16 | ||
#[repr(ordered_fields, u16)] | ||
enum FooEnumUnsigned { | ||
VarA = 1, | ||
VarB, // discriminant = 2 | ||
VarC = 500, | ||
VarD, // discriminant = 501 | ||
} | ||
``` | ||
|
||
Enums with fields will be laid out as if they were a union of structs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This statement matches the algorithm currently used for "primitive representations" ( The difference between the two algorithms (besides the integer type used for the tag) is that in Example of the difference: #![feature(offset_of_enum)]
#[repr(align(8))]
struct A64(u64);
#[repr(C, i32)]
enum ExampleC {
Foo(u8),
Bar(A64),
}
#[repr(i32)]
enum ExamplePrim {
Foo(u8),
Bar(A64),
}
fn main() {
dbg!(
std::mem::offset_of!(ExampleC, Foo.0), // prints 8
std::mem::offset_of!(ExamplePrim, Foo.0), // prints 4
);
} I think that either There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point. I didn't realize there was that difference, thanks for bringing this up. I think I think making the migration as painless as possible will make it much easier to split up I'm open to having the repr be named differently for the different kinds of type. (For example, |
||
|
||
For example, this would be laid out the same as the union below | ||
```rust | ||
#[repr(ordered_fields)] | ||
enum BarEnum { | ||
VarFieldless, | ||
VarTuple(u8, u32), | ||
VarStruct { | ||
a: u16, | ||
b: u32, | ||
}, | ||
} | ||
``` | ||
|
||
```rust | ||
#[repr(ordered_fields)] | ||
union BarUnion { | ||
var1: VarFieldless, | ||
var2: VarTuple, | ||
var3: VarStruct, | ||
} | ||
|
||
#[repr(ordered_fields)] | ||
enum BarTag { | ||
VarFieldless, | ||
VarTuple, | ||
VarStruct, | ||
} | ||
|
||
#[repr(ordered_fields)] | ||
struct VarFieldless { | ||
tag: BarTag, | ||
} | ||
|
||
#[repr(ordered_fields)] | ||
struct VarTuple(BarTag, u8, u32); | ||
|
||
#[repr(ordered_fields)] | ||
struct VarStruct { | ||
tag: BarTag, | ||
a: u16, | ||
b: u32 | ||
} | ||
``` | ||
|
||
In Rust, the algorithm for calculating the layout is defined precisely as follows: | ||
|
||
```rust | ||
/// Takes in the layout of each field (in declaration order) | ||
/// and returns the offsets of each field, and the layout of the entire struct | ||
fn get_layout_for_struct(field_layouts: &[Layout]) -> Result<(Vec<usize>, Layout), LayoutError> { | ||
let mut layout = Layout::new::<()>(); | ||
let mut field_offsets = Vec::new(); | ||
|
||
for &field in field_layouts { | ||
let (next_layout, offset) = layout.extend(field)?; | ||
|
||
field_offsets.push(offset); | ||
layout = next_layout; | ||
} | ||
|
||
Ok((field_offsets, layout.pad_to_align())) | ||
} | ||
|
||
fn layout_max(a: Layout, b: Layout) -> Result<Layout, LayoutError> { | ||
Layout::from_size_align( | ||
a.size().max(b.size()), | ||
a.align().max(b.align()), | ||
) | ||
} | ||
|
||
/// Takes in the layout of each field (in declaration order) | ||
/// and returns the layout of the entire union | ||
/// NOTE: all fields of the union are located at offset 0 | ||
fn get_layout_for_union(field_layouts: &[Layout]) -> Result<Layout, LayoutError> { | ||
let mut layout = Layout::new::<()>(); | ||
|
||
for &field in field_layouts { | ||
layout = layout_max(layout, field)?; | ||
} | ||
|
||
Ok(layout.pad_to_align()) | ||
} | ||
|
||
/// Takes in the layout of each variant (and their fields) (in declaration order) | ||
/// and returns the offsets of all fields of the enum, and the layout of the entire enum | ||
/// NOTE: the enum tag is always at offset 0 | ||
fn get_layout_for_enum( | ||
// the discriminants may be negative for some enums | ||
// or u128::MAX for some enums, so there is no one primitive integer type which works. So BigInteger | ||
discriminants: &[BigInteger], | ||
variant_layouts: &[&[Layout]] | ||
) -> Result<(Vec<Vec<usize>>, Layout), LayoutError> { | ||
assert_eq!(discriminants.len(), variant_layouts.len()); | ||
|
||
let mut layout = Layout::new::<()>(); | ||
let mut variant_field_offsets = Vec::new(); | ||
|
||
let mut variant_with_tag = Vec::new(); | ||
// gives the smallest integer type which can represent the variants and the specified discriminants | ||
let tag_layout = get_layout_for_tag(discriminants); | ||
// ensure that the tag is the first field | ||
variant_with_tag.push(tag_layout); | ||
|
||
for &variant in variant_layouts { | ||
variant_with_tag.truncate(1); | ||
// put all other fields of the variant after this one | ||
variant_with_tag.extend_from_slice(variant); | ||
let (mut offsets, variant_layout) = get_layout_for_struct(&variant_with_tag)?; | ||
// remove the tag so the caller only gets the fields they provided in order | ||
offsets.remove(0); | ||
|
||
variant_field_offsets.push(offsets); | ||
layout = layout_max(layout, variant_layout)?; | ||
} | ||
|
||
Ok((variant_field_offsets, layout)) | ||
} | ||
``` | ||
### Migration to `repr(ordered_fields)` | ||
|
||
The migration will be handled as follows: | ||
* after `repr(ordered_fields)` is implemented | ||
* add a future compatibility warning for `repr(C)` in all current editions | ||
* at this point both `repr(ordered_fields)` and `repr(C)` will have identical behavior | ||
* the warning will come with a machine-applicable fix | ||
* Any crate which does no FFI can just apply the autofix | ||
* Any crate which uses `repr(C)` for FFI can ignore the warning crate-wide | ||
* Any crate which mixes both must do extra work to figure out which is which. (This is likely a tiny minority of crate) | ||
* Once the next edition rolls around (2027?), `repr(C)` on the new edition will *not* warn. Instead the meaning will have changed to mean *only* compatibility with C. The docs should be adjusted to mention this edition wrinkle. | ||
* The warning for previous editions will continue to be in effect | ||
* In some future edition (2033+), when it is deemed safe enough, the future compatibility warnings may be removed in editions <= 2024 | ||
* This should have given plenty of time for anyone who was going to update their code to do so. And doesn't burden the language indefinitely | ||
* This part isn't strictly necessary, and can be removed if needed | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
* This will cause a large amount of churn in the Rust ecosystem | ||
* If we don't end up switching `repr(C)` to mean the system layout/ABI, then we will have two identical reprs, which may cause confusion. | ||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
* `crabi`: http://github.com/rust-lang/rfcs/pull/3470 | ||
* Currently stuck in limbo since it has a much larger scope. doesn't actually serve to give a consistent cross-platform layout, since it defers to `repr(C)` (and it must, for its stated goals) | ||
* https://internals.rust-lang.org/t/consistent-ordering-of-struct-fileds-across-all-layout-compatible-generics/23247 | ||
* This doesn't give a predictable layout that can be used to match the layouts (or prefixes) of different structs | ||
* https://github.com/rust-lang/rfcs/pull/3718 | ||
* This one is currently stuck due to a larger scope than this RFC | ||
* do nothing | ||
* We keep getting bug reports on Windows (and other platforms), where `repr(C)` doesn't actually match the target C compiler, or we break a bunch of subtle unsafe code to match the target C compiler. | ||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
See Rationale and Alternatives as well | ||
|
||
* https://rust-lang.zulipchat.com/#narrow/channel/213817-t-lang/topic/expand.2Frevise.20repr.28.7BC.2Clinear.2C.2E.2E.2E.7D.29.20for.202024.20edition | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
* The migration plan, as a whole, needs to be ironed out | ||
* Currently, it is just a sketch, but we need timelines, dates, and guarantees to switch `repr(C)` to match the layout algorithm of the target C compiler. | ||
* Before this RFC is accepted, t-compiler will need to commit to fixing the layout algorithm sometime in the next edition. | ||
* The name of the new repr `repr(ordered_fields)` is a mouthful (intentionally for this RFC), maybe we could pick a better name? This could be done after the RFC is accepted. | ||
* `repr(linear)` | ||
* `repr(ordered)` | ||
* `repr(sequential)` | ||
* `repr(consistent)` | ||
* something else? | ||
* Is the ABI of `repr(ordered_fields)` specified (making it safe for FFI)? Or not? | ||
* Should unions expose some niches? | ||
* For example, if all variants of the union are structs which have a common prefix, then any niches of that common prefix could be exposed (i.e. in the enum case, making union of structs behave more like an enum). | ||
* This must be answered before stabilization, as it is set in stone after that | ||
* Should this `repr` be versioned? | ||
* This way we can evolve the repr (for example, by adding new niches) | ||
* Should we change the meaning of `repr(C)` in editions <= 2024 after we have reached edition 2033? Yes, it's a breaking change, but at that point it will likely only be breaking code no one uses. | ||
* Leaning towards no | ||
* Should we warn on `repr(ordered_fields)` when explicit tag type is missing (i.e. no `repr(u8)`/`repr(i32)`) | ||
* Since it's likely they didn't want the same tag type as `C`, and wanted the smallest possible tag type. | ||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
* Add more reprs for each target C compiler, for example `repr(C_gcc)` or `repr(C_msvc)`, etc. | ||
* This would allow a single Rust app to target multiple compilers robustly, and would make it easier to specify `repr(C)` | ||
* This would also allow fixing code in older editions | ||
* https://internals.rust-lang.org/t/consistent-ordering-of-struct-fileds-across-all-layout-compatible-generics/23247 |
Uh oh!
There was an error while loading. Please reload this page.