Skip to content

run_qemu.sh: do not automatically modprobe 'cxl_test' at boot time #185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Apr 30, 2025

Boot time is not a good time to debug issues like
pmem/ndctl#278 or any other really.

git -C ndctl/ grep 'modprobe.*cxl_test' shows that pretty much every test already loads and unloads cxl_test by itself. If some exotic test managed to avoid doing this until today, then falling back in line is overdue.

PS: the "depmod_load_..." variable names were wrong: this was relying on systemd, not depmod.

Boot time is not a good time to debug issues like
pmem/ndctl#278 or any other really.

`git -C ndctl/ grep 'modprobe.*cxl_test'` shows that pretty much every
test already loads and unloads cxl_test by itself. If some exotic test
managed to avoid doing this until today, then falling back in line is
overdue.

PS: the "depmod_load_..." variable names were wrong: this was relying on
systemd, not depmod.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb marc-hb marked this pull request as ready for review April 30, 2025 23:25
@marc-hb marc-hb requested a review from stellarhopper as a code owner April 30, 2025 23:25
@AlisonSchofield
Copy link
Contributor

I don't know about this one. The history has been that when I select --cxl-test on the cmdline I get the cxl-test environment upon boot up and am ready to explore. I may be trying out any number of things in that environment, including, but not limited to running unit tests. Remember cxl-test is an environment, not only a unit test vehicle.
I'd rather it stay as is and present me with the usable environment upon boot up.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 30, 2025

The history has been that when I select --cxl-test on the cmdline I get the cxl-test environment upon boot up and am ready to explore.

Even after this PR, you are only one, simple and obvious modprobe cxl_test command away from that exploration. Choice.

Before this PR, no choice: pmem/ndctl#278 crash (and others) immediately at boot time and very obscure how to "opt out". For weeks, I didn't even realize run_qemu.sh was responsible and really wondered how to disable this.
I also struggled to connect messages in the logs to loading that module: logs are extremely noisy at boot time.
Depending on the issue it's not obvious the system can be recovered at all. cxl_test is a kernel module test hack after all.

@AlisonSchofield
Copy link
Contributor

The 'why' in this pull request is: "Boot time is not a good time to debug issues like pmem/ndctl#278 or any other really."

Following the link to issue #278 the story doesn't get much crisper. Why does this change need to be made?

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 1, 2025

"Boot time is not a good time to debug issues"

Boot time is a very busy time, so it's impossible to focus on one thing at time. This means:

  • logs are very noisy, hard to see what's relevant and what is not
  • races are very common

Making environment changes (log levels, tracing, init state,...) is also MUCH harder because it can require rebuilding the image as opposed to simply making the changes in place. Depending on the change desired, this can require advanced mkosi knowledge or even hacking. It also requires one to keep track of all image changes made as opposed to just rebooting a fresh image when needed.

There's really nothing specific to this PR there.

Applied to cxl_test: this PR allows "exploration" before and after loading.

@AlisonSchofield
Copy link
Contributor

If this is a change to make debugging something else easier, why not do that on a debug branch and only commit a change like this to main when it's deemed required?

FWIW adjusting the cxl dyndbg settings is useful for controlling the spew during boot and runtime.

@stellarhopper
Copy link
Member

I'd added this as a convenience thing - agree with @AlisonSchofield in that a lot of times you just want to look at something in cxl_test manually, not just run the test suite, and for those times it is nice to have it pre loaded by the time you get to a shell.

If it is causing actual problems (I'm not yet sure what the motivation for this is other than general cleanliness?) - sure removing it makes sense, but otherwise I think it's nice to have. I'll leave it to the folks who use this day to day for an ack/nack.

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 19, 2025

I'd added this as a convenience thing

There is indeed a big convenience difference. Both use cases are valid. But carrying this patch locally on every test system is pretty inconvenient and that's assuming you even know this patch exists. When you don't know it exists, the inconvenience is huge.

On the other, modprobe cxl_test is obvious and its inconvenience is very small.

@marc-hb

This comment was marked as resolved.

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 19, 2025

I found a surprising modprobe -r cxl_test behavior ...

That's because modprobe -r cxl_test removes cxl_acpi. It does not happen with rmmod cxl_test. That behavior is not intuitive because it causes cxl_test to affect non-cxl_test devices but there seems to be some logic to it. Thanks @davejiang for solving this mystery! EDIT: cxl list -i shows the devices, thanks @AlisonSchofield

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants