Skip to content

Make cleanups and ct-master introduction #11778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 2, 2024
Merged

Make cleanups and ct-master introduction #11778

merged 20 commits into from
Sep 2, 2024

Conversation

lhoguin
Copy link
Contributor

@lhoguin lhoguin commented Jul 22, 2024

More cleanups.

@mergify mergify bot added the make label Jul 22, 2024
@lhoguin lhoguin force-pushed the loic-make-it-big branch from f783df5 to 90cfc39 Compare July 22, 2024 13:55
@lukebakken lukebakken self-requested a review July 22, 2024 15:03
@lhoguin lhoguin marked this pull request as ready for review August 27, 2024 14:38
@lhoguin
Copy link
Contributor Author

lhoguin commented Aug 27, 2024

This is ready for review.

@lhoguin
Copy link
Contributor Author

lhoguin commented Aug 27, 2024

I probably broke tests.

@lhoguin lhoguin force-pushed the loic-make-it-big branch 2 times, most recently from 75b9f28 to e9f59bd Compare August 29, 2024 09:51
@mergify mergify bot added the bazel label Aug 29, 2024
@lhoguin
Copy link
Contributor Author

lhoguin commented Aug 29, 2024

Finally green. I will clean up the commit history before this is ready for review.

lhoguin added 16 commits August 29, 2024 15:18
Ideally we wouldn't need it, but until applications are
in apps/ it will be necessary for a thing or two.

Note that rabbitmq_server_release is required to be
there for prepare-dist:: to work when building the
generic unix package.
Everything in this file seems to be dead code except
ct-slow/ct-fast, which have been replaced by their
equivalent in the rabbit Makefile.
No longer relevant because of the monorepo
Not useful in the monorepo.
This was only relevant before the monorepo.
They haven't been necessary for quite some time.
They are not useful for the monorepo.
Hasn't been used for a long time.
The relevant files have been symlinked to the root file
for the past two years.
Because of the monorepo most components do not need to be
listed. Only the community plugins and third party dependencies.

Community plugins can now be fetched and acted on from the top
level Makefile by adding COMMUNITY_PLUGINS=1 to the command line
or the environment. This will fetch and build community plugins:

  make COMMUNITY_PLUGINS=1

Once fetched they can be targeted directly as usual:

  make -C deps/rabbitmq_metronome

This cleanup has a net positive effect on build performance,
especially the performance of the top-level Makefile:

  make nope  0,04s user 0,02s system 106% cpu 0,061 total
  make nope  0,02s user 0,01s system 104% cpu 0,033 total

But also a minor improvement for application Makefiles:

  make -C deps/rabbit nope  0,02s user 0,00s system 98% cpu 0,022 total
  make -C deps/rabbit nope  0,01s user 0,00s system 98% cpu 0,020 total

And that improvement adds up when going through dependencies:

  make -C deps/rabbitmq_management  0,59s user 0,23s system 100% cpu 0,808 total
  make -C deps/rabbitmq_management  0,60s user 0,19s system 101% cpu 0,780 total
No real need to have two files, especially since it contains
only a few variable definitions. Plan is to only keep
separate files for larger features such as dist or run.
This has no real impact on performance[1] but should
make it clear which application can run the broker
and/or publish to Hex.pm. In particular, applications
that we can't run the broker from will now give up
early if we try to.

Note that while the broker can't normally run from the
amqp_client application's directory, it can run from
tests and some of the tests start the broker.

[1] on my machine
Doing it on a per test suite basis leads to issues if multiple
suites try to configure it, and there's no cleanup performed
anyway.
Because `ct_master` is yet another Erlang node, and it is used
to run multiple CT nodes, meaning it is in a cluster of CT
nodes, the tests that change the net_ticktime could not
work properly anymore. This is because net_ticktime must
be the same value across the cluster.

The same value had to be set for all tests in order to solve
this. This is why it was changed to 5s across the board. The
lower net_ticktime was used in most places to speed up tests
that must deal with cluster failures, so that value is good
enough for these cases.

One test in amqp_client was using the net_ticktime to test
the behavior of the direct connection timeout with varying
net_ticktime configurations. The test now mocks the
`net_kernel:get_net_ticktime()` function to achieve the
same result.
@lhoguin lhoguin changed the title DO NOT MERGE: Loic's next Make PR Make cleanups and ct-master introduction Aug 29, 2024
@lhoguin
Copy link
Contributor Author

lhoguin commented Aug 29, 2024

This is ready for review (for real, this time).

Copy link
Collaborator

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been using this PR for a couple of days, mostly by starting nodes and running test suites "in the background" (in parallel to doing something else).

The speed-up is easy to observe. However, when I run all deps/rabbit suites, some fail with

rabbit_stream_queue_SUITE > cluster_size_3_parallel_4
    {error,
        {{assertEqual,
             [{module,rabbit_ct_broker_helpers},
              {line,1157},
              {expression,"CrashesCount"},
              {expected,0},
              {value,3}]},
         [{rabbit_ct_broker_helpers,find_crashes_in_logs,2,
              [{file,"rabbit_ct_broker_helpers.erl"},{line,1157}]},
          {rabbit_ct_broker_helpers,stop_rabbitmq_nodes,1,
              [{file,"rabbit_ct_broker_helpers.erl"},{line,1117}]},
          {rabbit_ct_helpers,run_steps,2,
              [{file,"rabbit_ct_helpers.erl"},{line,135}]},
          {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
          {test_server,run_test_case_eval1,6,
              [{file,"test_server.erl"},{line,1390}]},
          {test_server,run_test_case_eval,9,
              [{file,"test_server.erl"},{line,1234}]}]}}

unit_cluster_formation_locking_mocks_SUITE > non_parallel_tests > init_with_lock_ignore_after_errors
    #1. {error,
            {{assertEqual,
                 [{module,unit_cluster_formation_locking_mocks_SUITE},
                  {line,63},
                  {expression,
                      "rabbit_peer_discovery : join_selected_node ( rabbit_peer_discovery_classic_config , missing@localhost , disc )"},
                  {expected,
                      {error,
                          {aborted_feature_flags_compat_check,
                              {error,feature_flags_file_not_set}}}},
                  {value,
                      {error,
                          {aborted_feature_flags_compat_check,
                              {error,{erpc,noconnection}}}}}]},
             [{unit_cluster_formation_locking_mocks_SUITE,
                  init_with_lock_ignore_after_errors,1,
                  [{file,"unit_cluster_formation_locking_mocks_SUITE.erl"},
                   {line,63}]},
              {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
              {test_server,run_test_case_eval1,6,
                  [{file,"test_server.erl"},{line,1302}]},
              {test_server,run_test_case_eval,9,
                  [{file,"test_server.erl"},{line,1234}]}]}}

unit_cluster_formation_locking_mocks_SUITE > non_parallel_tests > init_with_lock_not_supported
    #1. {error,
            {{assertEqual,
                 [{module,unit_cluster_formation_locking_mocks_SUITE},
                  {line,71},
                  {expression,
                      "rabbit_peer_discovery : join_selected_node ( rabbit_peer_discovery_classic_config , missing@localhost , disc )"},
                  {expected,
                      {error,
                          {aborted_feature_flags_compat_check,
                              {error,feature_flags_file_not_set}}}},
                  {value,
                      {error,
                          {aborted_feature_flags_compat_check,
                              {error,{erpc,noconnection}}}}}]},
             [{unit_cluster_formation_locking_mocks_SUITE,
                  init_with_lock_not_supported,1,
                  [{file,"unit_cluster_formation_locking_mocks_SUITE.erl"},
                   {line,71}]},
              {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
              {test_server,run_test_case_eval1,6,
                  [{file,"test_server.erl"},{line,1302}]},
              {test_server,run_test_case_eval,9,
                  [{file,"test_server.erl"},{line,1234}]}]}}

unit_cluster_formation_locking_mocks_SUITE > non_parallel_tests > init_with_lock_supported
    #1. {error,
            {{assertEqual,
                 [{module,unit_cluster_formation_locking_mocks_SUITE},
                  {line,80},
                  {expression,
                      "rabbit_peer_discovery : join_selected_node ( rabbit_peer_discovery_classic_config , missing@localhost , disc )"},
                  {expected,
                      {error,
                          {aborted_feature_flags_compat_check,
                              {error,feature_flags_file_not_set}}}},
                  {value,
                      {error,
                          {aborted_feature_flags_compat_check,
                              {error,{erpc,noconnection}}}}}]},
             [{unit_cluster_formation_locking_mocks_SUITE,
                  init_with_lock_supported,1,
                  [{file,"unit_cluster_formation_locking_mocks_SUITE.erl"},
                   {line,80}]},
              {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
              {test_server,run_test_case_eval1,6,
                  [{file,"test_server.erl"},{line,1302}]},
              {test_server,run_test_case_eval,9,
                  [{file,"test_server.erl"},{line,1234}]}]}}

unit_config_value_encryption_SUITE > sequential_tests > decrypt_config
    #1. {error,
            {{badmatch,{error,{already_loaded,rabbit}}},
             [{unit_config_value_encryption_SUITE,do_decrypt_config,1,
                  [{file,"unit_config_value_encryption_SUITE.erl"},{line,71}]},
              {unit_config_value_encryption_SUITE,
                  '-decrypt_config/1-lc$^2/1-2-',6,
                  [{file,"unit_config_value_encryption_SUITE.erl"},{line,66}]},
              {unit_config_value_encryption_SUITE,decrypt_config,1,
                  [{file,"unit_config_value_encryption_SUITE.erl"},{line,67}]},
              {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1793}]},
              {test_server,run_test_case_eval1,6,
                  [{file,"test_server.erl"},{line,1302}]},
              {test_server,run_test_case_eval,9,
                  [{file,"test_server.erl"},{line,1234}]}]}}

which might have something to do with the changes to how CT suites are started.

Otherwise functionally it works as expected and the speed-up of many operations is really appreciated.

@lhoguin
Copy link
Contributor Author

lhoguin commented Sep 2, 2024

Make sure to make -C deps/rabbitmq_ct_helpers so it catches the changes. Right now it is not always rebuilt (because our projects are in deps/). This should help solve the noconnection issues. If not, then perhaps something else is afoot.

The first one about crashes in logs is expected and should not result in test failure.

The last one about rabbit already being loaded... I'm not sure. But since it's loading rabbit within the test we can change the test to accomodate. I will look into it.

Seems that this can happen if multiple test suites are running
one after the other and a previous test suite did not clean up
perfectly.
@lhoguin lhoguin merged commit f0932e3 into main Sep 2, 2024
197 checks passed
@lhoguin lhoguin deleted the loic-make-it-big branch September 2, 2024 11:54
@michaelklishin
Copy link
Collaborator

FTR, there was an automatically generated follow-up PR #12197.

@lhoguin
Copy link
Contributor Author

lhoguin commented Sep 3, 2024

Right, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants