Julia bindings to BSC's extrae
HPC profiler.
It supports automatic instrumentation (through LD_PRELOAD
mechanism, DynInst is on the way) of MPI, CUDA and pthreads, and PAPI/PMAPI hardware counters and callstack sampling.
Generated traces can be viewed with Paraver.
It was presented at JuliaCon 2024 Eindhoven:
If you want to perform a system-wide installation of Extrae, we recommend following this guide. You can find more details in the section 3. Configuration, build and installation of the documentation.
If you want to use the BinaryBuilder-built artifact, you don't need to do anything more than adding Extrae.jl as a dependency to your project.
First, you need to set the Extrae configuration using environment variables or XML configuration. An example configuration file can be found in scripts/extrae.xml
and in section 9. An example of Extrae XML configuration file of the documentation.
More information about the configuration options can be found in section 4. "Extrae XML configuration file" and in section 10. "Environment variables" of the documentation.
Extrae's functionality is very basic: every registered event is just a tuple of 2 integers annotating the event type and the event value.
Some events are automatically registered, such as MPI call names when you are tracing or PAPI hardware counters when performing sampling.
But you can also emit your own custom events using emit
:
# emit event 80000 with value 4
emit(80_000, 4)
Event types are encoded with Int32
and the values must always be a Int64
.
If you want to assign a string descriptor to the event, you should call Extrae.register
before initialization.
const BANANAS_TYPECODE::Int32 = 80_000
Extrae.register(BANANAS_TYPECODE, "Bananas")
Alternatively, you can also add string descriptors to values.
Extrae.register(Int32(80_001), "Monkey name", Int64[0,1,2], String["no monkey", "louis", "george"])
Extrae
can be initialized just by calling Extrae.init()
. If you are planning to use Distributed
, you should call
@everywhere Extrae.init(Val(:Distributed))
to properly initialize the profiler in all workers. If you plan to use MPI, you should use the LD_PRELOAD
mechanism.
The profiling is finished with Extrae.finish()
.
Many times, the profiler catches much more information than we want. One way to filter it is by marking which moments in the trace where devoted to user code. This can be done by calling Extrae.user_function(1)
to start and Extrae.user_function(0)
to end the marking region.
We also provide a Extrae.@user_function
for code cleanliness.
Check out #23. While Base and Core functions do appear in the profile, user functions do not. This is due to a miscommunication between Extrae and Julia to pass JITed function symbols, but these is not the case of sysimgs because Extrae can read function names from the file.
A solution is on the works, but meanwhile it can be make to work by compiling the user functions into a sysimg.
Binaries built through BinaryBuilder are dynamically linked, but the system linker is unable to find the rest of dependencies because the rpaths are configured in another way. The current solution is to manually preload all the required dependencies manually. We are looking for a way to make this is easier.
Alternatively, you can use system-installed binaries in your cluster (recommended option).