-
Notifications
You must be signed in to change notification settings - Fork 104
Labels #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Aske-Rosted
wants to merge
99
commits into
graphnet-team:main
Choose a base branch
from
Aske-Rosted:labels
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Labels #789
+1,895
−100
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Merge from main
merge from main
merge from main
This was referenced Apr 21, 2025
Merged
Merged
Merged
Merged
This PR is closed by #807, isn't it? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey all.
This PR has been in the process for a lot longer than I initially expected and have also grown in to be quite unruly in size. I will do my best to describe everything that has been added and/or changed.
Discussion about the resulting labels is both encouraged and very much appreciated. It seems to me that there might be others around in the IceCube collaboration who have been working on similar MC labels for training machine learning algorithms and I would especially like to hear your input.
The attempt was to add some labels which could be used for training machine learning algorithms. Two different main approaches have been taken, one is a calorimetric approach in which the labels try to describe all the energy deposited in and around the detector during the event window. And a pseudo-truth particle approach in which the labels try to describe the particle (or close bunch of particles) which produced a signal in the detector.
I am currently processing a large dataset using the new labels and could at a later time upload some plots if necessary. I have created some plots looking at the energy distributions of these new labels compared to the Homogenized-Qtot and the InIceNeutrino energy in order to, at least to some degree, verify that the labels are working as intended. (Example below)


I also looked into the SHAP values of BDT's trained on the truth MC-labels with the regression target being the energy of the in ice neutrino.
As there are quite a lot of these plots and they are quite difficult to parse I will not upload all of them here.
The new label extractors
e_entrance_track_
: Total energy at the time of entrance of all tracks entering the detector volume.e_deposited_track_
: Total deposited energy of all tracks entering the detector volume.e_cascade_
: Total energy of cascades contained inside the detector volumee_visible_
:e_entrance_track_
+e_cascade_
fraction_primary_
:e_visible_
as fraction of the primary particle(s) energyfraction_cascade_
:e_cascade_
/e_visible_
e_fraction_
:e_on_entrance_
divided by the energy of the primarydistance_
: For cascade, starting and contained tracks this is the distance from the center of the detector to the interaction vertex, for stopping and throughgoing tracks this is the distance to where the particle first enters the detector volume.e_on_entrance_
: Energy of the Highest Energy Particle (HEP) entering the detector volume, when the particle or production from the particle first becomes visible, the definition of this energy varies depending on whether the HEP starting, a track entering or a bundle.zenith_
: zenith angle of the HEPazimuth_
azimuth angle of the HEPdir_x_
x direction of the HEPdir_y_
y direction of the HEPdir_z_
z direction of the HEPpos_x_
position x of the HEP (see distance)pos_y_
position y of the HEP (see distance)pos_z_
position z of the HEP (see distance)time_
time of the HEP at the given positionlength_
: full length of the HEP track (should maybe be removed to not confuse withvisible_length_
visible_length_
: visible length of the track of length of the maximum expansion by the cascade inside the detector.trackness_
: fraction of energy produces by track like interactions.interaction_shape_
shape of the interactionparticle_type_
particle type of the HEPcontainment_
containment of the HEPparent_type_
the particle id of the parent of the HEPscipy.spatial
ConvexHull
class which can be used to effectively determine if a point or an array of points are located inside or outside of a volume spanned by a collection of points.MuonGun.extruded_polygon
to also be able to determine vectors intersection points on the surface of the hull.Additions: Completely new additions to GraphNet
Major change: Files that have seen major changes necessitated by the file conversion
get_primaries
function which can be inherited by classes - This leads to having to define if the file is a CORSIKA file since determining the primary of the muon bundle has to be handled differently.check_primary_energy
function, tries to handle cases where the energy of the primary particle is not set. Usually an identical particle exist as a daughter of the particle with the missing energy which can then be inserted as the new primary. This function can be called on both a list of primaries and on a single primary.remove_originals
: Allows the user to toggle whether or not the input files should be deleted after merging. This is done only after all files have been merged so files are not removed if the script terminates with an error. (This however also limits the usefulness of removing the files since you would still need to have the full space available)reset_integer_primary_key
: Allows the user to indicate that the key used for indexing the sqlite db needs to be reset. This is useful in instances where different processes where used to create the db files which are being merged.Minor changes: all the minor stuff I encountered and had to fix while working on this contribution.
gcd_file
value for combined extractorsSQLiteWriter