Skip to content

Conversation

vinothk-master
Copy link
Contributor

@vinothk-master vinothk-master commented Aug 13, 2025

Hi @kayoub5

I have made the below changes,

  • provide folder with multiple pdx files as argument
  • load all pdx files in folder, sort them alphabetically, compare databases pairwise
  • When multiple pdx files are analyzed, a summary of all changes might be helpful
  • calculate metrics (number of added services, number of changed services, number of removed services, total number of services per ecu variant, ...)
  • display metrics graphically

Thanks
Vinoth Kannan

vinothk-master and others added 30 commits February 22, 2025 17:19
New changes from parent repo
@andlaus
Copy link
Member

andlaus commented Aug 13, 2025

thanks for the contribution. @kakoeh: can you review this?

Copy link
Member

@andlaus andlaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks promising, but it still needs a bit of work. (I have not yet looked at the "meat" of the PR and I have some doubts about the usefulness of this feature as it is implemented by this PR (any opinions @kakoeh ?))

One thing I noticed is that the example PDX files get modified even though there is absolutely no reason why they should be (github does not allow this to be commented inline in reviews. The reason why this is there is that there are timestamps in PDX files, thus mksomersaultpdx.py creates a slightly different file every time it is called...)

summary_results = []
for i in range(len(pdx_files) - 1):
file_a = pdx_files[i]
file_b = pdx_files[i + 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not really make sense IMO: the desired order of files to be compared pairwise might not be lexicographical. Possibly the whole pdx_files list ought to be specified as command line parameters? (this would also remove the necessity of walking over the specified directory.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andlaus
Thanks for the suggestion!

Actually as per the comment from @kayoub5,
load all pdx files in folder, sort them alphabetically, compare databases pairwise

this is the condition, should we go with alphabetically or do we have to compare each and every file without sorting ?

Could you please provide me your input?

Thanks
Vinoth Kannan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still the opportunity to pass multiple PDX files in a custom order with the command line parameter -db. From my perspective, -f FOLDER would be used when many files exists and / or typing the list of databases is considered too cumbersome. But @andlaus you're right, lexicographical order might not be desired. Assuming that the files in the folder are already sorted as wanted, pdx_files.sort() could be omitted.

rich_print("Please specify either a database or variant for a comparison")
#elif args.folder:
elif hasattr(args, "folder") and args.folder:
print("Now printing the pdx files in folder")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("Now printing the pdx files in folder")
print(f"PDX files in folder {args.folder}: {','.join(pdx_files)}")

(you need to move this to line 743)

if args.output:
with open(args.output, "w") as f:
json.dump(summary_results, f, indent=4)
print_change_metrics(summary_results)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there already a function which compares two complete PDX files? why not calling that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andlaus

I have reused the compare_dianostic_layer() function to compare different pdx files

Suggested change
print_change_metrics(summary_results)
changes = task.compare_diagnostic_layers(layer_a, layer_b)

Thanks
Vinoth

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vinothk-master,
@andlaus is right, there is already a function which compares 2 PDX files: compare_databases()
Thus, db_changes = task.compare_databases(db_a, db_b) could be used and replace the loop for name in diagnostic_layer_names where the change metrics are calculated per diagnostic layer.
Instead of printing changes for each diagnostic layer individually with print_dl_metrics([layer_a, layer_b]), task.print_database_changes(db_changes) could then be used.

@@ -693,7 +710,7 @@ def run(args: argparse.Namespace) -> None:
dl for db in task.databases for dl in db.diag_layers
if dl.short_name in task.diagnostic_layer_names
]

# print("NAMES: ",task.diagnostic_layer_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code

Copy link
Contributor Author

@vinothk-master vinothk-master left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Team

  • Remove the dead codes
  • Made the changes from hasattr to getattr
    Made the above changes
    Thanks
    Vinoth

if args.output:
with open(args.output, "w") as f:
json.dump(summary_results, f, indent=4)
print_change_metrics(summary_results)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andlaus

I have reused the compare_dianostic_layer() function to compare different pdx files

Suggested change
print_change_metrics(summary_results)
changes = task.compare_diagnostic_layers(layer_a, layer_b)

Thanks
Vinoth

summary_results = []
for i in range(len(pdx_files) - 1):
file_a = pdx_files[i]
file_b = pdx_files[i + 1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @andlaus
Thanks for the suggestion!

Actually as per the comment from @kayoub5,
load all pdx files in folder, sort them alphabetically, compare databases pairwise

this is the condition, should we go with alphabetically or do we have to compare each and every file without sorting ?

Could you please provide me your input?

Thanks
Vinoth Kannan

table.add_column("Services Changed", justify="right", style="yellow")
table.add_column("Services Deleted", justify="right", style="yellow")
table.add_column("Changed Parameters", justify="right", style="yellow")
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variants and Diag Layer refer to the same object, therefore I wouldn't print the comparison details for each Diag Layer. See suggestion for table design here.

table.add_row(file_pair, str(m["diag_layer"]), str(m["diag_layer_type"]),
str(m["num_variants_added"]), str(m["num_variants_changed"]),
str(m["num_variants_deleted"]), str(m["num_new_services"]),
str(m["num_deleted_services"]), str(m["num_renamed_services"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the column headers, the order is Services Added, Services Changed, Services Deleted. Moreover, both renamed services and services with parameter changes should be characterized as Services Changed from my point of view.

Suggested change
str(m["num_deleted_services"]), str(m["num_renamed_services"]),
str(m["num_renamed_services"]+m["num_changed_parameters"]),
str(m["num_deleted_services"]))

except Exception:
old_names = [
item for sublist in changes.changed_name_of_service for item in sublist
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list changes.changed_name_of_service is defined as [new_names, old_names] where

  • new_names is of type list[DiagService] and
  • old_names is of type list[str].

Moreover, changed_name_of_service is a mandatory parameter of ServiceDiff, so lines 769-776 could be deleted and num_renamed calculated as num_renamed = len(changes.changed_name_of_service[0])

getattr(changes, "changed_parameters_of_service", []) or [])

# collect which services had parameter changes (unique)
for param_detail in getattr(changes, "changed_parameters_of_service", []) or []:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for param_detail in getattr(changes, "changed_parameters_of_service", []) or []:
for param_detail in changes.changed_parameters_of_service:

"num_deleted_services": len(changes.deleted_services),
"num_renamed_services": len(changes.changed_name_of_service[0]),
"num_changed_parameters": len(changes.changed_parameters_of_service)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add here only the change metrics for services as variant is synonymous with Diag Layer
Suggestion:
Add before loop for name in diagnostic_layer_names:

            summary_file = {
                "base_file": os.path.basename(file_a),
                "comparison_file": os.path.basename(file_b), 
                "num_variants_added": len(variants_added),
                "num_variants_changed": variants_changed_count,
                "num_variants_deleted": len(variants_deleted),
                "changed_variants": []
            }

Add at the end of loop for name in diagnostic_layer_names:

                summary_file.get("changed_variants").append(
                    {
                        "diag_layer": changes.diag_layer,
                        "diag_layer_type": changes.diag_layer_type,
                        "num_new_services": num_new,
                        "num_deleted_services": num_deleted,
                        "num_renamed_services": num_renamed,
                        "num_changed_parameters": num_changed_params
                    }
                )

Add after the loop for name in diagnostic_layer_names:

            summary_results.append(summary_file)

print(json.dumps(summary_results, indent=4))
if args.output:
with open(args.output, "w") as f:
json.dump(summary_results, f, indent=4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea:
If the content of SpecsChangesVariants and ServiceDiff are mapped to the dictionary summary_results in an independent function, the changes could be printed to a json file as well if arguments -db or -v are given.


print("New services:", services_b - services_a)
print("Deleted services:", services_a - services_b)
print_dl_metrics([layer_a, layer_b])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Services are defined per diagnostic layer, as well as the function print_dl_metrics([layer_a, layer_b]). Lines 804-811 should therefore be part of loop for name in diagnostic_layer_names.
Moreover, for bigger PDX files with multiple services, printing of all services for each file might be overwhelming. I'd suggest to print only the changes.

Suggested change
print_dl_metrics([layer_a, layer_b])
services_a = {srv.short_name for srv in layer_a.services}
services_b = {srv.short_name for srv in layer_b.services}
print("New services:", services_b - services_a)
print("Deleted services:", services_a - services_b)
print_dl_metrics([layer_a, layer_b])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @andlaus, based on the added code, there is no necessity to upload the files somersault.pdx, somersault_modified.pdx and somersault_modified_1.pdx. Maybe it's best to add *.pdx to your .gitignore file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's best to add *.pdx to your .gitignore file

the problem with that is that .gitignore is version controlled, i.e., it is quite easy to accidentally commit such a change. that said, maybe PDX files in the odxtools source tree ought to be ignored by everyone, i.e. changes to the somersault pdx files would need to be explicitly staged using git add -f? opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable

@andlaus
Copy link
Member

andlaus commented Aug 27, 2025

@vinothk-master: how do you intend to proceed with this PR? (if you want to continue working on it, it would IMO be great if you coordinated you efforts with @kakoeh and her work on #442 ...)

@vinothk-master
Copy link
Contributor Author

@vinothk-master: how do you intend to proceed with this PR? (if you want to continue working on it, it would IMO be great if you coordinated you efforts with @kakoeh and her work on #442 ...)

Hi @andlaus, I like to work on this PR, Let me cross check with #442 and provide my input.
Thanks
Vinoth Kannan

vinothk-master and others added 4 commits August 27, 2025 19:01
Co-authored-by: Katja Köhler <katja.koehler@mercedes-benz.com>
Co-authored-by: Katja Köhler <katja.koehler@mercedes-benz.com>
Co-authored-by: Katja Köhler <katja.koehler@mercedes-benz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants