-
Notifications
You must be signed in to change notification settings - Fork 93
Improve compare tool #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve compare tool #440
Conversation
…rmat-source.sh etc.
…o refractor_compare_tool
…o refractor_compare_tool
…o refractor_compare_tool
New changes from parent repo
…o improve_compare_tool
thanks for the contribution. @kakoeh: can you review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks promising, but it still needs a bit of work. (I have not yet looked at the "meat" of the PR and I have some doubts about the usefulness of this feature as it is implemented by this PR (any opinions @kakoeh ?))
One thing I noticed is that the example PDX files get modified even though there is absolutely no reason why they should be (github does not allow this to be commented inline in reviews. The reason why this is there is that there are timestamps in PDX files, thus mksomersaultpdx.py
creates a slightly different file every time it is called...)
summary_results = [] | ||
for i in range(len(pdx_files) - 1): | ||
file_a = pdx_files[i] | ||
file_b = pdx_files[i + 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not really make sense IMO: the desired order of files to be compared pairwise might not be lexicographical. Possibly the whole pdx_files
list ought to be specified as command line parameters? (this would also remove the necessity of walking over the specified directory.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andlaus
Thanks for the suggestion!
Actually as per the comment from @kayoub5,
load all pdx files in folder, sort them alphabetically, compare databases pairwise
this is the condition, should we go with alphabetically or do we have to compare each and every file without sorting ?
Could you please provide me your input?
Thanks
Vinoth Kannan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still the opportunity to pass multiple PDX files in a custom order with the command line parameter -db
. From my perspective, -f FOLDER
would be used when many files exists and / or typing the list of databases is considered too cumbersome. But @andlaus you're right, lexicographical order might not be desired. Assuming that the files in the folder are already sorted as wanted, pdx_files.sort()
could be omitted.
odxtools/cli/compare.py
Outdated
rich_print("Please specify either a database or variant for a comparison") | ||
#elif args.folder: | ||
elif hasattr(args, "folder") and args.folder: | ||
print("Now printing the pdx files in folder") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("Now printing the pdx files in folder") | |
print(f"PDX files in folder {args.folder}: {','.join(pdx_files)}") |
(you need to move this to line 743)
if args.output: | ||
with open(args.output, "w") as f: | ||
json.dump(summary_results, f, indent=4) | ||
print_change_metrics(summary_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there already a function which compares two complete PDX files? why not calling that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andlaus
I have reused the compare_dianostic_layer() function to compare different pdx files
print_change_metrics(summary_results) | |
changes = task.compare_diagnostic_layers(layer_a, layer_b) |
Thanks
Vinoth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vinothk-master,
@andlaus is right, there is already a function which compares 2 PDX files: compare_databases()
Thus, db_changes = task.compare_databases(db_a, db_b)
could be used and replace the loop for name in diagnostic_layer_names
where the change metrics are calculated per diagnostic layer.
Instead of printing changes for each diagnostic layer individually with print_dl_metrics([layer_a, layer_b])
, task.print_database_changes(db_changes)
could then be used.
odxtools/cli/compare.py
Outdated
@@ -693,7 +710,7 @@ def run(args: argparse.Namespace) -> None: | |||
dl for db in task.databases for dl in db.diag_layers | |||
if dl.short_name in task.diagnostic_layer_names | |||
] | |||
|
|||
# print("NAMES: ",task.diagnostic_layer_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Team
- Remove the dead codes
- Made the changes from hasattr to getattr
Made the above changes
Thanks
Vinoth
if args.output: | ||
with open(args.output, "w") as f: | ||
json.dump(summary_results, f, indent=4) | ||
print_change_metrics(summary_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andlaus
I have reused the compare_dianostic_layer() function to compare different pdx files
print_change_metrics(summary_results) | |
changes = task.compare_diagnostic_layers(layer_a, layer_b) |
Thanks
Vinoth
summary_results = [] | ||
for i in range(len(pdx_files) - 1): | ||
file_a = pdx_files[i] | ||
file_b = pdx_files[i + 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andlaus
Thanks for the suggestion!
Actually as per the comment from @kayoub5,
load all pdx files in folder, sort them alphabetically, compare databases pairwise
this is the condition, should we go with alphabetically or do we have to compare each and every file without sorting ?
Could you please provide me your input?
Thanks
Vinoth Kannan
table.add_column("Services Changed", justify="right", style="yellow") | ||
table.add_column("Services Deleted", justify="right", style="yellow") | ||
table.add_column("Changed Parameters", justify="right", style="yellow") | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variants
and Diag Layer
refer to the same object, therefore I wouldn't print the comparison details for each Diag Layer
. See suggestion for table design here.
table.add_row(file_pair, str(m["diag_layer"]), str(m["diag_layer_type"]), | ||
str(m["num_variants_added"]), str(m["num_variants_changed"]), | ||
str(m["num_variants_deleted"]), str(m["num_new_services"]), | ||
str(m["num_deleted_services"]), str(m["num_renamed_services"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the column headers, the order is Services Added
, Services Changed
, Services Deleted
. Moreover, both renamed services and services with parameter changes should be characterized as Services Changed
from my point of view.
str(m["num_deleted_services"]), str(m["num_renamed_services"]), | |
str(m["num_renamed_services"]+m["num_changed_parameters"]), | |
str(m["num_deleted_services"])) |
except Exception: | ||
old_names = [ | ||
item for sublist in changes.changed_name_of_service for item in sublist | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list changes.changed_name_of_service
is defined as [new_names, old_names]
where
new_names
is of typelist[DiagService]
andold_names
is of typelist[str]
.
Moreover, changed_name_of_service
is a mandatory parameter of ServiceDiff
, so lines 769-776 could be deleted and num_renamed
calculated as num_renamed = len(changes.changed_name_of_service[0])
getattr(changes, "changed_parameters_of_service", []) or []) | ||
|
||
# collect which services had parameter changes (unique) | ||
for param_detail in getattr(changes, "changed_parameters_of_service", []) or []: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for param_detail in getattr(changes, "changed_parameters_of_service", []) or []: | |
for param_detail in changes.changed_parameters_of_service: |
"num_deleted_services": len(changes.deleted_services), | ||
"num_renamed_services": len(changes.changed_name_of_service[0]), | ||
"num_changed_parameters": len(changes.changed_parameters_of_service) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add here only the change metrics for services as variant
is synonymous with Diag Layer
Suggestion:
Add before loop for name in diagnostic_layer_names
:
summary_file = {
"base_file": os.path.basename(file_a),
"comparison_file": os.path.basename(file_b),
"num_variants_added": len(variants_added),
"num_variants_changed": variants_changed_count,
"num_variants_deleted": len(variants_deleted),
"changed_variants": []
}
Add at the end of loop for name in diagnostic_layer_names
:
summary_file.get("changed_variants").append(
{
"diag_layer": changes.diag_layer,
"diag_layer_type": changes.diag_layer_type,
"num_new_services": num_new,
"num_deleted_services": num_deleted,
"num_renamed_services": num_renamed,
"num_changed_parameters": num_changed_params
}
)
Add after the loop for name in diagnostic_layer_names
:
summary_results.append(summary_file)
print(json.dumps(summary_results, indent=4)) | ||
if args.output: | ||
with open(args.output, "w") as f: | ||
json.dump(summary_results, f, indent=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea:
If the content of SpecsChangesVariants
and ServiceDiff
are mapped to the dictionary summary_results
in an independent function, the changes could be printed to a json file as well if arguments -db
or -v
are given.
|
||
print("New services:", services_b - services_a) | ||
print("Deleted services:", services_a - services_b) | ||
print_dl_metrics([layer_a, layer_b]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Services are defined per diagnostic layer, as well as the function print_dl_metrics([layer_a, layer_b])
. Lines 804-811 should therefore be part of loop for name in diagnostic_layer_names
.
Moreover, for bigger PDX files with multiple services, printing of all services for each file might be overwhelming. I'd suggest to print only the changes.
print_dl_metrics([layer_a, layer_b]) | |
services_a = {srv.short_name for srv in layer_a.services} | |
services_b = {srv.short_name for srv in layer_b.services} | |
print("New services:", services_b - services_a) | |
print("Deleted services:", services_a - services_b) | |
print_dl_metrics([layer_a, layer_b]) |
pdx_files_folder/somersault.pdx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @andlaus, based on the added code, there is no necessity to upload the files somersault.pdx
, somersault_modified.pdx
and somersault_modified_1.pdx
. Maybe it's best to add *.pdx
to your .gitignore
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's best to add *.pdx to your .gitignore file
the problem with that is that .gitignore
is version controlled, i.e., it is quite easy to accidentally commit such a change. that said, maybe PDX files in the odxtools source tree ought to be ignored by everyone, i.e. changes to the somersault pdx files would need to be explicitly staged using git add -f
? opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable
@vinothk-master: how do you intend to proceed with this PR? (if you want to continue working on it, it would IMO be great if you coordinated you efforts with @kakoeh and her work on #442 ...) |
Hi @andlaus, I like to work on this PR, Let me cross check with #442 and provide my input. |
Co-authored-by: Katja Köhler <katja.koehler@mercedes-benz.com>
Co-authored-by: Katja Köhler <katja.koehler@mercedes-benz.com>
Co-authored-by: Katja Köhler <katja.koehler@mercedes-benz.com>
Hi @kayoub5
I have made the below changes,
Thanks
Vinoth Kannan