Skip to content

Conversation

Mraymon5
Copy link
Collaborator

Modifies various things about post_process, mainly making it more amenable to being run more than once.
-fixes an issue where the 'level_0 ' column of an input table would prevent re-runs
-allows (but does not force) importing of the # of desired splits from the input table. This includes flexible auto-detection of the split #, allowing for more permissive conventions
-allows (but defaults to not) wiping pre-existing saved units from the hdf5 file, helping prevent redundant entries.
-adds a bit of text to some plots prompting the user to 'press q to close plot'. I found myself forgetting that was how it worked, so figured a reminder wouldn't hurt, and would be helpful to people new to the workflow

Added a value to the get_clustering_params call for use in that function
1: set up wiping of the 'level_0' column on re-runs
2: added "press q" to some of the plots as a tip
3: set up importing of the value of `Split` column to be used to determine # of splits
added in a findall() to make the spreadsheet inputs more flexible
Add an option to clear preexisting saved units when re-running post_process
@Mraymon5 Mraymon5 linked an issue Aug 16, 2024 that may be closed by this pull request
@abuzarmahmood
Copy link
Member

Thanks for the edits! Apologies for the delay in getting to them.
I've added some comments. Could you please respond to them/make edits and push the commit.

In the future, please also try to make commit messages for descriptive of the changes that have been made. Once the commits are merged, it is difficult to tell what changes they represent without a descriptive commit message.
image

@abuzarmahmood
Copy link
Member

Also @Mraymon5, I just merged #204, you will have to merge master into your branch again before making further edits

@Mraymon5
Copy link
Collaborator Author

Mraymon5 commented Aug 21, 2024

I have been adding comments; for some reason git hides them, and you've got to click the ellipses to expand them. I can add more specific descriptions if that's what you mean, like adding in the specific lines edited?

edit: I get what you mean now; I've been commenting my commits, but I haven't been giving them descriptive names. Bad habit; I'll work on that.

Screenshot from 2024-08-21 09-52-40

Added a message reporting completion in the event that pre-existing saved units are deleted

Homogenized message convention: "==== X ====\n"
@@ -34,6 +34,8 @@ def __init__(self, sort_file_path):
sort_table.sort_values(
['len_cluster','Split'],
ascending=False, inplace=True)
if 'level_0' in sort_table.columns:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level_0 is added by the .reset_index() function: I guess when pandas resets the index, it creates that column as a record of the new index.

The issue is, that it then permanently writes that column to the output csv file. If you're using the csv as the input for cell sorting, you create it manually, yes, but then on the first run through post_process, it writes in that column. If you run the same CSV through post-process AGAIN, it tries to write level_0 in again, and can't write it overtop of the existing level_0, which throws an error.

So this is a little bit of a niche problem; you need to be running post_process with a csv input, and then re-run the same csv to create the error. I mainly ran into the issue as a result of testing my code as I familiarize myself with the pipeline. That said, I could imagine scenarios where you accidentally added the wrong cell to the spreadsheet, or you're not happy with a split/merge outcome, or something similar, and want to run post_process again, and if you're using the csv input (which I like a lot, being very pro-automation), then this saves you from needing to manually go in and delete level_0 from the spreadsheet.

Pre-commit behavior first tests whether there's any value in the split column, initiates splitting, and produces an error if the column value does not contain a numeral

Altered the first test to ignore whitespace inputs; "   " previously triggered splitting, and does not now.
Additionally, once splitting has been initiated, checks if there is a numeral in the csv input. If there is a numeral, it asks for user input but defaults to that numeral if none is given. If there is no numeral, it asks for user input and defaults to 5, as with manual sorting.

This is now pretty airtight, I think.
@Mraymon5
Copy link
Collaborator Author

Did we not merge this? I'm running into all of these same problems again, and it looks like the patches never made it to Master

@abuzarmahmood abuzarmahmood force-pushed the 203-misc-blech_post_process-tweaks-and-issues branch from eb2a311 to 694ada5 Compare April 29, 2025 13:52
@abuzarmahmood
Copy link
Member

Working on it now...sorry for the delay

@abuzarmahmood abuzarmahmood merged commit e85ca98 into master Apr 29, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc blech_post_process tweaks and issues
2 participants