-
Notifications
You must be signed in to change notification settings - Fork 5
203 misc blech post process tweaks and issues #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
203 misc blech post process tweaks and issues #205
Conversation
Added a value to the get_clustering_params call for use in that function
1: set up wiping of the 'level_0' column on re-runs 2: added "press q" to some of the plots as a tip 3: set up importing of the value of `Split` column to be used to determine # of splits
added in a findall() to make the spreadsheet inputs more flexible
Add an option to clear preexisting saved units when re-running post_process
I have been adding comments; for some reason git hides them, and you've got to click the ellipses to expand them. I can add more specific descriptions if that's what you mean, like adding in the specific lines edited? edit: I get what you mean now; I've been commenting my commits, but I haven't been giving them descriptive names. Bad habit; I'll work on that. |
Added a message reporting completion in the event that pre-existing saved units are deleted Homogenized message convention: "==== X ====\n"
utils/blech_post_process_utils.py
Outdated
@@ -34,6 +34,8 @@ def __init__(self, sort_file_path): | |||
sort_table.sort_values( | |||
['len_cluster','Split'], | |||
ascending=False, inplace=True) | |||
if 'level_0' in sort_table.columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level_0
is added by the .reset_index() function: I guess when pandas resets the index, it creates that column as a record of the new index.
The issue is, that it then permanently writes that column to the output csv file. If you're using the csv as the input for cell sorting, you create it manually, yes, but then on the first run through post_process, it writes in that column. If you run the same CSV through post-process AGAIN, it tries to write level_0
in again, and can't write it overtop of the existing level_0
, which throws an error.
So this is a little bit of a niche problem; you need to be running post_process with a csv input, and then re-run the same csv to create the error. I mainly ran into the issue as a result of testing my code as I familiarize myself with the pipeline. That said, I could imagine scenarios where you accidentally added the wrong cell to the spreadsheet, or you're not happy with a split/merge outcome, or something similar, and want to run post_process again, and if you're using the csv input (which I like a lot, being very pro-automation), then this saves you from needing to manually go in and delete level_0
from the spreadsheet.
Pre-commit behavior first tests whether there's any value in the split column, initiates splitting, and produces an error if the column value does not contain a numeral Altered the first test to ignore whitespace inputs; " " previously triggered splitting, and does not now. Additionally, once splitting has been initiated, checks if there is a numeral in the csv input. If there is a numeral, it asks for user input but defaults to that numeral if none is given. If there is no numeral, it asks for user input and defaults to 5, as with manual sorting. This is now pretty airtight, I think.
… into 203-misc-blech_post_process-tweaks-and-issues
Did we not merge this? I'm running into all of these same problems again, and it looks like the patches never made it to Master |
eb2a311
to
694ada5
Compare
Working on it now...sorry for the delay |
… into 203-misc-blech_post_process-tweaks-and-issues
for more information, see https://pre-commit.ci
Modifies various things about post_process, mainly making it more amenable to being run more than once.
-fixes an issue where the 'level_0 ' column of an input table would prevent re-runs
-allows (but does not force) importing of the # of desired splits from the input table. This includes flexible auto-detection of the split #, allowing for more permissive conventions
-allows (but defaults to not) wiping pre-existing saved units from the hdf5 file, helping prevent redundant entries.
-adds a bit of text to some plots prompting the user to 'press q to close plot'. I found myself forgetting that was how it worked, so figured a reminder wouldn't hurt, and would be helpful to people new to the workflow