Skip to content

Off-by-one errors in filtering? #139

@fedarko

Description

@fedarko

It looks like the filtering code uses > when it should use >=:

songbird/songbird/util.py

Lines 154 to 158 in 2727c04

def sample_filter(val, id_, md):
return id_ in metadata.index and np.sum(val) > min_sample_count
def read_filter(val, id_, md):
return np.sum(val > 0) > min_feature_count

Because of this, features present in exactly 10 samples (or whatever min-feature-count is) will get filtered out and samples with exactly 1000 counts (or whatever min-sample-count is) will get filtered out, even though these are described as the minimum acceptable values:

'min-sample-count': (
"The minimum number of counts a sample needs for it to be included in "
"the analysis."
),
'min-feature-count': (
"The minimum number of samples a feature needs to be observed in "
"for it to be included in the analysis."
),

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions