Skip to content

fix: make col_sample min equals to 1 #385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2025

Conversation

HEGAB7
Copy link
Contributor

@HEGAB7 HEGAB7 commented Apr 3, 2025

Improved Column Sampling for Small Datasets

In our current implementation, col_sample is treated as a percentage. When the number of columns is small and col_sample is also small, calculating the sampled columns with:

col_size = int(self.col_sample * X.shape[1])

can result in col_size being 0.

For example:
if there are 4 columns and col_sample = 0.1, then int(0.1 * 4) yields 0, which subsequently causes an error during model generation due to having 0 input features.

This pull request updates the logic to ensure that at least one column is always selected when col_sample is positive by using:

col_size = max(1, int(self.col_sample * X.shape[1])) if self.col_sample > 0 else 0

This change prevents errors in cases with a small number of features, improving the robustness of the model creation process.

@alejandroschuler alejandroschuler self-assigned this Apr 3, 2025
@alejandroschuler alejandroschuler self-requested a review April 3, 2025 20:03
@ryan-wolbeck ryan-wolbeck self-requested a review April 3, 2025 23:59
Copy link
Collaborator

@ryan-wolbeck ryan-wolbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the contribution!

@ryan-wolbeck ryan-wolbeck merged commit fcf77c9 into stanfordmlgroup:master Apr 4, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants