Skip to content

Conversation

e-strauss
Copy link
Contributor

@e-strauss e-strauss commented Feb 13, 2025

This PR consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly.

The changes also resulted in a couple changes in the test script for the builtin scripts. For the discovery of the usages of the changed builtins, I had to manually "grep" the ./src/test/scripts folder, so that I dont break any tests, but also the ./script folder to catch the usages in tutorials, perftest, etc. I hope, I've catched everything there, but I can not guaranty it, since the github test just run on test script folder.

I applied the following formatting convention:

  • camelCase, no "_"
  • matrix / vector arguments must be capitalized
  • for scalars, the first character must be lower case

allowed:
X, Xtest, ScaleFactor,
tol, maxIter

not allowed:
X_test, max_iter

Furtheremore, I used the following python script / notebook for the discovery of all unique parameter names:
https://gist.github.com/e-strauss/ccec59f504ed3882a2141d859ff681ec

In the following, I want to give a quick overview over the consolidations, that I applied.
In total, I found 62 matches for consolidation and reduced the total number of unique parameters to 340, which was previously at 402.

I changed the spelling of the following 76 parameters names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa

Through either consolidation or applying the naming convention, I removed the following 138 parameter names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa

A list of all current parameters can be found here:
https://gist.github.com/e-strauss/8aad1143b016d7bda28f3884038d62d3

Copy link

codecov bot commented Feb 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 21.64%. Comparing base (1609470) to head (2b29972).
Report is 28 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #2228       +/-   ##
=============================================
- Coverage     72.47%   21.64%   -50.83%     
+ Complexity    45440    14140    -31300     
=============================================
  Files          1469     1469               
  Lines        170873   170873               
  Branches      33315    33315               
=============================================
- Hits         123836    36993    -86843     
- Misses        37622   130547    +92925     
+ Partials       9415     3333     -6082     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@e-strauss e-strauss force-pushed the DML_parameter_consolidation branch from d60724e to 2b29972 Compare February 14, 2025 11:42
@e-strauss
Copy link
Contributor Author

I just squashed the commits and removed the changes in the github action configs

@Baunsgaard
Copy link
Contributor

LGTM, we just need to resolve the merge conflicts.

This commit consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly.

The changes in the builtin scripts arguments:

order_rand -> Order
search_iterations -> iter
batch_size -> batchSize
n_bins -> n
ytest -> Ytest
max_iter -> maxIter
max_func_invoc, max_iterations -> maxIter
include_mean -> includeMean
icpt -> intercept
maxi -> maxIteration
maxii -> maxInnerIteration
cMask -> ctypes
is_verbose -> verbose
k_max -> maxK
kmax -> iter
(start/end)_stepsize -> (start/end)Stepsize
(start/end)_vicinity -> (start/end)Vicinity
sim_seed -> seed
k_value -> k
CL_T -> ctypes
trans_continuous -> tranCont
select_k -> selectK
k_min -> minK
k_max -> maxK
select_feature -> selectFeature
feature_max -> maxFeatures
feature_importance -> featureImportance
predict_con_tg -> predictCont
START_SELECTED -> initSelectFeature
frequency_threshold -> frequencyThreshold
distance_threshold -> distanceThreshold
is_verbose -> verbose
thresh -> threshold
reduced_dims -> reducedDims
max_iter -> maxIter
is_verbose -> verbose
print_iter -> printIter
min_leaf -> minLeaf
min_split -> minSplit
num_trees -> numTrees
max_depth -> maxDepth
max_features -> maxFeatures
max_values -> maxValues
max_dataratio -> maxDataRatio
sample_frac -> sampleFrac
feature_frac -> featureFrac
window_size -> windowSize
sample_percent -> sampleFrac
is_verbose -> verbose
alpha -> lr
batch_size -> batchSize
learning_rate -> lr
out_activation -> outActivation
loss_fcn -> lossFn
validation_split -> validationSplit
types -> ctypes
sml_type -> smlType
num_trees -> numTrees
learning_rate -> lr
max_depth -> maxDepth
lambda -> reg
icpt -> intercept
maxi, moi, iter -> maxIter
mii -> maxInnerIter
epsilon -> tol
maxIterations -> maxIter
maxii -> maxInnerIter
eps -> tol
max_iter, iterations -> maxIter
is_verbose -> verbose
avg_sample_size_per_centroid -> avgSampleSizePerCentroid
space_decomp -> spaceDecomp
n_components -> nComponents
init_param -> initParams
reg_covar -> reg
rnk -> rank
eps -> tolerance
maxi -> maxIter
dth, thr -> threshold
maxi -> maxIter
dth, thr -> threshold
maxi -> maxIter
@e-strauss e-strauss force-pushed the DML_parameter_consolidation branch from 2b29972 to 7e0ae37 Compare May 14, 2025 09:14
@e-strauss
Copy link
Contributor Author

@mboehm7 Hi, can we merge in the changes ? or is there something I should change first

@Baunsgaard
Copy link
Contributor

@mboehm7 ,
We need to move forward on this PR, since it is starting to be containing conflicts with the dependent documentation PR.

I did look though the changes and they seem reasonable.

However, you (@mboehm7) voiced some concerns on the changes. Could you give a LGTM, merge, or voice your concerns here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants