Skip to content

Release/glm 2.0.0 #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 828 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
828 commits
Select commit Hold shift + click to select a range
d4516f4
bugfix, variable train test on comparison tab
Jul 31, 2024
fe6b8cc
fix for interaction vars not being returned from prev model
Jul 31, 2024
f27a793
error handling when interaction variables are requested in model tria…
Jul 31, 2024
1617aed
verbose error handling for model retrival
Jul 31, 2024
b8c1789
not local
Jul 31, 2024
74fc70f
not local
Jul 31, 2024
193d5c3
bugfix: target not defined in ml task settings
Aug 20, 2024
dcf391b
interactions in debug
Sep 4, 2024
85ee447
frontend change for base level for all
david-behar Sep 5, 2024
37052d8
working training of model with base
Sep 5, 2024
216c572
add rescale
david-behar Sep 6, 2024
e2ce139
handle rescale in the frontend
david-behar Sep 6, 2024
89bed41
get base levels from the front
Sep 6, 2024
d85568a
working version except for suspect
Sep 9, 2024
d4fda12
updating structure based on PR comments
Sep 10, 2024
542dba8
Merge pull request #78 from dataiku/feature/select-base-relativity-fo…
david-behar Sep 10, 2024
d22230f
interactions algo
Sep 11, 2024
ede5600
Merge branch 'release/1.0.5' into feature/interactions
david-behar Sep 11, 2024
ae46120
typo
Sep 12, 2024
cb4f6e4
typo
Sep 12, 2024
38f7223
interaction frontend work in progress
david-behar Sep 12, 2024
f077df0
bugfix so you are not predicting on all column
Sep 12, 2024
3983187
bugfix
Sep 12, 2024
73dd442
Feature/improved robustness (#79)
matthewgalloway Sep 12, 2024
9f9a1c4
update front build
david-behar Sep 13, 2024
db3d262
add some logging
david-behar Sep 13, 2024
6c4e0c6
refactor clean up (#80)
matthewgalloway Sep 13, 2024
66a27e4
working version with categorical numeric column
Sep 16, 2024
38b9871
update axis one-way-variable chart
david-behar Sep 16, 2024
3b21976
remove some logs
Sep 16, 2024
7c8e321
moving exposure/target to settings
Sep 17, 2024
c93c4ee
switch to not local
Sep 17, 2024
d5959b6
rebuild js and css
david-behar Sep 17, 2024
c1ea6ad
Merge branch 'release/1.0.5' into feature/base_levels_based_on_exposure
david-behar Sep 17, 2024
e00d025
Merge pull request #81 from dataiku/feature/base_levels_based_on_expo…
david-behar Sep 17, 2024
5737702
Release/1.0.5 (#82)
matthewgalloway Sep 17, 2024
a4030a4
fix base levels when export one way chart
Sep 17, 2024
37a1068
model training for interaction
Sep 17, 2024
8750182
model training not local
Sep 17, 2024
775db93
fixed one-way variable charts
Sep 18, 2024
78dfeba
fix the lift chart
Sep 18, 2024
3fa8b50
Merge branch 'release/1.0.5' into feature/interactions
david-behar Sep 19, 2024
25fef21
interactions in progress
Sep 20, 2024
a81ce59
working variable level stats
Sep 20, 2024
deaa1d1
interaction export
Sep 25, 2024
942354b
bug fix lift chart bins (#84)
matthewgalloway Oct 7, 2024
b7d83c6
Bugfix/exposure column stoping model retrival (#85)
matthewgalloway Oct 7, 2024
5ea4897
working export
Oct 8, 2024
70c5e47
order modalities in export
Oct 8, 2024
d57ef64
merginging 1.0.5 bug fixes
Oct 8, 2024
bfa5428
fix variable level stats to show marginal interactions
Oct 8, 2024
00f12b7
Move model metrics to be calculated at training time (#87)
matthewgalloway Oct 16, 2024
8bb60d9
Naming models and bug fix on lift charts (#88)
matthewgalloway Oct 21, 2024
5862466
fix variable level stats
Oct 21, 2024
97c00cf
remove the prints
Oct 21, 2024
db2e138
rounding in the front
david-behar Oct 21, 2024
5970d65
no rounding in the backend
Oct 21, 2024
0a276df
minor bug fix lift charts
Oct 23, 2024
8985407
Merge pull request #89 from dataiku/bugfix/missing-variable-stats
david-behar Oct 23, 2024
501af9e
Merge pull request #90 from dataiku/bugfix/base_prediction_relativity…
david-behar Oct 23, 2024
f1ed90e
merge 1.0.5
david-behar Oct 25, 2024
b5364f9
fix metrics
Oct 25, 2024
eeeaa19
changelog
david-behar Oct 25, 2024
65be5cb
add the rescale
david-behar Oct 25, 2024
1d5c12d
remove useless tabs
david-behar Oct 25, 2024
bde1d82
Bugfix/retrieving interactions (#91)
matthewgalloway Oct 27, 2024
f5c3ec6
bug fix model training with interaction
Oct 27, 2024
3cc9fab
fix merge conflict
Oct 27, 2024
d5a07bf
shipping webaiku with plugin
Nov 1, 2024
de8269b
removing webaiku requirements
Nov 1, 2024
c2e3cd1
fix base values
david-behar Nov 5, 2024
3f245db
add interactions
david-behar Nov 7, 2024
214c35a
working store
david-behar Nov 15, 2024
3360617
in progress training screen
david-behar Nov 19, 2024
59aae74
in progress
david-behar Nov 21, 2024
aea0afc
working front
david-behar Nov 26, 2024
baf0007
Feature/interactions (#83)
matthewgalloway Nov 27, 2024
1778702
fix ordering and loading of metrics and models
Nov 28, 2024
34a0238
handle loadings
david-behar Nov 28, 2024
1dcebec
update front
david-behar Nov 29, 2024
097b985
update loading
david-behar Nov 29, 2024
18258cb
fix tabs
david-behar Nov 29, 2024
4a80c05
text color
david-behar Nov 29, 2024
a01eef0
Merge pull request #97 from dataiku/bugfix/arrow-menu
david-behar Nov 29, 2024
f76cd7d
Merge branch 'release/1.0.5' into release/0.0.1alpha
david-behar Dec 2, 2024
6f469b5
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
20dc89f
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
86bd7e2
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
956da0e
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
d28f1eb
Edited file 'python-lib/dku_visual_ml/dku_train_model_config.py'
Apr 8, 2025
28e8454
Edited file 'python-lib/backend/model_cache.py'
Apr 8, 2025
e51e612
Edited file 'python-lib/backend/model_cache.py'
Apr 8, 2025
bd6dd35
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
37ae7aa
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
feb5f38
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
c55009d
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
a6fc797
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
c4d12d1
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
843e72e
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
c0fc2e3
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
d0bb645
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
72eacd7
Merge pull request #98 from dataiku/feature/225464-remove-saved-model
david-behar Apr 8, 2025
7d8a088
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
23f1f24
Merge pull request #99 from dataiku/feature/225466-skip-expensive-rep…
david-behar Apr 8, 2025
00d1162
Edited file 'python-lib/backend/dataiku_api.py'
Apr 8, 2025
5760c3d
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
916e69b
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
7b52c7f
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
ab612cf
Merge pull request #100 from dataiku/feature/225467-force-code-env
david-behar Apr 8, 2025
2879724
Removed 'resource/params_helper.py'
Apr 8, 2025
23793ea
Edited file 'resource/web_app_setting_list.py'
Apr 8, 2025
d7b3835
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
b26d5e5
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
1ec1337
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
a42d70a
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
efbf759
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
b17d193
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
9cb2c79
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
58d1bab
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
2ce4c8f
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
2c889be
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
4fd9c58
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
04cbed0
Edited file 'python-lib/dku_visual_ml/dku_train_model_config.py'
Apr 8, 2025
10d05c0
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
d610923
Edited file 'webapps/glm-analysis/webapp.json'
Apr 8, 2025
b7c1df1
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
03ea8f0
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
11b81d1
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
9439511
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
0a3f0c8
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
8129efb
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
0c2bf48
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
76cd94c
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
95afd5c
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 8, 2025
08c1357
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
bfa67fb
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
efcc64c
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
b1613a5
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
e1d7616
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
d9c97c1
Edited file 'python-lib/backend/fetch_api.py'
Apr 8, 2025
45d2b83
Merge pull request #101 from dataiku/feature/225465-webapp-create-ana…
david-behar Apr 8, 2025
96a991e
Edited files
Apr 9, 2025
ec51c53
Edited file 'resource/web_app_setting_list.py'
Apr 9, 2025
a6a07a1
Edited file 'webapps/glm-analysis/webapp.json'
Apr 9, 2025
04eb5cc
Edited file 'webapps/glm-analysis/webapp.json'
Apr 9, 2025
aadbc12
Edited file 'resource/web_app_setting_list.py'
Apr 9, 2025
7c68bd7
Edited file 'webapps/glm-analysis/webapp.json'
Apr 9, 2025
79f9345
Edited file 'webapps/glm-analysis/webapp.json'
Apr 9, 2025
89037f3
Edited file 'python-lib/dku_visual_ml/dku_train_model_config.py'
Apr 9, 2025
22ffa6d
Edited file 'python-lib/dku_visual_ml/dku_train_model_config.py'
Apr 9, 2025
7584dde
Edited file 'python-lib/backend/fetch_api.py'
Apr 9, 2025
cf781e9
Edited file 'python-lib/backend/fetch_api.py'
Apr 9, 2025
fae95c1
Edited file 'python-lib/backend/fetch_api.py'
Apr 9, 2025
7971f7c
Edited file 'python-lib/backend/fetch_api.py'
Apr 9, 2025
6ed9d14
Edited file 'python-lib/dku_visual_ml/dku_train_model_config.py'
Apr 9, 2025
051fdb6
Edited file 'python-lib/dku_visual_ml/dku_train_model_config.py'
Apr 9, 2025
e54157b
Edited file 'python-lib/dku_visual_ml/dku_train_model_config.py'
Apr 9, 2025
03276f1
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 9, 2025
205c8e8
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 9, 2025
a879dcc
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 9, 2025
20e30c8
Edited file 'python-lib/dku_visual_ml/dku_model_trainer.py'
Apr 9, 2025
2083afc
Edited file 'python-lib/backend/dataiku_api.py'
Apr 9, 2025
5aaf881
send webapp id from the front end
david-behar Apr 10, 2025
259b6dd
working front
david-behar Apr 10, 2025
920756f
Edited file 'python-lib/backend/fetch_api.py'
Apr 10, 2025
1f8a169
Edited file 'python-lib/backend/fetch_api.py'
Apr 10, 2025
736f870
Edited file 'python-lib/backend/fetch_api.py'
Apr 10, 2025
69a0e70
Created file 'python-lib/processors'
Apr 10, 2025
66c9646
Removed 'python-lib/processors'
Apr 10, 2025
9d60afd
Created file 'python-lib/processors'
Apr 10, 2025
c3e930e
Created file 'python-lib/processors/processors.py'
Apr 10, 2025
67968e2
Edited file 'python-lib/processors/processors.py'
Apr 10, 2025
1ab3916
Edited file 'python-lib/processors/processors.py'
Apr 10, 2025
20a3328
Edited file 'python-lib/processors/processors.py'
Apr 10, 2025
8132765
Edited file 'python-lib/processors/processors.py'
Apr 10, 2025
099ffb8
Edited file 'webapps/glm-analysis/webapp.json'
Apr 25, 2025
cd0334a
working key value
david-behar Jun 6, 2025
5aa0aac
Merge pull request #103 from dataiku/feature/225463-class-settings-as…
david-behar Jun 6, 2025
c48f4ed
move to multiple tabs
david-behar Jun 6, 2025
b200960
work in progress
david-behar Jun 12, 2025
508db28
Merge branch 'feature/225471-ux' into feature/237571-scalability
david-behar Jun 12, 2025
6dd9fd3
fix variable level stats
david-behar Jun 12, 2025
491eaa4
Merge pull request #104 from dataiku/feature/237571-scalability
david-behar Jun 13, 2025
2ccc7ce
upgrade ui components
david-behar Jun 17, 2025
90fdc5e
still some work to do to work properly
david-behar Jun 17, 2025
b62bcad
start cleaning
david-behar Jun 17, 2025
881b25e
build front
david-behar Jun 17, 2025
a90f170
Merge branch 'feature/225471-ux' into feature/237571-scalability
david-behar Jun 17, 2025
ebfa470
better management of model cache
david-behar Jun 17, 2025
e98b5d5
backend refactoring
david-behar Jun 19, 2025
a35fad8
store for each page, refactoring
david-behar Jun 20, 2025
a0a2390
fix base value retrieval
david-behar Jun 23, 2025
155046a
Merge pull request #105 from dataiku/feature/225471-ux
david-behar Jun 23, 2025
198a2fc
remove calculate relativity recipe
david-behar Jun 23, 2025
b0e09ed
update versions
david-behar Jun 23, 2025
b00c517
intereactions in release notes
david-behar Jun 23, 2025
cd0defd
add deployment capability
david-behar Jun 26, 2025
2d71ff6
Merge pull request #107 from dataiku/feature/237827-deploy-model
david-behar Jun 26, 2025
865fe81
improved training screen and adding deletion, not fully functional
david-behar Jul 4, 2025
3aa26e0
fix train button
david-behar Jul 4, 2025
6c61a14
add pvalue
david-behar Aug 6, 2025
9ae3346
fix train/test
david-behar Aug 6, 2025
916ba3e
add chart rescaling and distribution
david-behar Aug 7, 2025
5261f8c
with a create chart button
david-behar Aug 8, 2025
f842d0c
design progress
david-behar Aug 8, 2025
95f6c4c
lift chart
david-behar Aug 8, 2025
7479d8e
link to the analysis
david-behar Aug 8, 2025
b91a5e3
numeric categorical toggle with inference
david-behar Aug 11, 2025
2d11752
some more fixes
david-behar Aug 12, 2025
e4f2196
update names
david-behar Aug 13, 2025
945e9ef
centralize all the loading
david-behar Aug 13, 2025
d96e303
update models simplification
david-behar Aug 13, 2025
8da1bad
fix delete and dependencies between tabs
david-behar Aug 13, 2025
7654003
a few more fixes
david-behar Aug 13, 2025
562dc46
Merge pull request #108 from dataiku/feature/225471-ux
david-behar Aug 13, 2025
c3dd7ab
fix update when delete
david-behar Aug 13, 2025
af87878
code review resolution
david-behar Aug 18, 2025
bbf2918
Merge pull request #109 from dataiku/feature/260508-code-review-resol…
david-behar Aug 18, 2025
1f0c9ca
fix Variable Model config
david-behar Aug 18, 2025
3882039
Merge pull request #110 from dataiku/feature/260524-training-screen-d…
david-behar Aug 18, 2025
9b35f24
fix loading when getting existing model config
david-behar Aug 19, 2025
a17d305
Add separator for tabs
guidataiku Aug 20, 2025
d23988c
Merge pull request #111 from dataiku/feature/dss14-sc-260241-add-the-…
guidataiku Aug 20, 2025
e2ff5b7
Add title in header
guidataiku Aug 20, 2025
70c28ae
add dist
guidataiku Aug 20, 2025
820731c
Merge pull request #112 from dataiku/feature/dss14-sc-260240-titles-f…
guidataiku Aug 20, 2025
bf0dcf9
a few changes but doesnt work
david-behar Aug 20, 2025
7306f7f
Fix black background, change padding and marging from app to each com…
guidataiku Aug 20, 2025
fb0d810
Merge pull request #113 from dataiku/feature/260609-model-management-…
guidataiku Aug 20, 2025
b17be0f
Add fix
guidataiku Aug 20, 2025
72e629b
add log
guidataiku Aug 20, 2025
1abf961
add log
guidataiku Aug 20, 2025
b25f962
remove os.environ project key setup
guidataiku Aug 20, 2025
dd0750c
Merge pull request #114 from dataiku/feature/dss14-sc-261019-fix-weba…
guidataiku Aug 20, 2025
61c2ca0
fix new analysis
david-behar Aug 21, 2025
d9fb279
Merge branch 'release/glm-2.0.0' into bug/260983-two-models
david-behar Aug 21, 2025
51f6808
Merge pull request #115 from dataiku/bug/260983-two-models
david-behar Aug 21, 2025
260b2f2
Change first tab ui
guidataiku Aug 21, 2025
1348dc8
Merge pull request #116 from dataiku/feature/dss14-sc-261164-fix-mode…
guidataiku Aug 21, 2025
46bf897
Fix left panel
guidataiku Aug 21, 2025
3f87c3d
Merge pull request #117 from dataiku/feature/dss14-sc-261195-fix-2nd-…
guidataiku Aug 21, 2025
cd897ad
Fix left panel
guidataiku Aug 22, 2025
f46b55a
add dist
guidataiku Aug 22, 2025
70d743d
Merge pull request #118 from dataiku/feature/dss14-sc-261218-fix-lift…
guidataiku Aug 22, 2025
ff94284
Add sticky button for onewayvariabletab
guidataiku Aug 22, 2025
67663ba
Merge pull request #119 from dataiku/feature/dss14-sc-261281-fix-butt…
guidataiku Aug 22, 2025
80cc467
ajust scroll in the scrollbox
guidataiku Aug 22, 2025
0bce0a9
Merge pull request #120 from dataiku/feature/dss14-sc-261281-fix-butt…
guidataiku Aug 22, 2025
bd2ee52
default to Poisson
david-behar Aug 22, 2025
dfb726d
enfore allowed links
david-behar Aug 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,14 @@ tests/allure_report
__pycache__/
*.py[cod]
*$py.class

*.wlock
# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
Expand Down
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# Changelog

## [Version 2.0.0] - New Feature Release - 2025-09

* Visual Webapp to train GLMs and assess their fit
* Removal of the custom Model View
* Interactions added to the GLM in the Visual ML

## [Version 1.1.1] - Bugfix Release - 2024-02

* Update the PredictionModelHandler interface

## [Version 1.1.0] - Upgrade Release - 2024-01

* Switched the GLM library from statsmodels to glum
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ integration-tests:
source env/bin/activate; \
pip install --upgrade pip;\
pip install --no-cache-dir -r tests/python/integration/requirements.txt; \
pytest tests/python/integration --alluredir=tests/allure_report || ret=$$?; exit $$ret \
pytest tests/python/integration --exclude-dss-targets="DSS11" --alluredir=tests/allure_report || ret=$$?; exit $$ret \
)

tests: unit-tests integration-tests
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ and its accuracy therefore is not able to be guaranteed. The regression spline c
# Components

The plugin contains the following components:
- Visual Webapp to train GLM and assess their fit using One-Way Variable Charts, Variable-Level Stats and Lift Charts
- Generalized Linear Model Regression to run GLM regression inside the visual ML interface
- Generalized Linear Model Classification to run GLM binary classification inside the visual ML interface
- Actual vs Expected view inside the Visual Analysis and the Deployed Model windows
- Regression Spline Prepare step to compute B-Spline basis row by row
- Regression Spline Recipe to compute B-spline basis in a separate recipe

Expand Down
7 changes: 7 additions & 0 deletions code-env/python/spec/requirements.dev.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Flask>=0.9
pandas>=0.23.0
numpy<1.24
python-dotenv==0.19.0
webaiku @ git+https://github.com/dataiku/solutions-contrib.git@main#egg=webaiku&subdirectory=bs-infra
dataiku-api-client>=11.0.0
https://design.solutions.dataiku-dss.io/public/packages/dataiku-internal-client.tar.gz
7 changes: 5 additions & 2 deletions code-env/python/spec/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
dash==2.14.2
dash_bootstrap_components==1.5.0
dash_bootstrap_components
scipy==1.11.4; python_version=='3.9'
scipy==1.10.1; python_version=='3.8'
scipy==1.7.3; python_version=='3.7'
Expand All @@ -12,4 +12,7 @@ glum==2.5.0; python_version=='3.7'
patsy==0.5.4; python_version=='3.9'
patsy==0.5.3; python_version<'3.9'
cloudpickle==1.5.0
urllib3<2
urllib3<2
statsmodels
python-dotenv==0.19.0
# webaiku @ git+https://github.com/dataiku/solutions-contrib.git@main#egg=webaiku&subdirectory=bs-infra
2 changes: 1 addition & 1 deletion custom-recipes/regression-splines/recipe.json
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,4 @@

"resourceKeys": []

}
}
2 changes: 1 addition & 1 deletion custom-recipes/regression-splines/recipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,4 @@
output_dataset_df = regression_splines.run_spline_creation(df)

# Write recipe outputs
output_dataset.write_with_schema(output_dataset_df)
output_dataset.write_with_schema(output_dataset_df)
4 changes: 2 additions & 2 deletions plugin.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
"id": "generalized-linear-models",
"version": "1.1.0",
"version": "2.0.0",
"meta": {
"label": "Generalized Linear Models",
"description": "Train and deploy Generalized Linear Models",
"author": "Dataiku (Matthew Galloway, David Behar, Nicolas Vallée)",
"author": "Dataiku",
"icon": "icon-bullseye",
"tags": ["Machine Learning"],
"url": "https://www.dataiku.com/product/plugins/glm",
Expand Down
Empty file added python-lib/backend/__init__.py
Empty file.
161 changes: 161 additions & 0 deletions python-lib/backend/api_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
import re
from flask import current_app
from logging_assist.logging import logger
from model_cache.model_conformity_checker import ModelConformityChecker
from .dataiku_api import dataiku_api
import pandas as pd
from dku_visual_ml.dku_model_retrival import VisualMLModelRetriver
from glm_handler.dku_relativites_calculator import RelativitiesCalculator
from chart_formatters.variable_level_stats import VariableLevelStatsFormatter

def format_models(global_dku_mltask):
logger.info("Formatting Models")
model_id_pattern = r'\((.*?)\)'
mcc = ModelConformityChecker()

list_ml_id = global_dku_mltask.get_trained_models_ids()
project_key = dataiku_api.default_project.project_key
ml_task_id = global_dku_mltask.mltask_id
analysis_id = global_dku_mltask.analysis_id
models = []
for ml_id in list_ml_id:
model_details = global_dku_mltask.get_trained_model_details(ml_id)
is_conform = mcc.check_model_conformity(ml_id)
if is_conform:
model_name = model_details.get_user_meta()['name']
matches = re.findall(model_id_pattern, model_name)
date = [v['value'] for v in model_details.get_user_meta()['labels'] if v['key'] == 'model:date'][0]
models.append({"id": ml_id, "name": matches[0], "date": date, "project_key": project_key, "ml_task_id": ml_task_id, "analysis_id": analysis_id})
else:
current_app.logger.info(f"model {ml_id} is not conform")
return models

def np_encode(obj):
if isinstance(obj, np.int64):
return int(obj)
return obj


def natural_sort_key(s):
import re
return [int(c) if c.isdigit() else c.lower() for c in re.split(r'(\d+)', str(s))]

def calculate_base_levels(df, exposure_column=None):
cols_json = []
# Sort the columns using natural sorting
sorted_columns = sorted(df.columns, key=natural_sort_key)

for col in sorted_columns:
if col == exposure_column:
continue

# Determine if the column contains numeric or non-numeric data
is_numeric = pd.api.types.is_numeric_dtype(df[col])

if is_numeric:
options = sorted([str(val) for val in df[col].unique()], key=float)
else:
options = sorted([str(val) for val in df[col].unique()], key=natural_sort_key)

if exposure_column and exposure_column in df.columns:
# Exposure-based calculation
weighted_counts = df.groupby(col)[exposure_column].sum()
base_level = str(weighted_counts.idxmax())
else:
# Original mode-based calculation
base_level = str(df[col].mode().iloc[0])

cols_json.append({
'column': col,
'options': options,
'baseLevel': base_level,
'type': ('numerical' if is_numeric else 'categorical')
})

return cols_json

def get_model_train_set(full_model_id, model_cache, data_handler):
model_retriever = VisualMLModelRetriver(full_model_id)
relativities_calculator = RelativitiesCalculator(data_handler, model_retriever)
train_set = relativities_calculator.train_set
return train_set

def get_model_test_set(full_model_id, model_cache, data_handler):
model_retriever = VisualMLModelRetriver(full_model_id)
relativities_calculator = RelativitiesCalculator(data_handler, model_retriever)
test_set = relativities_calculator.test_set
return test_set

def get_model_base_values_modalities_types(full_model_id, model_cache, data_handler):
creation_args = {"data_handler": data_handler,
"model_cache": model_cache,
"full_model_id": full_model_id}
train_set = model_cache.get_or_create_cached_item(full_model_id, 'train_set', get_model_train_set, **creation_args)
test_set = model_cache.get_or_create_cached_item(full_model_id, 'test_set', get_model_test_set, **creation_args)
model_retriever = VisualMLModelRetriver(full_model_id)
relativities_calculator = RelativitiesCalculator(data_handler, model_retriever, train_set, test_set)
base_values = relativities_calculator.get_base_values()
return {'base_values': base_values,
'modalities': relativities_calculator.modalities,
'types': relativities_calculator.variable_types}

def get_model_relativities(full_model_id, model_cache, data_handler):
creation_args = {"data_handler": data_handler,
"model_cache": model_cache,
"full_model_id": full_model_id}
train_set = model_cache.get_or_create_cached_item(full_model_id, 'train_set', get_model_train_set, **creation_args)
test_set = model_cache.get_or_create_cached_item(full_model_id, 'test_set', get_model_test_set, **creation_args)
base_values_modalities_types = model_cache.get_or_create_cached_item(full_model_id, 'base_values_modalities_types', get_model_base_values_modalities_types, **creation_args)
base_values = base_values_modalities_types['base_values']
modalities = base_values_modalities_types['modalities']
variable_types = base_values_modalities_types['types']
model_retriever = VisualMLModelRetriver(full_model_id)
relativities_calculator = RelativitiesCalculator(data_handler, model_retriever, train_set, test_set, base_values=base_values, modalities=modalities, variable_types=variable_types)
relativities = relativities_calculator.get_relativities_df()
relativities_dict = relativities_calculator.relativities
return {'relativities': relativities, 'relativities_dict': relativities_dict}

def get_model_relativities_interaction(full_model_id, model_cache, data_handler):
creation_args = {"data_handler": data_handler,
"model_cache": model_cache,
"full_model_id": full_model_id}
train_set = model_cache.get_or_create_cached_item(full_model_id, 'train_set', get_model_train_set, **creation_args)
test_set = model_cache.get_or_create_cached_item(full_model_id, 'test_set', get_model_test_set, **creation_args)
base_values_modalities_types = model_cache.get_or_create_cached_item(full_model_id, 'base_values_modalities_types', get_model_base_values_modalities_types, **creation_args)
base_values = base_values_modalities_types['base_values']
modalities = base_values_modalities_types['modalities']
variable_types = base_values_modalities_types['types']
model_retriever = VisualMLModelRetriver(full_model_id)
relativities_calculator = RelativitiesCalculator(data_handler, model_retriever, train_set, test_set, base_values=base_values, modalities=modalities, variable_types=variable_types)
relativities_interaction = relativities_calculator.get_relativities_interactions_df()
return relativities_interaction

def get_model_variable_level_stats(full_model_id, model_cache, data_handler):
creation_args = {"data_handler": data_handler,
"model_cache": model_cache,
"full_model_id": full_model_id}
train_set = model_cache.get_or_create_cached_item(full_model_id, 'train_set', get_model_train_set, **creation_args)
test_set = model_cache.get_or_create_cached_item(full_model_id, 'test_set', get_model_test_set, **creation_args)
relativities = model_cache.get_or_create_cached_item(full_model_id, 'relativities', get_model_relativities, **creation_args)['relativities']
relativities_interaction = model_cache.get_or_create_cached_item(full_model_id, 'relativities_interaction', get_model_relativities_interaction, **creation_args)
base_values_modalities_types = model_cache.get_or_create_cached_item(full_model_id, 'base_values_modalities_types', get_model_base_values_modalities_types, **creation_args)
base_values = base_values_modalities_types['base_values']
model_retriever = VisualMLModelRetriver(full_model_id)
variable_level_stats = VariableLevelStatsFormatter(model_retriever, data_handler, relativities, relativities_interaction, base_values, train_set, test_set)
variable_stats = variable_level_stats.get_variable_level_stats()
return variable_stats

def get_model_predicted_base(full_model_id, model_cache, data_handler, variable):
creation_args = {"data_handler": data_handler,
"model_cache": model_cache,
"full_model_id": full_model_id}
train_set = model_cache.get_or_create_cached_item(full_model_id, 'train_set', get_model_train_set, **creation_args)
test_set = model_cache.get_or_create_cached_item(full_model_id, 'test_set', get_model_test_set, **creation_args)
base_values_modalities_types = model_cache.get_or_create_cached_item(full_model_id, 'base_values_modalities_types', get_model_base_values_modalities_types, **creation_args)
base_values = base_values_modalities_types['base_values']
modalities = base_values_modalities_types['modalities']
variable_types = base_values_modalities_types['types']
model_retriever = VisualMLModelRetriver(full_model_id)
relativities_calculator = RelativitiesCalculator(data_handler, model_retriever, train_set, test_set, base_values, modalities, variable_types)
predicted_base_variable = relativities_calculator.get_formated_predicted_base_variable(variable)
return predicted_base_variable
44 changes: 44 additions & 0 deletions python-lib/backend/dataiku_api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import logging
from typing import Any, Dict
import dataiku
from dataiku.customwebapp import get_webapp_config
import pandas as pd
from dataiku.customrecipe import get_recipe_config
import os
import pwd
from typing import Optional

class DataikuApi:
def __init__(self):
self._webapp_config = None
self._default_project = None
self._default_project_key = None
self._client = dataiku.api_client()

def setup(self, webapp_config: Dict, default_project_key: str):
self._webapp_config = webapp_config
self._default_project_key = default_project_key

@property
def client(self):
if self._client is None:
raise Exception("Please set the client before using it.")
else:
return self._client

@property
def default_project(self):
try:
return self.client.get_default_project()
except Exception as err:
if self._default_project_key:
return self.client.get_project(self._default_project_key)
else:
raise Exception("Please define the default project before using it.")

@property
def plugin_code_env(self):
plugin = self.client.get_plugin('generalized-linear-models')
return plugin.get_settings().get_raw()['codeEnvName']

dataiku_api = DataikuApi()
Loading