10
10
11
11
The ` modelStudio ` package ** automates the Explanatory Analysis of Machine Learning predictive models** . Generate advanced interactive and animated model explanations in the form of a ** serverless HTML site** with only one line of code. This tool is model agnostic, therefore compatible with most of the black box predictive models and frameworks (e.g.  ; ` mlr/mlr3 ` , ` xgboost ` , ` caret ` , ` h2o ` , ` scikit-learn ` , ` lightGBM ` , ` keras/tensorflow ` ).
12
12
13
- The main ` modelStudio() ` function computes various (instance and dataset level) model explanations and produces an ** interactive,  ; customisable dashboard made with D3.js** . It consists of multiple panels for plots with their short descriptions. Easily  ; ** save  ; and  ; share** the dashboard with others. Tools for model exploration unite with tools for EDA (Exploratory Data Analysis) to give a broad overview of the model behavior.
13
+ The main ` modelStudio() ` function computes various (instance and dataset level) model explanations and produces an  ; ** interactive,  ; customisable dashboard made with D3.js** . It consists of multiple panels for plots with their short descriptions. Easily  ; ** save  ; and  ; share** the dashboard with others. Tools for model exploration unite with tools for EDA (Exploratory Data Analysis) to give a broad overview of the model behavior.
14
14
15
15
<!-- - [explain FIFA19](https://pbiecek.github.io/explainFIFA19/)   --->
16
16
<!-- - [explain Lung Cancer](https://github.com/hbaniecki/transparent_xai/)   --->
17
17
&emsp ; &emsp ; &emsp ; &emsp ; &emsp ; &emsp ;
18
18
[ ** explain FIFA20** ] ( https://pbiecek.github.io/explainFIFA20/ ) &emsp ;
19
- [ ** R & Python examples** ] ( http://modelstudio.drwhy.ai/articles/vignette_examples .html ) &emsp ;
19
+ [ ** R & Python examples** ] ( http://modelstudio.drwhy.ai/articles/ms-r-python-examples .html ) &emsp ;
20
20
[ ** More Resources** ] ( http://modelstudio.drwhy.ai/#more-resources ) &emsp ;
21
21
[ ** FAQ & Troubleshooting** ] ( https://github.com/ModelOriented/modelStudio/issues/54 )
22
22
@@ -41,9 +41,7 @@ library("DALEX")
41
41
library(" modelStudio" )
42
42
43
43
# fit a model
44
- model <- glm(survived ~ . ,
45
- data = titanic_imputed ,
46
- family = " binomial" )
44
+ model <- glm(survived ~ . , data = titanic_imputed , family = " binomial" )
47
45
48
46
# create an explainer for the model
49
47
explainer <- explain(model ,
@@ -59,18 +57,18 @@ modelStudio(explainer)
59
57
60
58
![ ] ( man/figures/long.gif )
61
59
62
- ## R & Python Examples [ more] ( http://modelstudio.drwhy.ai/articles/vignette_examples .html )
60
+ ## R & Python Examples [ more] ( http://modelstudio.drwhy.ai/articles/ms-r-python-examples .html )
63
61
64
62
The ` modelStudio() ` function uses ` DALEX ` explainers created with ` DALEX::explain() ` or ` DALEXtra::explain_*() ` .
65
63
66
64
``` r
67
- # update main dependencies
68
- install.packages(" ingredients" )
69
- install.packages(" iBreakDown" )
70
-
71
65
# packages for explainer objects
72
66
install.packages(" DALEX" )
73
67
install.packages(" DALEXtra" )
68
+
69
+ # update main dependencies
70
+ install.packages(" ingredients" )
71
+ install.packages(" iBreakDown" )
74
72
```
75
73
76
74
### mlr [ dashboard] ( https://modeloriented.github.io/modelStudio/mlr.html )
@@ -87,19 +85,16 @@ data <- DALEX::titanic_imputed
87
85
88
86
# split the data
89
87
index <- sample(1 : nrow(data ), 0.7 * nrow(data ))
90
- train <- data [index , ]
91
- test <- data [- index , ]
88
+ train <- data [index ,]
89
+ test <- data [- index ,]
92
90
93
91
# mlr ClassifTask takes target as factor
94
92
train $ survived <- as.factor(train $ survived )
95
93
96
94
# fit a model
97
- task <- makeClassifTask(id = " titanic" ,
98
- data = train ,
99
- target = " survived" )
95
+ task <- makeClassifTask(id = " titanic" , data = train , target = " survived" )
100
96
101
- learner <- makeLearner(" classif.ranger" ,
102
- predict.type = " prob" )
97
+ learner <- makeLearner(" classif.ranger" , predict.type = " prob" )
103
98
104
99
model <- train(learner , task )
105
100
@@ -110,7 +105,7 @@ explainer <- explain_mlr(model,
110
105
label = " mlr" )
111
106
112
107
# pick observations
113
- new_observation <- test [1 : 2 , ]
108
+ new_observation <- test [1 : 2 ,]
114
109
rownames(new_observation ) <- c(" id1" , " id2" )
115
110
116
111
# make a studio for the model
@@ -132,17 +127,18 @@ data <- DALEX::titanic_imputed
132
127
133
128
# split the data
134
129
index <- sample(1 : nrow(data ), 0.7 * nrow(data ))
135
- train <- data [index , ]
136
- test <- data [- index , ]
130
+ train <- data [index ,]
131
+ test <- data [- index ,]
137
132
138
133
train_matrix <- model.matrix(survived ~ . - 1 , train )
139
134
test_matrix <- model.matrix(survived ~ . - 1 , test )
140
135
141
136
# fit a model
142
137
xgb_matrix <- xgb.DMatrix(train_matrix , label = train $ survived )
143
- params <- list (eta = 0.01 , subsample = 0.6 , max_depth = 7 , min_child_weight = 3 ,
144
- objective = " binary:logistic" , eval_metric = " auc" )
145
- model <- xgb.train(params , xgb_matrix , nrounds = 1000 )
138
+
139
+ params <- list (max_depth = 7 , objective = " binary:logistic" , eval_metric = " auc" )
140
+
141
+ model <- xgb.train(params , xgb_matrix , nrounds = 500 )
146
142
147
143
# create an explainer for the model
148
144
explainer <- explain(model ,
@@ -151,7 +147,7 @@ explainer <- explain(model,
151
147
label = " xgboost" )
152
148
153
149
# pick observations
154
- new_observation <- test_matrix [1 : 2 ,, drop = FALSE ]
150
+ new_observation <- test_matrix [1 : 2 , , drop = FALSE ]
155
151
rownames(new_observation ) <- c(" id1" , " id2" )
156
152
157
153
# make a studio for the model
@@ -170,6 +166,11 @@ pip3 install dalex --force
170
166
171
167
Use ` pickle ` Python module and ` reticulate ` R package to easily make a studio for a model.
172
168
169
+ ``` {r eval = FALSE}
170
+ # package for pickle load
171
+ install.packages("reticulate")
172
+ ```
173
+
173
174
In this example we will fit a ` Pipeline MLPClassifier ` model on ` titanic ` data.
174
175
175
176
First, use ` dalex ` in Python:
@@ -193,45 +194,47 @@ y = data.survived
193
194
X_train, X_test, y_train, y_test = train_test_split(X, y)
194
195
195
196
# fit a pipeline model
196
- numeric_features = [' age' , ' fare' , ' sibsp' , ' parch' ]
197
- numeric_transformer = Pipeline(
197
+ numerical_features = [' age' , ' fare' , ' sibsp' , ' parch' ]
198
+ numerical_transformer = Pipeline(
198
199
steps = [
199
200
(' imputer' , SimpleImputer(strategy = ' median' )),
200
201
(' scaler' , StandardScaler())
201
- ]
202
+ ]
202
203
)
203
204
categorical_features = [' gender' , ' class' , ' embarked' ]
204
205
categorical_transformer = Pipeline(
205
206
steps = [
206
207
(' imputer' , SimpleImputer(strategy = ' constant' , fill_value = ' missing' )),
207
208
(' onehot' , OneHotEncoder(handle_unknown = ' ignore' ))
208
- ]
209
+ ]
209
210
)
210
211
211
212
preprocessor = ColumnTransformer(
212
213
transformers = [
213
- (' num' , numeric_transformer, numeric_features ),
214
+ (' num' , numerical_transformer, numerical_features ),
214
215
(' cat' , categorical_transformer, categorical_features)
215
- ]
216
+ ]
216
217
)
217
218
219
+ classifier = MLPClassifier(hidden_layer_sizes = (150 ,100 ,50 ), max_iter = 500 )
220
+
218
221
model = Pipeline(
219
222
steps = [
220
223
(' preprocessor' , preprocessor),
221
- (' classifier' , MLPClassifier( hidden_layer_sizes = ( 150 , 100 , 50 ), max_iter = 500 ) )
222
- ]
224
+ (' classifier' , classifier )
225
+ ]
223
226
)
224
227
model.fit(X_train, y_train)
225
228
226
229
# create an explainer for the model
227
- explainer = dx.Explainer(model, X_test, y_test, label = ' scikit-learn' )
230
+ explainer = dx.Explainer(model, data = X_test, y = y_test, label = ' scikit-learn' )
228
231
229
232
# ! remove residual_function before dump !
230
233
explainer.residual_function = None
231
234
232
235
# pack the explainer into a pickle file
233
236
import pickle
234
- pickle_out = open (" explainer_scikitlearn.pickle" , " wb " )
237
+ pickle_out = open (' explainer_scikitlearn.pickle' , ' wb ' )
235
238
pickle.dump(explainer, pickle_out)
236
239
pickle_out.close()
237
240
```
@@ -241,7 +244,7 @@ Then, use `modelStudio` in R:
241
244
``` r
242
245
# load the explainer from the pickle file
243
246
library(reticulate )
244
- explainer <- py_load_object(' explainer_scikitlearn.pickle' , pickle = " pickle" )
247
+ explainer <- py_load_object(" explainer_scikitlearn.pickle" , pickle = " pickle" )
245
248
246
249
# make a studio for the model
247
250
library(modelStudio )
@@ -261,9 +264,9 @@ or with [`r2d3::save_d3_html()`](https://rstudio.github.io/r2d3/articles/publish
261
264
262
265
- Theoretical introduction to the plots: [ Explanatory Model Analysis. Explore, Explain and Examine Predictive Models.] ( https://pbiecek.github.io/ema )
263
266
264
- - Vignette: [ modelStudio - R & python examples] ( https://modeloriented.github.io/modelStudio/articles/vignette_examples .html )
267
+ - Vignette: [ modelStudio - R & Python examples] ( https://modeloriented.github.io/modelStudio/articles/ms-r-python-examples .html )
265
268
266
- - Vignette: [ modelStudio - perks and features] ( https://modeloriented.github.io/modelStudio/articles/vignette_modelStudio .html )
269
+ - Vignette: [ modelStudio - perks and features] ( https://modeloriented.github.io/modelStudio/articles/ms-perks-features .html )
267
270
268
271
- Conference poster: [ MLinPL2019] ( misc/MLinPL2019_modelStudio_poster.pdf )
269
272
0 commit comments