You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/smcfcs-vignette.Rmd
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ When multiple variables are affected by missingness, the traditional approach to
21
21
22
22
## Imputation model compatibility
23
23
24
-
When missing values are imputed from a misspecified model, in general invalid inferences will result. One way in which misspecification can occur is when the imputation and substantive (analysis) model of interest are incompatible. Loosely speaking, this means there exists no joint model which contains the imputation model and the substantive model as the corresponding conditionals. In this case, as described by <ahref="http://doi.org/10.1177/0962280214521348">Bartlett *et al* (2015)</a>, assuming that the substantive model is correctly specified, unless the imputation and substantive models can be made compatible by imposing a restriction on the imputation model, incompatiblity implies the imputation model is misspecified.
24
+
When missing values are imputed from a misspecified model, in general invalid inferences will result. One way in which misspecification can occur is when the imputation and substantive (analysis) model of interest are incompatible. Loosely speaking, this means there exists no joint model which contains the imputation model and the substantive model as the corresponding conditionals. In this case, as described by <ahref="http://doi.org/10.1177/0962280214521348">Bartlett *et al* (2015)</a>, assuming that the substantive model is correctly specified, unless the imputation and substantive models can be made compatible by imposing a restriction on the imputation model, incompatibility implies the imputation model is misspecified.
25
25
26
26
Such incompatibility between the imputation model used to impute a partially observed covariate and the substantive/outcome model can arise for example when the latter includes interactions or non-linear effects of variables. A further example is when the substantive model is a Cox proportional hazards model for a censored time to event outcome. In these cases, it may be difficult or impossible to specify an imputation model for a covariate which is compatible with the model for the outcome (the substantive model) using standard imputation models as available in existing packages.
27
27
@@ -43,7 +43,7 @@ In certain situations it may be advantageous to use SMC-FCS rather than traditio
43
43
The `smcfcs` function in the `smcfcs` package implements the SMC-FCS procedure. Currently linear, logistic and Cox proportional hazards substantive models. Competing risks outcome data can also be accommodated, with a Cox proportional hazards model used to model each cause specific hazard function. Partially observed variables can be imputed using normal linear regression, logistic regression (for binary variables), proportional odds regression (sometimes known as ordinal logistic regression, suitable for ordered categorical variables), multinomial logistic regression (for unordered categorical variables), and Poisson regression (for count variables). In the following we describe some of the important aspects of using `smcfcs` by way of an example data frame.
44
44
45
45
## Example - linear regression substantive model with quadratic covariate effects
46
-
To illustrate the package, we use the simple example data frame `ex_linquad`, which is included with the package. This data frame was simulated for `n=1000` independent rows. For each row, variables `y,x,z,v` were intended to be collected, but there are missing values in `x`. The values have been made artifically missing, with the probability of missingness dependent on (the fully observed) `y` variable. Below the first 10 rows of the data frame are shown:
46
+
To illustrate the package, we use the simple example data frame `ex_linquad`, which is included with the package. This data frame was simulated for `n=1000` independent rows. For each row, variables `y,x,z,v` were intended to be collected, but there are missing values in `x`. The values have been made artificially missing, with the probability of missingness dependent on (the fully observed) `y` variable. Below the first 10 rows of the data frame are shown:
47
47
48
48
```{r}
49
49
library(smcfcs)
@@ -118,7 +118,7 @@ summary(MIcombine(models))
118
118
Sometimes when running `smcfcs` you may receive warnings that the rejection sampling that `smcfcs` uses has failed to draw from the required distribution on a couple of occasions. Upon receiving this warning, it is generally good idea to re-run `smcfcs`, specifying a value for `rjlimit` which is larger than the default, until the warning is no longer issued. Having said that, when only a small number of warnings are issued, it may be fine to ignore the warnings, especially when the dataset is large.
119
119
120
120
## Assessing convergence
121
-
Like standard chained equations or FCS imputation, the SMC-FCS algorithm must be run for a sufficient number of iterations for the process to converge to its stationary distribution. The default number of iterations used is 10, but this may not be sufficient in any given dataset and model specifcication. To assess convergence, the object returned by `smcfcs` includes an object called `smCoefIter`. This matrix contains the parameter estimates of the substantive model, and is indexed by imputation number, parameter number, and iteration number. To assess convergence, one can call smcfcs with `m=1` and `numit` suitably chosen (e.g. `numit=100`). The values in the resulting smCoefIter matrix can then be plotted to assess convergence. To illustrate, we re-run the imputation model used previously with the example data, but asking for only `m=1` imputation to be generated, and with 100 iterations.
121
+
Like standard chained equations or FCS imputation, the SMC-FCS algorithm must be run for a sufficient number of iterations for the process to converge to its stationary distribution. The default number of iterations used is 10, but this may not be sufficient in any given dataset and model specification To assess convergence, the object returned by `smcfcs` includes an object called `smCoefIter`. This matrix contains the parameter estimates of the substantive model, and is indexed by imputation number, parameter number, and iteration number. To assess convergence, one can call smcfcs with `m=1` and `numit` suitably chosen (e.g. `numit=100`). The values in the resulting smCoefIter matrix can then be plotted to assess convergence. To illustrate, we re-run the imputation model used previously with the example data, but asking for only `m=1` imputation to be generated, and with 100 iterations.
122
122
123
123
```{r, fig.width = 6, fig.height = 4}
124
124
# impute once with a larger number of iterations than the default 10
Copy file name to clipboardExpand all lines: vignettes/smcfcs_coverror-vignette.Rmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ vignette: >
9
9
%\usepackage[utf8]{inputenc}
10
10
---
11
11
12
-
This short vignette introduces the capabilites of `smcfcs` to accommodate classical covariate measurement error. We consider the cases where internal validation data and then internal replication data are available.
12
+
This short vignette introduces the capabilities of `smcfcs` to accommodate classical covariate measurement error. We consider the cases where internal validation data and then internal replication data are available.
13
13
14
14
#Validation data
15
15
We will simulate a dataset with internal validation data where the true covariate (x) is observed for 10\% of the sample, while every subject has an error-prone measurement (w) observed:
0 commit comments