-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
This builds off of dead issue #104, but I've talked to @Lewis-Barnett-NOAA and @James-Thorson-NOAA about this, and am not remembering who has been working up model based age comps in assessments this cycle with sdmTMB. But I thought it'd be worth putting together a vignette showing how to do this, and add some examples. Here's a simple non-spatial model
library(sdmTMB)
library(ggplot2)
set.seed(42)
# Parameters for the multinomial distribution
p <- exp(seq(-3.5, 0, length.out=10))
p <- p/sum(p) # normalize to sum to 1
n_categories <- length(p)
# Simulate multinomial data for 10 rows with unequal sample sizes
n_rows <- 10
# Unequal sample sizes (randomly between 50 and 200)
n_trials <- sample(50:200, n_rows, replace = TRUE)
# list to store simulated data -- these are observations per age
simulated_data_list <- list()
for (i in 1:n_rows) {
simulated_data_list[[i]] <- rmultinom(1, size = n_trials[i], prob = p)
}
# Convert the simulated data into a long format data frame
long_data <- do.call(rbind, lapply(1:n_rows, function(i) {
data.frame(
category = paste0("age", 1:n_categories),
count = simulated_data_list[[i]],
total = rep(n_trials[i], n_categories),
row_id = rep(i, n_categories) # Create a row ID for each row of data
)
}))
# Convert row_id to grouping factor and fit the model.
long_data$row_id <- as.factor(long_data$row_id)
fit_sdmTMB <- sdmTMB(count ~ category + (1 | row_id),
family = poisson(),
spatial = "off",
spatiotemporal = "off",
offset = log(long_data$total),
data = long_data)
# Calculate the expected proportions by dividing the expected counts by the total sample size for each row
long_data$expected_proportion <- exp(predict(fit_sdmTMB)$est) / long_data$total
# bring in true_p
p_df <- data.frame(category = paste0("age", 1:n_categories),
true_p = p)
long_data <- dplyr::left_join(long_data, p_df)
#> Joining with `by = join_by(category)`
head(long_data)
#> category count total row_id expected_proportion true_p
#> 1 age1 0 98 1 0.008695652 0.009932609
#> 2 age2 1 98 1 0.014229249 0.014653982
#> 3 age3 5 98 1 0.018181818 0.021619614
#> 4 age4 7 98 1 0.033201581 0.031896295
#> 5 age5 2 98 1 0.041106719 0.047057900
#> 6 age6 6 98 1 0.072727273 0.069426430
ggplot(long_data, aes(expected_proportion, true_p, col = row_id)) +
geom_point() +
geom_abline(aes(intercept=0, slope = 1)) +
theme_bw() +
xlab("Expected proportion") +
ylab("True proportion")
Created on 2025-03-28 with reprex v2.1.1
Estimates from each row are basically right on top of one another, which is why we only see the pink dots here.
Metadata
Metadata
Assignees
Labels
No labels