Skip to content

Add vignette for model based age comps? #430

@ericward-noaa

Description

@ericward-noaa

This builds off of dead issue #104, but I've talked to @Lewis-Barnett-NOAA and @James-Thorson-NOAA about this, and am not remembering who has been working up model based age comps in assessments this cycle with sdmTMB. But I thought it'd be worth putting together a vignette showing how to do this, and add some examples. Here's a simple non-spatial model

library(sdmTMB)
library(ggplot2)
set.seed(42)  

# Parameters for the multinomial distribution
p <- exp(seq(-3.5, 0, length.out=10))
p <- p/sum(p) # normalize to sum to 1
n_categories <- length(p) 

# Simulate multinomial data for 10 rows with unequal sample sizes
n_rows <- 10
# Unequal sample sizes (randomly between 50 and 200)
n_trials <- sample(50:200, n_rows, replace = TRUE)  

# list to store simulated data -- these are observations per age
simulated_data_list <- list()
for (i in 1:n_rows) {
  simulated_data_list[[i]] <- rmultinom(1, size = n_trials[i], prob = p)
}

# Convert the simulated data into a long format data frame
long_data <- do.call(rbind, lapply(1:n_rows, function(i) {
  data.frame(
    category = paste0("age", 1:n_categories),
    count = simulated_data_list[[i]],
    total = rep(n_trials[i], n_categories),
    row_id = rep(i, n_categories)  # Create a row ID for each row of data
  )
}))

# Convert row_id to grouping factor and fit the model. 
long_data$row_id <- as.factor(long_data$row_id)
fit_sdmTMB <- sdmTMB(count ~ category + (1 | row_id), 
                        family = poisson(), 
                        spatial = "off",
                        spatiotemporal = "off",
                        offset = log(long_data$total),
                        data = long_data)

# Calculate the expected proportions by dividing the expected counts by the total sample size for each row
long_data$expected_proportion <- exp(predict(fit_sdmTMB)$est) / long_data$total

# bring in true_p
p_df <- data.frame(category = paste0("age", 1:n_categories), 
                   true_p = p)
long_data <- dplyr::left_join(long_data, p_df)
#> Joining with `by = join_by(category)`

head(long_data)
#>   category count total row_id expected_proportion      true_p
#> 1     age1     0    98      1         0.008695652 0.009932609
#> 2     age2     1    98      1         0.014229249 0.014653982
#> 3     age3     5    98      1         0.018181818 0.021619614
#> 4     age4     7    98      1         0.033201581 0.031896295
#> 5     age5     2    98      1         0.041106719 0.047057900
#> 6     age6     6    98      1         0.072727273 0.069426430

ggplot(long_data, aes(expected_proportion, true_p, col = row_id)) + 
  geom_point() + 
  geom_abline(aes(intercept=0, slope = 1)) + 
  theme_bw() + 
  xlab("Expected proportion") + 
  ylab("True proportion")

Created on 2025-03-28 with reprex v2.1.1

Estimates from each row are basically right on top of one another, which is why we only see the pink dots here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions