Add vignette for model based age comps?

This builds off of dead issue https://github.com/pbs-assess/sdmTMB/issues/104, but I've talked to @Lewis-Barnett-NOAA and @James-Thorson-NOAA about this, and am not remembering who has been working up model based age comps in assessments this cycle with sdmTMB. But I thought it'd be worth putting together a vignette showing how to do this, and add some examples. Here's a simple non-spatial model

``` r
library(sdmTMB)
library(ggplot2)
set.seed(42)  

# Parameters for the multinomial distribution
p <- exp(seq(-3.5, 0, length.out=10))
p <- p/sum(p) # normalize to sum to 1
n_categories <- length(p) 

# Simulate multinomial data for 10 rows with unequal sample sizes
n_rows <- 10
# Unequal sample sizes (randomly between 50 and 200)
n_trials <- sample(50:200, n_rows, replace = TRUE)  

# list to store simulated data -- these are observations per age
simulated_data_list <- list()
for (i in 1:n_rows) {
  simulated_data_list[[i]] <- rmultinom(1, size = n_trials[i], prob = p)
}

# Convert the simulated data into a long format data frame
long_data <- do.call(rbind, lapply(1:n_rows, function(i) {
  data.frame(
    category = paste0("age", 1:n_categories),
    count = simulated_data_list[[i]],
    total = rep(n_trials[i], n_categories),
    row_id = rep(i, n_categories)  # Create a row ID for each row of data
  )
}))

# Convert row_id to grouping factor and fit the model. 
long_data$row_id <- as.factor(long_data$row_id)
fit_sdmTMB <- sdmTMB(count ~ category + (1 | row_id), 
                        family = poisson(), 
                        spatial = "off",
                        spatiotemporal = "off",
                        offset = log(long_data$total),
                        data = long_data)

# Calculate the expected proportions by dividing the expected counts by the total sample size for each row
long_data$expected_proportion <- exp(predict(fit_sdmTMB)$est) / long_data$total

# bring in true_p
p_df <- data.frame(category = paste0("age", 1:n_categories), 
                   true_p = p)
long_data <- dplyr::left_join(long_data, p_df)
#> Joining with `by = join_by(category)`

head(long_data)
#>   category count total row_id expected_proportion      true_p
#> 1     age1     0    98      1         0.008695652 0.009932609
#> 2     age2     1    98      1         0.014229249 0.014653982
#> 3     age3     5    98      1         0.018181818 0.021619614
#> 4     age4     7    98      1         0.033201581 0.031896295
#> 5     age5     2    98      1         0.041106719 0.047057900
#> 6     age6     6    98      1         0.072727273 0.069426430

ggplot(long_data, aes(expected_proportion, true_p, col = row_id)) + 
  geom_point() + 
  geom_abline(aes(intercept=0, slope = 1)) + 
  theme_bw() + 
  xlab("Expected proportion") + 
  ylab("True proportion")
```

![](https://i.imgur.com/7Vuo8o4.png)

<sup>Created on 2025-03-28 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

Estimates from each row are basically right on top of one another, which is why we only see the pink dots here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vignette for model based age comps? #430

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add vignette for model based age comps? #430

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions