| Title: | Continuous Norming |
|---|---|
| Description: | A toolbox for continuous norming of psychological and educational tests, supporting regression-based norming where norms can vary as a continuous function of age or another norm predictor. Norms are estimated using Generalized Additive Models for Location, Scale, and Shape (GAMLSS), enabling flexible modelling of the full score distribution in a normative sample. The package supports applications in psychometrics and psychological testing, and includes functions for model selection, reliability estimation, norm calculation, including confidence intervals, and sample size planning. For more details, see Timmerman et al. (2021) <doi:10.1037/met0000348>. |
| Authors: | Klazien de Vries [aut] (ORCID: <https://orcid.org/0009-0007-9302-1562>), Hannah Heister [aut] (ORCID: <https://orcid.org/0009-0001-1512-5549>), Julian Urban [aut] (ORCID: <https://orcid.org/0000-0001-8886-4724>), Lieke Voncken [ctb] (ORCID: <https://orcid.org/0000-0002-6710-271X>), Marieke Timmerman [aut, cre] (ORCID: <https://orcid.org/0000-0003-3480-5918>) |
| Maintainer: | Marieke Timmerman <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-06-09 06:32:37 UTC |
| Source: | https://github.com/cran/normref |
centiles_bin() plots centile curves and the sample data for
binomial-type distributions (see gamlss::.gamlss.bi.list) based on a
fitted GAMLSS object.
centiles_bin( model, xvar, cent = c(0.4, 2, 10, 25, 50, 75, 90, 98, 99.6), legend = TRUE, ylab = "y", xlab = "x", main = NULL, main.gsub = "@", xleg = min(xvar), yleg = max(model$y), xlim = range(xvar), ylim = range(model$y), save = FALSE, plot = TRUE, points = TRUE, pch = 15, cex = 0.5, col = "grey", col.centiles = seq_along(cent) + 2, lty.centiles = 1, lwd.centiles = 1, colors = "rainbow", ... )centiles_bin( model, xvar, cent = c(0.4, 2, 10, 25, 50, 75, 90, 98, 99.6), legend = TRUE, ylab = "y", xlab = "x", main = NULL, main.gsub = "@", xleg = min(xvar), yleg = max(model$y), xlim = range(xvar), ylim = range(model$y), save = FALSE, plot = TRUE, points = TRUE, pch = 15, cex = 0.5, col = "grey", col.centiles = seq_along(cent) + 2, lty.centiles = 1, lwd.centiles = 1, colors = "rainbow", ... )
model |
a GAMLSS fitted model, for example the result of |
xvar |
the unique explanatory variable |
cent |
a vector with elements the % centile values for which the centile curves have to be evaluated |
legend |
whether a legend is required in the plot or not, the default is |
ylab |
the y-variable label |
xlab |
the x-variable label |
main |
the main title here as character. If NULL the default title "centile curves using NO" (or the relevant distributions name) is shown |
main.gsub |
if the |
xleg |
position of the legend in the x-axis |
yleg |
position of the legend in the y-axis |
xlim |
the limits of the x-axis |
ylim |
the limits of the y-axis |
save |
whether to save the sample percentages or not with default equal to |
plot |
whether to plot the centiles. This option is useful for |
points |
whether the data points should be plotted, default is |
pch |
the character to be used as the default in plotting points see |
cex |
size of character see |
col |
plotting colour see |
col.centiles |
Plotting colours for the centile curves |
lty.centiles |
line type for the centile curves |
lwd.centiles |
The line width for the centile curves |
colors |
the different colour schemes to be used for the fan-chart. The following are available
|
... |
for extra arguments |
No return value, only graphical output.
data("ids_data") mydata_BB_y14 <- shape_data( data = ids_data, age_name = "age", score_name = "y14", family = "BB" ) mod_BB_y14 <- fb_select( data = mydata_BB_y14, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC" ) centiles_bin(mod_BB_y14, xvar = age)data("ids_data") mydata_BB_y14 <- shape_data( data = ids_data, age_name = "age", score_name = "y14", family = "BB" ) mod_BB_y14 <- fb_select( data = mydata_BB_y14, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC" ) centiles_bin(mod_BB_y14, xvar = age)
composite_shape() creates a data.frame with age values and the sum of normalized
z-scores from multiple NormTable objects, suitable for use as input to fb_select().
composite_shape(normtables)composite_shape(normtables)
normtables |
list of NormTable objects created by |
A data.frame with:
age: Age values from the first NormTable
z_sum: Unweighted sum of normalized z-scores across all objects
shape_data(), fb_select(), normtable_create()
invisible(data("ids_data")) # Example with two normtables mydata1 <- shape_data(ids_data, age_name = "age", score_name = "y7", family = "BCPE") mod1 <- fb_select(mydata1, age_name = "age", score_name = "shaped_score", family = "BCPE", selcrit = "BIC") norm1 <- normtable_create(mod1, mydata1, age_name = "age", score_name = "shaped_score") mydata2 <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BCPE") mod2 <- fb_select(mydata2, age_name = "age", score_name = "shaped_score", family = "BCPE", selcrit = "BIC") norm2 <- normtable_create(mod2, mydata2, age_name = "age", score_name = "shaped_score") composite_data <- composite_shape(list(norm1, norm2))invisible(data("ids_data")) # Example with two normtables mydata1 <- shape_data(ids_data, age_name = "age", score_name = "y7", family = "BCPE") mod1 <- fb_select(mydata1, age_name = "age", score_name = "shaped_score", family = "BCPE", selcrit = "BIC") norm1 <- normtable_create(mod1, mydata1, age_name = "age", score_name = "shaped_score") mydata2 <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BCPE") mod2 <- fb_select(mydata2, age_name = "age", score_name = "shaped_score", family = "BCPE", selcrit = "BIC") norm2 <- normtable_create(mod2, mydata2, age_name = "age", score_name = "shaped_score") composite_data <- composite_shape(list(norm1, norm2))
The data are perturbed variants of the scores on
the raw speed of block 1 (rt) and
the raw number of errors of block 7 (error)
of the normative sample of the cotapp test (Rommelse et al., 2018).
More information on the cotapp test: https://www.boom.nl/productgroep/101-45_COTAPP
data(cotapp_data)data(cotapp_data)
A dataframe with three columns:
ageage in years
rtreaction time: scores on the raw speed of block 1
errornumber of errors of block 7
Rommelse N, Brinkman A, Slaats-Willemse D, Timmerman ME, Voncken L, de Zeeuw P, Luman M, Hartman C (2020). “De Cognitieve Test Applicatie (COTAPP): geavanceerde computertest voor het meten van aandacht, informatieverwerking en executieve functies bij kinderen.” Kind en Adolescent, 41, 50–80.
Estimates reliability curves across various combinations of window widths and age step sizes, with optional per-individual estimation.
different_rel( data, item_variables, age_name, step_window, min_agegroup = NULL, max_agegroup = NULL, step_agegroup, include_window_per_person = FALSE, complete.obs = TRUE )different_rel( data, item_variables, age_name, step_window, min_agegroup = NULL, max_agegroup = NULL, step_agegroup, include_window_per_person = FALSE, complete.obs = TRUE )
data |
data.frame containing item scores and age variable. |
item_variables |
character vector. Names of the columns with item scores. |
age_name |
string. Name of the age variable. Default is |
step_window |
numeric vector. Window widths to evaluate. |
min_agegroup |
numeric. Minimum age to include. Defaults to the floor of the minimum age in the data. |
max_agegroup |
numeric. Maximum age to include. Defaults to the ceiling of the maximum age in the data. |
step_agegroup |
numeric vector. Step sizes between evaluated age points. |
include_window_per_person |
logical. If |
complete.obs |
logical. If |
An object of class Drel (a data.frame) with:
rel: Reliability estimates
age: Corresponding evaluated ages
window_width: Width of the window used
age_group_width: Step size between evaluated age groups
version: Type of estimation ("step" or "window_per_person")
invisible(data("ids_kn_data")) rel_int <- different_rel( data = ids_kn_data, item_variables = colnames(ids_kn_data), age_name = "age_years", step_window = c(0.5, 1, 2, 5, 10, 20), min_agegroup = 5, max_agegroup = 20, step_agegroup = c(0.5, 1, 1.5, 2) )invisible(data("ids_kn_data")) rel_int <- different_rel( data = ids_kn_data, item_variables = colnames(ids_kn_data), age_name = "age_years", step_window = c(0.5, 1, 2, 5, 10, 20), min_agegroup = 5, max_agegroup = 20, step_agegroup = c(0.5, 1, 1.5, 2) )
fb_select() applies the free order model selection procedure, using forward–backward selection
(Voncken et al. 2019).
For a given GAMLSS distribution and model selection criterion, it selects the optimal
polynomial degrees for all distribution parameters.
fb_select( data, age_name, score_name, family, selcrit = "BIC", spline = FALSE, method = "RS(10000)", max_poly = c(5, 5, 2, 2), min_poly = c(0, 0, 0, 0), start_poly = c(2, 1, 0, 0), trace = TRUE, seed = 123, parallel = FALSE )fb_select( data, age_name, score_name, family, selcrit = "BIC", spline = FALSE, method = "RS(10000)", max_poly = c(5, 5, 2, 2), min_poly = c(0, 0, 0, 0), start_poly = c(2, 1, 0, 0), trace = TRUE, seed = 123, parallel = FALSE )
data |
data.frame. Sample on which to fit the distribution; contains the scores and ages. |
age_name |
string. Name of the age variable. |
score_name |
string. Name of the score variable. |
family |
string. For example, |
selcrit |
string. Model selection criterion: |
spline |
logical. If |
method |
string. Estimation method for |
max_poly |
vector. Maximum polynomial degrees for each parameter. |
min_poly |
vector. Minimum polynomial degrees for each parameter. |
start_poly |
vector. Starting polynomial degrees for each parameter. |
trace |
logical. If |
seed |
integer. Random seed for cross-validation folds. |
parallel |
logical. If |
If parallel = TRUE, candidate models are evaluated in parallel using the
future and future.apply packages. If these packages are not installed,
a message is printed and the function continues with sequential evaluation.
Parallelization can reduce elapsed time for large datasets, complex models and cross-validation,
but may be slower than sequential evaluation for smaller problems.
A selected GAMLSS model with the chosen polynomial degrees and the final criterion value.
Voncken L, Albers CJ, Timmerman ME (2019). “Model selection in continuous test norming with GAMLSS.” Assessment, 26(7), 1329–1346. doi:10.1177/1073191117715113.
shape_data(), fb_select(), normtable_create()
invisible(data("ids_data")) mydata <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BB") mod <- fb_select(mydata, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC")invisible(data("ids_data")) mydata <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BB") mod <- fb_select(mydata, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC")
The data are perturbed data, based on scores on Test 14 (“naming antonyms”) and Test 7 (“naming categories”) of the intelligence test IDS-2 (Grob & Hagmann-von Arx, 2018a; Grob et al., 2018b). The data are provided as supplementary material to Timmerman et al. (2021).
data(ids_data)data(ids_data)
A dataframe with three columns:
ageage in years
y7raw test score on Test 7
y14raw test score on Test 14
Grob A, Hagmann-von Arx P (2018). IDS 2: Intelligence and Development Scales-2. Hogrefe.
Grob A, Hagmann-von Arx P, Ruiter S, Timmerman M, Visser L (2018). IDS-2: Intelligentie-en ontwikkelingsschalen voor kinderen en jongeren. Hogrefe Publishing.
Timmerman ME, Voncken L, Albers CJ (2021). “A tutorial on regression-based norming of psychological tests with GAMLSS.” Psychological methods, 26(3), 357. doi:10.1037/met0000348.
The data are simulated data for demonstration purposes, akin to Test 7 (“naming categories”) of the intelligence test IDS-2 (Grob & Hagmann-von Arx, 2018). It consists of the binary scores on 34 items (KN_1,...,KN_34). The raw test score is the sum of the 34 item scores. The data are provided as supplementary material to Heister et al. (2024).
data(ids_kn_data)data(ids_kn_data)
A dataframe with 36 columns:
KN_1binary score on item 1
KN_2binary score on item 2
...
KN_34binary score on item 34
rawscoreraw test score as the unweighted sum of the scores on item 1 to item 34
age_yearsage in year
https://osf.io/dc5k9/files/osfstorage
Grob A, Hagmann-von Arx P (2018). IDS 2: Intelligence and Development Scales-2. Hogrefe.
Heister HM, Albers CJ, Wiberg M, Timmerman ME (2024). “Item response theory-based continuous test norming.” Psychological methods. doi:10.1037/met0000686.
Dataframe with the vectors age and rel, with the ages evaluated, and rel the (fictional) test reliability per age.
data(ids_rel_data)data(ids_rel_data)
A dataframe with two columns:
ageage in years
relreliability
constructed by authors
normtable_create() creates a norm table based on a fitted GAMLSS model.
normtable_create( model, data, age_name, score_name, datarel = NULL, normtype = "Z", min_age = NULL, max_age = NULL, min_score = NULL, max_score = NULL, step_size_score = 1, step_size_age = NULL, cont_cor = FALSE, ci_level = 0.95, trim = 3, excel = FALSE, excel_name = tempfile("norms", fileext = ".xlsx"), new_data = FALSE )normtable_create( model, data, age_name, score_name, datarel = NULL, normtype = "Z", min_age = NULL, max_age = NULL, min_score = NULL, max_score = NULL, step_size_score = 1, step_size_age = NULL, cont_cor = FALSE, ci_level = 0.95, trim = 3, excel = FALSE, excel_name = tempfile("norms", fileext = ".xlsx"), new_data = FALSE )
model |
a GAMLSS fitted model, for example the result of |
data |
data.frame. The sample on which the model has been fitted, or new data;
must contain the score variable (with name given in |
age_name |
string. Name of the age variable. |
score_name |
string. Name of the score variable. |
datarel |
data.frame or numeric. If a data.frame, must contain columns |
normtype |
string. Norm score type: |
min_age |
numeric. Lowest age value in the norm table; default is the first integer below the minimum observed age. |
max_age |
numeric. Highest age value in the norm table; default is the first integer above the maximum observed age. |
min_score |
numeric. Lowest score value in the norm table; default is the minimum observed score. |
max_score |
numeric. Highest score value in the norm table; default is the maximum observed score. |
step_size_score |
numeric. Increment of the scores in the norm table; default is 1. |
step_size_age |
numeric. Increment of the ages in the norm table; defaults to approximately 100 ages in total. |
cont_cor |
logical. If |
ci_level |
numeric. Confidence interval level (if |
trim |
numeric. Trim norm scores at ± |
excel |
logical. If |
excel_name |
character. Path to the Excel file. Defaults to a temporary file.
Ignored if |
new_data |
logical. If |
If excel = TRUE, results are written to an Excel file via the openxlsx2 package.
If the package is not installed, a message is printed and the function continues
without writing an Excel file. By default, the file is written to a temporary path
(see tempfile()); if you want to keep the file permanently, provide your own file
name via the excel_name argument (e.g., "norms.xlsx").
A list of class NormTable containing:
norm_sample: Estimated norm scores (normtype) in the sample, trimmed at trim.
norm_sample_lower, norm_sample_upper: Lower and upper ci_level confidence bounds of norm_sample.
norm_matrix: Norm scores (normtype) by age (only if new_data = FALSE).
norm_matrix_lower, norm_matrix_upper: Lower and upper ci_level bounds of norm_matrix.
znorm_sample: Estimated Z scores in the sample.
cdf_sample: Estimated percentiles in the sample.
cdf_matrix: Percentile table by age (only if new_data = FALSE).
data, age_name, score_name: Copies of respective function arguments.
pop_age: Evaluated ages in the norm table (only if new_data = FALSE).
Timmerman ME, Voncken L, Albers CJ (2021). “A tutorial on regression-based norming of psychological tests with GAMLSS.” Psychological methods, 26(3), 357. doi:10.1037/met0000348.
# Load example data invisible(data("ids_data")) # Prepare data for modeling mydata_BB_y14 <- shape_data( data = ids_data, age_name = "age", score_name = "y14", family = "BB" ) # Fit model using BIC as selection criterion mod_BB_y14 <- fb_select( data = mydata_BB_y14, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC" ) # Create norm table from fitted model norm_mod_BB_y14 <- normtable_create( model = mod_BB_y14, data = mydata_BB_y14, age_name = "age", score_name = "shaped_score" ) # Calculate norms for a new sample using reliability data invisible(data("ids_rel_data")) newdata <- ids_data[1:5, c("age", "y14")] norm_mod_BB_newdata <- normtable_create( model = mod_BB_y14, data = newdata, age_name = "age", score_name = "y14", new_data = TRUE, datarel = ids_rel_data )# Load example data invisible(data("ids_data")) # Prepare data for modeling mydata_BB_y14 <- shape_data( data = ids_data, age_name = "age", score_name = "y14", family = "BB" ) # Fit model using BIC as selection criterion mod_BB_y14 <- fb_select( data = mydata_BB_y14, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC" ) # Create norm table from fitted model norm_mod_BB_y14 <- normtable_create( model = mod_BB_y14, data = mydata_BB_y14, age_name = "age", score_name = "shaped_score" ) # Calculate norms for a new sample using reliability data invisible(data("ids_rel_data")) newdata <- ids_data[1:5, c("age", "y14")] norm_mod_BB_newdata <- normtable_create( model = mod_BB_y14, data = newdata, age_name = "age", score_name = "y14", new_data = TRUE, datarel = ids_rel_data )
plot_drel() plots reliability estimates as a function of age,
based on different window widths, using a Drel object.
plot_drel(drel, ncol = 3, nrow = 2, ...)plot_drel(drel, ncol = 3, nrow = 2, ...)
drel |
a |
ncol |
number of plots per row (default: |
nrow |
number of plots per column (default: |
... |
additional arguments passed to plotting functions. |
graphical output and the ggplot object used to create it.
data("ids_kn_data") rel_int <- different_rel( data = ids_kn_data, item_variables = colnames(ids_kn_data), age_name = "age_years", step_window = c(0.5, 1, 2, 5, 10, 20), min_agegroup = 5, max_agegroup = 20, step_agegroup = c(0.5, 1, 1.5, 2) ) plot_drel(rel_int, ncol = 2)data("ids_kn_data") rel_int <- different_rel( data = ids_kn_data, item_variables = colnames(ids_kn_data), age_name = "age_years", step_window = c(0.5, 1, 2, 5, 10, 20), min_agegroup = 5, max_agegroup = 20, step_agegroup = c(0.5, 1, 1.5, 2) ) plot_drel(rel_int, ncol = 2)
plot_normtable() plots norm curves as a function of the predictor,
along with the sample data, based on a NormTable object.
plot_normtable( normtable, lty = 1, lwd = 3, pch = 1, cex = 0.5, col = "aquamarine4", xlab = "Age", ylab = "Percentile", ... )plot_normtable( normtable, lty = 1, lwd = 3, pch = 1, cex = 0.5, col = "aquamarine4", xlab = "Age", ylab = "Percentile", ... )
normtable |
a |
lty |
line type(s) for curves. |
lwd |
line width(s) for curves. |
pch |
symbol for sample points. |
cex |
point size (default: |
col |
point colour (default: |
xlab |
x-axis label (default: |
ylab |
y-axis label (default: |
... |
additional graphical parameters passed to
|
graphical output and the ggplot object used to create it.
data("ids_data") mydata_BB_y14 <- shape_data( data = ids_data, age_name = "age", score_name = "y14", family = "BB" ) mod_BB_y14 <- fb_select( data = mydata_BB_y14, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC" ) norm_mod_BB_y14 <- normtable_create( model = mod_BB_y14, data = mydata_BB_y14, age_name = "age", score_name = "shaped_score" ) # default plot plot_normtable(norm_mod_BB_y14)data("ids_data") mydata_BB_y14 <- shape_data( data = ids_data, age_name = "age", score_name = "y14", family = "BB" ) mod_BB_y14 <- fb_select( data = mydata_BB_y14, age_name = "age", score_name = "shaped_score", family = "BB", selcrit = "BIC" ) norm_mod_BB_y14 <- normtable_create( model = mod_BB_y14, data = mydata_BB_y14, age_name = "age", score_name = "shaped_score" ) # default plot plot_normtable(norm_mod_BB_y14)
Estimates reliability across age using a sliding window approach, either at fixed age points or per individual.
reliability_window( data, age_name, item_variables, window_width, window_version = "step", min_agegroup = NULL, max_agegroup = NULL, step_agegroup = 1, complete.obs = TRUE )reliability_window( data, age_name, item_variables, window_width, window_version = "step", min_agegroup = NULL, max_agegroup = NULL, step_agegroup = 1, complete.obs = TRUE )
data |
data.frame containing the item scores and age variable. |
age_name |
string. Name of the age variable. |
item_variables |
numeric or character vector. Column indices or names of the item variables. |
window_width |
numeric. Width of the sliding window used to group individuals by age. |
window_version |
string. Type of windowing:
|
min_agegroup |
numeric. Minimum age to include. Defaults to the floor of the minimum age in the data. |
max_agegroup |
numeric. Maximum age to include. Defaults to the ceiling of the maximum age in the data. |
step_agegroup |
numeric. Step size between evaluated ages. Used only when |
complete.obs |
logical. If |
A data.frame with:
rel: Reliability estimates
age: Corresponding age values
window_width: The width of the sliding window
window_per: Description of age step or observation unit
This output can be used as the datarel argument in normtable_create().
Heister HM, Albers CJ, Wiberg M, Timmerman ME (2024). “Item response theory-based continuous test norming.” Psychological methods. doi:10.1037/met0000686.
invisible(data("ids_kn_data")) rel_est <- reliability_window( data = ids_kn_data, age_name = "age_years", item_variables = colnames(ids_kn_data), window_width = 2 )invisible(data("ids_kn_data")) rel_est <- reliability_window( data = ids_kn_data, age_name = "age_years", item_variables = colnames(ids_kn_data), window_width = 2 )
Computes optimal sample sizes per group under a distribution free polynomial regression model for group means, following Hessen (2026).
sample_size_poly( n_groups, poly_degree, n0, variances = NULL, solution_type = c("balanced", "all", "range") )sample_size_poly( n_groups, poly_degree, n0, variances = NULL, solution_type = c("balanced", "all", "range") )
n_groups |
Integer. Number of groups (e.g., age groups). |
poly_degree |
Integer scalar or vector. Polynomial degree(s) of polynomial regression model. |
n0 |
Integer scalar or vector. Reference sample size per group under traditional norming. Typical values are 200, 300, or 400 (see e.g., Egberink et al. (2026)). |
variances |
Optional numeric vector of length |
solution_type |
Character string indicating which solution(s) to return. Must be one of:
|
In the assumed continuous norming model, group means are first estimated and then smoothed using a polynomial model that regresses group means on age. This function determines group sample sizes such that the precision of the estimated means (not higher-order moments) is at least as high as under traditional norming.
This function implements the linear programming (LP) approach described in
Hessen (2026). Multiple optimal solutions may exist:
different combinations of and can yield the same minimal
total sample size while satisfying the precision constraints.
By default (solution_type = "balanced"), a single solution is returned,
defined as the solution minimizing the difference between the largest and smallest group sample sizes. This yields
a practically balanced design. Alternatively, users can inspect all optimal
solutions (solution_type = "all") or examine the range of optimal sample
sizes per group (solution_type = "range").
How to use:
Before data collection: do not provide variances (assumes homoscedasticity).
During/after data collection: provide estimated variances per group.
A data.frame. The structure depends on solution_type:
Returns one optimal solution with:
Reference sample size per group
Group index
Polynomial degree
Sample size per group
Expected variance of estimated mean
Variance under traditional norming ()
Total sample size across all groups
Total sample size under traditional norming
Lower bound used in LP
Upper bound used in LP
Returns all optimal solutions with the same columns as above, plus:
Identifier for solution number
Returns one row per group with:
Reference sample size per group
Group index
Polynomial degree
Range of sample sizes over optimal solutions
Egberink I, De Leng W, Evers A, Hemker B, Lucassen W (2026). COTAN Beoordelingssysteem voor de Kwaliteit van Tests. Geheel herziene versie 2026. Nederlands Instituut van Psychologen. Hessen DJ (2026). “Richtlijnen voor steekproefgroottes bij continu normeren.” Utrecht University, https://github.com/djhessen/COTAN.
# Example 1: Planning before data collection (homoscedastic) ## Not run: res1 <- sample_size_poly( n_groups = 5, poly_degree = 2, n0 = 400 ) # Example 2: Midway planning with variance estimates (heteroscedastic) ids_data$age_group <- cut(ids_data$age, breaks = seq(6, 18, by = 1)) v <- tapply(ids_data$y7, ids_data$age_group, var, na.rm = TRUE) res2 <- sample_size_poly( n_groups = length(v), poly_degree = c(1,2,3), n0 = c(300,400), variances = v ) ## End(Not run)# Example 1: Planning before data collection (homoscedastic) ## Not run: res1 <- sample_size_poly( n_groups = 5, poly_degree = 2, n0 = 400 ) # Example 2: Midway planning with variance estimates (heteroscedastic) ids_data$age_group <- cut(ids_data$age, breaks = seq(6, 18, by = 1)) v <- tapply(ids_data$y7, ids_data$age_group, var, na.rm = TRUE) res2 <- sample_size_poly( n_groups = length(v), poly_degree = c(1,2,3), n0 = c(300,400), variances = v ) ## End(Not run)
fb_select()
shape_data() reshapes the response variable into the right format for the specified
distribution and removes all cases with missing data on the score or age variable.
The result is suitable for use as input to fb_select().
shape_data( data, age_name, score_name, family, max_score = NULL, verbose = TRUE )shape_data( data, age_name, score_name, family, max_score = NULL, verbose = TRUE )
data |
data.frame. Sample on which to fit the distribution; contains the scores and ages. |
age_name |
string. Name of the age variable. |
score_name |
string. Name of the score variable. |
family |
string. For example, |
max_score |
numeric. Highest possible score in the norm table. Defaults to the maximum observed score in the sample. |
verbose |
logical. If |
The function checks whether the response values are valid for the specified
GAMLSS distribution family. If not, transformations are applied to ensure compatibility.
Messages are printed (if verbose = TRUE) to describe each transformation.
Unexpected transformations should prompt inspection of the original data. Note that the function does not assess whether the chosen family is appropriate for the data—it only ensures compatibility.
Compatible with all gamlss distributions, with the exception of distributions in the multinomial family (gamlss::.gamlss.multin.list). This includes user-defined distributions, such as truncated distributions.
A data.frame containing the original variables and a new column shaped_score,
with the response variable in the correct format for GAMLSS modeling.
Voncken L, Albers CJ, Timmerman ME (2019). “Model selection in continuous test norming with GAMLSS.” Assessment, 26(7), 1329–1346. doi:10.1177/1073191117715113.
invisible(data("ids_data")) mydata_BB <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BB") mydata_BCPE <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BCPE")invisible(data("ids_data")) mydata_BB <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BB") mydata_BCPE <- shape_data(ids_data, age_name = "age", score_name = "y14", family = "BCPE")