semTools/ 0000755 0001762 0000144 00000000000 15144023322 012053 5 ustar ligges users semTools/MD5 0000644 0001762 0000144 00000015400 15144023322 012363 0 ustar ligges users 4ab99f79fb9d8acb5d96a9f4841be38b *DESCRIPTION
808f94a2886fa919395c8a540b39d6ae *NAMESPACE
a5d4cb7e8fa52b310296e873a9d6e2eb *NEWS.md
35e3b92136a02ce4a8a3b19817054398 *R/EmpKaiser.R
deeb23e9dd0dda60ea5ec933c8b93c24 *R/NET.R
528e54b04c168f9818d962f24976ce26 *R/PAVranking.R
1830a4f07e22584dc6fb6d508c232949 *R/TSML.R
b493f1e2905a9f6c68e09b577f7bcc39 *R/aa_semTools-deprecated.R
4af8fe4548dc80d75c65bbb033a57299 *R/auxiliary.R
4cf360fd729e1237d7c33b54c99d48cd *R/clipboard.R
70f9c4579fc2b88b2a141c7c77880db5 *R/compareFit.R
9452cfe3870212e48397e951991b27bf *R/data.R
328a6ba7b743ef477175cd7b61a19948 *R/dataDiagnosis.R
36efcb04956d27597da57d2edb00e7d4 *R/discriminantValidity.R
1c7a99c8c46cc8bf2817f8459bec1023 *R/emmeans_lavaan.R
6fcd9d1bb62aea3b1ef262fd06ed4107 *R/fitIndices.R
7f95ac609c99baf8953fd0145c1da9c0 *R/fmi.R
3d26bbf0d7143f21e1f76db128a9649d *R/gorica.R
5116fa966b876ab74cccca6f18d414dd *R/htmt.R
f70a1de02115ad9cef51820ebaa1565e *R/imposeStart.R
69ffc885d351e32fe641f5520b5fc0ec *R/indProd.R
ed3337c42eecfba61fc37e863ea61f65 *R/kd.R
664b1dee784db1f51dfc2b82984d21c4 *R/loadingFromAlpha.R
c898a7ada480f97ea6108029b9160d3e *R/measEq.R
8d6bf89ee0daa474b8661d7d948d59f2 *R/miPowerFit.R
f1ba0cfbec8a1c191c2e58e9368ba3c7 *R/missingBootstrap.R
7f5c671fde4a8df18a7bb9dac1af798b *R/monteCarloCI.R
f0f13700b0f7cc300f5c6326c4653acc *R/mvrnonnorm.R
7704490623667921dff2868bd2392dd6 *R/ordMoments.R
b3a5236c5120109f909904c212692c87 *R/parcelAllocation.R
c262cd01cb66abc27d880a2faec62583 *R/partialInvariance.R
a6a76a6d36e0a3fc95b27784c322dcaf *R/permuteMeasEq.R
efb66bb80b54f5d99c205fee7a8ae232 *R/plausibleValues.R
464377d7f7fdabcaeef7806675102e28 *R/poolMAlloc.R
bab9b4f4590967574d0955e9bb8621e8 *R/powerAnalysisNested.R
9aa6c828bb164c4b94266e6cae9c248c *R/powerAnalysisRMSEA.R
35eca65deb80aa5c9a03080b3a5fd09a *R/powerAnalysisSS.R
7918c8b6be077cb98212cafb5f99e05e *R/probeInteraction.R
2fe1e05a38e10ca178678b663baaad24 *R/quark.R
851f645e83be31ccdc0f39beb42b4d88 *R/reliability.R
d740abd5f729b37d44c58007747aeb84 *R/residualCovariate.R
3b0088f1cac74ea1e359bbb9997ddcdc *R/semTools.R
9b3c48a01d24ab652a1033cc43e09edf *R/singleParamTest.R
db5ec4cf1a5310d50dbb09dc664dc561 *R/splitSample.R
a7a44eb40f895592c9c2f5ddc3008e6d *R/tukeySEM.R
540f327d33c85cc090c6fda8c361efde *R/zzz.R
ebcb52eff7ecb5000bb00ad17957036b *README.md
76e123f5dcc8dee07678412adc0a2a84 *build/partial.rdb
648320faee991423d38e37988c44ef80 *build/vignette.rds
4d1090db8b160303f235075ebbda16d7 *data/dat2way.rda
c56c2358c6e4d82c4cd7855e4df37282 *data/dat3way.rda
d3f489a4fefd07ccfcd3875bc71e0b83 *data/datCat.rda
ba21a4b5113fd7515fbf26fccfe0fe17 *data/exLong.rda
35cb60d186fd440645d81b56e22a3fa2 *data/simParcel.rda
cf344eafafe90f08e687a259663b0c4d *inst/CITATION
81dc319ba4e3739b20cbcd3196f00e1e *inst/doc/partialInvariance.Rnw
8b2420f605f3592021ab4778724e2bed *inst/doc/partialInvariance.pdf
fbdb4902cc1df8b273d644fb4d1de132 *inst/examples/lavaan2emmeans.R
4d59b5a289ca1e7b3cf7f487570dd3cd *man/AVE.Rd
a28c22979c817fda070bbc8b191a36df *man/BootMiss-class.Rd
9b843422caf2145783b42860a0ad55ba *man/FitDiff-class.Rd
b559b20ac662d06cf943a39472653a57 *man/Net-class.Rd
84758fa1b8f55b0af087d11a1da6d0fb *man/PAVranking.Rd
13d9de2a40e21e2fec56b0135fcf719e *man/SSpower.Rd
a70ea2c6d04e74909d8a0e0b3dc30beb *man/auxiliary.Rd
c255973f9fea33e476d5e165e90ab685 *man/bsBootMiss.Rd
c1fc42a0fe2643a04f4f3db48c9595c8 *man/chisqSmallN.Rd
7bb6582205f7affb17e5364fb6c6c82b *man/clipboard.Rd
32205b01e78c1e2f7d1ecee7d19d5e9e *man/combinequark.Rd
5c9c190d74f3fc7b8263b8a7f1cd3ac9 *man/compRelSEM.Rd
ab6be7f82ae5a019f5cddbca4b91c828 *man/compareFit.Rd
e3b7ebbd49ee04307b7c2a9269ccd5c2 *man/dat2way.Rd
bc676545cc6f99640bee85f1d124637d *man/dat3way.Rd
bb535d599ad9096582dc0256ef785a1f *man/datCat.Rd
928b3be970ee728a5b63bf356c5550bc *man/discriminantValidity.Rd
b953186f103230d01c7cf98d5c245e15 *man/efa.ekc.Rd
e5a00c7b7f1515795af3e2c60f4a7b10 *man/epcEquivCheck.Rd
6f79741769dd5671559834b27b4fe292 *man/epcEquivFit.Rd
91c710004d40191cc92a7b7d4e27139f *man/exLong.Rd
8e886463f55e5edeedddbb77f601a1a2 *man/findRMSEApower.Rd
4388dbbe39c7f66ddc58d3543c742864 *man/findRMSEApowernested.Rd
181b47581b794e42091883a3f2715a4a *man/findRMSEAsamplesize.Rd
3b4214558a5f6b4ad5a7df6e411f2e90 *man/findRMSEAsamplesizenested.Rd
75895615d5f9bb188af3d355a5bce1fa *man/fmi.Rd
bfbfda5bca607c81efda738b31d4a7b4 *man/goricaSEM.Rd
4fef726eb173e9ab4aafca30452a5849 *man/htmt.Rd
2865323337743b69f0111bfe8d3763f4 *man/imposeStart.Rd
ccd52b293e9c542277f9f6892271f189 *man/indProd.Rd
050966fdeb43eeb3c23176c0791d88b4 *man/kd.Rd
31a27322da8b705c6ec8774e518bf321 *man/kurtosis.Rd
25dc38a53f52a8e4eddf6a169f1e117a *man/lavaan2emmeans.Rd
b24d6f6dc617e522f1bdb7d2478e9ec0 *man/loadingFromAlpha.Rd
a1416c3b9594ff872e601a11f69ebe56 *man/lrv2ord.Rd
e62aff85d844043eeb417a94b065a10d *man/mardiaKurtosis.Rd
3e5c5ac5ce4923a87133e02fdd87ca7e *man/mardiaSkew.Rd
7d7f26b6540e5176a7b44c086bb6857b *man/maximalRelia.Rd
97c1b90dab0910110c6363dcbadba2ba *man/measEq.syntax-class.Rd
0c1cf8f72c256eb8408d49751313f6f0 *man/measEq.syntax.Rd
40ed34cecc0037913fdf88e3dc1ab28e *man/monteCarloCI.Rd
8050a1cb961375d1a6b4c8af0a760109 *man/moreFitIndices.Rd
a46444a8b87a06d19a8446a21f58749c *man/mvrnonnorm.Rd
f51137adb8cb4587d0e5eb7a63e4778f *man/net.Rd
f2ff63ff40108e5bb828c609a5ce89c8 *man/nullRMSEA.Rd
dad21e0350e2d1b05dd0291b25d8ea07 *man/parcelAllocation.Rd
90ba9c83203c83d1da6a9c0760f7d02d *man/partialInvariance.Rd
dbe9310ae109bb5b17b671de55b77196 *man/permuteMeasEq-class.Rd
51e94c734eb2a3d1cd4c616b1df7dae2 *man/permuteMeasEq.Rd
baaf08afcf07ad8d0b67894e1e346c29 *man/plausibleValues.Rd
bfd140211baf83ac057f7caaf356a66f *man/plotProbe.Rd
981a1c3ef734038ade2d80f6f1c8e1fb *man/plotRMSEAdist.Rd
936d9d577112a1f242ade5c29cb3dd87 *man/plotRMSEApower.Rd
1003bb6dba12f7f08a7d0e97b9ca043f *man/plotRMSEApowernested.Rd
86ff2b2c35eea574ed046f6cd8e93bf3 *man/poolMAlloc.Rd
7c19934693578e00fd5a2cd8f30b47f3 *man/probe2WayMC.Rd
d2c032bc6a9e4fae589dc25b0fa3bf14 *man/probe2WayRC.Rd
ce1daf746eb6cbdd47e2334b0d1fd6ce *man/probe3WayMC.Rd
f1d63c788d3faaff8946ea539b6c1c28 *man/probe3WayRC.Rd
9093b87793638e47a6890826d44a4a7a *man/quark.Rd
af445dcb2bd7c487bc053fe1f47c6898 *man/reliability-deprecated.Rd
ee1fcbc69047bfe60c047ee4d549c198 *man/reliabilityL2-deprecated.Rd
99dbb956123d40f686e2d77d842b0886 *man/residualCovariate.Rd
08ae3d10b66d42f98ff203e1ab30bca5 *man/semTools-deprecated.Rd
c84f43ff30c7e34caee0a38905b96493 *man/semTools.Rd
2d0c2f4e2f10d8c84feb8f1705bef4fb *man/simParcel.Rd
00af0ddb36e99e38f796bc8896fd663d *man/singleParamTest.Rd
e89797620eddc5bcc1bafe0bb5babddb *man/skew.Rd
2da1028ad38bb0531d85ef2b95f9273f *man/splitSample.Rd
84b89acd41d31adce3db39a5b89f0009 *man/tukeySEM.Rd
21b218fa85360e1ae22c0d31b1fb8306 *man/twostage-class.Rd
b21e87ffc30baacb82e92cea8c87e8e7 *man/twostage.Rd
81dc319ba4e3739b20cbcd3196f00e1e *vignettes/partialInvariance.Rnw
4f5891dc46f7212c1ce6189c4467adba *vignettes/partialInvariance.bib
semTools/R/ 0000755 0001762 0000144 00000000000 15142657341 012270 5 ustar ligges users semTools/R/PAVranking.R 0000644 0001762 0000144 00000170470 14764274542 014433 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 12 March 2025
##' Parcel-Allocation Variability in Model Ranking
##'
##' This function quantifies and assesses the consequences of parcel-allocation
##' variability for model ranking of structural equation models (SEMs) that
##' differ in their structural specification but share the same parcel-level
##' measurement specification (see Sterba & Rights, 2016). This function calls
##' [parcelAllocation()]---which can be used with only one SEM in
##' isolation---to fit two (assumed) nested models to each of a specified number
##' of random item-to-parcel allocations. Output includes summary information
##' about the distribution of model selection results (including plots) and the
##' distribution of results for each model individually, across allocations
##' within-sample. Note that this function can be used when selecting among more
##' than two competing structural models as well (see instructions below
##' involving the `seed=` argument).
##'
##' This is based on a SAS macro `ParcelAlloc` (Sterba & MacCallum, 2010).
##' The `PAVranking()` function produces results discussed in Sterba and
##' Rights (2016) relevant to the assessment of parcel-allocation variability in
##' model selection and model ranking. Specifically, the `PAVranking()`
##' function first calls [parcelAllocation()] to generate a given
##' number (`nAlloc=`) of item-to-parcel allocations, fitting both specified
##' models to each allocation, and providing summaryies of PAV for each model.
##' Additionally, `PAVranking()` provides the following new summaries:
##'
##' \itemize{
##' \item{PAV in model selection index values and model ranking between
##' Models `model0=` and `model1=`.}
##' \item{The proportion of allocations that converged and the proportion of
##' proper solutions (results are summarized for allocations with both
##' converged and proper allocations only).}
##' }
##'
##' For further details on the benefits of the random allocation of items to
##' parcels, see Sterba (2011) and Sterba and MacCallum (2010).
##'
##' To test whether nested models have equivalent fit, results can be pooled
##' across allocations using the same methods available for pooling results
##' across multiple imputations of missing data (see **Examples**).
##'
##' *Note*: This function requires the `lavaan` package. Missing data
##' must be coded as `NA`. If the function returns `"Error in
##' plot.new() : figure margins too large"`, the user may need to increase
##' size of the plot window (e.g., in RStudio) and rerun the function.
##'
##'
##' @importFrom stats sd
##' @importFrom lavaan parTable lavListInspect lavaanList
##' @importFrom graphics hist
##'
##' @param model0,model1 [lavaan::lavaan()] model syntax specifying
##' nested models (`model0` within `model1`) to be fitted
##' to the same parceled data. Note that there can be a mixture of
##' items and parcels (even within the same factor), in case certain items
##' should never be parceled. Can be a character string or parameter table.
##' Also see [lavaan::lavaanify()] for more details.
##' @param data A `data.frame` containing all observed variables appearing
##' in `model0=` and `model1=`, as well as those in the `item.syntax=` used to
##' create parcels. If the data have missing values, multiple imputation
##' before parceling is recommended: submit a stacked data set (with a variable
##' for the imputation number, so they can be separated later) and set
##' `do.fit=FALSE` to return the list of `data.frame`s (one per
##' allocation), each of which is a stacked, multiply imputed data set with
##' parcels created using the same allocation scheme.
##' @param parcel.names `character` vector containing names of all parcels
##' appearing as indicators in `model0=` or `model1=`.
##' @param item.syntax [lavaan::lavaan()] model syntax specifying the model
##' that would be fit to all of the unparceled items, including items that
##' should be randomly allocated to parcels appearing in `model0=` and `model1=`.
##' @param nAlloc The number of random items-to-parcels allocations to generate.
##' @param fun `character` string indicating the name of the
##' [lavaan::lavaan()] function used to fit `model0=` and `model1=` to `data=`.
##' Can only take the values `"lavaan"`, `"sem"`, `"cfa"`, or `"growth"`.
##' @param alpha Alpha level used as criterion for significance.
##' @param bic.crit Criterion for assessing evidence in favor of one model
##' over another. See Rafferty (1995) for guidelines (default is "very
##' strong evidence" in favor of the model with lower BIC).
##' @param fit.measures `character` vector containing names of fit measures
##' to request from each fitted [lavaan::lavaan-class] model. See the
##' output of [lavaan::fitMeasures()] for a list of available measures.
##' @param \dots Additional arguments to be passed to
##' [lavaan::lavaanList()]. See also [lavaan::lavOptions()]
##' @param show.progress If `TRUE`, show a [utils::txtProgressBar()]
##' indicating how fast each model-fitting iterates over allocations.
##' @param iseed (Optional) Random seed used for parceling items. When the same
##' random seed is specified and the program is re-run, the same allocations
##' will be generated. The seed argument can be used to assess parcel-allocation
##' variability in model ranking when considering more than two models. For each
##' pair of models under comparison, the program should be rerun using the same
##' random seed. Doing so ensures that multiple model comparisons will employ
##' the same set of parcel datasets. *Note*: When using \pkg{parallel}
##' options, you must first type `RNGkind("L'Ecuyer-CMRG")` into the R
##' Console, so that the seed will be controlled across cores.
##' @param warn Whether to print warnings when fitting models to each allocation
##'
##' @return
##' A `list` with 3 elements. The first two (`model0.results` and
##' `model1.results`) are results returned by [parcelAllocation()]
##' for `model0` and `model1`, respectively.
##' The third element (`model0.v.model1`) is a `list` of
##' model-comparison results, including the following:
##' \item{`LRT_Summary:`}{ The average likelihood ratio test across
##' allocations, as well as the *SD*, minimum, maximum, range, and the
##' proportion of allocations for which the test was significant.}
##' \item{`Fit_Index_Differences:`}{ Differences in fit indices, organized
##' by what proportion favored each model and among those, what the average
##' difference was.}
##' \item{`Favored_by_BIC:`}{ The proportion of allocations in which each
##' model met the criterion (`bic.crit`) for a substantial difference
##' in fit.}
##' \item{`Convergence_Summary:`}{ The proportion of allocations in which
##' each model (and both models) converged on a solution.}
##'
##' Histograms are also printed to the current plot-output device.
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso [parcelAllocation()] for fitting a single model,
##' [poolMAlloc()] for choosing the number of allocations
##'
##' @references
##' Raftery, A. E. (1995). Bayesian model selection in social
##' research. *Sociological Methodology, 25*, 111--163. \doi{10.2307/271063}
##'
##' Sterba, S. K. (2011). Implications of parcel-allocation variability for
##' comparing fit of item-solutions and parcel-solutions. *Structural
##' Equation Modeling, 18*(4), 554--577.\doi{10.1080/10705511.2011.607073}
##'
##' Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates
##' and model fit across repeated allocations of items to parcels.
##' *Multivariate Behavioral Research, 45*(2), 322--358.
##' \doi{10.1080/00273171003680302}
##'
##' Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation
##' variability in practice: Combining sources of uncertainty and choosing the
##' number of allocations. *Multivariate Behavioral Research, 51*(2--3),
##' 296--313. \doi{10.1080/00273171.2016.1144502}
##'
##' Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model
##' selection: Parcel-allocation variability in model ranking.
##' *Psychological Methods, 22*(1), 47--68. \doi{10.1037/met0000067}
##'
##' @examples
##'
##' ## Specify the item-level model (if NO parcels were created)
##' ## This must apply to BOTH competing models
##'
##' item.syntax <- c(paste0("f1 =~ f1item", 1:9),
##' paste0("f2 =~ f2item", 1:9))
##' cat(item.syntax, sep = "\n")
##' ## Below, we reduce the size of this same model by
##' ## applying different parceling schemes
##'
##' ## Specify a 2-factor CFA with correlated factors, using 3-indicator parcels
##' mod1 <- '
##' f1 =~ par1 + par2 + par3
##' f2 =~ par4 + par5 + par6
##' '
##' ## Specify a more restricted model with orthogonal factors
##' mod0 <- '
##' f1 =~ par1 + par2 + par3
##' f2 =~ par4 + par5 + par6
##' f1 ~~ 0*f2
##' '
##' ## names of parcels (must apply to BOTH models)
##' (parcel.names <- paste0("par", 1:6))
##'
##' \donttest{
##' ## override default random-number generator to use parallel options
##' RNGkind("L'Ecuyer-CMRG")
##'
##' PAVranking(model0 = mod0, model1 = mod1, data = simParcel, nAlloc = 100,
##' parcel.names = parcel.names, item.syntax = item.syntax,
##' # parallel = "multicore", # parallel available on Mac/Linux
##' std.lv = TRUE) # any addition lavaan arguments
##'
##'
##'
##' ## POOL RESULTS by treating parcel allocations as multiple imputations.
##' ## Details provided in Sterba & Rights (2016); see ?poolMAlloc.
##'
##' ## save list of data sets instead of fitting model yet
##' dataList <- parcelAllocation(mod0, # or mod1 (either uses same allocations)
##' data = simParcel, nAlloc = 100,
##' parcel.names = parcel.names,
##' item.syntax = item.syntax,
##' do.fit = FALSE)
##' ## now fit each model to each data set
##' if(requireNamespace("lavaan.mi")){
##' library(lavaan.mi)
##' fit0 <- cfa.mi(mod0, data = dataList, std.lv = TRUE)
##' fit1 <- cfa.mi(mod1, data = dataList, std.lv = TRUE)
##' anova(fit0, fit1) # Pooled test statistic comparing models.
##' help(package = "lavaan.mi") # Find more methods for pooling results.
##' }
##'
##' }
##'
##' @export
PAVranking <- function(model0, model1, data, parcel.names, item.syntax,
nAlloc = 100, fun = "sem", alpha = .05, bic.crit = 10,
fit.measures = c("chisq","df","cfi","tli","rmsea",
"srmr","logl","aic","bic","bic2"), ...,
show.progress = FALSE, iseed = 12345, warn = FALSE) {
if (alpha >= 1 | alpha <= 0) stop('alpha level must be between 0 and 1')
bic.crit <- abs(bic.crit)
## fit each model
out0 <- parcelAllocation(model = model0, data = data, nAlloc = nAlloc,
parcel.names = parcel.names, item.syntax = item.syntax,
fun = fun, alpha = alpha, fit.measures = fit.measures,
..., show.progress = show.progress, iseed = iseed,
return.fit = TRUE, warn = warn)
out1 <- parcelAllocation(model = model1, data = data, nAlloc = nAlloc,
parcel.names = parcel.names, item.syntax = item.syntax,
fun = fun, alpha = alpha, fit.measures = fit.measures,
..., show.progress = show.progress, iseed = iseed,
return.fit = TRUE, warn = warn)
## convergence summary
conv0 <- out0$Model@meta$ok
conv1 <- out1$Model@meta$ok
conv01 <- conv0 & conv1
conv <- data.frame(Proportion_Converged = sapply(list(conv0, conv1, conv01), mean),
row.names = c("model0","model1","Both_Models"))
## add proper solutions? I would advise against
## check df matches assumed nesting
DF0 <- out0$Fit["df", "Avg"]
DF1 <- out1$Fit["df", "Avg"]
if (DF0 == DF1) stop('Models have identical df, so they cannot be compared.')
if (DF0 < DF1) warning('model0 should be nested within model1, ',
'but df_0 < df_1. Should models be swapped?')
temp <- out0
out0 <- out1
out1 <- temp
## Re-run lavaanList to collect model-comparison results
if (show.progress) message('Re-fitting model0 to collect model-comparison ',
'statistics\n')
oldCall <- out0$Model@call
oldCall$model <- parTable(out0$Model)
oldCall$dataList <- out0$Model@external$dataList[conv01]
if (!is.null(oldCall$parallel)) {
if (oldCall$parallel == "snow") {
oldCall$parallel <- "no"
oldCall$ncpus <- 1L
if (warn) warning("Unable to pass lavaan::lavTestLRT() arguments when ",
"parallel = 'snow'. Switching to parallel = 'no'. ",
"Unless using Windows, parallel = 'multicore' works.")
}
}
PT1 <- parTable(out1$Model)
op1 <- lavListInspect(out1$Model, "options")
oldCall$FUN <- function(obj) {
fit1 <- try(lavaan::lavaan(model = PT1, slotOptions = op1,
slotData = obj@Data), silent = TRUE)
if (inherits(fit1, "try-error")) return("fit failed")
out <- lavaan::lavTestLRT(obj, fit1)
if (inherits(out, "try-error")) return("lavTestLRT() failed")
out
}
fit01 <- eval(as.call(oldCall))
## check if there are any results
noLRT <- sapply(fit01@funList, is.character)
if (all(noLRT)) stop("No success using lavTestScore() on any allocations.")
## anova() results
CHIs <- sapply(fit01@funList[!noLRT], "[", i = 2, j = "Chisq diff")
pVals <- sapply(fit01@funList[!noLRT], "[", i = 2, j = "Pr(>Chisq)")
LRT <- c(`Avg LRT` = mean(CHIs), df = abs(DF0 - DF1), SD = sd(CHIs),
Min = min(CHIs), Max = max(CHIs), Range = max(CHIs) - min(CHIs),
`% Sig` = mean(pVals < alpha))
class(LRT) <- c("lavaan.vector","numeric")
## differences in fit indices
indices <- fit.measures[!grepl(pattern = "chisq|df|pvalue", fit.measures)]
Fit0 <- do.call(cbind, out0$Model@funList[conv01])[indices, ]
Fit1 <- do.call(cbind, out1$Model@funList[conv01])[indices, ]
## higher values for model0
Fit01 <- Fit0 - Fit1
higher0 <- Fit0 > Fit1
perc0 <- rowMeans(higher0)
avg0 <- rowMeans(Fit01 * higher0)
## higher values for model1
Fit10 <- Fit1 - Fit0
higher1 <- Fit1 > Fit0
perc1 <- rowMeans(higher1)
avg1 <- rowMeans(Fit10 * higher1)
fitDiff <- data.frame(Prop0 = perc0, Avg0 = avg0, Prop1 = perc1, Avg1 = avg1)
class(fitDiff) <- c("lavaan.data.frame","data.frame")
attr(fitDiff, "header") <- paste("Note: Higher values of goodness-of-fit",
"indices (e.g., CFI) favor that model, but",
"higher values of badness-of-fit indices",
"(e.g., RMSEA) indicate the competing model",
"is favored.\n\n'Prop0' indicates the",
"proportion of allocations for which each",
"index was higher for model0 (likewise,",
"'Prop1' indicates this for model1).\n",
"\nAmong those allocations, 'Avg0' or 'Avg1'",
"indicates the average amount by which the",
"index was higher for that model.")
## favored by BIC
favorBIC <- NULL
if (any(grepl(pattern = "bic", fit.measures))) {
if ("bic" %in% fit.measures) {
highBIC <- abs(Fit01["bic",]) >= bic.crit
favor0 <- mean(higher1["bic",] & highBIC)
favor1 <- mean(higher0["bic",] & highBIC)
favorBIC <- data.frame("bic" = c(favor0, favor1),
row.names = paste("Evidence Favoring Model", 0:1))
}
if ("bic2" %in% fit.measures) {
highBIC <- abs(Fit01["bic2",]) >= bic.crit
favor0 <- mean(higher1["bic2",] & highBIC)
favor1 <- mean(higher0["bic2",] & highBIC)
favorBIC2 <- data.frame("bic2" = c(favor0, favor1),
row.names = paste("Evidence Favoring Model", 0:1))
if (is.null(favorBIC)) {
favorBIC <- favorBIC2
} else favorBIC <- cbind(favorBIC, favorBIC2)
}
class(favorBIC) <- c("lavaan.data.frame","data.frame")
attr(favorBIC, "header") <- paste("Percent of Allocations with |BIC Diff| >",
bic.crit)
}
## return results
list(Model0_Results = out0[c("Estimates","SE","Fit")],
Model1_Results = out1[c("Estimates","SE","Fit")],
Model0.v.Model1 = list(LRT_Summary = LRT,
Fit_Index_Differences = fitDiff,
Favored_by_BIC = favorBIC,
Convergence_Summary = conv))
}
## ------------
## old function
## ------------
## @param nPerPar A list in which each element is a vector, corresponding to
## each factor, indicating sizes of parcels. If variables are left out of
## parceling, they should not be accounted for here (i.e., there should not be
## parcels of size "1").
## @param facPlc A list of vectors, each corresponding to a factor, specifying
## the item indicators of that factor (whether included in parceling or not).
## Either variable names or column numbers. Variables not listed will not be
## modeled or included in output datasets.
## @param nAlloc The number of random allocations of items to parcels to
## generate.
## @param syntaxA lavaan syntax for Model A. Note that, for likelihood ratio
## test (LRT) results to be interpreted, Model A should be nested within Model
## B (though the function will still provide results when Models A and B are
## nonnested).
## @param syntaxB lavaan syntax for Model B. Note that, for likelihood ratio
## test (LRT) results to be appropriate, Model A should be nested within Model
## B (though the function will still provide results when Models A and B are
## nonnested).
## @param dataset Item-level dataset
## @param parceloutput folder where parceled data sets will be outputted (note
## for Windows users: file path must specified using forward slashes).
## @param names (Optional) A character vector containing the names of parceled
## variables.
## @param leaveout (Optional) A vector of variables to be left out of
## randomized parceling. Either variable names or column numbers are allowed.
## @examples
##
## \dontrun{
## ## lavaan syntax for Model A: a 2 Uncorrelated
## ## factor CFA model to be fit to parceled data
##
## parmodelA <- '
## f1 =~ NA*p1f1 + p2f1 + p3f1
## f2 =~ NA*p1f2 + p2f2 + p3f2
## p1f1 ~ 1
## p2f1 ~ 1
## p3f1 ~ 1
## p1f2 ~ 1
## p2f2 ~ 1
## p3f2 ~ 1
## p1f1 ~~ p1f1
## p2f1 ~~ p2f1
## p3f1 ~~ p3f1
## p1f2 ~~ p1f2
## p2f2 ~~ p2f2
## p3f2 ~~ p3f2
## f1 ~~ 1*f1
## f2 ~~ 1*f2
## f1 ~~ 0*f2
## '
##
## ## lavaan syntax for Model B: a 2 Correlated
## ## factor CFA model to be fit to parceled data
##
## parmodelB <- '
## f1 =~ NA*p1f1 + p2f1 + p3f1
## f2 =~ NA*p1f2 + p2f2 + p3f2
## p1f1 ~ 1
## p2f1 ~ 1
## p3f1 ~ 1
## p1f2 ~ 1
## p2f2 ~ 1
## p3f2 ~ 1
## p1f1 ~~ p1f1
## p2f1 ~~ p2f1
## p3f1 ~~ p3f1
## p1f2 ~~ p1f2
## p2f2 ~~ p2f2
## p3f2 ~~ p3f2
## f1 ~~ 1*f1
## f2 ~~ 1*f2
## f1 ~~ f2
## '
##
## ## specify items for each factor
## f1name <- colnames(simParcel)[1:9]
## f2name <- colnames(simParcel)[10:18]
##
## ## run function
## PAVranking(nPerPar = list(c(3,3,3), c(3,3,3)), facPlc = list(f1name,f2name),
## nAlloc = 100, parceloutput = 0, leaveout = 0,
## syntaxA = parmodelA, syntaxB = parmodelB, dataset = simParcel,
## names = list("p1f1","p2f1","p3f1","p1f2","p2f2","p3f2"))
## }
##
# PAVranking <- function(nPerPar, facPlc, nAlloc = 100, parceloutput = 0, syntaxA, syntaxB,
# dataset, names = NULL, leaveout = 0, seed = NA, ...) {
# if (is.null(names))
# names <- matrix(NA, length(nPerPar), 1)
# ## set random seed if specified
# if (is.na(seed) == FALSE)
# set.seed(seed)
# ## allow many tables to be outputted
# options(max.print = 1e+06)
#
# ## Create parceled datasets
# if (is.character(dataset)) dataset <- utils::read.csv(dataset)
# dataset <- as.matrix(dataset)
#
# if (nAlloc < 2)
# stop("Minimum of two allocations required.")
#
# if (is.list(facPlc)) {
# if (is.numeric(facPlc[[1]][1]) == FALSE) {
# facPlcb <- facPlc
# Namesv <- colnames(dataset)
#
# for (i in 1:length(facPlc)) {
# for (j in 1:length(facPlc[[i]])) {
# facPlcb[[i]][j] <- match(facPlc[[i]][j], Namesv)
# }
# facPlcb[[i]] <- as.numeric(facPlcb[[i]])
# }
# facPlc <- facPlcb
# }
#
# # facPlc2 <- rep(0, sum(sapply(facPlc, length)))
# facPlc2 <- rep(0, ncol(dataset))
#
# for (i in 1:length(facPlc)) {
# for (j in 1:length(facPlc[[i]])) {
# facPlc2[facPlc[[i]][j]] <- i
# }
# }
# facPlc <- facPlc2
# }
#
# if (leaveout != 0) {
# if (is.numeric(leaveout) == FALSE) {
# leaveoutb <- rep(0, length(leaveout))
# Namesv <- colnames(dataset)
#
# for (i in 1:length(leaveout)) {
# leaveoutb[i] <- match(leaveout[i], Namesv)
# }
# leaveout <- as.numeric(leaveoutb)
# }
# k1 <- 0.001
# for (i in 1:length(leaveout)) {
# facPlc[leaveout[i]] <- facPlc[leaveout[i]] + k1
# k1 <- k1 + 0.001
# }
# }
#
# if (0 %in% facPlc == TRUE) {
# Zfreq <- sum(facPlc == 0)
# for (i in 1:Zfreq) {
# Zplc <- match(0, facPlc)
# dataset <- dataset[, -Zplc]
# facPlc <- facPlc[-Zplc]
# }
# ## this allows for unused variables in dataset, which are specified by zeros, and
# ## deleted
# }
#
# if (is.list(nPerPar)) {
# nPerPar2 <- c()
# for (i in 1:length(nPerPar)) {
# Onesp <- sum(facPlc > i & facPlc < i + 1)
# nPerPar2 <- c(nPerPar2, nPerPar[i], rep(1, Onesp), recursive = TRUE)
# }
# nPerPar <- nPerPar2
# }
#
# Npp <- c()
# for (i in 1:length(nPerPar)) {
# Npp <- c(Npp, rep(i, nPerPar[i]))
# }
#
# Locate <- sort(round(facPlc))
# Maxv <- max(Locate) - 1
#
# if (length(Locate) != length(Npp))
# stop("Parcels incorrectly specified.\",
# \" Check input!")
#
# if (Maxv > 0) {
# ## Bug was here. With 1 factor Maxv=0. Skip this with a single factor
# for (i in 1:Maxv) {
# Mat <- match(i + 1, Locate)
# if (Npp[Mat] == Npp[Mat - 1])
# stop("Parcels incorrectly specified.\",
# \" Check input!")
# }
# }
# ## warning message if parcel crosses into multiple factors vector, parcel to which
# ## each variable belongs vector, factor to which each variables belongs if
# ## variables are in the same parcel, but different factors error message given in
# ## output
#
# Onevec <- facPlc - round(facPlc)
# NleaveA <- length(Onevec) - sum(Onevec == 0)
# NleaveP <- sum(nPerPar == 1)
#
# if (NleaveA < NleaveP)
# warning("Single-variable parcels have been requested.\",
# \" Check input!")
#
# if (NleaveA > NleaveP)
# warning("More non-parceled variables have been", " requested than provided for in parcel",
# " vector. Check input!")
#
# if (length(names) > 1) {
# if (length(names) != length(nPerPar))
# warning("Number of parcel names provided not equal to number", " of parcels requested")
# }
#
# Data <- c(1:ncol(dataset))
# ## creates a vector of the number of indicators e.g. for three indicators, c(1, 2,
# ## 3)
# Nfactors <- max(facPlc)
# ## scalar, number of factors
# Nindicators <- length(Data)
# ## scalar, number of indicators
# Npar <- length(nPerPar)
# ## scalar, number of parcels
# Rmize <- runif(Nindicators, 1, Nindicators)
# ## create vector of randomly ordered numbers, length of number of indicators
#
# Data <- rbind(facPlc, Rmize, Data)
# ## 'Data' becomes object of three rows, consisting of 1) factor to which each
# ## indicator belongs (in order to preserve indicator/factor assignment during
# ## randomization) 2) randomly order numbers 3) indicator number
#
# Results <- matrix(numeric(0), nAlloc, Nindicators)
# ## create empty matrix for parcel allocation matrix
#
# Pin <- nPerPar[1]
# for (i in 2:length(nPerPar)) {
# Pin <- c(Pin, nPerPar[i] + Pin[i - 1])
# ## creates vector which indicates the range of columns (endpoints) in each parcel
# }
#
# for (i in 1:nAlloc) {
# Data[2, ] <- runif(Nindicators, 1, Nindicators)
# ## Replace second row with newly randomly ordered numbers
#
# Data <- Data[, order(Data[2, ])]
# ## Order the columns according to the values of the second row
#
# Data <- Data[, order(Data[1, ])]
# ## Order the columns according to the values of the first row in order to preserve
# ## factor assignment
#
# Results[i, ] <- Data[3, ]
# ## assign result to allocation matrix
# }
#
# Alpha <- rbind(Results[1, ], dataset)
# ## bind first random allocation to dataset 'Alpha'
#
# Allocations <- list()
# ## create empty list for allocation data matrices
#
# for (i in 1:nAlloc) {
# Ineff <- rep(NA, ncol(Results))
# Ineff2 <- c(1:ncol(Results))
# for (inefficient in 1:ncol(Results)) {
# Ineff[Results[i, inefficient]] <- Ineff2[inefficient]
# }
#
# Alpha[1, ] <- Ineff
# ## replace first row of dataset matrix with row 'i' from allocation matrix
#
# Beta <- Alpha[, order(Alpha[1, ])]
# ## arrangle dataset columns by values of first row assign to temporary matrix
# ## 'Beta'
#
# Temp <- matrix(NA, nrow(dataset), Npar)
# ## create empty matrix for averaged parcel variables
#
# TempAA <- if (length(1:Pin[1]) > 1)
# Beta[2:nrow(Beta), 1:Pin[1]] else cbind(Beta[2:nrow(Beta), 1:Pin[1]], Beta[2:nrow(Beta), 1:Pin[1]])
# Temp[, 1] <- rowMeans(TempAA, na.rm = TRUE)
# ## fill first column with averages from assigned indicators
# for (al in 2:Npar) {
# Plc <- Pin[al - 1] + 1
# ## placeholder variable for determining parcel width
# TempBB <- if (length(Plc:Pin[al]) > 1)
# Beta[2:nrow(Beta), Plc:Pin[al]] else cbind(Beta[2:nrow(Beta), Plc:Pin[al]], Beta[2:nrow(Beta), Plc:Pin[al]])
# Temp[, al] <- rowMeans(TempBB, na.rm = TRUE)
# ## fill remaining columns with averages from assigned indicators
# }
# if (length(names) > 1)
# colnames(Temp) <- names
# Allocations[[i]] <- Temp
# ## assign result to list of parcel datasets
# }
#
# ## Write parceled datasets
# if (as.vector(regexpr("/", parceloutput)) != -1) {
# replist <- matrix(NA, nAlloc, 1)
# for (i in 1:nAlloc) {
# ## if (is.na(names)==TRUE) names <- matrix(NA,nrow(
# colnames(Allocations[[i]]) <- names
# utils::write.table(Allocations[[i]], paste(parceloutput, "/parcelruns", i,
# ".dat", sep = ""),
# row.names = FALSE, col.names = TRUE)
# replist[i, 1] <- paste("parcelruns", i, ".dat", sep = "")
# }
# utils::write.table(replist, paste(parceloutput, "/parcelrunsreplist.dat",
# sep = ""),
# quote = FALSE, row.names = FALSE, col.names = FALSE)
# }
#
#
# ## Model A estimation
#
# {
# Param_A <- list()
# ## list for parameter estimated for each imputation
# Fitind_A <- list()
# ## list for fit indices estimated for each imputation
# Converged_A <- list()
# ## list for whether or not each allocation converged
# ProperSolution_A <- list()
# ## list for whether or not each allocation has proper solutions
# ConvergedProper_A <- list()
# ## list for whether or not each allocation converged and has proper solutions
#
# for (i in 1:nAlloc) {
# data_A <- as.data.frame(Allocations[[i]], row.names = NULL, optional = FALSE)
# ## convert allocation matrix to dataframe for model estimation
# fit_A <- lavaan::sem(syntaxA, data = data_A, ...)
# ## estimate model in lavaan
# if (lavInspect(fit_A, "converged") == TRUE) {
# Converged_A[[i]] <- 1
# } else Converged_A[[i]] <- 0
# ## determine whether or not each allocation converged
# Param_A[[i]] <- lavaan::parameterEstimates(fit_A)[, c("lhs", "op", "rhs",
# "est", "se", "z", "pvalue", "ci.lower", "ci.upper")]
# ## assign allocation parameter estimates to list
# if (lavInspect(fit_A, "post.check") == TRUE & Converged_A[[i]] == 1) {
# ProperSolution_A[[i]] <- 1
# } else ProperSolution_A[[i]] <- 0
# ## determine whether or not each allocation has proper solutions
# if (any(is.na(Param_A[[i]][, 5] == TRUE)))
# ProperSolution_A[[i]] <- 0
# ## make sure each allocation has existing SE
# if (Converged_A[[i]] == 1 & ProperSolution_A[[i]] == 1) {
# ConvergedProper_A[[i]] <- 1
# } else ConvergedProper_A[[i]] <- 0
# ## determine whether or not each allocation converged and has proper solutions
#
# if (ConvergedProper_A[[i]] == 0)
# Param_A[[i]][, 4:9] <- matrix(data = NA, nrow(Param_A[[i]]), 6)
# ## make parameter estimates null for nonconverged, improper solutions
#
# if (ConvergedProper_A[[i]] == 1) {
# Fitind_A[[i]] <- lavaan::fitMeasures(fit_A, c("chisq", "df", "cfi",
# "tli", "rmsea", "srmr", "logl", "bic", "aic"))
# } else Fitind_A[[i]] <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA)
# ### assign allocation parameter estimates to list
#
# }
#
#
# nConverged_A <- Reduce("+", Converged_A)
# ## count number of converged allocations
#
# nProperSolution_A <- Reduce("+", ProperSolution_A)
# ## count number of allocations with proper solutions
#
# nConvergedProper_A <- Reduce("+", ConvergedProper_A)
# ## count number of allocations with proper solutions
#
# if (nConvergedProper_A == 0)
# stop("All allocations failed to converge and/or yielded improper solutions for Model A and/or B.")
# ## stop program if no allocations converge
#
# Parmn_A <- Param_A[[1]]
# ## assign first parameter estimates to mean dataframe
#
# ParSE_A <- matrix(NA, nrow(Parmn_A), nAlloc)
# ParSEmn_A <- Parmn_A[, 5]
#
# Parsd_A <- matrix(NA, nrow(Parmn_A), nAlloc)
# ## assign parameter estimates for S.D. calculation
#
# Fitmn_A <- Fitind_A[[1]]
# ## assign first fit indices to mean dataframe
#
# Fitsd_A <- matrix(NA, length(Fitmn_A), nAlloc)
# ## assign fit indices for S.D. calculation
#
# Sigp_A <- matrix(NA, nrow(Parmn_A), nAlloc)
# ## assign p-values to calculate percentage significant
#
# Fitind_A <- data.frame(Fitind_A)
# ### convert fit index table to data frame
#
# for (i in 1:nAlloc) {
#
# Parsd_A[, i] <- Param_A[[i]][, 4]
# ## assign parameter estimates for S.D. estimation
#
# ParSE_A[, i] <- Param_A[[i]][, 5]
#
# if (i > 1) {
# ParSEmn_A <- rowSums(cbind(ParSEmn_A, Param_A[[i]][, 5]), na.rm = TRUE)
# }
#
# Sigp_A[, ncol(Sigp_A) - i + 1] <- Param_A[[i]][, 7]
# ## assign p-values to calculate percentage significant
#
# Fitsd_A[, i] <- Fitind_A[[i]]
# ## assign fit indices for S.D. estimation
#
# if (i > 1) {
# Parmn_A[, 4:ncol(Parmn_A)] <- rowSums(cbind(Parmn_A[, 4:ncol(Parmn_A)],
# Param_A[[i]][, 4:ncol(Parmn_A)]), na.rm = TRUE)
# }
# ## add together all parameter estimates
#
# if (i > 1)
# Fitmn_A <- rowSums(cbind(Fitmn_A, Fitind_A[[i]]), na.rm = TRUE)
# ## add together all fit indices
#
# }
#
#
# Sigp_A <- Sigp_A + 0.45
# Sigp_A <- apply(Sigp_A, c(1, 2), round)
# Sigp_A <- 1 - as.vector(rowMeans(Sigp_A, na.rm = TRUE))
# ## calculate percentage significant parameters
#
# Parsum_A <- cbind(apply(Parsd_A, 1, mean, na.rm = TRUE),
# apply(Parsd_A, 1, sd, na.rm = TRUE),
# apply(Parsd_A, 1, max, na.rm = TRUE),
# apply(Parsd_A, 1, min, na.rm = TRUE),
# apply(Parsd_A, 1, max, na.rm = TRUE) - apply(Parsd_A, 1, min, na.rm = TRUE), Sigp_A * 100)
# colnames(Parsum_A) <- c("Avg Est.", "S.D.", "MAX", "MIN", "Range", "% Sig")
# ## calculate parameter S.D., minimum, maximum, range, bind to percentage
# ## significant
#
# ParSEmn_A <- Parmn_A[, 1:3]
# ParSEfn_A <- cbind(ParSEmn_A, apply(ParSE_A, 1, mean, na.rm = TRUE),
# apply(ParSE_A, 1, sd, na.rm = TRUE),
# apply(ParSE_A, 1, max, na.rm = TRUE),
# apply(ParSE_A, 1, min, na.rm = TRUE),
# apply(ParSE_A, 1, max, na.rm = TRUE) - apply(ParSE_A, 1, min, na.rm = TRUE))
# colnames(ParSEfn_A) <- c("lhs", "op", "rhs", "Avg SE", "S.D.",
# "MAX", "MIN", "Range")
#
# Fitsum_A <- cbind(apply(Fitsd_A, 1, mean, na.rm = TRUE),
# apply(Fitsd_A, 1, sd, na.rm = TRUE),
# apply(Fitsd_A, 1, max, na.rm = TRUE),
# apply(Fitsd_A, 1, min, na.rm = TRUE),
# apply(Fitsd_A, 1, max, na.rm = TRUE) - apply(Fitsd_A, 1, min, na.rm = TRUE))
# rownames(Fitsum_A) <- c("chisq", "df", "cfi", "tli", "rmsea", "srmr", "logl",
# "bic", "aic")
# ## calculate fit S.D., minimum, maximum, range
#
# Parmn_A[, 4:ncol(Parmn_A)] <- Parmn_A[, 4:ncol(Parmn_A)]/nConvergedProper_A
# ## divide totalled parameter estimates by number converged allocations
# Parmn_A <- Parmn_A[, 1:3]
# ## remove confidence intervals from output
# Parmn_A <- cbind(Parmn_A, Parsum_A)
# ## bind parameter average estimates to cross-allocation information
# Fitmn_A <- Fitmn_A/nConvergedProper_A
# ## divide totalled fit indices by number converged allocations
#
# pChisq_A <- list()
# ## create empty list for Chi-square p-values
# sigChisq_A <- list()
# ## create empty list for Chi-square significance
#
# for (i in 1:nAlloc) {
# pChisq_A[[i]] <- (1 - pchisq(Fitsd_A[1, i], Fitsd_A[2, i]))
# ## calculate p-value for each Chi-square
# if (is.na(pChisq_A[[i]]) == FALSE & pChisq_A[[i]] < 0.05) {
# sigChisq_A[[i]] <- 1
# } else sigChisq_A[[i]] <- 0
# }
# ## count number of allocations with significant chi-square
#
# PerSigChisq_A <- (Reduce("+", sigChisq_A))/nConvergedProper_A * 100
# PerSigChisq_A <- round(PerSigChisq_A, 3)
# ## calculate percent of allocations with significant chi-square
#
# PerSigChisqCol_A <- c(PerSigChisq_A, "n/a", "n/a", "n/a", "n/a", "n/a", "n/a",
# "n/a", "n/a")
# ## create list of Chi-square Percent Significant and 'n/a' (used for fit summary
# ## table)
#
# options(stringsAsFactors = FALSE)
# ## set default option to allow strings into dataframe without converting to factors
#
# Fitsum_A <- data.frame(Fitsum_A, PerSigChisqCol_A)
# colnames(Fitsum_A) <- c("Avg Ind", "S.D.", "MAX", "MIN", "Range", "% Sig")
# ### bind to fit averages
#
# options(stringsAsFactors = TRUE)
# ## unset option to allow strings into dataframe without converting to factors
#
# ParSEfn_A[, 4:8] <- apply(ParSEfn_A[, 4:8], 2, round, digits = 3)
# Parmn_A[, 4:9] <- apply(Parmn_A[, 4:9], 2, round, digits = 3)
# Fitsum_A[, 1:5] <- apply(Fitsum_A[, 1:5], 2, round, digits = 3)
# ## round output to three digits
#
# Fitsum_A[2, 2:5] <- c("n/a", "n/a", "n/a", "n/a")
# ## Change df row to 'n/a' for sd, max, min, and range
#
# Output_A <- list(Parmn_A, ParSEfn_A, Fitsum_A)
# names(Output_A) <- c("Estimates_A", "SE_A", "Fit_A")
# ## output summary for model A
#
# }
#
# ## Model B estimation
#
# {
# Param <- list()
# ## list for parameter estimated for each imputation
# Fitind <- list()
# ## list for fit indices estimated for each imputation
# Converged <- list()
# ## list for whether or not each allocation converged
# ProperSolution <- list()
# ## list for whether or not each allocation has proper solutions
# ConvergedProper <- list()
# ## list for whether or not each allocation is converged and proper
#
# for (i in 1:nAlloc) {
# data <- as.data.frame(Allocations[[i]], row.names = NULL, optional = FALSE)
# ## convert allocation matrix to dataframe for model estimation
# fit <- lavaan::sem(syntaxB, data = data, ...)
# ## estimate model in lavaan
# if (lavInspect(fit, "converged") == TRUE) {
# Converged[[i]] <- 1
# } else Converged[[i]] <- 0
# ## determine whether or not each allocation converged
# Param[[i]] <- lavaan::parameterEstimates(fit)[, c("lhs", "op", "rhs",
# "est", "se", "z", "pvalue", "ci.lower", "ci.upper")]
# ## assign allocation parameter estimates to list
# if (lavInspect(fit, "post.check") == TRUE & Converged[[i]] == 1) {
# ProperSolution[[i]] <- 1
# } else ProperSolution[[i]] <- 0
# ## determine whether or not each allocation has proper solutions
# if (any(is.na(Param[[i]][, 5] == TRUE)))
# ProperSolution[[i]] <- 0
# ## make sure each allocation has existing SE
# if (Converged[[i]] == 1 & ProperSolution[[i]] == 1) {
# ConvergedProper[[i]] <- 1
# } else ConvergedProper[[i]] <- 0
# ## determine whether or not each allocation converged and has proper solutions
#
# if (ConvergedProper[[i]] == 0)
# Param[[i]] <- matrix(data = NA, nrow(Param[[i]]), ncol(Param[[i]]))
# ## make parameter estimates null for nonconverged, improper solutions
#
# if (ConvergedProper[[i]] == 1) {
# Fitind[[i]] <- lavaan::fitMeasures(fit, c("chisq", "df", "cfi", "tli",
# "rmsea", "srmr", "logl", "bic", "aic"))
# } else Fitind[[i]] <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA)
# ### assign allocation parameter estimates to list
#
#
# }
#
#
#
#
# nConverged <- Reduce("+", Converged)
# ## count number of converged allocations
#
# nProperSolution <- Reduce("+", ProperSolution)
# ## count number of allocations with proper solutions
#
# nConvergedProper <- Reduce("+", ConvergedProper)
# ## count number of allocations with proper solutions
#
# if (nConvergedProper == 0)
# stop("All allocations failed to converge", " and/or yielded improper solutions for",
# " Model A and/or B.")
# ## stop program if no allocations converge
#
# Parmn <- Param[[1]]
# ## assign first parameter estimates to mean dataframe
#
# ParSE <- matrix(NA, nrow(Parmn), nAlloc)
# ParSEmn <- Parmn[, 5]
#
# Parsd <- matrix(NA, nrow(Parmn), nAlloc)
# ## assign parameter estimates for S.D. calculation
#
# Fitmn <- Fitind[[1]]
# ## assign first fit indices to mean dataframe
#
# Fitsd <- matrix(NA, length(Fitmn), nAlloc)
# ## assign fit indices for S.D. calculation
#
# Sigp <- matrix(NA, nrow(Parmn), nAlloc)
# ## assign p-values to calculate percentage significant
#
# Fitind <- data.frame(Fitind)
# ### convert fit index table to dataframe
#
#
# for (i in 1:nAlloc) {
#
# Parsd[, i] <- Param[[i]][, 4]
# ## assign parameter estimates for S.D. estimation
#
# ParSE[, i] <- Param[[i]][, 5]
#
# if (i > 1)
# ParSEmn <- rowSums(cbind(ParSEmn, Param[[i]][, 5]), na.rm = TRUE)
#
# Sigp[, ncol(Sigp) - i + 1] <- Param[[i]][, 7]
# ## assign p-values to calculate percentage significant
#
#
# Fitsd[, i] <- Fitind[[i]]
# ## assign fit indices for S.D. estimation
#
# if (i > 1) {
# Parmn[, 4:ncol(Parmn)] <- rowSums(cbind(Parmn[, 4:ncol(Parmn)], Param[[i]][,
# 4:ncol(Parmn)]), na.rm = TRUE)
# }
# ## add together all parameter estimates
#
# if (i > 1)
# Fitmn <- rowSums(cbind(Fitmn, Fitind[[i]]), na.rm = TRUE)
# ## add together all fit indices
#
# }
#
#
# Sigp <- Sigp + 0.45
# Sigp <- apply(Sigp, c(1, 2), round)
# Sigp <- 1 - as.vector(rowMeans(Sigp, na.rm = TRUE))
# ## calculate percentage significant parameters
#
# Parsum <- cbind(apply(Parsd, 1, mean, na.rm = TRUE), apply(Parsd, 1, sd, na.rm = TRUE),
# apply(Parsd, 1, max, na.rm = TRUE), apply(Parsd, 1, min, na.rm = TRUE),
# apply(Parsd, 1, max, na.rm = TRUE) - apply(Parsd, 1, min, na.rm = TRUE),
# Sigp * 100)
# colnames(Parsum) <- c("Avg Est", "S.D.", "MAX", "MIN", "Range", "% Sig")
# ## calculate parameter S.D., minimum, maximum, range, bind to percentage
# ## significant
#
# ParSEmn <- Parmn[, 1:3]
# ParSEfn <- cbind(ParSEmn, apply(ParSE, 1, mean, na.rm = TRUE), apply(ParSE,
# 1, sd, na.rm = TRUE), apply(ParSE, 1, max, na.rm = TRUE), apply(ParSE,
# 1, min, na.rm = TRUE), apply(ParSE, 1, max, na.rm = TRUE) - apply(ParSE,
# 1, min, na.rm = TRUE))
# colnames(ParSEfn) <- c("lhs", "op", "rhs", "Avg SE", "S.D.", "MAX", "MIN",
# "Range")
#
# Fitsum <- cbind(apply(Fitsd, 1, mean, na.rm = TRUE), apply(Fitsd, 1, sd, na.rm = TRUE),
# apply(Fitsd, 1, max, na.rm = TRUE), apply(Fitsd, 1, min, na.rm = TRUE),
# apply(Fitsd, 1, max, na.rm = TRUE) - apply(Fitsd, 1, min, na.rm = TRUE))
# rownames(Fitsum) <- c("chisq", "df", "cfi", "tli", "rmsea", "srmr", "logl",
# "bic", "aic")
# ## calculate fit S.D., minimum, maximum, range
#
# Parmn[, 4:ncol(Parmn)] <- Parmn[, 4:ncol(Parmn)]/nConvergedProper
# ## divide totalled parameter estimates by number converged allocations
# Parmn <- Parmn[, 1:3]
# ## remove confidence intervals from output
# Parmn <- cbind(Parmn, Parsum)
# ## bind parameter average estimates to cross-allocation information
# Fitmn <- as.numeric(Fitmn)
# ## make fit index values numeric
# Fitmn <- Fitmn/nConvergedProper
# ## divide totalled fit indices by number converged allocations
#
# pChisq <- list()
# ## create empty list for Chi-square p-values
# sigChisq <- list()
# ## create empty list for Chi-square significance
#
# for (i in 1:nAlloc) {
#
# pChisq[[i]] <- (1 - pchisq(Fitsd[1, i], Fitsd[2, i]))
# ## calculate p-value for each Chi-square
#
# if (is.na(pChisq[[i]]) == FALSE & pChisq[[i]] < 0.05) {
# sigChisq[[i]] <- 1
# } else sigChisq[[i]] <- 0
# }
# ## count number of allocations with significant chi-square
#
# PerSigChisq <- (Reduce("+", sigChisq))/nConvergedProper * 100
# PerSigChisq <- round(PerSigChisq, 3)
# ## calculate percent of allocations with significant chi-square
#
# PerSigChisqCol <- c(PerSigChisq, "n/a", "n/a", "n/a", "n/a", "n/a", "n/a",
# "n/a", "n/a")
# ## create list of Chi-square Percent Significant and 'n/a' (used for fit summary
# ## table)
#
# options(stringsAsFactors = FALSE)
# ## set default option to allow strings into dataframe without converting to factors
#
# Fitsum <- data.frame(Fitsum, PerSigChisqCol)
# colnames(Fitsum) <- c("Avg Ind", "S.D.", "MAX", "MIN", "Range", "% Sig")
# ### bind to fit averages
#
# options(stringsAsFactors = TRUE)
# ## unset option to allow strings into dataframe without converting to factors
#
# ParSEfn[, 4:8] <- apply(ParSEfn[, 4:8], 2, round, digits = 3)
# Parmn[, 4:9] <- apply(Parmn[, 4:9], 2, round, digits = 3)
# Fitsum[, 1:5] <- apply(Fitsum[, 1:5], 2, round, digits = 3)
# ## round output to three digits
#
# Fitsum[2, 2:5] <- c("n/a", "n/a", "n/a", "n/a")
# ## Change df row to 'n/a' for sd, max, min, and range
#
# Output_B <- list(Parmn, ParSEfn, Fitsum)
# names(Output_B) <- c("Estimates_B", "SE_B", "Fit_B")
# ## output summary for model A
#
# }
#
# ## Model Comparison (everything in this section is new)
#
# {
# Converged_AB <- list()
# ## create list of convergence comparison for each allocation
# ProperSolution_AB <- list()
# ## create list of proper solution comparison for each allocation
# ConvergedProper_AB <- list()
# ## create list of convergence and proper solution comparison for each allocation
# lrtest_AB <- list()
# ## create list for likelihood ratio test for each allocation
# lrchisq_AB <- list()
# ## create list for likelihood ratio chi square value
# lrchisqp_AB <- list()
# ## create list for likelihood ratio test p-value
# lrsig_AB <- list()
# ## create list for likelihood ratio test significance
#
# for (i in 1:nAlloc) {
# if (Converged_A[[i]] == 1 & Converged[[i]] == 1) {
# Converged_AB[[i]] <- 1
# } else Converged_AB[[i]] <- 0
# ## compare convergence
#
# if (ProperSolution_A[[i]] == 1 & ProperSolution[[i]] == 1) {
# ProperSolution_AB[[i]] <- 1
# } else ProperSolution_AB[[i]] <- 0
# ## compare existence of proper solutions
#
# if (ConvergedProper_A[[i]] == 1 & ConvergedProper[[i]] == 1) {
# ConvergedProper_AB[[i]] <- 1
# } else ConvergedProper_AB[[i]] <- 0
# ## compare existence of proper solutions and convergence
#
#
#
# if (ConvergedProper_AB[[i]] == 1) {
#
# data <- as.data.frame(Allocations[[i]], row.names = NULL, optional = FALSE)
# ## convert allocation matrix to dataframe for model estimation
# fit_A <- lavaan::sem(syntaxA, data = data, ...)
# ## estimate model A in lavaan
# fit <- lavaan::sem(syntaxB, data = data, ...)
# ## estimate model B in lavaan
# lrtest_AB[[i]] <- lavaan::lavTestLRT(fit_A, fit)
# ## likelihood ratio test comparing A and B
# lrtestd_AB <- as.data.frame(lrtest_AB[[i]], row.names = NULL, optional = FALSE)
# ## convert lrtest results to dataframe
# lrchisq_AB[[i]] <- lrtestd_AB[2, 5]
# ## write lrtest chisq as single numeric variable
# lrchisqp_AB[[i]] <- lrtestd_AB[2, 7]
# ## write lrtest p-value as single numeric variable
# if (lrchisqp_AB[[i]] < 0.05) {
# lrsig_AB[[i]] <- 1
# } else {
# lrsig_AB[[i]] <- 0
# }
# ## determine statistical significance of lrtest
#
# }
# }
#
# lrchisqp_AB <- unlist(lrchisqp_AB, recursive = TRUE, use.names = TRUE)
# ## convert lrchisqp_AB from list to vector
# lrchisqp_AB <- as.numeric(lrchisqp_AB)
# ## make lrchisqp_AB numeric
# lrsig_AB <- unlist(lrsig_AB, recursive = TRUE, use.names = TRUE)
# ## convert lrsig_AB from list to vector
# lrsig_AB <- as.numeric(lrsig_AB)
# ### make lrsig_AB numeric
#
#
# nConverged_AB <- Reduce("+", Converged_AB)
# ## count number of allocations that converged for both A and B
# nProperSolution_AB <- Reduce("+", ProperSolution_AB)
# ## count number of allocations with proper solutions for both A and B
# nConvergedProper_AB <- Reduce("+", ConvergedProper_AB)
# ## count number of allocations that converged and have proper solutions for both A
# ## and B
# ProConverged_AB <- (nConverged_AB/nAlloc) * 100
# ## calc proportion of allocations that converged for both A and B
# nlrsig_AB <- Reduce("+", lrsig_AB)
# ## count number of allocations with significant lrtest between A and B
# Prolrsig_AB <- (nlrsig_AB/nConvergedProper_AB) * 100
# ## calc proportion of allocations with significant lrtest between A and B
# lrchisq_AB <- unlist(lrchisq_AB, recursive = TRUE, use.names = TRUE)
# ### convert lrchisq_AB from list to vector
# lrchisq_AB <- as.numeric(lrchisq_AB)
# ### make lrchisq_AB numeric
# AvgLRT_AB <- (Reduce("+", lrchisq_AB))/nConvergedProper_AB
# ## calc average LRT
#
# LRTsum <- cbind(AvgLRT_AB, lrtestd_AB[2, 3], sd(lrchisq_AB, na.rm = TRUE),
# max(lrchisq_AB), min(lrchisq_AB),
# max(lrchisq_AB) - min(lrchisq_AB), Prolrsig_AB)
# colnames(LRTsum) <- c("Avg LRT", "df", "S.D.", "MAX", "MIN", "Range", "% Sig")
# ## calculate LRT distribution statistics
#
# FitDiff_AB <- Fitsd_A - Fitsd
# ## compute fit index difference matrix
#
# for (i in 1:nAlloc) {
# if (ConvergedProper_AB[[i]] != 1)
# FitDiff_AB[1:9, i] <- 0
# }
# ### make fit differences zero for each non-converged allocation
#
# BICDiff_AB <- list()
# AICDiff_AB <- list()
# RMSEADiff_AB <- list()
# CFIDiff_AB <- list()
# TLIDiff_AB <- list()
# SRMRDiff_AB <- list()
# BICDiffGT10_AB <- list()
# ## create list noting each allocation in which A is preferred over B
#
# BICDiff_BA <- list()
# AICDiff_BA <- list()
# RMSEADiff_BA <- list()
# CFIDiff_BA <- list()
# TLIDiff_BA <- list()
# SRMRDiff_BA <- list()
# BICDiffGT10_BA <- list()
# ## create list noting each allocation in which B is preferred over A
#
# for (i in 1:nAlloc) {
# if (FitDiff_AB[8, i] < 0) {
# BICDiff_AB[[i]] <- 1
# } else BICDiff_AB[[i]] <- 0
# if (FitDiff_AB[9, i] < 0) {
# AICDiff_AB[[i]] <- 1
# } else AICDiff_AB[[i]] <- 0
# if (FitDiff_AB[5, i] < 0) {
# RMSEADiff_AB[[i]] <- 1
# } else RMSEADiff_AB[[i]] <- 0
# if (FitDiff_AB[3, i] > 0) {
# CFIDiff_AB[[i]] <- 1
# } else CFIDiff_AB[[i]] <- 0
# if (FitDiff_AB[4, i] > 0) {
# TLIDiff_AB[[i]] <- 1
# } else TLIDiff_AB[[i]] <- 0
# if (FitDiff_AB[6, i] < 0) {
# SRMRDiff_AB[[i]] <- 1
# } else SRMRDiff_AB[[i]] <- 0
# if (FitDiff_AB[8, i] < (-10)) {
# BICDiffGT10_AB[[i]] <- 1
# } else BICDiffGT10_AB[[i]] <- 0
# }
# nBIC_AoverB <- Reduce("+", BICDiff_AB)
# nAIC_AoverB <- Reduce("+", AICDiff_AB)
# nRMSEA_AoverB <- Reduce("+", RMSEADiff_AB)
# nCFI_AoverB <- Reduce("+", CFIDiff_AB)
# nTLI_AoverB <- Reduce("+", TLIDiff_AB)
# nSRMR_AoverB <- Reduce("+", SRMRDiff_AB)
# nBICDiffGT10_AoverB <- Reduce("+", BICDiffGT10_AB)
# ## compute number of 'A preferred over B' for each fit index
#
# for (i in 1:nAlloc) {
# if (FitDiff_AB[8, i] > 0) {
# BICDiff_BA[[i]] <- 1
# } else BICDiff_BA[[i]] <- 0
# if (FitDiff_AB[9, i] > 0) {
# AICDiff_BA[[i]] <- 1
# } else AICDiff_BA[[i]] <- 0
# if (FitDiff_AB[5, i] > 0) {
# RMSEADiff_BA[[i]] <- 1
# } else RMSEADiff_BA[[i]] <- 0
# if (FitDiff_AB[3, i] < 0) {
# CFIDiff_BA[[i]] <- 1
# } else CFIDiff_BA[[i]] <- 0
# if (FitDiff_AB[4, i] < 0) {
# TLIDiff_BA[[i]] <- 1
# } else TLIDiff_BA[[i]] <- 0
# if (FitDiff_AB[6, i] > 0) {
# SRMRDiff_BA[[i]] <- 1
# } else SRMRDiff_BA[[i]] <- 0
# if (FitDiff_AB[8, i] > (10)) {
# BICDiffGT10_BA[[i]] <- 1
# } else BICDiffGT10_BA[[i]] <- 0
# }
# nBIC_BoverA <- Reduce("+", BICDiff_BA)
# nAIC_BoverA <- Reduce("+", AICDiff_BA)
# nRMSEA_BoverA <- Reduce("+", RMSEADiff_BA)
# nCFI_BoverA <- Reduce("+", CFIDiff_BA)
# nTLI_BoverA <- Reduce("+", TLIDiff_BA)
# nSRMR_BoverA <- Reduce("+", SRMRDiff_BA)
# nBICDiffGT10_BoverA <- Reduce("+", BICDiffGT10_BA)
# ## compute number of 'B preferred over A' for each fit index
#
# BICDiffAvgtemp <- list()
# AICDiffAvgtemp <- list()
# RMSEADiffAvgtemp <- list()
# CFIDiffAvgtemp <- list()
# TLIDiffAvgtemp <- list()
# SRMRDiffAvgtemp <- list()
# BICgt10DiffAvgtemp <- list()
# ## create empty list for average fit index differences
#
# for (i in 1:nAlloc) {
# if (BICDiff_AB[[i]] != 1) {
# BICDiffAvgtemp[[i]] <- 0
# } else BICDiffAvgtemp[[i]] <- FitDiff_AB[8, i]
# if (AICDiff_AB[[i]] != 1) {
# AICDiffAvgtemp[[i]] <- 0
# } else AICDiffAvgtemp[[i]] <- FitDiff_AB[9, i]
# if (RMSEADiff_AB[[i]] != 1) {
# RMSEADiffAvgtemp[[i]] <- 0
# } else RMSEADiffAvgtemp[[i]] <- FitDiff_AB[5, i]
# if (CFIDiff_AB[[i]] != 1) {
# CFIDiffAvgtemp[[i]] <- 0
# } else CFIDiffAvgtemp[[i]] <- FitDiff_AB[3, i]
# if (TLIDiff_AB[[i]] != 1) {
# TLIDiffAvgtemp[[i]] <- 0
# } else TLIDiffAvgtemp[[i]] <- FitDiff_AB[4, i]
# if (SRMRDiff_AB[[i]] != 1) {
# SRMRDiffAvgtemp[[i]] <- 0
# } else SRMRDiffAvgtemp[[i]] <- FitDiff_AB[6, i]
# if (BICDiffGT10_AB[[i]] != 1) {
# BICgt10DiffAvgtemp[[i]] <- 0
# } else BICgt10DiffAvgtemp[[i]] <- FitDiff_AB[8, i]
# }
# ## make average fit index difference list composed solely of values where A is
# ## preferred over B
#
# BICDiffAvg_AB <- Reduce("+", BICDiffAvgtemp)/nBIC_AoverB * (-1)
# AICDiffAvg_AB <- Reduce("+", AICDiffAvgtemp)/nAIC_AoverB * (-1)
# RMSEADiffAvg_AB <- Reduce("+", RMSEADiffAvgtemp)/nRMSEA_AoverB * (-1)
# CFIDiffAvg_AB <- Reduce("+", CFIDiffAvgtemp)/nCFI_AoverB
# TLIDiffAvg_AB <- Reduce("+", TLIDiffAvgtemp)/nTLI_AoverB
# SRMRDiffAvg_AB <- Reduce("+", SRMRDiffAvgtemp)/nSRMR_AoverB * (-1)
# BICgt10DiffAvg_AB <- Reduce("+", BICgt10DiffAvgtemp)/nBICDiffGT10_AoverB *
# (-1)
# ## calc average fit index difference when A is preferred over B
#
# FitDiffAvg_AoverB <- list(BICDiffAvg_AB, AICDiffAvg_AB, RMSEADiffAvg_AB, CFIDiffAvg_AB,
# TLIDiffAvg_AB, SRMRDiffAvg_AB)
# ## create list of all fit index differences when A is preferred over B
#
# FitDiffAvg_AoverB <- unlist(FitDiffAvg_AoverB, recursive = TRUE, use.names = TRUE)
# ### convert from list to vector
#
# for (i in 1:nAlloc) {
# if (BICDiff_BA[[i]] != 1) {
# BICDiffAvgtemp[[i]] <- 0
# } else BICDiffAvgtemp[[i]] <- FitDiff_AB[8, i]
# if (AICDiff_BA[[i]] != 1) {
# AICDiffAvgtemp[[i]] <- 0
# } else AICDiffAvgtemp[[i]] <- FitDiff_AB[9, i]
# if (RMSEADiff_BA[[i]] != 1) {
# RMSEADiffAvgtemp[[i]] <- 0
# } else RMSEADiffAvgtemp[[i]] <- FitDiff_AB[5, i]
# if (CFIDiff_BA[[i]] != 1) {
# CFIDiffAvgtemp[[i]] <- 0
# } else CFIDiffAvgtemp[[i]] <- FitDiff_AB[3, i]
# if (TLIDiff_BA[[i]] != 1) {
# TLIDiffAvgtemp[[i]] <- 0
# } else TLIDiffAvgtemp[[i]] <- FitDiff_AB[4, i]
# if (SRMRDiff_BA[[i]] != 1) {
# SRMRDiffAvgtemp[[i]] <- 0
# } else SRMRDiffAvgtemp[[i]] <- FitDiff_AB[6, i]
# if (BICDiffGT10_BA[[i]] != 1) {
# BICgt10DiffAvgtemp[[i]] <- 0
# } else BICgt10DiffAvgtemp[[i]] <- FitDiff_AB[8, i]
# }
# ## make average fit index difference list composed solely of values where B is
# ## preferred over A
#
# BICDiffAvg_BA <- Reduce("+", BICDiffAvgtemp)/nBIC_BoverA
# AICDiffAvg_BA <- Reduce("+", AICDiffAvgtemp)/nAIC_BoverA
# RMSEADiffAvg_BA <- Reduce("+", RMSEADiffAvgtemp)/nRMSEA_BoverA
# CFIDiffAvg_BA <- Reduce("+", CFIDiffAvgtemp)/nCFI_BoverA * (-1)
# TLIDiffAvg_BA <- Reduce("+", TLIDiffAvgtemp)/nTLI_BoverA * (-1)
# SRMRDiffAvg_BA <- Reduce("+", SRMRDiffAvgtemp)/nSRMR_BoverA
# BICgt10DiffAvg_BA <- Reduce("+", BICgt10DiffAvgtemp)/nBICDiffGT10_BoverA
# ## calc average fit index difference when B is preferred over A
#
# FitDiffAvg_BoverA <- list(BICDiffAvg_BA, AICDiffAvg_BA, RMSEADiffAvg_BA, CFIDiffAvg_BA,
# TLIDiffAvg_BA, SRMRDiffAvg_BA)
# ## create list of all fit index differences when B is preferred over A
#
# FitDiffAvg_BoverA <- unlist(FitDiffAvg_BoverA, recursive = TRUE, use.names = TRUE)
# ### convert from list to vector
#
# FitDiffBICgt10_AoverB <- nBICDiffGT10_AoverB/nConvergedProper_AB * 100
# ### calculate portion of allocations where A strongly preferred over B
#
# FitDiffBICgt10_BoverA <- nBICDiffGT10_BoverA/nConvergedProper_AB * 100
# ### calculate portion of allocations where B strongly preferred over A
#
# FitDiffBICgt10 <- rbind(FitDiffBICgt10_AoverB, FitDiffBICgt10_BoverA)
# rownames(FitDiffBICgt10) <- c("Very Strong evidence for A>B", "Very Strong evidence for B>A")
# colnames(FitDiffBICgt10) <- "% Allocations"
# ### create table of proportions of 'A strongly preferred over B' and 'B strongly
# ### preferred over A'
#
# FitDiff_AoverB <- list(nBIC_AoverB/nConvergedProper_AB * 100, nAIC_AoverB/nConvergedProper_AB *
# 100, nRMSEA_AoverB/nConvergedProper_AB * 100, nCFI_AoverB/nConvergedProper_AB *
# 100, nTLI_AoverB/nConvergedProper_AB * 100, nSRMR_AoverB/nConvergedProper_AB *
# 100)
# ### create list of all proportions of 'A preferred over B'
# FitDiff_BoverA <- list(nBIC_BoverA/nConvergedProper_AB * 100, nAIC_BoverA/nConvergedProper_AB *
# 100, nRMSEA_BoverA/nConvergedProper_AB * 100, nCFI_BoverA/nConvergedProper_AB *
# 100, nTLI_BoverA/nConvergedProper_AB * 100, nSRMR_BoverA/nConvergedProper_AB *
# 100)
# ### create list of all proportions of 'B preferred over A'
#
# FitDiff_AoverB <- unlist(FitDiff_AoverB, recursive = TRUE, use.names = TRUE)
# ### convert from list to vector
#
# FitDiff_BoverA <- unlist(FitDiff_BoverA, recursive = TRUE, use.names = TRUE)
# ### convert from list to vector
#
# FitDiffSum_AB <- cbind(FitDiff_AoverB, FitDiffAvg_AoverB, FitDiff_BoverA,
# FitDiffAvg_BoverA)
# colnames(FitDiffSum_AB) <- c("% A>B", "Avg Amount A>B", "% B>A", "Avg Amount B>A")
# rownames(FitDiffSum_AB) <- c("bic", "aic", "rmsea", "cfi", "tli", "srmr")
# ## create table showing number of allocations in which A>B and B>A as well as
# ## average difference values
#
# for (i in 1:nAlloc) {
# is.na(FitDiff_AB[1:9, i]) <- ConvergedProper_AB[[i]] != 1
# }
# ### make fit differences missing for each non-converged allocation
#
# LRThistMax <- max(hist(lrchisqp_AB, plot = FALSE)$counts)
# BIChistMax <- max(hist(FitDiff_AB[8, 1:nAlloc], plot = FALSE)$counts)
# AIChistMax <- max(hist(FitDiff_AB[9, 1:nAlloc], plot = FALSE)$counts)
# RMSEAhistMax <- max(hist(FitDiff_AB[5, 1:nAlloc], plot = FALSE)$counts)
# CFIhistMax <- max(hist(FitDiff_AB[3, 1:nAlloc], plot = FALSE)$counts)
# TLIhistMax <- max(hist(FitDiff_AB[4, 1:nAlloc], plot = FALSE)$counts)
# ### calculate y-axis height for each histogram
#
# LRThist <- hist(lrchisqp_AB, ylim = c(0, LRThistMax), xlab = "p-value", main = "LRT p-values")
# ## plot histogram of LRT p-values
#
# BIChist <- hist(FitDiff_AB[8, 1:nAlloc], ylim = c(0, BIChistMax), xlab = "BIC_modA - BIC_modB",
# main = "BIC Diff")
# AIChist <- hist(FitDiff_AB[9, 1:nAlloc], ylim = c(0, AIChistMax), xlab = "AIC_modA - AIC_modB",
# main = "AIC Diff")
# RMSEAhist <- hist(FitDiff_AB[5, 1:nAlloc], ylim = c(0, RMSEAhistMax), xlab = "RMSEA_modA - RMSEA_modB",
# main = "RMSEA Diff")
# CFIhist <- hist(FitDiff_AB[3, 1:nAlloc], ylim = c(0, CFIhistMax), xlab = "CFI_modA - CFI_modB",
# main = "CFI Diff")
# TLIhist <- hist(FitDiff_AB[4, 1:nAlloc], ylim = c(0, TLIhistMax), xlab = "TLI_modA - TLI_modB",
# main = "TLI Diff")
# ### plot histograms for each index_modA - index_modB
# BIChist
# AIChist
# RMSEAhist
# CFIhist
# TLIhist
#
# ConvergedProperSum <- rbind(nConverged_A/nAlloc, nConverged/nAlloc, nConverged_AB/nAlloc,
# nConvergedProper_A/nAlloc, nConvergedProper/nAlloc, nConvergedProper_AB/nAlloc)
# rownames(ConvergedProperSum) <- c("Converged_A", "Converged_B", "Converged_AB",
# "ConvergedProper_A", "ConvergedProper_B", "ConvergedProper_AB")
# colnames(ConvergedProperSum) <- "Proportion of Allocations"
# ### create table summarizing proportions of converged allocations and allocations
# ### with proper solutions
#
# Output_AB <- list(round(LRTsum, 3), "LRT results are interpretable specifically for nested models",
# round(FitDiffSum_AB, 3), round(FitDiffBICgt10, 3), ConvergedProperSum)
# names(Output_AB) <- c("LRT Summary, Model A vs. Model B", "Note:", "Fit Index Differences",
# "Percent of Allocations with |BIC Diff| > 10", "Converged and Proper Solutions Summary")
# ### output for model comparison
#
# }
#
# return(list(Output_A, Output_B, Output_AB))
# ## returns output for model A, model B, and the comparison of these
# }
semTools/R/permuteMeasEq.R 0000644 0001762 0000144 00000214777 15142327465 015213 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 9 February 2026
### permutation randomization test for measurement equivalence and DIF
## -----------------
## Class and Methods
## -----------------
##' Class for the Results of Permutation Randomization Tests of Measurement
##' Equivalence and DIF
##'
##' This class contains the results of tests of Measurement Equivalence and
##' Differential Item Functioning (DIF).
##'
##'
##' @name permuteMeasEq-class
##' @aliases permuteMeasEq-class show,permuteMeasEq-method
##' summary,permuteMeasEq-method hist,permuteMeasEq-method
##' @docType class
##'
##' @slot PT A `data.frame` returned by a call to
##' [lavaan::parTable()] on the constrained model
##' @slot modelType A character indicating the specified `modelType` in the
##' call to `permuteMeasEq`
##' @slot ANOVA A `numeric` vector indicating the results of the observed
##' (\eqn{\Delta})\eqn{\chi^2} test, based on the central \eqn{\chi^2}
##' distribution
##' @slot AFI.obs A vector of observed (changes in) user-selected fit measures
##' @slot AFI.dist The permutation distribution(s) of user-selected fit measures.
##' A `data.frame` with `n.Permutations` rows and one column for each
##' `AFI.obs`.
##' @slot AFI.pval A vector of *p* values (one for each element in slot
##' `AFI.obs`) calculated using slot `AFI.dist`, indicating the
##' probability of observing a change at least as extreme as `AFI.obs`
##' if the null hypothesis were true
##' @slot MI.obs A `data.frame` of observed Lagrange Multipliers
##' (modification indices) associated with the equality constraints or fixed
##' parameters specified in the `param` argument. This is a subset of the
##' output returned by a call to [lavaan::lavTestScore()] on the
##' constrained model.
##' @slot MI.dist The permutation distribution of the maximum modification index
##' (among those seen in slot `MI.obs$X2`) at each permutation of group
##' assignment or of `covariates`
##' @slot extra.obs If `permuteMeasEq` was called with an `extra`
##' function, the output when applied to the original data is concatenated
##' into this vector
##' @slot extra.dist A `data.frame`, each column of which contains the
##' permutation distribution of the corresponding statistic in slot
##' `extra.obs`
##' @slot n.Permutations An `integer` indicating the number of permutations
##' requested by the user
##' @slot n.Converged An `integer` indicating the number of permuation
##' iterations which yielded a converged solution
##' @slot n.nonConverged An `integer` vector of length
##' `n.Permutations` indicating how many times group assignment was
##' randomly permuted (at each iteration) before converging on a solution
##' @slot n.Sparse Only relevant with `ordered` indicators when
##' `modelType == "mgcfa"`. An `integer` vector of length
##' `n.Permutations` indicating how many times group assignment was
##' randomly permuted (at each iteration) before obtaining a sample with all
##' categories observed in all groups.
##' @slot oldSeed An `integer` vector storing the value of
##' `.Random.seed` before running `permuteMeasEq`. Only relevant
##' when using a parallel/multicore option and the original
##' `RNGkind() != "L'Ecuyer-CMRG"`. This enables users to restore their
##' previous `.Random.seed` state, if desired, by running:
##' `.Random.seed[-1] <- permutedResults@oldSeed[-1]`
##' @section Objects from the Class: Objects can be created via the
##' [semTools::permuteMeasEq()] function.
##'
##' @return
##' \itemize{
##' \item The `show` method prints a summary of the multiparameter
##' omnibus test results, using the user-specified AFIs. The parametric
##' (\eqn{\Delta})\eqn{\chi^2} test is also displayed.
##' \item The `summary` method prints the same information from the
##' `show` method, but when `extra = FALSE` (the default) it also
##' provides a table summarizing any requested follow-up tests of DIF using
##' modification indices in slot `MI.obs`. The user can also specify an
##' `alpha` level for flagging modification indices as significant, as
##' well as `nd` (the number of digits displayed). For each modification
##' index, the *p* value is displayed using a central \eqn{\chi^2}
##' distribution with the *df* shown in that column. Additionally, a
##' *p* value is displayed using the permutation distribution of the
##' maximum index, which controls the familywise Type I error rate in a manner
##' similar to Tukey's studentized range test. If any indices are flagged as
##' significant using the `tukey.p.value`, then a message is displayed for
##' each flagged index. The invisibly returned `data.frame` is the
##' displayed table of modification indices, unless
##' [semTools::permuteMeasEq()] was called with `param = NULL`,
##' in which case the invisibly returned object is `object`. If
##' `extra = TRUE`, the permutation-based *p* values for each
##' statistic returned by the `extra` function are displayed and returned
##' in a `data.frame` instead of the modification indices requested in the
##' `param` argument.
##' \item The `hist` method returns a list of `length == 2`,
##' containing the arguments for the call to `hist` and the arguments
##' to the call for `legend`, respectively. This list may facilitate
##' creating a customized histogram of `AFI.dist`, `MI.dist`, or
##' `extra.dist`
##' }
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' @seealso [semTools::permuteMeasEq()]
##'
##' @examples
##'
##' # See the example from the permuteMeasEq function
##'
setClass("permuteMeasEq", slots = c(PT = "data.frame",
modelType = "character",
ANOVA = "vector",
AFI.obs = "vector",
AFI.dist = "data.frame",
AFI.pval = "vector",
MI.obs = "data.frame",
MI.dist = "vector",
extra.obs = "vector",
extra.dist = "data.frame",
n.Permutations = "integer",
n.Converged = "integer",
n.nonConverged = "vector",
n.Sparse = "vector",
oldSeed = "integer"))
##' @rdname permuteMeasEq-class
##' @aliases show,permuteMeasEq-method
##' @export
setMethod("show", "permuteMeasEq", function(object) {
## print warning if there are nonConverged permutations
if (object@n.Permutations != object@n.Converged) {
warning(paste("Only", object@n.Converged, "out of",
object@n.Permutations, "models converged within",
max(object@n.nonConverged), "attempts per permutation.\n\n"))
}
## print ANOVA
cat("Omnibus p value based on parametric chi-squared difference test:\n\n")
print(round(object@ANOVA, digits = 3))
## print permutation results
cat("\n\nOmnibus p values based on nonparametric permutation method: \n\n")
AFI <- data.frame(AFI.Difference = object@AFI.obs, p.value = object@AFI.pval)
class(AFI) <- c("lavaan.data.frame","data.frame")
print(AFI, nd = 3)
invisible(object)
})
##' @rdname permuteMeasEq-class
##' @aliases summary,permuteMeasEq-method
##' @export
setMethod("summary", "permuteMeasEq", function(object, alpha = .05, nd = 3,
extra = FALSE) {
## print warning if there are nonConverged permutations
if (object@n.Permutations != object@n.Converged) {
warning(paste("Only", object@n.Converged, "out of",
object@n.Permutations, "models converged within",
max(object@n.nonConverged), "attempts per permutation.\n\n"))
}
## print ANOVA
cat("Omnibus p value based on parametric chi-squared difference test:\n\n")
print(round(object@ANOVA, digits = nd))
## print permutation results
cat("\n\nOmnibus p values based on nonparametric permutation method: \n\n")
AFI <- data.frame(AFI.Difference = object@AFI.obs, p.value = object@AFI.pval)
class(AFI) <- c("lavaan.data.frame","data.frame")
print(AFI, nd = nd)
## print extras or DIF test results, if any were requested
if (extra && length(object@extra.obs)) {
cat("\n\nUnadjusted p values of extra statistics,\n",
"based on permutation distribution of each statistic: \n\n")
MI <- data.frame(Statistic = object@extra.obs)
class(MI) <- c("lavaan.data.frame","data.frame")
MI$p.value <- sapply(names(object@extra.dist), function(nn) {
mean(abs(object@extra.dist[,nn]) >= abs(object@extra.obs[nn]), na.rm = TRUE)
})
MI$flag <- ifelse(MI$p.value < alpha, "* ", "")
print(MI, nd = nd)
} else if (length(object@MI.dist)) {
cat("\n\n Modification indices for equality constrained parameter estimates,\n",
"with unadjusted 'p.value' based on chi-squared distribution and\n",
"adjusted 'tukey.p.value' based on permutation distribution of the\n",
"maximum modification index per iteration: \n\n")
MI <- do.call(paste("summ", object@modelType, sep = "."),
args = list(object = object, alpha = alpha))
print(MI, nd = nd)
## print messages about potential DIF
if (all(MI$tukey.p.value > alpha)) {
cat("\n\n No equality constraints were flagged as significant.\n\n")
return(invisible(MI))
}
if (object@modelType == "mgcfa") {
cat("\n\nThe following equality constraints were flagged as significant:\n\n")
for (i in which(MI$tukey.p.value < alpha)) {
cat("Parameter '", MI$parameter[i], "' may differ between Groups '",
MI$group.lhs[i], "' and '", MI$group.rhs[i], "'.\n", sep = "")
}
cat("\nUse lavTestScore(..., epc = TRUE) on your constrained model to",
"display expected parameter changes for these equality constraints\n\n")
}
} else return(invisible(object))
invisible(MI)
})
summ.mgcfa <- function(object, alpha) {
MI <- object@MI.obs
class(MI) <- c("lavaan.data.frame","data.frame")
PT <- object@PT
eqPar <- rbind(PT[PT$plabel %in% MI$lhs, ], PT[PT$plabel %in% MI$rhs, ])
MI$flag <- ""
MI$parameter <- ""
MI$group.lhs <- ""
MI$group.rhs <- ""
for (i in 1:nrow(MI)) {
par1 <- eqPar$par[ eqPar$plabel == MI$lhs[i] ]
par2 <- eqPar$par[ eqPar$plabel == MI$rhs[i] ]
MI$parameter[i] <- par1
MI$group.lhs[i] <- eqPar$group.label[ eqPar$plabel == MI$lhs[i] ]
MI$group.rhs[i] <- eqPar$group.label[ eqPar$plabel == MI$rhs[i] ]
if (par1 != par2) {
myMessage <- paste0("Constraint '", MI$lhs[i], "==", MI$rhs[i],
"' refers to different parameters: \n'",
MI$lhs[i], "' is '", par1, "' in group '",
MI$group.lhs[i], "'\n'",
MI$rhs[i], "' is '", par2, "' in group '",
MI$group.rhs[i], "'\n")
warning(myMessage)
}
if (MI$tukey.p.value[i] < alpha) MI$flag[i] <- "* -->"
}
MI
}
summ.mimic <- function(object, alpha) {
MI <- object@MI.obs
class(MI) <- c("lavaan.data.frame","data.frame")
MI$flag <- ifelse(MI$tukey.p.value < alpha, "* ", "")
MI
}
##' @rdname permuteMeasEq-class
##' @aliases hist,permuteMeasEq-method
##' @importFrom stats qchisq dchisq quantile
##' @param object,x object of class `permuteMeasEq`
##' @param ... Additional arguments to pass to [graphics::hist()]
##' @param AFI `character` indicating the fit measure whose permutation
##' distribution should be plotted
##' @param alpha alpha level used to draw confidence limits in `hist` and
##' flag significant statistics in `summary` output
##' @param nd number of digits to display
##' @param extra `logical` indicating whether the `summary` output
##' should return permutation-based *p* values for each statistic returned
##' by the `extra` function. If `FALSE` (default), `summary`
##' will return permutation-based *p* values for each modification index.
##' @param printLegend `logical`. If `TRUE` (default), a legend will
##' be printed with the histogram
##' @param legendArgs `list` of arguments passed to the
##' [graphics::legend()] function. The default argument is a list
##' placing the legend at the top-left of the figure.
##' @export
setMethod("hist", "permuteMeasEq", function(x, ..., AFI, alpha = .05, nd = 3,
printLegend = TRUE,
legendArgs = list(x = "topleft")) {
histArgs <- list(...)
histArgs$x <- x@AFI.dist[[AFI]]
if (is.null(histArgs$col)) histArgs$col <- "grey69"
histArgs$freq <- !grepl("chi", AFI)
histArgs$ylab <- if (histArgs$freq) "Frequency" else "Probability Density"
if (printLegend) {
if (is.null(legendArgs$box.lty)) legendArgs$box.lty <- 0
if (nd < length(strsplit(as.character(1 / alpha), "")[[1]]) - 1) {
warning(paste0("The number of digits argument (nd = ", nd ,
") is too low to display your p value at the",
" same precision as your requested alpha level (alpha = ",
alpha, ")"))
}
if (x@AFI.pval[[AFI]] < (1 / 10^nd)) {
pVal <- paste(c("< .", rep(0, nd - 1),"1"), collapse = "")
} else {
pVal <- paste("=", round(x@AFI.pval[[AFI]], nd))
}
}
delta <- length(x@MI.dist) > 0L && x@modelType == "mgcfa"
if (grepl("chi", AFI)) { ####################################### Chi-squared
ChiSq <- x@AFI.obs[AFI]
DF <- x@ANOVA[2]
histArgs$xlim <- range(c(ChiSq, x@AFI.dist[[AFI]], qchisq(c(.01, .99), DF)))
xVals <- seq(histArgs$xlim[1], histArgs$xlim[2], by = .1)
theoDist <- dchisq(xVals, df = DF)
TheoCrit <- round(qchisq(p = alpha, df = DF, lower.tail = FALSE), 2)
Crit <- quantile(histArgs$x, probs = 1 - alpha)
if (ChiSq > histArgs$xlim[2]) histArgs$xlim[2] <- ChiSq
if (delta) {
histArgs$main <- expression(Permutation~Distribution~of~Delta*chi^2)
histArgs$xlab <- expression(Delta*chi^2)
if (printLegend) {
legendArgs$legend <- c(bquote(Theoretical~Delta*chi[Delta*.(paste("df =", DF))]^2 ~ Distribution),
bquote(Critical~chi[alpha~.(paste(" =", alpha))]^2 == .(round(TheoCrit, nd))),
bquote(.(paste("Permuted Critical Value =", round(Crit, nd)))),
bquote(Observed~Delta*chi^2 == .(round(ChiSq, nd))),
expression(paste("")),
bquote(Permuted~italic(p)~.(pVal)))
}
} else {
histArgs$main <- expression(Permutation~Distribution~of~chi^2)
histArgs$xlab <- expression(chi^2)
if (printLegend) {
legendArgs$legend <- c(bquote(Theoretical~chi[.(paste("df =", DF))]^2 ~ Distribution),
bquote(Critical~chi[alpha~.(paste(" =", alpha))]^2 == .(round(TheoCrit, nd))),
bquote(.(paste("Permuted Critical Value =", round(Crit, nd)))),
bquote(Observed~chi^2 == .(round(ChiSq, nd))),
expression(paste("")),
bquote(Permuted~italic(p)~.(pVal)))
}
}
H <- do.call(hist, c(histArgs["x"], plot = FALSE))
histArgs$ylim <- c(0, max(H$density, theoDist))
if (printLegend) {
legendArgs <- c(legendArgs, list(lty = c(2, 2, 1, 1, 0, 0),
lwd = c(2, 2, 2, 3, 0, 0),
col = c("black","black","black","red","","")))
}
} else { ################################################### other AFIs
badness <- grepl(pattern = "fmin|aic|bic|rmr|rmsea|cn|sic|hqc",
x = AFI, ignore.case = TRUE)
if (badness) {
Crit <- quantile(histArgs$x, probs = 1 - alpha)
} else {
Crit <- quantile(histArgs$x, probs = alpha)
}
histArgs$xlim <- range(histArgs$x, x@AFI.obs[AFI])
if (delta) {
histArgs$main <- bquote(~Permutation~Distribution~of~Delta*.(toupper(AFI)))
histArgs$xlab <- bquote(~Delta*.(toupper(AFI)))
if (printLegend) {
legendArgs$legend <- c(bquote(Critical~Delta*.(toupper(AFI))[alpha~.(paste(" =", alpha))] == .(round(Crit, nd))),
bquote(Observed~Delta*.(toupper(AFI)) == .(round(x@AFI.obs[AFI], nd))),
expression(paste("")),
bquote(Permuted~italic(p)~.(pVal)))
}
} else {
histArgs$main <- paste("Permutation Distribution of", toupper(AFI))
histArgs$xlab <- toupper(AFI)
if (printLegend) {
legendArgs$legend <- c(bquote(Critical~.(toupper(AFI))[alpha~.(paste(" =", alpha))] == .(round(Crit, nd))),
bquote(Observed~.(toupper(AFI)) == .(round(x@AFI.obs[AFI], nd))),
expression(paste("")),
bquote(Permuted~italic(p)~.(pVal)))
}
}
if (printLegend) {
legendArgs <- c(legendArgs, list(lty = c(1, 1, 0, 0),
lwd = c(2, 3, 0, 0),
col = c("black","red","","")))
}
}
## print histogram (and optionally, print legend)
suppressWarnings({
do.call(hist, histArgs)
if (grepl("chi", AFI)) {
lines(x = xVals, y = theoDist, lwd = 2, lty = 2)
abline(v = TheoCrit, col = "black", lwd = 2, lty = 2)
}
abline(v = Crit, col = "black", lwd = 2)
abline(v = x@AFI.obs[AFI], col = "red", lwd = 3)
if (printLegend) do.call(legend, legendArgs)
})
## return arguments to create histogram (and optionally, legend)
invisible(list(hist = histArgs, legend = legendArgs))
})
## --------------------
## Constructor Function
## --------------------
##' Permutation Randomization Tests of Measurement Equivalence and Differential
##' Item Functioning (DIF)
##'
##' The function `permuteMeasEq` provides tests of hypotheses involving
##' measurement equivalence, in one of two frameworks: multigroup CFA or MIMIC
##' models.
##'
##'
##' The function `permuteMeasEq` provides tests of hypotheses involving
##' measurement equivalence, in one of two frameworks:
##' \enumerate{
##' \item{1} For multiple-group CFA models, provide a pair of nested lavaan objects,
##' the less constrained of which (`uncon`) freely estimates a set of
##' measurement parameters (e.g., factor loadings, intercepts, or thresholds;
##' specified in `param`) in all groups, and the more constrained of which
##' (`con`) constrains those measurement parameters to equality across
##' groups. Group assignment is repeatedly permuted and the models are fit to
##' each permutation, in order to produce an empirical distribution under the
##' null hypothesis of no group differences, both for (a) changes in
##' user-specified fit measures (see `AFIs` and `moreAFIs`) and for
##' (b) the maximum modification index among the user-specified equality
##' constraints. Configural invariance can also be tested by providing that
##' fitted lavaan object to `con` and leaving `uncon = NULL`, in which
##' case `param` must be `NULL` as well.
##'
##' \item{2} In MIMIC models, one or a set of continuous and/or discrete
##' `covariates` can be permuted, and a constrained model is fit to each
##' permutation in order to provide a distribution of any fit measures (namely,
##' the maximum modification index among fixed parameters in `param`) under
##' the null hypothesis of measurement equivalence across levels of those
##' covariates.
##' }
##'
##' In either framework, modification indices for equality constraints or fixed
##' parameters specified in `param` are calculated from the constrained
##' model (`con`) using the function [lavaan::lavTestScore()].
##'
##' For multiple-group CFA models, the multiparameter omnibus null hypothesis of
##' measurement equivalence/invariance is that there are no group differences in
##' any measurement parameters (of a particular type). This can be tested using
##' the `anova` method on nested `lavaan` objects, or by inspecting
##' the change in alternative fit indices (AFIs) such as the CFI. The
##' permutation randomization method employed by `permuteMeasEq` generates
##' an empirical distribution of any `AFIs` under the null hypothesis, so
##' the user is not restricted to using fixed cutoffs proposed by Cheung &
##' Rensvold (2002), Chen (2007), or Meade, Johnson, & Braddy (2008).
##'
##' If the multiparameter omnibus null hypothesis is rejected, partial
##' invariance can still be established by freeing invalid equality constraints,
##' as long as equality constraints are valid for at least two indicators per
##' factor. Modification indices can be calculated from the constrained model
##' (`con`), but multiple testing leads to inflation of Type I error rates.
##' The permutation randomization method employed by `permuteMeasEq`
##' creates a distribution of the maximum modification index if the null
##' hypothesis is true, which allows the user to control the familywise Type I
##' error rate in a manner similar to Tukey's *q* (studentized range)
##' distribution for the Honestly Significant Difference (HSD) post hoc test.
##'
##' For MIMIC models, DIF can be tested by comparing modification indices of
##' regression paths to the permutation distribution of the maximum modification
##' index, which controls the familywise Type I error rate. The MIMIC approach
##' could also be applied with multiple-group models, but the grouping variable
##' would not be permuted; rather, the covariates would be permuted separately
##' within each group to preserve between-group differences. So whether
##' parameters are constrained or unconstrained across groups, the MIMIC
##' approach is only for testing null hypotheses about the effects of
##' `covariates` on indicators, controlling for common factors.
##'
##' In either framework, [lavaan::lavaan()]'s `group.label`
##' argument is used to preserve the order of groups seen in `con` when
##' permuting the data.
##'
##'
##' @importFrom lavaan lavInspect parTable
##'
##' @param nPermute An integer indicating the number of random permutations used
##' to form empirical distributions under the null hypothesis.
##' @param modelType A character string indicating type of model employed:
##' multiple-group CFA (`"mgcfa"`) or MIMIC (`"mimic"`).
##' @param con The constrained `lavaan` object, in which the parameters
##' specified in `param` are constrained to equality across all groups when
##' `modelType = "mgcfa"`, or which regression paths are fixed to zero when
##' `modelType = "mimic"`. In the case of testing *configural*
##' invariance when `modelType = "mgcfa"`, `con` is the configural
##' model (implicitly, the unconstrained model is the saturated model, so use
##' the defaults `uncon = NULL` and `param = NULL`). When
##' `modelType = "mimic"`, `con` is the MIMIC model in which the
##' covariate predicts the latent construct(s) but no indicators (unless they
##' have already been identified as DIF items).
##' @param uncon Optional. The unconstrained `lavaan` object, in which the
##' parameters specified in `param` are freely estimated in all groups.
##' When `modelType = "mgcfa"`, only in the case of testing
##' *configural* invariance should `uncon = NULL`. When
##' `modelType = "mimic"`, any non-`NULL uncon` is silently set to
##' `NULL`.
##' @param null Optional. A `lavaan` object, in which an alternative null
##' model is fit (besides the default independence model specified by
##' `lavaan`) for the calculation of incremental fit indices. See Widamin &
##' Thompson (2003) for details. If `NULL`, `lavaan`'s default
##' independence model is used.
##' @param param An optional character vector or list of character vectors
##' indicating which parameters the user would test for DIF following a
##' rejection of the omnibus null hypothesis tested using
##' (`more`)`AFIs`. Note that `param` does not guarantee certain
##' parameters *are* constrained in `con`; that is for the user to
##' specify when fitting the model. If users have any "anchor items" that they
##' would never intend to free across groups (or levels of a covariate), these
##' should be excluded from `param`; exceptions to a type of parameter can
##' be specified in `freeParam`. When `modelType = "mgcfa"`,
##' `param` indicates which parameters of interest are constrained across
##' groups in `con` and are unconstrained in `uncon`. Parameter names
##' must match those returned by `names(coef(con))`, but omitting any
##' group-specific suffixes (e.g., `"f1~1"` rather than `"f1~1.g2"`)
##' or user-specified labels (that is, the parameter names must follow the rules
##' of lavaan's [lavaan::model.syntax()]). Alternatively (or
##' additionally), to test all constraints of a certain type (or multiple types)
##' of parameter in `con`, `param` may take any combination of the
##' following values: `"loadings"`, `"intercepts"`,
##' `"thresholds"`, `"residuals"`, `"residual.covariances"`,
##' `"means"`, `"lv.variances"`, and/or `"lv.covariances"`. When
##' `modelType = "mimic"`, `param` must be a vector of individual
##' parameters or a list of character strings to be passed one-at-a-time to
##' `lavaan::lavTestScore(object = con, add = param[i])`,
##' indicating which (sets of) regression paths fixed to zero in `con` that
##' the user would consider freeing (i.e., exclude anchor items). If
##' `modelType = "mimic"` and `param` is a list of character strings,
##' the multivariate test statistic will be saved for each list element instead
##' of 1-*df* modification indices for each individual parameter, and
##' `names(param)` will name the rows of the `MI.obs` slot (see
##' [permuteMeasEq-class]). Set `param = NULL` (default) to avoid
##' collecting modification indices for any follow-up tests.
##' @param freeParam An optional character vector, silently ignored when
##' `modelType = "mimic"`. If `param` includes a type of parameter
##' (e.g., `"loadings"`), `freeParam` indicates exceptions (i.e.,
##' anchor items) that the user would *not* intend to free across groups
##' and should therefore be ignored when calculating *p* values adjusted
##' for the number of follow-up tests. Parameter types that are already
##' unconstrained across groups in the fitted `con` model (i.e., a
##' *partial* invariance model) will automatically be ignored, so they do
##' not need to be specified in `freeParam`. Parameter names must match
##' those returned by `names(coef(con))`, but omitting any group-specific
##' suffixes (e.g., `"f1~1"` rather than `"f1~1.g2"`) or
##' user-specified labels (that is, the parameter names must follow the rules of
##' lavaan [lavaan::model.syntax()]).
##' @param covariates An optional character vector, only applicable when
##' `modelType = "mimic"`. The observed data are partitioned into columns
##' indicated by `covariates`, and the rows are permuted simultaneously for
##' the entire set before being merged with the remaining data. Thus, the
##' covariance structure is preserved among the covariates, which is necessary
##' when (e.g.) multiple dummy codes are used to represent a discrete covariate
##' or when covariates interact. If `covariates = NULL` when
##' `modelType = "mimic"`, the value of `covariates` is inferred by
##' searching `param` for predictors (i.e., variables appearing after the
##' "`~`" operator).
##' @param AFIs A character vector indicating which alternative fit indices (or
##' chi-squared itself) are to be used to test the multiparameter omnibus null
##' hypothesis that the constraints specified in `con` hold in the
##' population. Any fit measures returned by [lavaan::fitMeasures()]
##' may be specified (including constants like `"df"`, which would be
##' nonsensical). If both `AFIs` and `moreAFIs` are `NULL`, only
##' `"chisq"` will be returned.
##' @param moreAFIs Optional. A character vector indicating which (if any)
##' alternative fit indices returned by [semTools::moreFitIndices()]
##' are to be used to test the multiparameter omnibus null hypothesis that the
##' constraints specified in `con` hold in the population.
##' @param maxSparse Only applicable when `modelType = "mgcfa"` and at
##' least one indicator is `ordered`. An integer indicating the maximum
##' number of consecutive times that randomly permuted group assignment can
##' yield a sample in which at least one category (of an `ordered`
##' indicator) is unobserved in at least one group, such that the same set of
##' parameters cannot be estimated in each group. If such a sample occurs, group
##' assignment is randomly permuted again, repeatedly until a sample is obtained
##' with all categories observed in all groups. If `maxSparse` is exceeded,
##' `NA` will be returned for that iteration of the permutation
##' distribution.
##' @param maxNonconv An integer indicating the maximum number of consecutive
##' times that a random permutation can yield a sample for which the model does
##' not converge on a solution. If such a sample occurs, permutation is
##' attempted repeatedly until a sample is obtained for which the model does
##' converge. If `maxNonconv` is exceeded, `NA` will be returned for
##' that iteration of the permutation distribution, and a warning will be
##' printed when using `show` or `summary`.
##' @param showProgress Logical. Indicating whether to display a progress bar
##' while permuting. Silently set to `FALSE` when using parallel options.
##' @param warn Sets the handling of warning messages when fitting model(s) to
##' permuted data sets. See [base::options()].
##' @param datafun An optional function that can be applied to the data
##' (extracted from `con`) after each permutation, but before fitting the
##' model(s) to each permutation. The `datafun` function must have an
##' argument named `data` that accepts a `data.frame`, and it must
##' return a `data.frame` containing the same column names. The column
##' order may differ, the values of those columns may differ (so be careful!),
##' and any additional columns will be ignored when fitting the model, but an
##' error will result if any column names required by the model syntax do not
##' appear in the transformed data set. Although available for any
##' `modelType`, `datafun` may be useful when using the MIMIC method
##' to test for nonuniform DIF (metric/weak invariance) by using product
##' indicators for a latent factor representing the interaction between a factor
##' and one of the `covariates`, in which case the product indicators would
##' need to be recalculated after each permutation of the `covariates`. To
##' access other R objects used within `permuteMeasEq`, the arguments to
##' `datafun` may also contain any subset of the following: `"con"`,
##' `"uncon"`, `"null"`, `"param"`, `"freeParam"`,
##' `"covariates"`, `"AFIs"`, `"moreAFIs"`, `"maxSparse"`,
##' `"maxNonconv"`, and/or `"iseed"`. The values for those arguments
##' will be the same as the values supplied to `permuteMeasEq`.
##' @param extra An optional function that can be applied to any (or all) of the
##' fitted lavaan objects (`con`, `uncon`, and/or `null`). This
##' function will also be applied after fitting the model(s) to each permuted
##' data set. To access the R objects used within `permuteMeasEq`, the
##' arguments to `extra` must be any subset of the following: `"con"`,
##' `"uncon"`, `"null"`, `"param"`, `"freeParam"`,
##' `"covariates"`, `"AFIs"`, `"moreAFIs"`, `"maxSparse"`,
##' `"maxNonconv"`, and/or `"iseed"`. The values for those arguments
##' will be the same as the values supplied to `permuteMeasEq`. The
##' `extra` function must return a named `numeric` vector or a named
##' `list` of scalars (i.e., a `list` of `numeric` vectors of
##' `length == 1`). Any unnamed elements (e.g., `""` or `NULL`)
##' of the returned object will result in an error.
##' @param parallelType The type of parallel operation to be used (if any). The
##' default is `"none"`. Forking is not possible on Windows, so if
##' `"multicore"` is requested on a Windows machine, the request will be
##' changed to `"snow"` with a message.
##' @param ncpus Integer: number of processes to be used in parallel operation.
##' If `NULL` (the default) and `parallelType %in%
##' c("multicore","snow")`, the default is one less than the maximum number of
##' processors detected by [parallel::detectCores()]. This default is
##' also silently set if the user specifies more than the number of processors
##' detected.
##' @param cl An optional \pkg{parallel} or \pkg{snow} cluster for use when
##' `parallelType = "snow"`. If `NULL`, a `"PSOCK"` cluster on
##' the local machine is created for the duration of the `permuteMeasEq`
##' call. If a valid [parallel::makeCluster()] object is supplied,
##' `parallelType` is silently set to `"snow"`, and `ncpus` is
##' silently set to `length(cl)`.
##' @param iseed Integer: Only used to set the states of the RNG when using
##' parallel options, in which case [base::RNGkind()] is set to
##' `"L'Ecuyer-CMRG"` with a message. See
##' [parallel::clusterSetRNGStream()] and Section 6 of
##' `vignette("parallel", "parallel")` for more details. If user supplies
##' an invalid value, `iseed` is silently set to the default (12345). To
##' set the state of the RNG when not using parallel options, call
##' [base::set.seed()] before calling `permuteMeasEq`.
##'
##' @return The [permuteMeasEq-class] object representing the results of
##' testing measurement equivalence (the multiparameter omnibus test) and DIF
##' (modification indices), as well as diagnostics and any `extra` output.
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' @seealso [stats::TukeyHSD()], [lavaan::lavTestScore()]
##'
##' @references
##'
##' **Papers about permutation tests of measurement equivalence:**
##'
##' Jorgensen, T. D., Kite, B. A., Chen, P.-Y., & Short, S. D. (2018).
##' Permutation randomization methods for testing measurement equivalence and
##' detecting differential item functioning in multiple-group confirmatory
##' factor analysis. *Psychological Methods, 23*(4), 708--728.
##' \doi{10.1037/met0000152}
##'
##' Kite, B. A., Jorgensen, T. D., & Chen, P.-Y. (2018). Random permutation
##' testing applied to measurement invariance testing with ordered-categorical
##' indicators. *Structural Equation Modeling 25*(4), 573--587.
##' \doi{10.1080/10705511.2017.1421467}
##'
##' Jorgensen, T. D. (2017). Applying permutation tests and multivariate
##' modification indices to configurally invariant models that need
##' respecification. *Frontiers in Psychology, 8*(1455).
##' \doi{10.3389/fpsyg.2017.01455}
##'
##' **Additional reading:**
##'
##' Chen, F. F. (2007). Sensitivity of goodness of fit indexes to
##' lack of measurement invariance. *Structural Equation Modeling, 14*(3),
##' 464--504. \doi{10.1080/10705510701301834}
##'
##' Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes
##' for testing measurement invariance. *Structural Equation Modeling,
##' 9*(2), 233--255. \doi{10.1207/S15328007SEM0902_5}
##'
##' Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity
##' of alternative fit indices in tests of measurement invariance. *Journal
##' of Applied Psychology, 93*(3), 568--592. \doi{10.1037/0021-9010.93.3.568}
##'
##' Widamin, K. F., & Thompson, J. S. (2003). On specifying the null model for
##' incremental fit indices in structural equation modeling. *Psychological
##' Methods, 8*(1), 16--37. \doi{10.1037/1082-989X.8.1.16}
##'
##' @examples
##'
##' \donttest{
##'
##' ########################
##' ## Multiple-Group CFA ##
##' ########################
##'
##' ## create 3-group data in lavaan example(cfa) data
##' HS <- lavaan::HolzingerSwineford1939
##' HS$ageGroup <- ifelse(HS$ageyr < 13, "preteen",
##' ifelse(HS$ageyr > 13, "teen", "thirteen"))
##'
##' ## specify and fit an appropriate null model for incremental fit indices
##' mod.null <- c(paste0("x", 1:9, " ~ c(T", 1:9, ", T", 1:9, ", T", 1:9, ")*1"),
##' paste0("x", 1:9, " ~~ c(L", 1:9, ", L", 1:9, ", L", 1:9, ")*x", 1:9))
##' fit.null <- cfa(mod.null, data = HS, group = "ageGroup")
##'
##' ## fit target model with varying levels of measurement equivalence
##' mod.config <- '
##' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' '
##' fit.config <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup")
##' fit.metric <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup",
##' group.equal = "loadings")
##' fit.scalar <- cfa(mod.config, data = HS, std.lv = TRUE, group = "ageGroup",
##' group.equal = c("loadings","intercepts"))
##'
##'
##' ####################### Permutation Method
##'
##' ## fit indices of interest for multiparameter omnibus test
##' myAFIs <- c("chisq","cfi","rmsea","mfi","aic")
##' moreAFIs <- c("gammaHat","adjGammaHat")
##'
##' ## Use only 20 permutations for a demo. In practice,
##' ## use > 1000 to reduce sampling variability of estimated p values
##'
##' ## test configural invariance
##' set.seed(12345)
##' out.config <- permuteMeasEq(nPermute = 20, con = fit.config)
##' out.config
##'
##' ## test metric equivalence
##' set.seed(12345) # same permutations
##' out.metric <- permuteMeasEq(nPermute = 20, uncon = fit.config, con = fit.metric,
##' param = "loadings", AFIs = myAFIs,
##' moreAFIs = moreAFIs, null = fit.null)
##' summary(out.metric, nd = 4)
##'
##' ## test scalar equivalence
##' set.seed(12345) # same permutations
##' out.scalar <- permuteMeasEq(nPermute = 20, uncon = fit.metric, con = fit.scalar,
##' param = "intercepts", AFIs = myAFIs,
##' moreAFIs = moreAFIs, null = fit.null)
##' summary(out.scalar)
##'
##' ## Not much to see without significant DIF.
##' ## Try using an absurdly high alpha level for illustration.
##' outsum <- summary(out.scalar, alpha = .50)
##'
##' ## notice that the returned object is the table of DIF tests
##' outsum
##'
##' ## visualize permutation distribution
##' hist(out.config, AFI = "chisq")
##' hist(out.metric, AFI = "chisq", nd = 2, alpha = .01,
##' legendArgs = list(x = "topright"))
##' hist(out.scalar, AFI = "cfi", printLegend = FALSE)
##'
##'
##' ####################### Extra Output
##'
##' ## function to calculate expected change of Group-2 and -3 latent means if
##' ## each intercept constraint were released
##' extra <- function(con) {
##' output <- list()
##' output["x1.vis2"] <- lavTestScore(con, release = 19:20, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[70]
##' output["x1.vis3"] <- lavTestScore(con, release = 19:20, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[106]
##' output["x2.vis2"] <- lavTestScore(con, release = 21:22, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[70]
##' output["x2.vis3"] <- lavTestScore(con, release = 21:22, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[106]
##' output["x3.vis2"] <- lavTestScore(con, release = 23:24, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[70]
##' output["x3.vis3"] <- lavTestScore(con, release = 23:24, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[106]
##' output["x4.txt2"] <- lavTestScore(con, release = 25:26, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[71]
##' output["x4.txt3"] <- lavTestScore(con, release = 25:26, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[107]
##' output["x5.txt2"] <- lavTestScore(con, release = 27:28, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[71]
##' output["x5.txt3"] <- lavTestScore(con, release = 27:28, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[107]
##' output["x6.txt2"] <- lavTestScore(con, release = 29:30, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[71]
##' output["x6.txt3"] <- lavTestScore(con, release = 29:30, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[107]
##' output["x7.spd2"] <- lavTestScore(con, release = 31:32, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[72]
##' output["x7.spd3"] <- lavTestScore(con, release = 31:32, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[108]
##' output["x8.spd2"] <- lavTestScore(con, release = 33:34, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[72]
##' output["x8.spd3"] <- lavTestScore(con, release = 33:34, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[108]
##' output["x9.spd2"] <- lavTestScore(con, release = 35:36, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[72]
##' output["x9.spd3"] <- lavTestScore(con, release = 35:36, univariate = FALSE,
##' epc = TRUE, warn = FALSE)$epc$epc[108]
##' output
##' }
##'
##' ## observed EPC
##' extra(fit.scalar)
##'
##' ## permutation results, including extra output
##' set.seed(12345) # same permutations
##' out.scalar <- permuteMeasEq(nPermute = 20, uncon = fit.metric, con = fit.scalar,
##' param = "intercepts", AFIs = myAFIs,
##' moreAFIs = moreAFIs, null = fit.null, extra = extra)
##' ## summarize extra output
##' summary(out.scalar, extra = TRUE)
##'
##'
##' ###########
##' ## MIMIC ##
##' ###########
##'
##' ## Specify Restricted Factor Analysis (RFA) model, equivalent to MIMIC, but
##' ## the factor covaries with the covariate instead of being regressed on it.
##' ## The covariate defines a single-indicator construct, and the
##' ## double-mean-centered products of the indicators define a latent
##' ## interaction between the factor and the covariate.
##' mod.mimic <- '
##' visual =~ x1 + x2 + x3
##' age =~ ageyr
##' age.by.vis =~ x1.ageyr + x2.ageyr + x3.ageyr
##'
##' x1 ~~ x1.ageyr
##' x2 ~~ x2.ageyr
##' x3 ~~ x3.ageyr
##' '
##'
##' HS.orth <- indProd(var1 = paste0("x", 1:3), var2 = "ageyr", match = FALSE,
##' data = HS[ , c("ageyr", paste0("x", 1:3))] )
##' fit.mimic <- cfa(mod.mimic, data = HS.orth, meanstructure = TRUE)
##' summary(fit.mimic, stand = TRUE)
##'
##' ## Whereas MIMIC models specify direct effects of the covariate on an indicator,
##' ## DIF can be tested in RFA models by specifying free loadings of an indicator
##' ## on the covariate's construct (uniform DIF, scalar invariance) and the
##' ## interaction construct (nonuniform DIF, metric invariance).
##' param <- as.list(paste0("age + age.by.vis =~ x", 1:3))
##' names(param) <- paste0("x", 1:3)
##' # param <- as.list(paste0("x", 1:3, " ~ age + age.by.vis")) # equivalent
##'
##' ## test both parameters simultaneously for each indicator
##' do.call(rbind, lapply(param, function(x) lavTestScore(fit.mimic, add = x)$test))
##' ## or test each parameter individually
##' lavTestScore(fit.mimic, add = as.character(param))
##'
##'
##' ####################### Permutation Method
##'
##' ## function to recalculate interaction terms after permuting the covariate
##' datafun <- function(data) {
##' d <- data[, c(paste0("x", 1:3), "ageyr")]
##' indProd(var1 = paste0("x", 1:3), var2 = "ageyr", match = FALSE, data = d)
##' }
##'
##' set.seed(12345)
##' perm.mimic <- permuteMeasEq(nPermute = 20, modelType = "mimic",
##' con = fit.mimic, param = param,
##' covariates = "ageyr", datafun = datafun)
##' summary(perm.mimic)
##'
##' }
##'
##' @export
permuteMeasEq <- function(nPermute, modelType = c("mgcfa","mimic"),
con, uncon = NULL, null = NULL,
param = NULL, freeParam = NULL, covariates = NULL,
AFIs = NULL, moreAFIs = NULL,
maxSparse = 10, maxNonconv = 10, showProgress = TRUE,
warn = -1, datafun, extra,
parallelType = c("none","multicore","snow"),
ncpus = NULL, cl = NULL, iseed = 12345) {
## save arguments from call
availableArgs <- as.list(formals(permuteMeasEq))
argNames <- names(availableArgs)
if (missing(datafun)) argNames <- setdiff(argNames, "datafun")
if (missing(extra)) argNames <- setdiff(argNames, "extra")
for (aa in argNames) {
if (!is.null(eval(as.name(aa))))
suppressWarnings(availableArgs[[aa]] <- eval(as.name(aa)))
}
## check and return them
fullCall <- do.call(checkPermArgs, availableArgs)
## assign them to workspace (also adds old_RNG & oldSeed to workspace)
for (aa in names(fullCall)) assign(aa, fullCall[[aa]])
###################### SAVE OBSERVED RESULTS ##########################
AFI.obs <- do.call(getAFIs, fullCall)
## save modification indices if !is.null(param)
if (is.null(param)) {
MI.obs <- data.frame(NULL)
} else MI.obs <- do.call(getMIs, fullCall)
## anything extra?
if (!missing(extra)) {
extraArgs <- formals(extra)
neededArgs <- intersect(names(extraArgs), names(fullCall))
extraArgs <- do.call(c, lapply(neededArgs, function(nn) fullCall[nn]))
extraOut <- do.call(extra, extraArgs)
## check that extra() returns a named list of scalars
if (!is.list(extraOut)) extraOut <- as.list(extraOut)
wrongFormat <- paste('Function "extra" must return a numeric vector or a',
'list of scalars, with each element named.')
if (!all(sapply(extraOut, is.numeric))) stop(wrongFormat)
if (!all(sapply(extraOut, length) == 1L)) stop(wrongFormat)
if (is.null(names(extraOut)) | any(names(extraOut) == "")) stop(wrongFormat)
extra.obs <- do.call(c, extraOut)
} else extra.obs <- numeric(length = 0L)
######################### PREP DATA ##############################
argList <- fullCall[c("con","uncon","null","param","freeParam","covariates",
"AFIs","moreAFIs","maxSparse","maxNonconv","warn","iseed")]
argList$G <- lavInspect(con, "group")
## check for categorical variables
# catVars <- lavaan::lavNames(con, type = "ov.ord")
# numVars <- lavaan::lavNames(con, type = "ov.num")
# latentVars <- lavaan::lavNames(con, type = "lv.regular")
## assemble data to which the models were fit
if (length(argList$G)) {
dataList <- mapply(FUN = function(x, g, n) {
y <- data.frame(as.data.frame(x), g, stringsAsFactors = FALSE)
names(y) <- c(n, argList$G)
y
}, SIMPLIFY = FALSE,
x = lavInspect(con, "data"), g = lavInspect(con, "group.label"),
n = lavaan::lavNames(con, type = "ov",
group = seq_along(lavInspect(con, "group.label"))))
argList$d <- do.call(rbind, dataList)
} else {
argList$d <- as.data.frame(lavInspect(con, "data"))
names(argList$d) <- lavaan::lavNames(con, type = "ov")
}
## check that covariates are actual variables
if (modelType == "mimic") {
if (length(covariates) && !all(covariates %in% names(argList$d)))
stop('These specified covariates are not columns in the data.frame:\n',
paste(setdiff(covariates, names(argList$d)), collapse = ", "))
}
## anything extra?
if (!missing(extra)) argList$extra <- extra
if (!missing(datafun)) argList$datafun <- datafun
###################### PERMUTED RESULTS ###########################
## permute and return distributions of (delta)AFIs, largest MI, and extras
if (showProgress) {
mypb <- utils::txtProgressBar(min = 1, max = nPermute, initial = 1,
char = "=", width = 50, style = 3, file = "")
permuDist <- list()
for (j in 1:nPermute) {
permuDist[[j]] <- do.call(paste("permuteOnce", modelType, sep = "."),
args = c(argList, i = j))
utils::setTxtProgressBar(mypb, j)
}
close(mypb)
} else if (parallelType == "multicore") {
if (length(iseed)) set.seed(iseed)
argList$FUN <- as.name(paste("permuteOnce", modelType, sep = "."))
argList$X <- 1:nPermute
argList$mc.cores <- ncpus
argList$mc.set.seed <- TRUE
pmcl <- function(...) { parallel::mclapply(...) }
permuDist <- do.call(pmcl, args = argList)
## restore old RNG type
if (fullCall$old_RNG[1] != "L'Ecuyer-CMRG") RNGkind(fullCall$old_RNG[1])
} else if (parallelType == "snow") {
stopTheCluster <- FALSE
if (is.null(cl)) {
stopTheCluster <- TRUE
cl <- parallel::makePSOCKcluster(rep("localhost", ncpus))
}
parallel::clusterSetRNGStream(cl, iseed = iseed)
argList$cl <- cl
argList$X <- 1:nPermute
argList$fun <- paste("permuteOnce", modelType, sep = ".")
parallel::clusterExport(cl, varlist = c(argList$fun, "getAFIs","getMIs")) #FIXME: need update?
tempppl <- function(...) { parallel::parLapply(...) }
permuDist <- do.call(tempppl, args = argList)
if (stopTheCluster) parallel::stopCluster(cl)
## restore old RNG type
if (fullCall$old_RNG[1] != "L'Ecuyer-CMRG") RNGkind(fullCall$old_RNG[1])
} else {
argList$X <- 1:nPermute
argList$FUN <- paste("permuteOnce", modelType, sep = ".")
permuDist <- do.call(lapply, args = argList)
}
## extract AFI distribution
if (length(AFI.obs) > 1) {
AFI.dist <- as.data.frame(t(sapply(permuDist, function(x) x$AFI)))
}
if (length(AFI.obs) == 1L) {
AFI.dist <- data.frame(sapply(permuDist, function(x) x$AFI))
colnames(AFI.dist) <- names(AFI.obs)
}
## identify badness-of-fit measures
badness <- grepl(pattern = "fmin|chi|aic|bic|rmr|rmsea|cn|sic|hqc",
x = names(AFI.obs), ignore.case = TRUE)
## calculate all one-directional p-values
AFI.pval <- mapply(FUN = function(x, y, b) {
if (b) return(mean(x >= y, na.rm = TRUE))
mean(x <= y, na.rm = TRUE)
}, x = unclass(AFI.dist), y = AFI.obs, b = badness)
## extract distribution of maximum modification indices
MI.dist <- as.numeric(unlist(lapply(permuDist, function(x) x$MI)))
## calculate Tukey-adjusted p values for modification indices
if (!is.null(param)) {
MI.obs$tukey.p.value <- sapply(MI.obs$X2,
function(i) mean(i <= MI.dist, na.rm = TRUE))
MI.obs <- as.data.frame(unclass(MI.obs))
rownames(MI.obs) <- names(param)
}
## anything extra?
if (!missing(extra)) {
extra.dist <- do.call(rbind, lapply(permuDist, function(x) x$extra))
} else extra.dist <- data.frame(NULL)
## save parameter table for show/summary methods
PT <- as.data.frame(parTable(con))
PT$par <- paste0(PT$lhs, PT$op, PT$rhs)
if (length(lavInspect(con, "group")))
PT$group.label[PT$group > 0] <- lavInspect(con, "group.label")[PT$group[PT$group > 0] ]
## return observed results, permutation p values, and ANOVA results
if (is.null(uncon)) {
delta <- lavaan::anova(con)
} else {
delta <- lavaan::anova(uncon, con)
}
ANOVA <- sapply(delta[,c("Chisq diff","Df diff","Pr(>Chisq)")], function(x) x[2])
out <- new("permuteMeasEq", PT = PT, modelType = modelType, ANOVA = ANOVA,
AFI.obs = AFI.obs, AFI.dist = AFI.dist, AFI.pval = AFI.pval,
MI.obs = MI.obs, MI.dist = MI.dist,
extra.obs = extra.obs, extra.dist = extra.dist,
n.Permutations = nPermute, n.Converged = sum(!is.na(AFI.dist[,1])),
n.nonConverged = sapply(permuDist, function(x) x$n.nonConverged),
n.Sparse = sapply(permuDist, function(x) x$n.Sparse),
oldSeed = fullCall$oldSeed)
out
}
## ----------------
## Hidden Functions
## ----------------
## function to check validity of arguments to permuteMeasEq()
#' @importFrom lavaan lavInspect parTable
checkPermArgs <- function(nPermute, modelType, con, uncon, null,
param, freeParam, covariates, AFIs, moreAFIs,
maxSparse, maxNonconv, showProgress, warn,
datafun, extra, parallelType, ncpus, cl, iseed) {
fixedCall <- as.list(match.call())[-1]
fixedCall$nPermute <- as.integer(nPermute[1])
fixedCall$modelType <- modelType[1]
if (!fixedCall$modelType %in% c("mgcfa","mimic","long"))
stop('modelType must be one of c("mgcfa","mimic","long")')
if (fixedCall$modelType == "long") stop('modelType "long" is not yet available.')
if (fixedCall$modelType == "mgcfa" && lavInspect(con, "ngroups") == 1L)
stop('modelType = "mgcfa" applies only to multigroup models.')
if (fixedCall$modelType == "mimic") {
uncon <- NULL
fixedCall$uncon <- NULL
fixedCall <- c(fixedCall, list(uncon = NULL))
}
## strip white space
if (is.list(param)) {
fixedCall$param <- lapply(param, function(cc) gsub("[[:space:]]+", "", cc))
} else if (!is.null(param)) fixedCall$param <- gsub("[[:space:]]+", "", param)
if (!is.null(freeParam)) fixedCall$freeParam <- gsub("[[:space:]]+", "", freeParam)
if (fixedCall$modelType == "mimic") {
# PT <- lavaan::lavaanify(fixedCall$param)
# checkCovs <- unique(PT$rhs[PT$op == "~"])
# if (is.null(covariates)) covariates <- checkCovs
# if (length(setdiff(covariates, checkCovs)))
# warning('Argument "covariates" includes predictors not in argument "param"')
##### ordVars <- lavaan::lavNames(con, type = "ov.ord")
fixedCall$covariates <- as.character(covariates)
}
fixedCall$maxSparse <- as.integer(maxSparse[1])
fixedCall$maxNonconv <- as.integer(maxNonconv[1])
fixedCall$showProgress <- as.logical(showProgress[1])
fixedCall$warn <- as.integer(warn[1])
fixedCall$oldSeed <- as.integer(NULL)
parallelType <- as.character(parallelType[1])
if (!parallelType %in% c("none","multicore","snow")) parallelType <- "none"
if (!is.null(cl)) {
if (!is(cl, "cluster")) stop("Invalid cluster object. Check class(cl)")
parallelType <- "snow"
ncpus <- length(cl)
}
if (parallelType == "multicore" && .Platform$OS.type == "windows") {
parallelType <- "snow"
message("'multicore' option unavailable on Windows. Using 'snow' instead.")
}
## parallel settings, adapted from boot::boot()
if (parallelType != "none") {
if (is.null(ncpus) || ncpus > parallel::detectCores()) {
ncpus <- parallel::detectCores() - 1
}
if (ncpus <= 1L) {
parallelType <- "none"
} else {
fixedCall$showProgress <- FALSE
fixedCall$old_RNG <- RNGkind()
fixedCall$oldSeed <- .Random.seed
if (fixedCall$old_RNG[1] != "L'Ecuyer-CMRG") {
RNGkind("L'Ecuyer-CMRG")
message("Your RNGkind() was changed from ", fixedCall$old_RNG[1],
" to L'Ecuyer-CMRG, which is required for reproducibility ",
" in parallel jobs. Your RNGkind() has been returned to ",
fixedCall$old_RNG[1], " but the seed has not been set. ",
" The state of your previous RNG is saved in the slot ",
" named 'oldSeed', if you want to restore it using ",
" the syntax:\n",
".Random.seed[-1] <- permuteMeasEqObject@oldSeed[-1]")
}
fixedCall$iseed <- as.integer(iseed[1])
if (is.na(fixedCall$iseed)) fixedCall$iseed <- 12345
}
}
fixedCall$parallelType <- parallelType
if (is.null(ncpus)) {
fixedCall$ncpus <- NULL
fixedCall <- c(fixedCall, list(ncpus = NULL))
} else fixedCall$ncpus <- ncpus
## Check that "param" is NULL if uncon is NULL, and check for lavaan class.
## Also check that models are fitted to raw data, not summary stats.
notLavaan <- "Non-NULL 'con', 'uncon', or 'null' must be fitted lavaan object."
notRawData <- "lavaan models ('con', 'uncon', or 'null') must be fitted to raw data=, not summary statistics (e.g., sample.cov=)"
if (is.null(uncon)) {
if (!is.null(fixedCall$param) && fixedCall$modelType == "mgcfa") {
message(c(" When 'uncon = NULL', only configural invariance is tested.",
"\n So the 'param' argument was changed to NULL."))
fixedCall$param <- NULL
fixedCall <- c(fixedCall, list(param = NULL))
}
if (!inherits(con, "lavaan")) stop(notLavaan)
stopifnot(con@Data@data.type == "full")
} else {
if (!inherits(con, "lavaan")) stop(notLavaan)
if (!inherits(uncon, "lavaan")) stop(notLavaan)
stopifnot( con@Data@data.type == "full")
stopifnot(uncon@Data@data.type == "full")
}
if (!is.null(null)) {
if (!inherits(null, "lavaan")) stop(notLavaan)
stopifnot(null@Data@data.type == "full")
}
############ FIXME: check that lavInspect(con, "options")$conditional.x = FALSE (find defaults for continuous/ordered indicators)
if (!is.null(fixedCall$param)) {
## Temporarily warn about testing thresholds without necessary constraints. FIXME: check for binary indicators
if ("thresholds" %in% fixedCall$param | any(grepl("\\|", fixedCall$param))) {
warning(c("This function is not yet optimized for testing thresholds.\n",
"Necessary identification contraints might not be specified."))
}
## collect parameter types for "mgcfa"
if (fixedCall$modelType != "mimic") {
## save all estimates from constrained model
PT <- parTable(con)[ , c("lhs","op","rhs","group","plabel")]
## extract parameters of interest
paramTypes <- c("loadings","intercepts","thresholds","residuals","means",
"residual.covariances","lv.variances","lv.covariances")
params <- PT[paste0(PT$lhs, PT$op, PT$rhs) %in% setdiff(fixedCall$param,
paramTypes), ]
## add parameters by type, if any are specified
types <- intersect(fixedCall$param, paramTypes)
ov.names <- lavaan::lavNames(con, "ov")
isOV <- PT$lhs %in% ov.names
lv.names <- con@pta$vnames$lv[[1]]
isLV <- PT$lhs %in% lv.names & PT$rhs %in% lv.names
if ("loadings" %in% types) params <- rbind(params, PT[PT$op == "=~", ])
if ("intercepts" %in% types) {
params <- rbind(params, PT[isOV & PT$op == "~1", ])
}
if ("thresholds" %in% types) params <- rbind(params, PT[PT$op == "|", ])
if ("residuals" %in% types) {
params <- rbind(params, PT[isOV & PT$lhs == PT$rhs & PT$op == "~~", ])
}
if ("residual.covariances" %in% types) {
params <- rbind(params, PT[isOV & PT$lhs != PT$rhs & PT$op == "~~", ])
}
if ("means" %in% types) {
params <- rbind(params, PT[PT$lhs %in% lv.names & PT$op == "~1", ])
}
if ("lv.variances" %in% types) {
params <- rbind(params, PT[isLV & PT$lhs == PT$rhs & PT$op == "~~", ])
}
if ("lv.covariances" %in% types) {
params <- rbind(params, PT[isLV & PT$lhs != PT$rhs & PT$op == "~~", ])
}
## remove parameters specified by "freeParam" argument
params <- params[!paste0(params$lhs, params$op, params$rhs) %in% fixedCall$freeParam, ]
fixedCall$param <- paste0(params$lhs, params$op, params$rhs)
}
}
if (is.null(AFIs) & is.null(moreAFIs)) {
message("No AFIs were selected, so only chi-squared will be permuted.\n")
fixedCall$AFIs <- "chisq"
AFIs <- "chisq"
}
if ("ecvi" %in% AFIs & lavInspect(con, "ngroups") > 1L)
stop("ECVI is not available for multigroup models.")
## check estimators
leastSq <- grepl("LS", lavInspect(con, "options")$estimator)
if (!is.null(uncon)) {
if (uncon@Options$estimator != lavInspect(con, "options")$estimator)
stop("Models must be fit using same estimator.")
}
if (!is.null(null)) {
if (lavInspect(null, "options")$estimator != lavInspect(con, "options")$estimator)
stop("Models must be fit using same estimator.")
}
## check extra functions, if any
restrictedArgs <- c("con","uncon","null","param","freeParam","covariates",
"AFIs","moreAFIs","maxSparse","maxNonconv","iseed")
if (!missing(datafun)) {
if (!is.function(datafun)) stop('Argument "datafun" must be a function.')
extraArgs <- formals(datafun)
if (!all(names(extraArgs) %in% c(restrictedArgs, "data")))
stop('The user-supplied function "datafun" can only have any among the ',
'following arguments:\n', paste(restrictedArgs, collapse = ", "))
}
if (!missing(extra)) {
if (!is.function(extra)) stop('Argument "extra" must be a function.')
extraArgs <- formals(extra)
if (!all(names(extraArgs) %in% restrictedArgs))
stop('The user-supplied function "extra" can only have any among the ',
'following arguments:\n', paste(restrictedArgs, collapse = ", "))
}
## return evaluated list of other arguments
lapply(fixedCall, eval)
}
## function to extract fit measures
#' @importFrom lavaan lavInspect
getAFIs <- function(...) {
dots <- list(...)
AFI1 <- list()
AFI0 <- list()
leastSq <- grepl("LS", lavInspect(dots$con, "options")$estimator)
## check validity of user-specified AFIs, save output
if (!is.null(dots$AFIs)) {
IC <- grep("ic|logl", dots$AFIs, value = TRUE)
if (leastSq & length(IC)) {
stop(paste("Argument 'AFIs' includes invalid options:",
paste(IC, collapse = ", "),
"Information criteria unavailable for least-squares estimators.",
sep = "\n"))
}
if (!is.null(dots$uncon))
AFI1[[1]] <- lavaan::fitMeasures(dots$uncon, fit.measures = dots$AFIs,
baseline.model = dots$null)
AFI0[[1]] <- lavaan::fitMeasures(dots$con, fit.measures = dots$AFIs,
baseline.model = dots$null)
}
## check validity of user-specified moreAFIs
if (!is.null(dots$moreAFIs)) {
IC <- grep("ic|hqc", dots$moreAFIs, value = TRUE)
if (leastSq & length(IC)) {
stop(paste("Argument 'moreAFIs' includes invalid options:",
paste(IC, collapse = ", "),
"Information criteria unavailable for least-squares estimators.",
sep = "\n"))
}
if (!is.null(dots$uncon))
AFI1[[2]] <- moreFitIndices(dots$uncon, fit.measures = dots$moreAFIs)
AFI0[[2]] <- moreFitIndices(dots$con, fit.measures = dots$moreAFIs)
}
## save observed AFIs or delta-AFIs
if (is.null(dots$uncon)) {
AFI.obs <- unlist(AFI0)
} else {
AFI.obs <- unlist(AFI0) - unlist(AFI1)
}
AFI.obs
}
## Function to extract modification indices for equality constraints
#' @importFrom lavaan parTable
getMIs <- function(...) {
dots <- list(...)
if (dots$modelType == "mgcfa") {
## save all estimates from constrained model
PT <- parTable(dots$con)[ , c("lhs","op","rhs","group","plabel")]
## extract parameters of interest
params <- PT[paste0(PT$lhs, PT$op, PT$rhs) %in% dots$param, ]
## return modification indices for specified constraints (param)
MIs <- lavaan::lavTestScore(dots$con)$uni
MI.obs <- MIs[MIs$lhs %in% params$plabel, ]
} else if (dots$modelType == "mimic") {
if (is.list(dots$param)) {
MI <- lapply(dots$param, function(x) lavaan::lavTestScore(dots$con, add = x)$test)
MI.obs <- do.call(rbind, MI)
} else MI.obs <- lavaan::lavTestScore(dots$con, add = dots$param)$uni
} else if (dots$modelType == "long") {
## coming soon
}
MI.obs
}
## Functions to find delta-AFIs & maximum modification index in one permutation
permuteOnce.mgcfa <- function(i, d, G, con, uncon, null, param, freeParam,
covariates, AFIs, moreAFIs, maxSparse, maxNonconv,
iseed, warn, extra = NULL, datafun = NULL) {
old_warn <- options()$warn
options(warn = warn)
## save arguments from call
argNames <- names(formals(permuteOnce.mgcfa))
availableArgs <- lapply(argNames, function(x) eval(as.name(x)))
names(availableArgs) <- argNames
group.label <- lavaan::lavInspect(con, "group.label")
nSparse <- 0L
nTries <- 1L
while ( (nSparse <= maxSparse) & (nTries <= maxNonconv) ) {
## permute grouping variable
d[ , G] <- sample(d[ , G])
## transform data?
if (!is.null(datafun)) {
extraArgs <- formals(datafun)
neededArgs <- intersect(names(extraArgs), names(availableArgs))
extraArgs <- do.call(c, lapply(neededArgs, function(nn) availableArgs[nn]))
extraArgs$data <- d
originalNames <- colnames(d)
d <- do.call(datafun, extraArgs)
## coerce extraOut to data.frame
if (!is.data.frame(d)) stop('Argument "datafun" did not return a data.frame')
if (!all(originalNames %in% colnames(d)))
stop('The data.frame returned by argument "datafun" did not contain ',
'column names required by the model:\n',
paste(setdiff(originalNames, colnames(d)), collapse = ", "))
}
## for ordered indicators, check that groups have same observed categories
ordVars <- lavaan::lavNames(con, type = "ov.ord")
if (length(ordVars) > 0) {
try(onewayTables <- lavaan::lavTables(d, dimension = 1L,
categorical = ordVars, group = G),
silent = TRUE)
if (exists("onewayTables")) {
if (any(onewayTables$obs.prop == 1)) {
nSparse <- nSparse + 1L
next
}
} else {
## no "onewayTables" probably indicates empty categories in 1+ groups
nSparse <- nSparse + 1L
next
}
}
## fit null model, if it exists
if (!is.null(null)) {
out.null <- lavaan::update(null, data = d, group.label = group.label)
}
## fit constrained model, check for convergence
try(out0 <- lavaan::update(con, data = d, group.label = group.label))
if (!exists("out0")) {
nTries <- nTries + 1L
next
}
if (!lavaan::lavInspect(out0, "converged")) {
nTries <- nTries + 1L
next
}
## fit unconstrained model (unless NULL), check for convergence
if (!is.null(uncon)) {
try(out1 <- lavaan::update(uncon, data = d, group.label = group.label))
if (!exists("out1")) {
nTries <- nTries + 1L
next
}
if (!lavaan::lavInspect(out1, "converged")) {
nTries <- nTries + 1L
next
}
}
## If you get this far, everything converged, so break WHILE loop
break
}
## if WHILE loop ended before getting results, return NA
if ( (nSparse == maxSparse) | (nTries == maxNonconv) ) {
allAFIs <- c(AFIs, moreAFIs)
AFI <- rep(NA, sum(!is.na(allAFIs)))
names(AFI) <- allAFIs[!is.na(allAFIs)]
MI <- if (is.null(param)) NULL else NA
extra.obs <- NA
nTries <- nTries + 1L
} else {
availableArgs$con <- out0
if (exists("out1")) availableArgs$uncon <- out1
if (exists("out.null")) availableArgs$null <- out.null
AFI <- do.call(getAFIs, availableArgs)
## save max(MI) if !is.null(param)
if (is.null(param)) {
MI <- NULL
} else {
MI <- max(do.call(getMIs, c(availableArgs, modelType = "mgcfa"))$X2)
}
## anything extra?
if (!is.null(extra)) {
extraArgs <- formals(extra)
neededArgs <- intersect(names(extraArgs), names(availableArgs))
extraArgs <- do.call(c, lapply(neededArgs, function(nn) availableArgs[nn]))
extraOut <- do.call(extra, extraArgs)
## coerce extraOut to data.frame
if (!is.list(extraOut)) extraOut <- as.list(extraOut)
extra.obs <- data.frame(extraOut)
} else extra.obs <- data.frame(NULL)
}
options(warn = old_warn)
list(AFI = AFI, MI = MI, extra = extra.obs,
n.nonConverged = nTries - 1L, n.Sparse = nSparse)
}
permuteOnce.mimic <- function(i, d, G, con, uncon, null, param, freeParam,
covariates, AFIs, moreAFIs, maxSparse, maxNonconv,
iseed, warn, extra = NULL, datafun = NULL) {
old_warn <- options()$warn
options(warn = warn)
## save arguments from call
argNames <- names(formals(permuteOnce.mimic))
availableArgs <- lapply(argNames, function(x) eval(as.name(x)))
names(availableArgs) <- argNames
group.label <- lavaan::lavInspect(con, "group.label")
nTries <- 1L
while (nTries <= maxNonconv) {
## permute covariate(s) within each group
if (length(G)) {
for (gg in group.label) {
dG <- d[ d[[G]] == gg, ]
N <- nrow(dG)
newd <- dG[sample(1:N, N), covariates, drop = FALSE]
for (COV in covariates) d[d[[G]] == gg, COV] <- newd[ , COV]
}
} else {
N <- nrow(d)
newd <- d[sample(1:N, N), covariates, drop = FALSE]
for (COV in covariates) d[ , COV] <- newd[ , COV]
}
## transform data?
if (!is.null(datafun)) {
extraArgs <- formals(datafun)
neededArgs <- intersect(names(extraArgs), names(availableArgs))
extraArgs <- do.call(c, lapply(neededArgs, function(nn) availableArgs[nn]))
extraArgs$data <- d
originalNames <- colnames(d)
d <- do.call(datafun, extraArgs)
## coerce extraOut to data.frame
if (!is.data.frame(d)) stop('Argument "datafun" did not return a data.frame')
if (!all(originalNames %in% colnames(d)))
stop('The data.frame returned by argument "datafun" did not contain ',
'column names required by the model:\n',
paste(setdiff(originalNames, colnames(d)), collapse = ", "))
}
## fit null model, if it exists
if (!is.null(null)) {
out.null <- lavaan::update(null, data = d, group.label = group.label)
}
## fit constrained model
try(out0 <- lavaan::update(con, data = d, group.label = group.label))
## check for convergence
if (!exists("out0")) {
nTries <- nTries + 1L
next
}
if (!lavaan::lavInspect(out0, "converged")) {
nTries <- nTries + 1L
next
}
## If you get this far, everything converged, so break WHILE loop
break
}
## if WHILE loop ended before getting results, return NA
if (nTries == maxNonconv) {
allAFIs <- c(AFIs, moreAFIs)
AFI <- rep(NA, sum(!is.na(allAFIs)))
names(AFI) <- allAFIs[!is.na(allAFIs)]
MI <- if (is.null(param)) NULL else NA
extra.obs <- NA
nTries <- nTries + 1L
} else {
availableArgs$con <- out0
if (exists("out.null")) availableArgs$null <- out.null
AFI <- do.call(getAFIs, availableArgs)
if (is.null(param)) {
MI <- NULL
} else {
MI <- max(do.call(getMIs, c(availableArgs, modelType = "mimic"))$X2)
}
## anything extra?
if (!is.null(extra)) {
extraArgs <- formals(extra)
neededArgs <- intersect(names(extraArgs), names(availableArgs))
extraArgs <- do.call(c, lapply(neededArgs, function(nn) availableArgs[nn]))
extraOut <- do.call(extra, extraArgs)
## coerce extraOut to data.frame
if (!is.list(extraOut)) extraOut <- as.list(extraOut)
extra.obs <- data.frame(extraOut)
} else extra.obs <- data.frame(NULL)
}
options(warn = old_warn)
list(AFI = AFI, MI = MI, extra = extra.obs,
n.nonConverged = nTries - 1L, n.Sparse = integer(length = 0))
}
semTools/R/residualCovariate.R 0000644 0001762 0000144 00000003576 14632143377 016076 0 ustar ligges users ### Sunthud Pornprasertmanit
### Last updated: 10 January 2021
##' Residual-center all target indicators by covariates
##'
##' This function will regress target variables on the covariate and replace the
##' target variables by the residual of the regression analysis. This procedure
##' is useful to control the covariate from the analysis model (Geldhof,
##' Pornprasertmanit, Schoemann, & Little, 2013).
##'
##'
##' @importFrom stats lm
##'
##' @param data The desired data to be transformed.
##' @param targetVar Varible names or the position of indicators that users wish
##' to be residual centered (as dependent variables)
##' @param covVar Covariate names or the position of the covariates using for
##' residual centering (as independent variables) onto target variables
##'
##' @return The data that the target variables replaced by the residuals
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##' @seealso [indProd()] For creating the indicator products with no
##' centering, mean centering, double-mean centering, or residual centering.
##'
##' @references Geldhof, G. J., Pornprasertmanit, S., Schoemann, A. M., &
##' Little, T. D. (2013). Orthogonalizing through residual centering:
##' Extended applications and caveats. *Educational and Psychological
##' Measurement, 73*(1), 27--46. \doi{10.1177/0013164412445473}
##'
##' @examples
##'
##' dat <- residualCovariate(attitude, 2:7, 1)
##'
##' @export
residualCovariate <- function(data, targetVar, covVar) {
x <- as.list(match.call())
cov <- eval(x$covVar)
target <- eval(x$targetVar)
if (all(is.numeric(cov))) cov <- colnames(data)[cov]
if (all(is.numeric(target))) target <- colnames(data)[target]
express <- paste("cbind(", paste(target, collapse = ", "), ") ~ ",
paste(cov, collapse = " + "), sep = "")
data[, target] <- lm(express, data = data)$residuals
return(data)
}
semTools/R/powerAnalysisSS.R 0000644 0001762 0000144 00000020014 14632143377 015520 0 ustar ligges users ### Alexander M. Schoemann & Terrence D. Jorgensen
### Last updated: 2 June 2022
### Function to apply Satorra & Saris method for chi-squared power analysis
## Steps:
## 1. Specify model (use lavaan syntax based on simulateData)
## 2. get model implied covariance matrix
## 3. Fit model with parameter constrained to 0 (or take a model specification for multiparameter tests?)
## 4. Use chi square from step 3 as non-centrality parameter to get power.
## Alternatively, combine steps 1 and 2 by providing population moments directly
##' Power for model parameters
##'
##' Apply Satorra & Saris (1985) method for chi-squared power analysis.
##'
##' Specify all non-zero parameters in a population model, either by using
##' lavaan syntax (`popModel`) or by submitting a population covariance
##' matrix (`Sigma`) and optional mean vector (`mu`) implied by the
##' population model. Then specify an analysis model that places at least
##' one invalid constraint (note the number in the `nparam` argument).
##'
##' There is also a Shiny app called "power4SEM" that provides a graphical user
##' interface for this functionality (Jak et al., in press). It can be accessed
##' at .
##'
##'
##' @importFrom stats qchisq pchisq
##'
##' @param powerModel lavaan [lavaan::model.syntax()] for the model to
##' be analyzed. This syntax should constrain at least one nonzero parameter
##' to 0 (or another number).
##' @param n `integer`. Sample size used in power calculation, or a vector
##' of sample sizes if analyzing a multigroup model. If
##' `length(n) < length(Sigma)` when `Sigma` is a list, `n` will
##' be recycled. If `popModel` is used instead of `Sigma`, `n`
##' must specify a sample size for each group, because that is used to infer
##' the number of groups.
##' @param nparam `integer`. Number of invalid constraints in `powerModel`.
##' @param popModel lavaan [lavaan::model.syntax()] specifying the
##' data-generating model. This syntax should specify values for all nonzero
##' parameters in the model. If `length(n) > 1`, the same population
##' values will be used for each group, unless different population values are
##' specified per group, either in the lavaan [lavaan::model.syntax()]
##' or by utilizing a list of `Sigma` (and optionally `mu`).
##' @param mu `numeric` or `list`. For a single-group model, a vector
##' of population means. For a multigroup model, a list of vectors (one per
##' group). If `mu` and `popModel` are missing, mean structure will
##' be excluded from the analysis.
##' @param Sigma `matrix` or `list`. For a single-group model,
##' a population covariance matrix. For a multigroup model, a list of matrices
##' (one per group). If missing, `popModel` will be used to generate a
##' model-implied Sigma.
##' @param fun character. Name of `lavaan` function used to fit
##' `powerModel` (i.e., `"cfa"`, `"sem"`, `"growth"`, or
##' `"lavaan"`).
##' @param alpha Type I error rate used to set a criterion for rejecting H0.
##' @param ... additional arguments to pass to [lavaan::lavaan()].
##' See also [lavaan::lavOptions()].
##'
##' @author
##' Alexander M. Schoemann (East Carolina University; \email{schoemanna@@ecu.edu})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio
##' test in covariance structure analysis. *Psychometrika, 50*(1), 83--90.
##' \doi{10.1007/BF02294150}
##'
##' Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L.
##' (2021). Analytical power calculations for structural equation modeling:
##' A tutorial and Shiny app. *Behavior Research Methods, 53*, 1385--1406.
##' \doi{10.3758/s13428-020-01479-0}
##'
##' @examples
##' ## Specify population values. Note every parameter has a fixed value.
##' modelP <- '
##' f1 =~ .7*V1 + .7*V2 + .7*V3 + .7*V4
##' f2 =~ .7*V5 + .7*V6 + .7*V7 + .7*V8
##' f1 ~~ .3*f2
##' f1 ~~ 1*f1
##' f2 ~~ 1*f2
##' V1 ~~ .51*V1
##' V2 ~~ .51*V2
##' V3 ~~ .51*V3
##' V4 ~~ .51*V4
##' V5 ~~ .51*V5
##' V6 ~~ .51*V6
##' V7 ~~ .51*V7
##' V8 ~~ .51*V8
##' '
##' ## Specify analysis model. Note parameter of interest f1~~f2 is fixed to 0.
##' modelA <- '
##' f1 =~ V1 + V2 + V3 + V4
##' f2 =~ V5 + V6 + V7 + V8
##' f1 ~~ 0*f2
##' '
##' ## Calculate power
##' SSpower(powerModel = modelA, popModel = modelP, n = 150, nparam = 1,
##' std.lv = TRUE)
##'
##' ## Get power for a range of sample sizes
##' Ns <- seq(100, 500, 40)
##' Power <- rep(NA, length(Ns))
##' for(i in 1:length(Ns)) {
##' Power[i] <- SSpower(powerModel = modelA, popModel = modelP,
##' n = Ns[i], nparam = 1, std.lv = TRUE)
##' }
##' plot(x = Ns, y = Power, type = "l", xlab = "Sample Size")
##'
##'
##' ## Optionally specify different values for multiple populations
##'
##' modelP2 <- '
##' f1 =~ .7*V1 + .7*V2 + .7*V3 + .7*V4
##' f2 =~ .7*V5 + .7*V6 + .7*V7 + .7*V8
##' f1 ~~ c(-.3, .3)*f2 # DIFFERENT ACROSS GROUPS
##' f1 ~~ 1*f1
##' f2 ~~ 1*f2
##' V1 ~~ .51*V1
##' V2 ~~ .51*V2
##' V3 ~~ .51*V3
##' V4 ~~ .51*V4
##' V5 ~~ .51*V5
##' V6 ~~ .51*V6
##' V7 ~~ .51*V7
##' V8 ~~ .51*V8
##' '
##' modelA2 <- '
##' f1 =~ V1 + V2 + V3 + V4
##' f2 =~ V5 + V6 + V7 + V8
##' f1 ~~ c(psi21, psi21)*f2 # EQUALITY CONSTRAINT ACROSS GROUPS
##' '
##' ## Calculate power
##' SSpower(powerModel = modelA2, popModel = modelP2, n = c(100, 100), nparam = 1,
##' std.lv = TRUE)
##' ## Get power for a range of sample sizes
##' Ns2 <- cbind(Group1 = seq(10, 100, 10), Group2 = seq(10, 100, 10))
##' Power2 <- apply(Ns2, MARGIN = 1, FUN = function(nn) {
##' SSpower(powerModel = modelA2, popModel = modelP2, n = nn,
##' nparam = 1, std.lv = TRUE)
##' })
##' plot(x = rowSums(Ns2), y = Power2, type = "l", xlab = "Total Sample Size",
##' ylim = 0:1)
##' abline(h = c(.8, .9), lty = c("dotted","dashed"))
##' legend("bottomright", c("80% Power","90% Power"), lty = c("dotted","dashed"))
##'
##' @export
SSpower <- function(powerModel, n, nparam, popModel, mu, Sigma,
fun = "sem", alpha = .05, ...) {
if (missing(Sigma)) {
## specify (vector of) sample size(s) for optional multigroup syntax to work
popMoments <- lavaan::fitted(do.call(fun, list(model = popModel,
sample.nobs = n),
envir = getNamespace("lavaan")))
## without data, can't apply fitted() to multigroup model syntax, so
## save the same fitted moments for each group
if (length(n) > 1L) {
Sigma <- lapply(popMoments, "[[", i = "cov")
mu <- if (!is.null(popMoments[[1]]$mean)) {
lapply(popMoments, "[[", i = "mean")
} else NULL
} else {
## single group
Sigma <- popMoments$cov
mu <- popMoments$mean
}
} else {
## otherwise, user-supplied moments
if (is.list(Sigma)) {
nG <- length(Sigma)
if (length(n) < nG) n <- rep(n, length.out = nG)
if (length(n) > nG) n <- n[1:nG]
no.mu <- missing(mu)
if (!no.mu) null.mu <- any(sapply(mu, is.null))
if (no.mu || null.mu) {
mu <- NULL
}
} else if (is.matrix(Sigma)) {
n <- n[[1]]
if (missing(mu)) {
mu <- NULL
} else if (!is.numeric(mu) || !!is.null(mu)) {
stop('mu must be a numeric vector, or a list (one vector per group)')
}
} else stop('Sigma must be a covariance matrix, or a list (one matrix per group)')
}
## Fit (probably constrained) analysis model
dots <- list(...)
funArgs <- list(model = powerModel, sample.cov = Sigma,
sample.mean = mu, sample.nobs = n)
useArgs <- c(funArgs, dots[setdiff(names(dots), names(funArgs))])
fit <- do.call(fun, useArgs, envir = getNamespace("lavaan"))
## get NCP from chi square
ncp <- lavaan::fitmeasures(fit)["chisq"]
## critical value under H0
critVal <- qchisq(alpha, df = nparam, lower.tail = FALSE)
## return power
pchisq(critVal, df = nparam, ncp = ncp, lower.tail = FALSE)
}
semTools/R/parcelAllocation.R 0000644 0001762 0000144 00000042730 14764334616 015703 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 12 March 2025
##' Random Allocation of Items to Parcels in a Structural Equation Model
##'
##' @description
##' This function generates a given number of randomly generated item-to-parcel
##' allocations, fits a model to each allocation, and provides averaged results
##' over all allocations.
##'
##' @details
##' This function implements the random item-to-parcel allocation procedure
##' described in Sterba (2011) and Sterba and MacCallum (2010). The function
##' takes a single data set with item-level data, randomly assigns items to
##' parcels, fits a structural equation model to the parceled data using
##' [lavaan::lavaanList()], and repeats this process for a user-specified
##' number of random allocations. Results from all fitted models are summarized
##' in the output. For further details on the benefits of randomly allocating
##' items to parcels, see Sterba (2011) and Sterba and MacCallum (2010).
##'
##' @importFrom stats sd qnorm
##' @importFrom lavaan parTable lavInspect lavaanList lavaanify lavNames
##'
##' @param model [lavaan::lavaan()] model syntax specifying the model
##' fit to (at least some) parceled data. Note that there can be a mixture of
##' items and parcels (even within the same factor), in case certain items
##' should never be parceled. Can be a character string or parameter table.
##' Also see [lavaan::lavaanify()] for more details.
##' @param data A `data.frame` containing all observed variables appearing
##' in the `model`, as well as those in the `item.syntax` used to
##' create parcels. If the data have missing values, multiple imputation
##' before parceling is recommended: submit a stacked data set (with a variable
##' for the imputation number, so they can be separateed later) and set
##' `do.fit = FALSE` to return the list of `data.frame`s (one per
##' allocation), each of which is a stacked, imputed data set with parcels.
##' @param parcel.names `character` vector containing names of all parcels
##' appearing as indicators in `model`.
##' @param item.syntax [lavaan::model.syntax()] specifying the model
##' that would be fit to all of the unparceled items, including items that
##' should be randomly allocated to parcels appearing in `model`.
##' @param nAlloc The number of random items-to-parcels allocations to generate.
##' @param fun `character` string indicating the name of the
##' [lavaan::lavaan()] function used to fit `model` to
##' `data`. Can only take the values `"lavaan"`, `"sem"`,
##' `"cfa"`, or `"growth"`.
##' @param alpha Alpha level used as criterion for significance.
##' @param fit.measures `character` vector containing names of fit measures
##' to request from each fitted [lavaan::lavaan()] model. See the
##' output of [lavaan::fitMeasures()] for a list of available measures.
##' @param \dots Additional arguments to be passed to
##' [lavaan::lavaanList()]. See also [lavaan::lavOptions()]
##' @param show.progress If `TRUE`, show a [utils::txtProgressBar()]
##' indicating how fast the model-fitting iterates over allocations.
##' @param iseed (Optional) Random seed used for parceling items. When the same
##' random seed is specified and the program is re-run, the same allocations
##' will be generated. Using the same `iseed` argument will ensure the
##' any model is fit to the same parcel allocations. *Note*: When using
##' \pkg{parallel} options, you must first type `RNGkind("L'Ecuyer-CMRG")`
##' into the R Console, so that the seed will be controlled across cores.
##' @param do.fit If `TRUE` (default), the `model` is fitted to each
##' parceled data set, and the summary of results is returned (see the Value
##' section below). If `FALSE`, the items are randomly parceled, but the
##' model is not fit; instead, the `list` of `data.frame`s is
##' returned (so assign it to an object).
##' @param return.fit If `TRUE`, a [lavaan::lavaanList-class] object
##' is returned with the `list` of results across allocations
##' @param warn Whether to print warnings when fitting `model` to each allocation
##'
##' @return
##' \item{Estimates}{A `data.frame` containing results related to
##' parameter estimates with columns corresponding to their names; average
##' and standard deviation across allocations; minimum, maximum, and range
##' across allocations; and the proportion of allocations in which each
##' parameter estimate was significant.}
##' \item{SE}{A `data.frame` containing results similar to
##' `Estimates`, but related to the standard errors of parameter
##' estimates.}
##' \item{Fit}{A `data.frame` containing results related to model fit,
##' with columns corresponding to fit index names; their average and
##' standard deviation across allocations; the minimum, maximum, and range
##' across allocations; and (if the test statistic or RMSEA is included in
##' `fit.measures`) the proportion of allocations in which each
##' test of (exact or close) fit was significant.}
##' \item{Model}{A [lavaan::lavaanList-class] object containing results
##' of the `model` fitted to each parcel allocation. Only returned if
##' `return.fit = TRUE`.}
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso [PAVranking()] for comparing 2 models,
##' [poolMAlloc()] for choosing the number of allocations
##'
##' @references
##'
##' Sterba, S. K. (2011). Implications of parcel-allocation
##' variability for comparing fit of item-solutions and parcel-solutions.
##' *Structural Equation Modeling, 18*(4), 554--577.
##' \doi{10.1080/10705511.2011.607073}
##'
##' Sterba, S. K. & MacCallum, R. C. (2010). Variability in parameter estimates
##' and model fit across random allocations of items to parcels.
##' *Multivariate Behavioral Research, 45*(2), 322--358.
##' \doi{10.1080/00273171003680302}
##'
##' Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation
##' variability in practice: Combining sources of uncertainty and choosing the
##' number of allocations. *Multivariate Behavioral Research, 51*(2--3),
##' 296--313. \doi{10.1080/00273171.2016.1144502}
##'
##' Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model
##' selection: Parcel-allocation variability in model ranking.
##' *Psychological Methods, 22*(1), 47--68. \doi{10.1037/met0000067}
##'
##' @examples
##'
##' ## Fit 2-factor CFA to simulated data. Each factor has 9 indicators.
##'
##' ## Specify the item-level model (if NO parcels were created)
##' item.syntax <- c(paste0("f1 =~ f1item", 1:9),
##' paste0("f2 =~ f2item", 1:9))
##' cat(item.syntax, sep = "\n")
##' ## Below, we reduce the size of this same model by
##' ## applying different parceling schemes
##'
##'
##' ## 3-indicator parcels
##' mod.parcels <- '
##' f1 =~ par1 + par2 + par3
##' f2 =~ par4 + par5 + par6
##' '
##' ## names of parcels
##' (parcel.names <- paste0("par", 1:6))
##'
##' \donttest{
##' ## override default random-number generator to use parallel options
##' RNGkind("L'Ecuyer-CMRG")
##'
##' parcelAllocation(mod.parcels, data = simParcel, nAlloc = 100,
##' parcel.names = parcel.names, item.syntax = item.syntax,
##' # parallel = "multicore", # parallel available in Mac/Linux
##' std.lv = TRUE) # any addition lavaan arguments
##'
##'
##'
##' ## POOL RESULTS by treating parcel allocations as multiple imputations
##' ## Details provided in Sterba & Rights (2016); see ?poolMAlloc.
##'
##' ## save list of data sets instead of fitting model yet
##' dataList <- parcelAllocation(mod.parcels, data = simParcel, nAlloc = 100,
##' parcel.names = parcel.names,
##' item.syntax = item.syntax,
##' do.fit = FALSE)
##' ## now fit the model to each data set
##' library(lavaan.mi)
##' fit.parcels <- cfa.mi(mod.parcels, data = dataList, std.lv = TRUE)
##' summary(fit.parcels) # pooled using Rubin's rules
##' anova(fit.parcels) # pooled test statistic
##' help(package = "lavaan.mi") # find more methods for pooling results
##' }
##'
##'
##' ## multigroup example
##' simParcel$group <- 0:1 # arbitrary groups for example
##' mod.mg <- '
##' f1 =~ par1 + c(L2, L2)*par2 + par3
##' f2 =~ par4 + par5 + par6
##' '
##' ## names of parcels
##' (parcel.names <- paste0("par", 1:6))
##'
##' parcelAllocation(mod.mg, data = simParcel, parcel.names, item.syntax,
##' std.lv = TRUE, group = "group", group.equal = "loadings",
##' nAlloc = 20, show.progress = TRUE)
##'
##'
##'
##' ## parcels for first factor, items for second factor
##' mod.items <- '
##' f1 =~ par1 + par2 + par3
##' f2 =~ f2item2 + f2item7 + f2item8
##' '
##' ## names of parcels
##' (parcel.names <- paste0("par", 1:3))
##'
##' parcelAllocation(mod.items, data = simParcel, parcel.names, item.syntax,
##' nAlloc = 20, std.lv = TRUE)
##'
##'
##'
##' ## mixture of 1- and 3-indicator parcels for second factor
##' mod.mix <- '
##' f1 =~ par1 + par2 + par3
##' f2 =~ f2item2 + f2item7 + f2item8 + par4 + par5 + par6
##' '
##' ## names of parcels
##' (parcel.names <- paste0("par", 1:6))
##'
##' parcelAllocation(mod.mix, data = simParcel, parcel.names, item.syntax,
##' nAlloc = 20, std.lv = TRUE)
##'
##' @export
parcelAllocation <- function(model, data, parcel.names, item.syntax,
nAlloc = 100, fun = "sem", alpha = .05,
fit.measures = c("chisq","df","cfi",
"tli","rmsea","srmr"), ...,
show.progress = FALSE, iseed = 12345,
do.fit = TRUE, return.fit = FALSE, warn = FALSE) {
if (nAlloc < 2) stop("Minimum of two allocations required.")
if (!fun %in% c("sem","cfa","growth","lavaan"))
stop("'fun' argument must be either 'lavaan', 'cfa', 'sem', or 'growth'")
lavArgs <- list(...)
lavArgs$model <- item.syntax
lavArgs$data <- data
lavArgs$do.fit <- FALSE
## fit item-level model to data
item.fit <- do.call(fun, lavArgs)
item.PT <- parTable(item.fit)
## construct parameter table for parcel-level model
if (is.character(model)) {
## default lavaanify arguments
ptArgs <- formals(lavaanify)
## arguments passed to lavaan by user
fitArgs <- lavInspect(item.fit, "call")[-1]
## overwrite defaults with user's values
sameArgs <- intersect(names(ptArgs), names(fitArgs))
ptArgs[sameArgs] <- fitArgs[sameArgs]
ptArgs$model <- model
if (is.null(ptArgs$model.type)) ptArgs$model.type <- "sem"
if (ptArgs$model.type != "growth") ptArgs$model.type <- "sem"
ptArgs$ngroups <- lavInspect(item.fit, "ngroups")
PT <- do.call("lavaanify", ptArgs)
} else if (is.data.frame(model)) {
PT <- model
} else stop("'model' argument must be a character string of lavaan model",
" syntax or a lavaan parameter table. See ?lavaanify help page.")
## check that both models specify the same factors
## NOTE: lv.names remains unaltered, to omit latent indicators from $items
factorNames <- lv.names <- lavNames(PT, type = "lv")
if (!all(sort(lavNames(item.PT, type = "lv")) == sort(factorNames))) {
stop("'model' and 'item.syntax' arguments specify different factors.\n",
"'model' specifies: ", paste(sort(factorNames), collapse = ", "), "\n",
"'item.syntax' specifies: ", paste(sort(lavNames(item.PT,
type = "lv")),
collapse = ", "))
}
## for each factor, assign item sets to parcel sets
assignments <- list()
for (i in factorNames) {
## all indicators from parcel-level model
parcels <- PT$rhs[PT$lhs == i & PT$op == "=~"]
## all indicators from item-level model
items <- item.PT$rhs[item.PT$lhs == i & item.PT$op == "=~"]
## exclude observed indicators from parceling scheme if specified
## in parcel-level model
assignments[[i]]$parcels <- setdiff(parcels, c(names(data), lv.names))
assignments[[i]]$items <- setdiff(items , c( parcels , lv.names))
## Does this factor have parcels? If not, omit this factor from next loop
if (length(assignments[[i]]$parcels) == 0L) {
factorNames <- factorNames[-which(factorNames == i)]
next
}
## how many items per parcel?
nItems <- length(assignments[[i]]$items)
nParcels <- length(assignments[[i]]$parcels)
assignments[[i]]$nPerParcel <- rep(nItems %/% nParcels, nParcels)
if (nItems %% nParcels > 0) for (j in 1:(nItems %% nParcels)) {
assignments[[i]]$nPerParcel[j] <- assignments[[i]]$nPerParcel[j] + 1
}
names(assignments[[i]]$nPerParcel) <- assignments[[i]]$parcels
}
## for each allocation, create parcels from items
dataList <- list()
for (i in 1:nAlloc) {
dataList[[i]] <- data
for (j in factorNames) {
## create a random assignment pattern
ranAss <- sample(rep(names(assignments[[j]]$nPerParcel),
times = assignments[[j]]$nPerParcel))
## add each parcel to a copy of the original data set
for (k in assignments[[j]]$parcels) {
## which items were selected for this parcel?
ranVars <- assignments[[j]]$items[ranAss == k]
## calculate row means of those items, save as parcel
dataList[[i]][ , k] <- rowMeans(data[ , ranVars])
}
}
}
if (!do.fit) return(dataList)
## fit parcel-level model to list of data sets
set.seed(iseed) # in case not using parallel
fitList <- lavaanList(model, dataList, cmd = fun, ..., warn = warn, iseed = iseed,
FUN = lavaan::fitMeasures, show.progress = show.progress)
## for which data sets did the model converge?
conv <- fitList@meta$ok
if (!any(conv)) stop("The model did not converge for any allocations.")
if (!all(conv)) message("The model did not converge for the following ",
"allocations: ", paste(which(!conv), collapse = ", "))
## tools to extract output
getOutput <- function(x, sig = FALSE) {
c(Avg = mean(x, na.rm = TRUE), SD = sd(x, na.rm = TRUE),
Min = min(x, na.rm = TRUE), Max = max(x, na.rm = TRUE),
Range = max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}
out <- list()
myCols <- c("lhs","op","rhs","group", "block","label")
template <- data.frame(fitList@ParTableList[[which(conv)[1]]][myCols])
## parameter estimates
Est <- sapply(fitList@ParTableList[conv], function(x) x$est)
out$Estimates <- cbind(template, t(apply(Est, 1, getOutput)))
## standard errors
SE <- sapply(fitList@ParTableList[conv], function(x) x$se)
## Any for which SE could not be calculated?
missingSE <- apply(SE, 2, function(x) any(is.na(x)))
if (!all(missingSE)) {
if (any(missingSE)) message("Standard errors could not be computed for ",
"the following allocations: ",
paste(which(missingSE), collapse = ", "))
out$SE <- cbind(template, t(apply(SE[ , !missingSE], 1, getOutput)))
## add significance test results to $Estimates
Sig <- abs(Est[, !missingSE] / SE[, !missingSE]) > qnorm(alpha / 2,
lower.tail = FALSE)
out$Estimates$Percent_Sig <- rowMeans(Sig)
out$Estimates$Percent_Sig[fitList@ParTableList[[which(conv)[1]]]$free == 0L] <- NA
} else {
message("Standard errors could not be calculated for any converged",
" data sets, so no significance tests could be conducted.")
out$SE <- NULL
}
## fit measures
Fit <- do.call(cbind, fitList@funList[conv])[fit.measures, ]
out$Fit <- data.frame(t(apply(Fit, 1, getOutput)))
if (any(grepl(pattern = "chisq", fit.measures))) {
out$Fit$Percent_Sig <- NA
if ("chisq" %in% fit.measures) {
pvalues <- sapply(fitList@funList[conv], "[", i = "pvalue")
out$Fit["chisq", "Percent_Sig"] <- mean(pvalues < alpha, na.rm = TRUE)
}
if ("chisq.scaled" %in% fit.measures) {
pvalues <- sapply(fitList@funList[conv], "[", i = "pvalue.scaled")
out$Fit["chisq.scaled", "Percent_Sig"] <- mean(pvalues < alpha, na.rm = TRUE)
}
}
if (any(grepl(pattern = "rmsea", fit.measures))) {
if (is.null(out$Fit$Percent_Sig)) out$Fit$Percent_Sig <- NA
if ("rmsea" %in% fit.measures) {
pvalues <- sapply(fitList@funList[conv], "[", i = "rmsea.pvalue")
out$Fit["rmsea", "Percent_Sig"] <- mean(pvalues < alpha, na.rm = TRUE)
}
if ("rmsea.scaled" %in% fit.measures) {
pvalues <- sapply(fitList@funList[conv], "[", i = "rmsea.pvalue.scaled")
out$Fit["rmsea.scaled", "Percent_Sig"] <- mean(pvalues < alpha, na.rm = TRUE)
}
}
## check for robust test
if (any(grepl(pattern = "scaled", names(fitList@funList[conv][[1]]))) &
!any(grepl(pattern = "scaled", fit.measures))) {
warning('Robust test requested, but "fit.measures" argument does not',
' include any scaled measures (e.g., "chisq.scaled", ',
'"rmsea.scaled", or "rmsea.robust").')
}
## remove rows that do not correspond to estimates
out$Estimates <- out$Estimates[fitList@ParTableList[[which(conv)[1]]]$group > 0L, ]
if (!is.null(out$SE)) out$SE <- out$SE[fitList@ParTableList[[which(conv)[1]]]$group > 0L, ]
## assign class for lavaan's print method
class(out$Estimates) <- c("lavaan.data.frame","data.frame")
if (!is.null(out$SE)) class(out$SE) <- c("lavaan.data.frame","data.frame")
class(out$Fit) <- c("lavaan.data.frame","data.frame")
## return output
if (return.fit) {
out$Model <- fitList
out$Model@external$dataList <- dataList
}
out
}
semTools/R/compareFit.R 0000644 0001762 0000144 00000053351 15137362711 014511 0 ustar ligges users ### Terrence D. Jorgensen & Sunthud Pornprasertmanit
### Last updated: 31 January 2026
### source code for compareFit() function and FitDiff class
## -----------------
## Class and Methods
## -----------------
##' Class For Representing A Template of Model Fit Comparisons
##'
##' This class contains model fit measures and model fit comparisons among
##' multiple models
##'
##'
##' @name FitDiff-class
##' @aliases FitDiff-class show,FitDiff-method summary,FitDiff-method
##' @docType class
##'
##' @slot name `character`. The name of each model
##' @slot model.class `character`. One class to which each model belongs
##' @slot nested `data.frame`. Model fit comparisons between adjacently
##' nested models that are ordered by their degrees of freedom (*df*)
##' @slot fit `data.frame`. Fit measures of all models specified in the
##' `name` slot, ordered by their *df*
##' @slot fit.diff `data.frame`. Sequential differences in fit measures in
##' the `fit` slot
##'
##' @section Objects from the Class: Objects can be created via the
##' [compareFit()] function.
##'
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso [compareFit()]; [clipboard()]
##'
##' @examples
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##' fit.config <- cfa(HS.model, data = HolzingerSwineford1939, group = "school")
##' ## invariance constraints
##' fit.metric <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
##' group.equal = "loadings")
##' fit.scalar <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
##' group.equal = c("loadings","intercepts"))
##' fit.strict <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
##' group.equal = c("loadings","intercepts","residuals"))
##' measEqOut <- compareFit(fit.config, fit.metric, fit.scalar, fit.strict)
##' summary(measEqOut)
##' summary(measEqOut, fit.measures = "all")
##' summary(measEqOut, fit.measures = c("aic", "bic"))
##'
##' if(interactive()){
##' ## Save results to a file
##' saveFile(measEqOut, file = "measEq.txt")
##'
##' ## Copy to a clipboard
##' clipboard(measEqOut)
##' }
##'
setClass("FitDiff", slots = c(name = "character", # list of model names
model.class = "character", # lavaan or lavaan.mi
nested = "data.frame", # anova() table
fit = "data.frame", # fitMeasures() output
fit.diff = "data.frame")) # index differences
##' @rdname FitDiff-class
##' @aliases show,FitDiff-method
##' @importFrom methods getMethod
##' @export
setMethod("show", signature(object = "FitDiff"), function(object) {
cat("The following", object@model.class, "models were compared:\n ")
cat(object@name, sep = "\n ")
cat("To view results, assign the compareFit() output to an object and ",
"use the summary() method; see the class?FitDiff help page.\n")
invisible(object)
})
##' @rdname FitDiff-class
##' @aliases summary,FitDiff-method
##'
##' @param object object of class `FitDiff`
##' @param fit.measures `character` vector naming fit indices the user can
##' request from [lavaan::fitMeasures()]. If `"default"`, the
##' fit measures will be `c("chisq", "df", "pvalue", "cfi", "tli",
##' "rmsea", "srmr", "aic", "bic")`. If `"all"`, all available fit
##' measures will be returned.
##' @param nd number of digits printed
##' @param tag single `character` used to flag the model preferred by each
##' fit index. To omit tags, set to `NULL` or `NA`.
##'
##' @export
setMethod("summary", signature(object = "FitDiff"),
function(object, fit.measures = "default", nd = 3, tag = "\u2020") {
if (nrow(object@nested) > 0L) {
cat("################### Nested Model Comparison #########################\n")
test.statistics <- object@nested
if (object@model.class == "lavaan") {
print(test.statistics, nd = nd)
} else {
class(test.statistics) <- c("lavaan.data.frame","data.frame")
stats::printCoefmat(test.statistics, P.values = TRUE, has.Pvalue = TRUE)
}
cat("\n")
}
noFit <- ncol(object@fit) == 1L && names(object@fit)[1] == "df"
if (!noFit) {
if (is.null(fit.measures)) fit.measures <- colnames(object@fit)
if ("all" %in% fit.measures) fit.measures <- colnames(object@fit)
if (length(fit.measures) == 1 && fit.measures == "default") {
## robust or scaled test statistics?
if (is.null(object@fit$cfi.scaled)) {
fit.measures <- c("chisq","df","pvalue","rmsea","cfi","tli","srmr")
} else if (all(!is.na(object@fit$cfi.robust)) && !is.null(object@fit$cfi.robust)) {
fit.measures <- c("chisq.scaled","df.scaled","pvalue.scaled",
"rmsea.robust","cfi.robust","tli.robust","srmr")
} else {
fit.measures <- c("chisq.scaled","df.scaled","pvalue.scaled",
"rmsea.scaled","cfi.scaled","tli.scaled","srmr")
}
if ("aic" %in% colnames(object@fit)) {
fit.measures <- c(fit.measures, "aic", "bic")
}
}
cat("####################### Model Fit Indices ###########################\n")
## this is the object to return (numeric, no printed daggers)
fit.indices <- object@fit[ , fit.measures , drop = FALSE]
## print with daggers marking each fit index's preferred model
## (turns "numeric" vectors into "character")
badness <- grepl(pattern = c("chisq|rmsea|ic|rmr|ecvi|fmin|hqc"),
x = colnames(fit.indices))
goodness <- grepl(pattern = c("cfi|tli|rfi|nfi|ifi|rni|cn|gfi|mfi|Hat"),
x = colnames(fit.indices))
minvalue <- badness & !goodness
minvalue[!badness & !goodness] <- NA
fit.integer <- grepl(pattern = c("df|npar|ntotal"),
x = colnames(fit.indices))
suppressWarnings(fitTab <- as.data.frame(mapply(tagCharacter, nd = nd,
char = tag,
vec = fit.indices,
minvalue = minvalue,
print_integer = fit.integer),
stringsAsFactors = FALSE))
rownames(fitTab) <- object@name
colnames(fitTab) <- colnames(fit.indices)
class(fitTab) <- c("lavaan.data.frame","data.frame")
print(fitTab, nd = nd)
cat("\n")
if (nrow(object@nested) > 0L) {
fit.diff.measures <- fit.measures[!grepl(pattern = "chisq|pvalue|ntotal",
x = fit.measures)]
cat("################## Differences in Fit Indices #######################\n")
fit.diff <- object@fit.diff[ , fit.diff.measures, drop = FALSE]
class(fit.diff) <- c("lavaan.data.frame","data.frame")
print(fit.diff, nd = nd)
cat("\n")
}
}
invisible(object)
})
## "method" for saveFile() function (see "clipboard.R")
saveFileFitDiff <- function(object, file, what = "summary",
tableFormat = FALSE, fit.measures = "default",
writeArgs = list()) {
if (tableFormat) {
writeArgs$file <- file
writeArgs$append <- TRUE
if (is.null(writeArgs$sep)) writeArgs$sep <- "\t"
if (is.null(writeArgs$quote)) writeArgs$quote <- FALSE
if (is.null(writeArgs$row.names)) writeArgs$row.names <- FALSE
if (nrow(object@nested) > 0L) {
cat("Nested Model Comparison\n\n", file = file, append = TRUE)
out <- object@nested
#out <- data.frame(model.diff = rownames(out), out)
writeArgs$x <- out
do.call("write.table", writeArgs)
cat("\n\n", file = file, append = TRUE)
}
out2 <- getFitSummary(object, fit.measures)
out2 <- data.frame(model = object@name, out2)
cat("Fit Indices Summaries\n\n", file = file, append = TRUE)
writeArgs$x <- out2
do.call("write.table", writeArgs)
} else {
write(paste(utils::capture.output(getMethod("summary",
signature = "FitDiff")(object)),
collapse = "\n"), file = file)
}
}
## --------------------
## Constructor Function
## --------------------
##' Build an object summarizing fit indices across multiple models
##'
##' This function will create the template to compare fit indices across
##' multiple fitted lavaan objects. The results can be exported to a clipboard
##' or a file later.
##'
##' @importFrom lavaan lavTestLRT
##' @importMethodsFrom lavaan fitMeasures
##'
##' @param ... fitted `lavaan` models or list(s) of `lavaan` objects.
##' [lavaan.mi::lavaan.mi-class] objects are also accepted, but all models
##' must belong to the same class.
##' @param nested `logical` indicating whether the models in `...` are
##' nested. See [net()] for an empirical test of nesting.
##' @param argsLRT `list` of arguments to pass to
##' [lavaan::lavTestLRT()], as well as to
##' [lavaan.mi::lavTestLRT.mi()] and [lavaan::fitMeasures()] when
##' comparing [lavaan.mi::lavaan.mi-class] models.
##' @param indices `logical` indicating whether to return fit indices from
##' the [lavaan::fitMeasures()] function. Selecting particular
##' indices is controlled in the `summary` method; see
##' [FitDiff-class].
##' @param moreIndices `logical` indicating whether to return fit indices
##' from the [moreFitIndices()] function. Selecting particular
##' indices is controlled in the `summary` method; see
##' [FitDiff-class].
##' @param baseline.model optional fitted [lavaan::lavaan-class] model passed to
##' [lavaan::fitMeasures()] to calculate incremental fit indices.
##' @param nPrior passed to [moreFitIndices()], if relevant
##'
##' @return A [FitDiff-class] object that saves model fit
##' comparisons across multiple models. If the models are not nested, only
##' fit indices for each model are returned. If the models are nested, the
##' differences in fit indices are additionally returned, as well as test
##' statistics comparing each sequential pair of models (ordered by their
##' degrees of freedom).
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso [FitDiff-class], [clipboard()]
##'
##' @examples
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' ## non-nested models
##' fit1 <- cfa(HS.model, data = HolzingerSwineford1939)
##'
##' m2 <- ' f1 =~ x1 + x2 + x3 + x4
##' f2 =~ x5 + x6 + x7 + x8 + x9 '
##' fit2 <- cfa(m2, data = HolzingerSwineford1939)
##'
##' (out1 <- compareFit(fit1, fit2, nested = FALSE))
##' summary(out1)
##'
##'
##' ## nested model comparisons: measurement equivalence/invariance
##' fit.config <- cfa(HS.model, data = HolzingerSwineford1939, group = "school")
##' fit.metric <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
##' group.equal = "loadings")
##' fit.scalar <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
##' group.equal = c("loadings","intercepts"))
##' fit.strict <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
##' group.equal = c("loadings","intercepts","residuals"))
##'
##' measEqOut <- compareFit(fit.config, fit.metric, fit.scalar, fit.strict,
##' moreIndices = TRUE) # include moreFitIndices()
##' summary(measEqOut)
##' summary(measEqOut, fit.measures = "all")
##' summary(measEqOut, fit.measures = c("aic", "bic", "sic", "ibic"))
##'
##'
#FIXME: why doesn't this example work?
## \donttest{
## ## also applies to lavaan.mi objects (fit model to multiple imputations)
## library(lavaan.mi)
## data("HS20imps", package = "lavaan.mi") # example data: 20 imputations
##
## ## request robust test statistics
## mgfit2 <- cfa.mi(HS.model, data = HS20imps, group = "school", estimator = "mlm")
## mgfit1 <- cfa.mi(HS.model, data = HS20imps, group = "school", estimator = "mlm",
## group.equal = "loadings")
## mgfit0 <- cfa.mi(HS.model, data = HS20imps, group = "school", estimator = "mlm",
## group.equal = c("loadings","intercepts"))
##
## ## request the strictly-positive robust test statistics
## out2 <- compareFit(scalar = mgfit0, metric = mgfit1, config = mgfit2,
## argsLRT = list(asymptotic = TRUE,
## method = "satorra.bentler.2010"))
## ## note that moreFitIndices() does not work for lavaan.mi objects
## summary(out2, fit.measures = c("crmr","srmr", "cfi.robust","tli.robust",
## "rmsea.robust",
## "rmsea.ci.lower.robust",
## "rmsea.ci.upper.robust"))
## }
##'
##' @export
compareFit <- function(..., nested = TRUE, argsLRT = list(), indices = TRUE,
moreIndices = FALSE, baseline.model = NULL, nPrior = 1) {
## make sure there is something to do
if (!(nested || indices || moreIndices)) {
message('User requested neither indices nor tests of nested models.')
return(NULL)
}
## separate models from lists of models
dots <- list(...)
idx.list <- sapply(dots, is.list)
modLists <- dots[ idx.list]
mods <- dots[!idx.list]
## capture names of any arguments passed via dots
allnames <- sapply(substitute(list(...))[-1], deparse)
listnames <- allnames[ idx.list]
modnames <- allnames[!idx.list]
## make sure models are named
if (length(mods) && is.null(names(mods))) {
names(mods) <- modnames
} else for (nn in seq_along(mods)) {
if (names(mods)[nn] == "") names(mods)[nn] <- modnames[nn]
}
## make sure lists are named
if (length(modLists) && is.null(names(modLists))) {
names(modLists) <- listnames
} else for (nn in seq_along(modLists)) {
if (names(modLists)[nn] == "") names(modLists)[nn] <- listnames[nn]
}
## within each list, make sure models are named
for (i in seq_along(modLists)) {
if (length(modLists[[i]]) && is.null(names(modLists[[i]]))) {
names(modLists[[i]]) <- seq_along(modLists[[i]])
} else for (nn in seq_along(modLists[[i]])) {
if (names(modLists[[i]])[nn] == "") names(modLists[[i]])[nn] <- nn
}
}
## collapse into a single list of models
if (length(modLists)) mods <- c(mods, unlist(modLists))
## check for lavaan models
not.lavaan <- !sapply(mods, inherits, what = c("lavaan","lavaan.mi"))
if (any(not.lavaan)) stop("The following are not fitted lavaan(.mi) models:\n",
paste0(names(which(not.lavaan)), collapse = ", "))
modClass <- unique(sapply(mods, class))
if (length(modClass) > 1L) stop('All models must be of the same class (e.g.,',
' cannot compare lavaan objects to lavaan.mi)')
if (inherits(mods[[1]], "lavaan")) {
nonConv <- !sapply(mods, lavInspect, what = "converged")
} else if (inherits(mods[[1]], "lavaan.mi")) {
## attach lavaan.mi to access LRT + fitMeasures
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
nonConv <- !sapply(mods, function(fit) {
any(sapply(fit@convergence, "[", i = "converged"))
})
}
if (all(nonConv)) {
stop('No models converged')
} else if (any(nonConv)) {
message('The following models did not converge, so they are ignored:\n',
paste(names(nonConv)[nonConv], collapse = ",\t"))
mods <- mods[which(!nonConv)]
}
## grab lavaan.mi options, if relevant
if (inherits(mods[[1]], "lavaan.mi")) {
if (is.null(argsLRT$pool.robust)) {
pool.robust <- formals(lavaan.mi::lavTestLRT.mi)$pool.robust # default value
} else {
pool.robust <- argsLRT$pool.robust # user-specified value
}
if (is.null(argsLRT$pool.method)) {
pool.method <- eval(formals(lavaan.mi::lavTestLRT.mi)$pool.method) # default value
} else {
pool.method <- argsLRT$pool.method # user-specified value
}
}
## FIT INDICES
if (indices || moreIndices) {
if (inherits(mods[[1]], "lavaan.mi")) {
fitList <- lapply(mods, fitMeasures, baseline.model = baseline.model,
## extra arguments about pooling
pool.robust = pool.robust, pool.method = pool.method)
} else {
## must be a lavaan-class object
fitList <- lapply(mods, fitMeasures, baseline.model = baseline.model)
if (moreIndices) {
moreFitList <- lapply(mods, moreFitIndices, nPrior = nPrior)
fitList <- mapply(c, fitList, moreFitList, SIMPLIFY = FALSE)
}
}
if (length(unique(sapply(fitList, length))) > 1L) {
warning('fitMeasures() returned vectors of different lengths for different',
' models, probably because certain options are not the same. Check',
' lavInspect(fit, "options")[c("estimator","test","meanstructure")]',
' for each model, or run fitMeasures() on each model to investigate.')
indexList <- lapply(fitList, names)
useNames <- names(which(table(unlist(indexList)) == length(fitList)))
fitList <- lapply(fitList, "[", i = useNames)
}
fit <- as.data.frame(do.call(rbind, fitList))
} else {
## No fit indices requested, but still call fitMeasures() for df?
if (inherits(mods[[1]], "lavaan.mi")) {
#FIXME: This seems terribly inefficient. Only need df to sort below.
fitList <- lapply(mods, fitMeasures, fit.measures = "df",
pool.robust = pool.robust, pool.method = pool.method)
} else {
fitList <- lapply(mods, fitMeasures, fit.measures = "df")
}
## check for scaled tests
nDF <- sapply(fitList, length)
if (any(nDF != nDF[1])) stop('Some (but not all) models have robust tests,',
' so they cannot be compared as nested models.')
## started failing with R 4.6.0:
# fit <- data.frame(df = sapply(fitList, "[", i = if (any(nDF > 1L)) 2L else 1L))
fit <- data.frame(df = mapply(function(x, i) x[[i]],
x = fitList,
i = ifelse(nDF > 1L, 2L, 1L)))
}
## order models by increasing df (least-to-most constrained)
ord <- order(fit$df) #FIXME: what if test == "mean.var.adjusted"?
fit <- fit[ord, , drop = FALSE]
mods <- mods[ord]
## TEST STATISTICS
if (nested) {
argsLRT$model.names <- names(mods)
argsLRT$object <- mods[[1]]
if (inherits(mods[[1]], "lavaan")) {
nestedout <- do.call(lavTestLRT, c(mods[-1], argsLRT))
} else if (inherits(mods[[1]], "lavaan.mi")) {
nestedout <- do.call(lavaan.mi::lavTestLRT.mi, c(mods[-1], argsLRT))
}
## not nested
} else nestedout <- data.frame()
## DIFFERENCES IN FIT INDICES
if (indices && length(names(mods)) > 1L) {
fitSubset <- colnames(fit)[!grepl(pattern = "chisq|pvalue|ntotal",
x = colnames(fit))]
fitTab <- fit[ , fitSubset, drop = FALSE]
diffTab <- as.data.frame(do.call(cbind, lapply(fitTab, diff)))
rownames(diffTab) <- paste(names(mods)[-1], "-", names(mods)[-length(names(mods))])
} else diffTab <- data.frame(df = diff(fit$df))
new("FitDiff", name = names(mods), model.class = modClass,
nested = nestedout, fit = fit, fit.diff = diffTab)
}
## ----------------
## Hidden Functions
## ----------------
noLeadingZero <- function(vec, fmt, nd = 3L) {
out <- sprintf(fmt, vec)
upper.limit <- paste0(".", paste(rep(9, nd - 1L), collapse = ""), "5")
used <- vec < as.numeric(upper.limit) & vec >= 0
used[is.na(used)] <- FALSE
out[used] <- substring(out[used], 2)
out
}
tagCharacter <- function(vec, char = "\u2020", minvalue = NA,
print_integer = FALSE, nd = 3L) {
char <- if (is.null(char)) as.character(NA) else as.character(char)[1]
if (nchar(char) != 1L) {
message('Only a single character can be used to tag= preferred models, so ',
'no tags were added. To omit tags, specify tag=NULL or tag=NA.')
char <- as.character(NA)
}
if (print_integer) {
vec <- noLeadingZero(vec, fmt = "%.0f", nd = nd)
} else if (is.na(minvalue)) {
vec <- noLeadingZero(vec, fmt = paste0("%.", nd, "f"), nd = nd)
} else {
target <- if (minvalue) min(vec, na.rm = TRUE) else max(vec, na.rm = TRUE)
tag <- rep(" ", length(vec))
if (!is.na(char)) tag[vec == target] <- char
vec <- noLeadingZero(vec, fmt = paste0("%.", nd, "f"), nd = nd)
vec <- paste0(vec, tag)
}
vec
}
getFitSummary <- function(object, fit.measures = "default", return.diff = FALSE) {
if (is.null(fit.measures)) fit.measures <- colnames(object@fit)
if ("all" %in% fit.measures) fit.measures <- colnames(object@fit)
if (length(fit.measures) == 1 && fit.measures == "default") {
## robust or scaled test statistics?
if (is.null(object@fit$cfi.scaled)) {
fit.measures <- c("chisq","df","pvalue","rmsea","cfi","tli","srmr")
} else if (all(!is.na(object@fit$cfi.robust)) && !is.null(object@fit$cfi.robust)) {
fit.measures <- c("chisq.scaled","df.scaled","pvalue.scaled",
"rmsea.robust","cfi.robust","tli.robust","srmr")
} else {
fit.measures <- c("chisq.scaled","df.scaled","pvalue.scaled",
"rmsea.scaled","cfi.scaled","tli.scaled","srmr")
}
if ("aic" %in% colnames(object@fit)) {
fit.measures <- c(fit.measures, "aic", "bic")
}
}
## chi-squared difference test already reported, so remove (diff in p-value)
if (return.diff) {
fit.measures <- fit.measures[!grepl(pattern = "chisq|pvalue|ntotal",
x = fit.measures)]
}
## return numeric values
fitTab <- object@fit[ , colnames(object@fit) %in% fit.measures, drop = FALSE]
if (!return.diff) return(fitTab)
## or return differences in fit indices
diffTab <- as.data.frame(do.call(cbind, lapply(fitTab, diff)))
rownames(diffTab) <- paste(object@name[-1], "-", object@name[-length(object@name)])
diffTab
}
semTools/R/discriminantValidity.R 0000644 0001762 0000144 00000022453 14753073506 016615 0 ustar ligges users ### Mikko Rönkkö (Roxygen edits by TDJ)
### Last updated: 12 February 2025
##' Calculate discriminant validity statistics
##'
##' Calculate discriminant validity statistics based on a fitted lavaan object
##'
##' Evaluated on the measurement scale level, discriminant validity is commonly
##' evaluated by checking if each pair of latent correlations is sufficiently
##' below one (in absolute value) that the latent variables can be thought of
##' representing two distinct constructs.
##'
##' `discriminantValidity` function calculates two sets of statistics that
##' are commonly used in discriminant validity evaluation. The first set are
##' factor correlation estimates and their confidence intervals. The second set
##' is a series of nested model tests, where the baseline model is compared
##' against a set of constrained models that are constructed by constraining
##' each factor correlation to the specified cutoff one at a time.
##'
##' The function assume that the `object` is set of confirmatory
##' factor analysis results where the latent variables are scaled by fixing their
##' variances to 1s. If the model is not a CFA model, the function will calculate
##' the statistics for the correlations among exogenous latent variables, but
##' for the *residual* variances with endogenous variables. If the
##' latent variables are scaled in some other way (e.g. fixing the first loadings),
##' the function issues a warning and re-estimates the model by fixing latent
##' variances to 1 (and estimating all loadings) so that factor covariances are
##' already estimated as correlations.
##'
##' The likelihood ratio tests are done by comparing the original baseline model
##' against more constrained alternatives. By default, these alternatives are
##' constructed by fixing each correlation at a time to a cutoff value. The
##' typical purpose of this test is to demonstrate that the estimated factor
##' correlation is well below the cutoff and a significant \eqn{chi^2} statistic
##' thus indicates support for discriminant validity. In some cases, the original
##' correlation estimate may already be greater than the cutoff, making it
##' redundant to fit a "restricted" model. When this happens, the likelihood
##' ratio test will be replaced by comparing the baseline model against itself.
##' For correlations that are estimated to be negative, a negation of the cutoff
##' is used in the constrained model.
##'
##' Another alternative is to do a nested model comparison against a model where
##' two factors are merged as one by setting the `merge` argument to
##' `TRUE`. In this comparison, the constrained model is constructed by
##' removing one of the correlated factors from the model and assigning its
##' indicators to the factor that remains in the model.
##'
##'
##' @importFrom lavaan lavInspect lavNames parTable
##'
##' @param object The [lavaan::lavaan-class] model object returned by
##' the [lavaan::cfa()] function.
##' @param cutoff A cutoff to be used in the constrained models in likelihood
##' ratio tests.
##' @param merge Whether the constrained models should be constructed by merging
##' two factors as one. Implies `cutoff` = 1.
##'
##' @inheritParams lavaan::parameterEstimates
##'
##' @return A `data.frame` of latent variable correlation estimates, their
##' confidence intervals, and a likelihood ratio tests against constrained models.
##' with the following attributes:
##' \describe{
##' \item{baseline}{The baseline model after possible rescaling.}
##' \item{constrained}{A `list` of the fitted constrained models
##' used in the likelihood ratio test.}
##' }
##'
##' @author
##' Mikko Rönkkö (University of Jyväskylä; \email{mikko.ronkko@jyu.fi}):
##' @references
##'
##' Rönkkö, M., & Cho, E. (2022). An updated guideline for assessing
##' discriminant validity. *Organizational Research Methods*, 25(1), 6–14.
##' \doi{10.1177/1094428120968614}
##'
##' @examples
##'
##' library(lavaan)
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' fit <- cfa(HS.model, data = HolzingerSwineford1939)
##' discriminantValidity(fit)
##' discriminantValidity(fit, merge = TRUE)
##'
##' @export
discriminantValidity <- function(object, cutoff = .9, merge = FALSE, level = .95,
boot.ci.type = "perc") {
free <- lavInspect(object, "free", add.class = FALSE)
#FIXME: adapt for multiple blocks by looping over groups/levels
if (lavInspect(object, "ngroups") > 1L | lavInspect(object, "nlevels") > 1L)
stop("Only implemented for single-group, single-level models so far.")
# Identify the latent variables that we will use
lvs <- lavNames(object,"lv")
if (cutoff <= 0 | cutoff > 1) stop("The cutoff must be between (0,1]")
if (merge & !missing(cutoff) & cutoff != 1)
message("Merging factors imply constraining factor correlation to 1. ",
"Cutoff will be ignored.")
if (length(lvs) == 0) stop("The model does not have any exogenous latent variables.")
if (length(lvs) == 1) stop("The model has only one exogenous latent variable. ",
"At least two are required for assessing discriminant validity.")
if (length(lavNames(object, "lv.y")) > 0)
warning("The model has at least one endogenous latent variable (",
paste(lavNames(object, "lv.y"), collapse = ", "),
"). The correlations of these variables will be estimated after ",
"conditioning on their predictors.")
# Extract the part of psi that contains latent variables
psi <- free$psi[lvs,lvs]
# Identify exogenous variances and covariances
pt <- parTable(object)
varIndices <- which(pt$lhs == pt$rhs & pt$lhs %in% lvs & pt$op == "~~")
covIndices <- which(pt$lhs != pt$rhs & pt$lhs %in% lvs & pt$rhs %in% lvs & pt$op == "~~")
# Check that the diagonal of psi is all zeros
if (any(diag(psi) != 0)) {
message("Some of the latent variable variances are estimated instead of ",
"fixed to 1. The model is re-estimated by scaling the latent ",
"variables by fixing their variances and freeing all factor loadings.")
# Identify free exogenous variances
i <- intersect(varIndices,which(pt$free != 0))
pt$free[i] <- 0
pt$ustart[i] <- 1
pt$user[i] <- 1
# Free all factor loadings corresponding of lvs where the covariances were
# just freed
i <- which(pt$lhs %in% pt$lhs[i] & pt$op == "=~")
pt$free[i] <- -1
pt$ustart[i] <- NA
# Update parameter numbers
i <- which(pt$free != 0)
pt$free[i] <- seq_along(i)
object <- lavaan::update(object, model = pt[,1:12]) # Leave out starting
# values, estimates, and
# ses from pt
# Update pt based on the new model
pt <- parTable(object)
}
# At this point we can be sure that all exogenous variances are fixed instead
# of being estimated. We need to still check that they are fixed to 1s
est <- lavInspect(object,"est")$psi[lvs,lvs]
if (any(diag(est) != 1)) {
message("Some of the latent variable variances are fixed to values other ",
"than 1. The model is re-estimated by scaling the latent variables",
" based on the first factor loading.")
# constrain the exogenous variances to 1
pt$ustart[varIndices] <- 1
object <- lavaan::update(object, model = pt[,1:12]) # Leave out starting
# values, estimates, and
# ses from pt
# Update pt based on the new estimates
pt <- parTable(object)
}
# At this point we can be sure that all exogenous LVs have their variances
# fixed to ones and can start constructing the matrix to be returned
ret <- lavaan::parameterEstimates(object, ci = TRUE,
level = level,
boot.ci.type = boot.ci.type)[covIndices,
c("lhs","op","rhs","est",
"ci.lower","ci.upper")]
rownames(ret) <- seq_len(nrow(ret))
# Add the chi^2 test to all correlation pairs
constrainedModels <- lapply(covIndices, function(i) {
thisPt <- pt
if (merge) {
lhs <- pt$lhs[i]
rhs <- pt$rhs[i]
# Merge the factors by assigning indicator of lhs to rhs
thisPt$lhs[thisPt$lhs == lhs & thisPt$op == "=~"] <- rhs
# Then remove all other parameters concering lhs
thisPt <- thisPt[!(thisPt$lhs == lhs | thisPt$rhs == lhs), ]
thisPt$id <- seq_len(nrow(thisPt))
} else {
# If the correlation is estimated to be greater than the cuttof, constrain
# it to the estimated alue
if (abs(pt$est[i]) > cutoff) {
thisCutoff <- pt$est[i]
} else {
thisCutoff <- ifelse(pt$est[i] < 0, - cutoff, cutoff)
}
thisPt$free[i] <- 0
thisPt$ustart[i] <- thisCutoff
}
# Update parameter numbers
j <- which(thisPt$free != 0)
thisPt$free[j] <- seq_along(j)
lavaan::update(object, model = thisPt[,1:12],
se = "none") # Disable SEs to save computational time
})
lrTests <- lapply(constrainedModels, function(constrained) {
lavaan::lavTestLRT(object,constrained)[2,] # Return the second row of the
# test
})
ret <- cbind(ret,do.call(rbind,lrTests))
# Store the baseline model
attr(ret,"baseline") <- object
attr(ret,"constrained") <- constrainedModels
ret
}
semTools/R/NET.R 0000644 0001762 0000144 00000031213 14764263766 013056 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 12 March 2025
### semTools functions for Nesting and Equivalence Testing
## -----------------
## Class and Methods
## -----------------
##' Class For the Result of Nesting and Equivalence Testing
##'
##' This class contains the results of nesting and equivalence testing among
##' multiple models
##'
##'
##' @name Net-class
##' @aliases Net-class show,Net-method summary,Net-method
##' @docType class
##'
##' @slot test Logical `matrix` indicating nesting/equivalence among models
##' @slot df The degrees of freedom of tested models
##'
##' @section Objects from the Class: Objects can be created via the
##' [net()] function.
##'
##' @param object An object of class `Net`.
##'
##' @return
##' \item{show}{`signature(object = "Net")`: prints the logical matrix of
##' test results. `NA` indicates a model did not converge.}
##' \item{summary}{`signature(object = "Net")`: prints a narrative
##' description of results. The original `object` is invisibly returned.}
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso [net()]
##'
##' @examples
##'
##' # See the example in the net function.
##'
setClass("Net", representation(test = "matrix", df = "vector"))
##' @rdname Net-class
##' @aliases show,Net-method
##' @export
setMethod("show", "Net",
function(object) {
if (length(object@test)) {
m <- as.matrix(unclass(object@test))
m[upper.tri(m, diag = TRUE)] <- ""
cat("
If cell [R, C] is TRUE, the model in row R is nested within column C.
If the models also have the same degrees of freedom, they are equivalent.
NA indicates the model in column C did not converge when fit to the
implied means and covariance matrix from the model in row R.
The hidden diagonal is TRUE because any model is equivalent to itself.
The upper triangle is hidden because for models with the same degrees
of freedom, cell [C, R] == cell [R, C]. For all models with different
degrees of freedom, the upper diagonal is all FALSE because models with
fewer degrees of freedom (i.e., more parameters) cannot be nested
within models with more degrees of freedom (i.e., fewer parameters).
\n")
print(m, quote = FALSE)
} else {
cat(data.class(object@test), "(0)\n", sep = "")
}
invisible(object)
})
##' @rdname Net-class
##' @aliases summary,Net-method
##' @export
setMethod("summary", "Net",
function(object) {
DFs <- object@df
x <- object@test
mods <- colnames(x)
## keep track of how many are printed
nPrinted <- 0L
for (R in 2:nrow(x)) {
for (C in (R - 1):1) {
## if model didn't converge (logical value is missing), go to next iteration
if (is.na(x[R, C])) next
## if the models are not nested, go to next iteration
if (!x[R, C]) next
## choose message based on whether models are equivalent or nested
if (identical(DFs[R], DFs[C])) {
rel <- "equivalent to"
nPrinted <- nPrinted + 1L
} else {
rel <- "nested within"
nPrinted <- nPrinted + 1L
}
cat("Model \"", mods[R], "\" is ", rel, " model \"", mods[C], "\"\n", sep = "")
}
}
if (nPrinted == 0L) cat('No models were determined as nested/equivalent.\n')
invisible(object)
})
## --------------------
## Constructor Function
## --------------------
##' Nesting and Equivalence Testing
##'
##' This test examines whether pairs of SEMs are nested or equivalent.
##'
##' The concept of nesting/equivalence should be the same regardless of
##' estimation method. However, the particular method of testing
##' nesting/equivalence (as described in Bentler & Satorra, 2010) employed by
##' the `net` function analyzes summary statistics (model-implied means and
##' covariance matrices, not raw data). In the case of robust methods like MLR,
##' the raw data is only utilized for the robust adjustment to SE and chi-sq,
##' and the net function only checks the unadjusted chi-sq for the purposes of
##' testing nesting/equivalence. This method also applies to models for
##' categorical data, following the procedure described by Asparouhov & Muthen
##' (2019).
##'
##'
##' @importFrom lavaan lavInspect
##'
##' @param \dots The `lavaan` objects used for test of nesting and
##' equivalence
##' @param crit The upper-bound criterion for testing the equivalence of models.
##' Models are considered nested (or equivalent) if the difference between
##' their \eqn{\chi^2} fit statistics is less than this criterion.
##'
##' @return The [Net-class] object representing the outputs for nesting
##' and equivalent testing, including a logical matrix of test results and a
##' vector of degrees of freedom for each model.
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##'
##' Bentler, P. M., & Satorra, A. (2010). Testing model nesting and equivalence.
##' *Psychological Methods, 15*(2), 111--123. \doi{10.1037/a0019625}
##'
##' Asparouhov, T., & Muthen, B. (2019). Nesting and equivalence testing for
##' structural equation models. *Structural Equation Modeling, 26*(2),
##' 302--309. \doi{10.1080/10705511.2018.1513795}
##'
##' @examples
##'
##' m1 <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##'
##' m2 <- ' f1 =~ x1 + x2 + x3 + x4
##' f2 =~ x5 + x6 + x7 + x8 + x9 '
##'
##' m3 <- ' visual =~ x1 + x2 + x3
##' textual =~ eq*x4 + eq*x5 + eq*x6
##' speed =~ x7 + x8 + x9 '
##'
##' fit1 <- cfa(m1, data = HolzingerSwineford1939)
##' fit1a <- cfa(m1, data = HolzingerSwineford1939, std.lv = TRUE) # Equivalent to fit1
##' fit2 <- cfa(m2, data = HolzingerSwineford1939) # Not equivalent to or nested in fit1
##' fit3 <- cfa(m3, data = HolzingerSwineford1939) # Nested in fit1 and fit1a
##'
##' \donttest{
##' tests <- net(fit1, fit1a, fit2, fit3)
##' tests
##' summary(tests)
##' }
##'
##' @export
net <- function(..., crit = .0001) {
## put fitted objects in a list
fitList <- list(...)
## check that they are all lavaan objects
notLavaan <- !sapply(fitList, inherits, what = "lavaan")
if (any(notLavaan)) {
fitNames <- sapply(as.list(substitute(list(...)))[-1], deparse)
stop(paste("The following arguments are not fitted lavaan objects:\n",
paste(fitNames[notLavaan], collapse = "\t")))
}
## remove any that did not converge
nonConv <- !sapply(fitList, lavInspect, what = "converged")
if (all(nonConv)) {
stop('No models converged')
} else if (any(nonConv)) {
fitNames <- sapply(as.list(substitute(list(...)))[-1], deparse)
message('The following models did not converge, so they are ignored:\n',
paste(fitNames[nonConv], collapse = ",\t"))
fitList <- fitList[which(!nonConv)]
}
## check for meanstructure
meanstructure <- sapply(fitList, function(x) lavInspect(x, "options")$meanstructure)
if (!(all(meanstructure) || !any(meanstructure)))
stop('Some (but not all) fitted lavaan objects include a mean structure. ',
'Please re-fit all models with the argument meanstructure=TRUE.')
## get degrees of freedom for each model
DFs <- sapply(fitList, function(x) lavInspect(x, "fit")["df"])
## name according to named objects, with DF in parentheses
fitNames <- names(fitList)
dotNames <- sapply(as.list(substitute(list(...)))[-1], deparse)
if (any(nonConv)) dotNames <- dotNames[which(!nonConv)]
if (is.null(names(fitList))) {
fitNames <- dotNames
} else {
noName <- which(fitNames == "")
fitNames[noName] <- dotNames[noName]
}
names(fitList) <- paste(fitNames, " (df = ", DFs, ")", sep = "")
## sort list according to DFs
fitList <- fitList[order(DFs)]
fitNames <- fitNames[order(DFs)]
orderedDFs <- DFs[order(DFs)]
## create structure for sequence of tests (logical matrix), FALSE by default
nestMat <- matrix(FALSE, length(fitList), length(fitList),
dimnames = list(names(fitList), fitNames))
diag(nestMat) <- TRUE # every model is equivalent with itself
## Loop through sorted models in sequence of most to least restricted model
for (R in 2:nrow(nestMat)) {
for (C in (R - 1):1) {
## test for nesting/equivalence
nestMat[R, C] <- x.within.y(x = fitList[[R]], y = fitList[[C]], crit = crit)
## if models are equivalent, set above-diagonal value to TRUE
if (identical(orderedDFs[R], orderedDFs[C])) nestMat[C, R] <- nestMat[R, C]
if (C == 1) next # to prevent the next 2 tests from returning an error
## if model didn't converge (logical value is missing), go to next iteration
if (is.na(nestMat[R, C]) | is.na(nestMat[R - 1, C - 1])) next
## check whether nesting is implied, to skip unnecessary tests
if (nestMat[R, C] & nestMat[R - 1, C - 1]) {
nestMat[R, C - 1] <- TRUE
next
}
}
}
out <- new("Net", test = nestMat, df = orderedDFs)
out
}
## --------------------------------------------------------------------
## Hidden Function to test whether model "x" is nested within model "y"
## --------------------------------------------------------------------
#' @importFrom lavaan lavInspect lavNames parTable
x.within.y <- function(x, y, crit = .0001) {
if (!lavInspect(x, "converged")) return(NA)
if (!lavInspect(y, "converged")) return(NA)
## not possible for clustered data
if (length(lavInspect(x, "cluster")) || length(lavInspect(y, "cluster")))
stop('The net() function does not work with models for clustered data.')
## not currently implemented unless all variables are considered random
exoX <- lavInspect(x, "options")$fixed.x & length(lavNames(x, "ov.x"))
exoY <- lavInspect(y, "options")$fixed.x & length(lavNames(y, "ov.x"))
if (exoX | exoY) {
stop(c("The net() function does not work with exogenous variables.\n",
"Fit the model again with 'fixed.x = FALSE'"))
}
## variable names
Xnames <- sort(lavNames(x))
Ynames <- sort(lavNames(y))
if (!identical(Xnames, Ynames))
stop("Models do not contain the same variables")
## check that the analyzed data matches
xData <- sort(unlist(lavInspect(x, "sampstat")))
yData <- sort(unlist(lavInspect(y, "sampstat")))
names(xData) <- NULL
names(yData) <- NULL
if (!isTRUE(all.equal(xData, yData, tolerance = crit)))
stop("Sample statistics differ. Models must apply to the same data")
#FIXME: this method requires raw data
# xData <- lavInspect(x, "data")
# if (is.list(xData)) xData <- do.call(rbind, xData)
# xData <- xData[ , order(Xnames)]
# yData <- lavInspect(y, "data")
# if (is.list(yData)) yData <- do.call(rbind, yData)
# yData <- yData[ , order(Ynames)]
# if (!identical(xData, yData)) stop("Models must apply to the same data")
## check degrees of freedom support nesting structure
if (lavInspect(x, "fit")["df"] < lavInspect(y, "fit")["df"])
stop("x cannot be nested within y because y is more restricted than x")
## check sample sizes
N <- lavInspect(x, "nobs")
if (!all(N == lavInspect(y, "nobs"))) {
stop("Sample sizes differ. Models must apply to the same data")
}
## model-implied moments
Sigma <- lavInspect(x, "cov.ov")
nBlocks <- if (is.list(Sigma)) length(Sigma) else 1L
## mean structure?
Mu <- lavInspect(x, "mean.ov")
if (nBlocks == 1L) {
if (!length(Mu)) Mu <- NULL
} else {
if (all(sapply(Mu, length) == 0)) Mu <- NULL
}
## thresholds?
Thr <- lavInspect(x, "thresholds")
if (nBlocks == 1L) {
if (!length(Thr)) Thr <- NULL
} else {
if (all(sapply(Thr, length) == 0)) Thr <- NULL
}
if (!is.null(Thr)) attr(Thr, "th.idx") <- lavInspect(x, "th.idx")
## If DWLS, extract WLS.V and NACOV
estimator <- lavInspect(x, "options")$estimator
if (estimator == "DWLS") {
WLS.V <- lavInspect(x, "WLS.V")
NACOV <- lavInspect(x, "gamma")
#TODO: check against same output from y
} else {
WLS.V <- NULL
NACOV <- NULL
}
## fit model and check that chi-squared < crit
PT <- parTable(y)
PT$start <- PT$est
PT$est <- PT$se <- NULL
CALL <- lavInspect(y, "call")
CALL$model <- PT
CALL$data <- NULL
CALL$sample.cov <- Sigma
CALL$sample.mean <- Mu
CALL$sample.nobs <- N
CALL$sample.th <- Thr
CALL$estimator <- estimator
CALL$WLS.V <- WLS.V
CALL$NACOV <- NACOV
CALL$se <- "none" # to save time
CALL$test <- "standard"
CALL$group.label <- NULL # unnecessary, and causes an error when is.integer()
suppressWarnings(try( newFit <- eval(as.call(CALL)) ))
if (!lavInspect(newFit, "converged")) return(NA) else {
result <- lavInspect(newFit, "fit")[["chisq"]] < crit
if (lavInspect(x, "fit")["df"] ==
lavInspect(y, "fit")["df"]) return(c(Equivalent = result))
}
c(Nested = result)
}
semTools/R/splitSample.R 0000644 0001762 0000144 00000013304 14632016456 014710 0 ustar ligges users ### Corbin Quick
### Last updated: 4 April 2017
#' Randomly Split a Data Set into Halves
#'
#' This function randomly splits a data set into two halves, and saves the
#' resulting data sets to the same folder as the original.
#'
#' This function randomly orders the rows of a data set, divides the data set
#' into two halves, and saves the halves to the same folder as the original
#' data set, preserving the original formatting. Data set type (*.csv or *.dat)
#' and formatting (headers) are automatically detected, and output data sets
#' will preserve input type and formatting unless specified otherwise. Input
#' can be in the form of a file path (*.dat or *.csv), or an R object (matrix or
#' dataframe). If input is an R object and path is default, output data sets
#' will be returned as a list object.
#'
#'
#' @importFrom stats runif
#'
#' @param dataset The original data set to be divided. Can be a file path to a
#' *.csv or *.dat file (headers will automatically be detected) or an R object
#' (matrix or dataframe). (Windows users: file path must be specified using
#' FORWARD SLASHES (`/`) ONLY.)
#' @param path File path to folder for output data sets. NOT REQUIRED if
#' dataset is a filename. Specify ONLY if dataset is an R object, or desired
#' output folder is not that of original data set. If path is specified as
#' "object", output data sets will be returned as a list, and not saved to hard
#' drive.
#' @param div Number of output data sets. NOT REQUIRED if default, 2 halves.
#' @param type Output file format ("dat" or "csv"). NOT REQUIRED unless desired
#' output formatting differs from that of input, or dataset is an R object and
#' csv formatting is desired.
#' @param name Output file name. NOT REQUIRED unless desired output name
#' differs from that of input, or input dataset is an R object. (If input is an
#' R object and name is not specified, name will be "splitSample".)
#' @return If `path = "object"`, `list` of output data sets.
#' Otherwise, output will saved to hard drive in the same format as input.
#' @author Corbin Quick (University of Michigan; \email{corbinq@@umich.edu})
#' @examples
#'
#' #### Input is .dat file
#' #splitSample("C:/Users/Default/Desktop/MYDATA.dat")
#' #### Output saved to "C:/Users/Default/Desktop/" in .dat format
#' #### Names are "MYDATA_s1.dat" and "MYDATA_s2.dat"
#'
#' #### Input is R object
#' ## Split C02 dataset from the datasets package
#' library(datasets)
#' splitMyData <- splitSample(CO2, path = "object")
#' summary(splitMyData[[1]])
#' summary(splitMyData[[2]])
#' #### Output object splitMyData becomes list of output data sets
#'
#' #### Input is .dat file in "C:/" folder
#' #splitSample("C:/testdata.dat", path = "C:/Users/Default/Desktop/", type = "csv")
#' #### Output saved to "C:/Users/Default/Desktop/" in *.csv format
#' #### Names are "testdata_s1.csv" and "testdata_s2.csv"
#'
#' #### Input is R object
#' #splitSample(myData, path = "C:/Users/Default/Desktop/", name = "splitdata")
#' #### Output saved to "C:/Users/Default/Desktop/" in *.dat format
#' #### Names are "splitdata_s1.dat" and "splitdata_s2.dat"
#'
#' @export
splitSample <- function(dataset, path = "default", div = 2,
type = "default", name = "splitSample") {
type1 <- type
hea = FALSE
file <- dataset
if (is.character(file)) {
temp <- strsplit(file, "/", fixed = TRUE)
if (path == "default") {
path <- paste(temp[[1]][1:(length(temp[[1]]) - 1)], "/",
sep = "", collapse = "")
}
fileN <- temp[[1]][length(temp[[1]])]
temp <- strsplit(fileN, ".", fixed = TRUE)
type <- temp[[1]][2]
name <- temp[[1]][1]
if (type == "dat") {
if (is.numeric(as.matrix(utils::read.table(file, nrows = 1))) == FALSE) {
data <- as.matrix(utils::read.table(file, header = TRUE))
hea = TRUE
} else {
data <- as.matrix(utils::read.table(file))
}
}
if (type == "csv") {
if (is.numeric(as.matrix(utils::read.table(file, nrows = 1))) == FALSE) {
data <- as.matrix(utils::read.csv(file, header = TRUE))
hea = TRUE
} else {
data <- as.matrix(utils::read.csv(file))
}
}
} else {
if (is.matrix(file) | is.data.frame(file)) {
data <- as.matrix(file)
} else {
stop("Provide data in *.dat or *.csv format")
}
}
if (type1 != "default") {
type <- type1
}
if (is.character(colnames(data))) {
hea = TRUE
}
random <- runif(nrow(data), 1, nrow(data))
data <- cbind(random, data)
data <- data[order(random), ]
data <- data[, 2:ncol(data)]
size <- split((1:nrow(data)), cut((1:nrow(data)), div, labels = FALSE))
size <- as.matrix(as.data.frame(lapply(size, length)))
dataL <- list()
dataL[[1]] <- data[1:size[1, 1], ]
for (i in 2:div) {
size[1, i] <- size[1, (i - 1)] + size[1, i]
dataL[[i]] <- data[(size[1, (i - 1)] + 1):size[1, i], ]
}
if (path == "default") {
return(dataL)
} else {
if (path == "object") {
return(dataL)
} else {
for (i in 1:div) {
if (type == "dat") {
utils::write.table(dataL[[i]],
paste(path, name, "_s", i, ".dat", sep = ""),
sep = " ", row.names = FALSE, col.names = hea)
}
if (type == "csv") {
utils::write.table(dataL[[i]],
paste(path, name, "_s", i, ".csv", sep = ""),
sep = ",", row.names = FALSE, col.names = hea)
}
if (type == "default") {
utils::write.table(dataL[[i]],
paste(path, name, "_s", i, ".dat", sep = ""),
sep = " ", row.names = FALSE, col.names = hea)
}
}
}
}
}
semTools/R/aa_semTools-deprecated.R 0000644 0001762 0000144 00000001011 14632016456 016747 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated 25 August 2018
### automatically create documentation for "deprecated" help page
#' @title Deprecated functions in package \pkg{semTools}.
#' @description The functions listed below are deprecated and will be defunct
#' in the near future. When possible, alternative functions with similar
#' functionality are also mentioned. Help pages for deprecated functions are
#' available at `help("semTools-deprecated")`.
#' @name semTools-deprecated
#' @keywords internal
NULL
semTools/R/reliability.R 0000644 0001762 0000144 00000424107 15144013656 014731 0 ustar ligges users ### Terrence D. Jorgensen
### - omegaCat() and deprecated functionality: Sunthud Pornprasertmanit
### Last updated: 14 February 2026
## -----------------------------
## AVE()
## average variance extracted
## (not a reliability index)
## -----------------------------
##' Calculate average variance extracted
##'
##' Calculate average variance extracted (AVE) per factor from `lavaan` object
##'
##' The average variance extracted (AVE) can be calculated by
##'
##' \deqn{ AVE = \frac{\bold{1}^\prime
##' \textrm{diag}\left(\Lambda\Psi\Lambda^\prime\right)\bold{1}}{\bold{1}^\prime
##' \textrm{diag}\left(\hat{\Sigma}\right) \bold{1}}, }
##'
##' Note that this formula is modified from Fornell & Larcker (1981) in the case
##' that factor variances are not 1. The proposed formula from Fornell & Larcker
##' (1981) assumes that the factor variances are 1. Note that AVE will not be
##' provided for factors consisting of items with dual loadings. AVE is the
##' property of items but not the property of factors. AVE is calculated with
##' polychoric correlations when ordinal indicators are used.
##'
##' @importFrom lavaan lavInspect
##' @importFrom methods getMethod
##'
##' @param object A [lavaan::lavaan-class] or [lavaan.mi::lavaan.mi-class] object,
##' expected to contain only exogenous common factors (i.e., a CFA model).
##' Cross-loadings are not allowed and will result in `NA` for any factor with
##' indicator(s) that cross-load.
##' @param obs.var `logical` indicating whether to compute AVE using
##' observed variances in the denominator. Setting `FALSE` triggers
##' using model-implied variances in the denominator.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##' @param omit.factors `character` vector naming any common factors
##' modeled in `object` whose indicators' AVE is not of interest.
##' @param dropSingle `logical` indicating whether to exclude factors
##' defined by a single indicator from the returned results. If `TRUE`
##' (default), single indicators will still be included in the `total`
##' column when `return.total = TRUE`.
##' @param return.df `logical` indicating whether to return reliability
##' coefficients in a `data.frame` (one row per group/level), which is
##' possible when every model block includes the same factors (after excluding
##' those in `omit.factors` and applying `dropSingle`).
##'
##' @return `numeric` vector of average variance extracted from indicators
##' per factor. For models with multiple "blocks" (any combination of groups
##' and levels), vectors may be returned as columns in a `data.frame`
##' with additional columns indicating the group/level (see `return.df=`
##' argument description for caveat).
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models
##' with unobservable variables and measurement errors. *Journal of
##' Marketing Research, 18*(1), 39--50. \doi{10.2307/3151312}
##'
##' @seealso [compRelSEM()] for composite reliability estimates
##'
##' @examples
##' data(HolzingerSwineford1939)
##' HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
##' HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
##' names(HSbinary) <- c("y7","y8","y9")
##' HS <- cbind(HolzingerSwineford1939, HSbinary)
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ y7 + y8 + y9 '
##'
##' fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
##'
##' ## works for factors with exclusively continuous OR categorical indicators
##' AVE(fit) # uses observed (or unconstrained polychoric/polyserial) by default
##' AVE(fit, obs.var = FALSE)
##'
##'
##' ## works for multigroup models and for multilevel models (and both)
##' data(Demo.twolevel)
##' ## assign clusters to arbitrary groups
##' Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "type1", "type2")
##' model2 <- ' group: type1
##' level: within
##' fac =~ y1 + L2*y2 + L3*y3
##' level: between
##' fac =~ y1 + L2*y2 + L3*y3
##'
##' group: type2
##' level: within
##' fac =~ y1 + L2*y2 + L3*y3
##' level: between
##' fac =~ y1 + L2*y2 + L3*y3
##' '
##' fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
##' AVE(fit2)
##'
##'@export
AVE <- function(object, obs.var = TRUE, omit.imps = c("no.conv","no.se"),
omit.factors = character(0), dropSingle = TRUE,
return.df = TRUE) {
## numbers of blocks
ngroups <- lavInspect(object, "ngroups")
nLevels <- lavInspect(object, "nlevels")
nblocks <- ngroups*nLevels #FIXME: always true?
## labels for groups
if (ngroups > 1L) {
group.label <- lavInspect(object, "group.label")
blk.g.lab <- if (!length(group.label)) paste0("g", 1:ngroups) else group.label
} else {
group.label <- blk.g.lab <- NULL
}
## labels for clusters
if (nLevels > 1L) {
#FIXME? lavInspect(object, "level.label") is always ==
# c("within", lavInspect(object, "cluster"))
PT <- parTable(object)
clus.label <- unique(PT$level)
clus.label <- clus.label[which(clus.label != "")]
clus.label <- clus.label[which(clus.label != 0)]
blk.clus.lab <- if (is.numeric(clus.label)) {
c("within", lavInspect(object, "cluster"))
} else clus.label
} else clus.label <- blk.clus.lab <- NULL
## labels for blocks
if (nblocks > 1L) {
block.label <- paste(rep(blk.g.lab, each = nLevels), blk.clus.lab,
sep = if (ngroups > 1L && nLevels > 1L) "_" else "")
} else block.label <- NULL
## check for categorical
anyCategorical <- lavInspect(object, "categorical")
if (inherits(object, "lavaan")) {
PHI <- lavInspect(object, "cov.lv") # common-factor variance
EST <- lavInspect(object, "est") # to extract loadings
SIGMA <- lavInspect(object, # total variance
ifelse(obs.var, "sampstat", "fitted"))
if (nblocks == 1L) {
PHI <- list(PHI)
LAMBDA <- list(EST$lambda)
SIGMA <- list(SIGMA$cov)
} else {
LAMBDA <- sapply(EST, "[[", i = "lambda", simplify = FALSE)
SIGMA <- sapply(SIGMA, "[[", i = "cov", simplify = FALSE)
}
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
## common-factor variance
phiList <- object@phiList[useImps]
if (nblocks == 1L) for (i in 1:m) phiList[[i]] <- list(phiList[[i]])
PHI <- list()
for (b in 1:nblocks) {
PHI[[b]] <- Reduce("+", lapply(phiList, "[[", i = b) ) / m
}
## loadings
LAMBDA <- vector("list", nblocks)
if (nblocks == 1L) {
lamList <- lapply(object@coefList[useImps], "[[", i = "lambda")
LAMBDA[[1]] <- Reduce("+", lamList) / length(lamList)
} else {
for (b in 1:nblocks) {
lamList <- lapply(object@coefList[useImps], function(i) i[[b]]$lambda)
LAMBDA[[b]] <- Reduce("+", lamList) / length(lamList)
}
}
## total variance
if (obs.var) {
SIGMA <- vector("list", nblocks)
## loop over blocks to pool saturated-model (observed) matrices
for (b in 1:nblocks) {
covList <- lapply(object@h1List[useImps], function(i) i$implied$cov[[b]])
SIGMA[[b]] <- Reduce("+", covList) / m
rownames(SIGMA[[b]]) <- colnames(SIGMA[[b]]) <- lavNames(object, block = b)
}
} else {
## pooled model-implied matrices
if (nblocks == 1L) {
SIGMA <- getMethod("fitted", class(object))(object)["cov"] # retain list format
} else {
SIGMA <- sapply(getMethod("fitted", class(object))(object),
"[[", "cov", simplify = FALSE)
}
}
} # end lavaan vs. lavaan.mi conditional
## scale polychoric/polyserial to modeled LRV scale
if (anyCategorical) {
SDs <- sapply(getScales(object, omit.imps = omit.imps),
FUN = function(x) diag(1 / as.numeric(x)),
simplify = FALSE)
for (b in 1:nblocks) {
dimnames(SDs[[b]]) <- dimnames(SIGMA[[b]])
SIGMA[[b]] <- SDs[[b]] %*% SIGMA[[b]] %*% SDs[[b]]
}
}
avevar <- list()
for (b in 1:nblocks) {
## extract factor and indicator names
LY <- LAMBDA[[b]]
allIndNames <- rownames(LY)
allFacNames <- colnames(LY)
myFacNames <- setdiff(allFacNames, omit.factors)
if (dropSingle) {
multInd <- sapply(myFacNames, function(fn) sum(LY[,fn] != 0) > 1L)
myFacNames <- myFacNames[multInd]
}
subLY <- LY[ , myFacNames, drop = FALSE]
myIndNames <- rownames(subLY)[apply(subLY, 1L, function(x) any(x != 0))]
## check for cross-loadings
Xload <- apply(subLY, 1L, function(x) sum(round(x, 5) != 0) > 1L)
avevar[[b]] <- setNames(rep(NA, length(myFacNames)), nm = myFacNames)
## loop over factors
for (fn in myFacNames) {
idx <- which(subLY[,fn] != 0)
if (any(Xload[idx])) next # cross-loading violates AVE definition
commonVar <- sum(subLY[idx, fn]^2) * PHI[[b]][fn, fn]
avevar[[b]][fn] <- commonVar / sum(diag(SIGMA[[b]])[ myIndNames[idx] ])
}
}
## drop list structure?
if (nblocks == 1L) {
avevar <- avevar[[1]]
class(avevar) <- c("lavaan.vector","numeric")
return(avevar)
} else {
facList <- lapply(avevar, names)
sameNames <- all(sapply(2:nblocks, function(i) {
isTRUE(all.equal(facList[[1]], facList[[i]]))
} ))
if (!(sameNames && return.df)) {
## can't simplify, return as a list
for (i in seq_along(avevar)) class(avevar[[i]]) <- c("lavaan.vector","numeric")
names(avevar) <- block.label
return(avevar)
}
}
## concatenate each factor's AVE across blocks
facRel <- sapply(facList[[1]], simplify = FALSE, FUN = function(nn) {
sapply(avevar, "[[", i = nn, USE.NAMES = FALSE) # extract AVE for factor i
})
if (ngroups > 1L && nLevels > 1L) {
out <- data.frame(group = rep(blk.g.lab, each = nLevels),
level = rep(blk.clus.lab, times = ngroups),
facRel)
} else if (ngroups > 1L) {
out <- data.frame(group = blk.g.lab, facRel)
} else if (nLevels > 1L) {
out <- data.frame(level = blk.clus.lab, facRel)
}
class(out) <- c("lavaan.data.frame","data.frame")
out
}
## ------------
## compRelSEM()
## ------------
##' Composite Reliability using SEM
##'
##' Calculate composite reliability from estimated factor-model parameters
##'
##' Several coefficients for factor-analysis reliability have been termed
##' "omega", which Cho (2021) argues is a misleading misnomer and argues for
##' using \eqn{\rho} to represent them all, differentiated by descriptive
##' subscripts. In our package, we strive to provide unlabeled coefficients,
##' leaving it to the user to decide on a label in their report. But we do
##' use the symbols \eqn{\alpha} and \eqn{\omega} in the formulas below in order
##' to distinguish coefficients that do (not) assume essential tau-equivalence.
##'
##' Bentler (1968) first introduced factor-analysis reliability for a
##' unidimensional factor model with congeneric indicators, labeling the
##' coefficients \eqn{\alpha}. McDonald (1999) later referred to this
##' *and other reliability coefficients*, first as \eqn{\theta} (in 1970),
##' then as \eqn{\omega}, which is a source of confusion when reporting
##' coefficients (Cho, 2021). Coefficients based on factor models were later
##' generalized to account for multidimenisionality (possibly with
##' cross-loadings) and correlated errors. The general \eqn{\omega} formula
##' implemented in this function is:
##'
##' \deqn{\omega=\frac{\bold{w}^{\prime} \Lambda \Phi \Lambda^{\prime} \bold{w}
##' }{ \bold{w}^{\prime} \hat{\Sigma} \bold{w} }, }
##'
##' where \eqn{\hat{\Sigma}} can be the model-implied covariance matrix from
##' either the saturated model (i.e., the "observed" covariance matrix, used by
##' default) or from the hypothesized CFA model, controlled by the `obs.var=`
##' argument. All elements of matrices in the numerator and denominator are
##' effectively summed by the multiplication of the outer terms \eqn{\bold{w}},
##' a \eqn{k}-dimensional vector of composite weights typically consisting of
##' \eqn{\bold{1}}s, unless otherwise specified with the `W=` argument), and
##' \eqn{k} is the number of variables in the composite. Reliability of subscale
##' composites (or simply for separate factors in a joint CFA) can be calculated
##' by setting omitted-indicator weights to 0. For unidimensional constructs
##' with simple structure, the equation above is often simplified to a scalar
##' representation (e.g., McDonald, 1999, Eq. 6.20b):
##'
##' \deqn{ \omega = \frac{ \left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
##' Var\left( \psi \right) }{ \left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
##' Var\left( \psi \right) + \sum^{k}_{i = 1} \theta_{ii} }, }
##'
##' Note that all coefficients are calculated from *total* factor variances:
##' `lavInspect(object, "cov.lv")`, which assumes the fitted `object=` is a CFA,
##' not a full SEM with latent regression slopes. If there is a Beta matrix, it
##' should only contain higher-order factor loadings (see details below).
##'
##'
##' When the fitted CFA imposes constraints consistent with (essential)
##' tau-equivalence, \eqn{\omega} is equivalent to coefficient \eqn{\alpha}
##' (Cronbach, 1951):
##'
##' \deqn{ \alpha = \frac{k}{k - 1}\left[ 1 -
##' \frac{ \textrm{tr} \left( \hat{\Sigma} \right)
##' }{ \bold{1}^{\prime} \hat{\Sigma} \bold{1} }
##' \right],}
##'
##' where \eqn{\textrm{tr} \left( . \right)} is the trace operation (i.e., the
##' sum of diagonal elements). Setting `tau.eq=TRUE` triggers the application of
##' this formula (rather than \eqn{\omega} above) to the model-implied or
##' observed covariance matrix (again controlled by the `obs.var=` argument).
##'
##'
##' **Higher-Order Factors**:
##'
##' For higher-order constructs with latent indicators, only \eqn{\omega} is
##' available because \eqn{\alpha} was not derived from CFA parameters (although
##' it can be expressed in a particular restricted CFA specification).
##'
##' The reliability of a composite that represents a higher-order construct
##' requires partitioning the model-implied factor covariance matrix \eqn{\Phi}
##' in order to isolate the common-factor variance associated only with the
##' higher-order factor. Using a second-order factor model, the model-implied
##' covariance matrix of observed indicators \eqn{\hat{\Sigma}} can be
##' partitioned into 3 sources:
##' \enumerate{
##' \item the second-order common-factor (co)variance:
##' \eqn{\Lambda \bold{B} \Phi_2 \bold{B}^{\prime} \Lambda^{\prime}}
##' \item the residual variance of the first-order common factors (i.e., not
##' accounted for by the second-order factor):
##' \eqn{\Lambda \Psi_{u} \Lambda^{\prime}}
##' \item the measurement error of observed indicators: \eqn{\Theta}
##' }
##'
##' where \eqn{\Lambda} contains first-order factor loadings, \eqn{\bold{B}}
##' contains second-order factor loadings, \eqn{\Phi_2} is the model-implied
##' covariance matrix of the second-order factor(s), and \eqn{\Psi_{u}} is the
##' covariance matrix of first-order factor disturbances. In practice, we can
##' use the full \eqn{\bold{B}} matrix and full model-implied \eqn{\Phi} matrix
##' (i.e., including all latent factors) because the zeros in \eqn{\bold{B}}
##' will cancel out unwanted components of \eqn{\Phi}. Thus, we can calculate
##' the proportion of variance of a composite score that is attributable to the
##' second-order factor:
##'
##' \deqn{\omega=\frac{\bold{w}^{\prime} \Lambda \bold{B} \Phi \bold{B}^{\prime}
##' \Lambda^{\prime} \bold{w} }{ \bold{w}^{\prime} \hat{\Sigma} \bold{w}}, }
##'
##' where \eqn{\bold{w}}, \eqn{\hat{\Sigma}}, and \eqn{k} are defined as above.
##' **Note** that if a higher-order factor also has observed indicators, it is
##' necessary to model the observed indicators as single-indicator lower-order
##' constructs, so that all of the higher-order factor indicators are latent
##' (with loadings in the Beta matrix, not Lambda); otherwise, higher-order
##' factor variance in the observed indicator is not captured in the numerator.
##'
##'
##' **Bifactor or Multitrait--Multimethod (MTMM) Models**:
##'
##' These multidimensional models partition sources of common variance that are
##' due to the factor of interest (e.g., a trait) as well as non-target factors
##' (e.g., "method factors", such as item wording or type of respondent).
##' The latter can be considered as systematic (i.e., non-random) sources of
##' error, to be excluded from the numerator of a reliability coefficient,
##' yielding so-called "hierarchical omega" (\eqn{\omega_\textrm{H}}). On the
##' other hand, non-target variance that can be expected in repeated measurement
##' meets the classical test theory definition of reliability. Including method
##' factors in the numerator yields so-called "omega total"
##' (\eqn{\omega_\textrm{T}}), which is the default approach in `compRelSEM()`
##' because it is consistent with the classical test theory definition of
##' reliability. However, users can obtain \eqn{\omega_\textrm{H}} for a
##' composite by using the `true=` argument to specify any factor(s) to be
##' treated as representing true scores. The same approach can be taken to
##' obtain the proportion of a (sub)scale composite's variance due to method
##' factors (by listing those in `true=`), if that is of interest.
##'
##' **Categorical Indicators**:
##'
##' When all indicators (per composite) are ordinal, a CFA can be fitted that
##' includes a threshold model (sometimes called Item Factor Analysis: IFA),
##' which assumes a normally distributed latent response underlies each observed
##' ordinal response. Despite making this assumption, a composite of ordinal
##' items can only be calculated by assigning numerical values to the ordinal
##' categories, so that the pseudo-numerical variables can be summed into a
##' composite variable that is more approximately continuous than its items.
##'
##' Applying the formulas above to IFA parameters provides the
##' *hypothetical* reliability of a composite of latent responses: a composite
##' which cannot be calculated in practice. Nonetheless, this hypothetical
##' reliability can be interpreted as an estimate of what reliability *could* be
##' if a more approximately continuous response scale were used (e.g., with
##' sufficiently many response categories that the standardized solutions are
##' equivalent between a fitted IFA and a fitted CFA that treats the ordinal
##' responses as numeric; Chalmers, 2018). This can be requested by setting
##' `ord.scale=FALSE`, in which case \eqn{\hat\Sigma} in the formulas above
##' is a *polychoric* correlation matrix.
##' When `ord.scale=FALSE` and `tau.eq=TRUE`, this results in what Zumbo et al.
##' (2007) termed "ordinal \eqn{\alpha}" (see criticisms by Chalmers, 2018, and
##' and a rejoinder by Zumbo & Kroc, 2019).
##'
##' Alternatively, Green and Yang (2009, Eq. 21) derived a method to calculate
##' model-based reliability (\eqn{\omega}) from IFA parameters (i.e.,
##' incorporating the latent-response assumption) but that applies to the actual
##' (i.e., ordinal) observed response scale (the default: `ord.scale=TRUE`).
##' Lu et al. (2020) showed how to incorporate unequal weights into Green and
##' Yang's (2009) formula, so `W=` can be used to estimate the (maximal)
##' reliability of a weighted composite of ordinal variables.
##' However, combining `ord.scale=TRUE` with `tau.eq=TRUE` is not available.
##' For \eqn{\alpha} to be interpretable on the observed ordinal scale,
##' users must choose whether to (a) release the latent-response assumption, by
##' fitting a CFA without a threshold model, or (b) fit an IFA model with
##' constraints consistent with the assumption of (essential) tau-equivalence
##' (i.e., equal factor loadings).
##'
##' No method analogous to Green and Yang (2009, Eq. 21) has yet been proposed
##' to calculate reliability with a mixture of categorical and continuous
##' indicators, so any such composite is skipped with a warning.
##'
##'
##' **Multilevel Measurement Models**:
##'
##' How to define reliability coefficients for scales employed in nested designs
##' is an ongoing topic of methodological development, with some ongoing
##' controversies about best practice when the target of measurement is the
##' "cluster" or between-level (i.e., Level 2 in a 2-level design).
##' Geldhof et al. (2014) proposed applying the standard formulas above to each
##' level's CFA parameters and/or (model-implied) covariance matrix, whereas
##' Lai (2021) proposed different formulas that account for all sources of
##' variance in composites of observed variables.
##'
##' There is no controversy about how to define a within-level reliability,
##' coefficient, which can be interpreted as the reliability of a composite
##' calculated by first centering each indicator around its cluster mean, then
##' calculating the composite from the cluster-mean-centered items. Equivalently
##' (i.e., the same formula), this can be interpreted as the *hypothetical*
##' reliability of a composite of the items' latent Level-1 components. This
##' coefficient can be requested with [lavaan::model.syntax] (to pass to the
##' `W=` argument) that specifies a composite in a Level-1 "block", which not
##' have the same name as any composite in the Level-2 block. If users do not
##' use `W=` (i.e., calculate a reliability index per modeled common factor),
##' then this can be accomplished by using unique factor names across levels.
##'
##' This contrasts with reliability indices for between-level composites:
##' The reliability of a *hypothetical* composite of items' latent between-level
##' components (using formulas proposed by Geldhof et al., 2014) is **not**
##' equivalent to the coefficient for a composite of items' observed cluster
##' means, using generalizations of formulas proposed by Lai (2021):
##'
##' \deqn{ \omega^\textrm{B} =
##' \frac{\bold{w}^{\prime} \Lambda^\textrm{B} \Phi^\textrm{B} \Lambda^{\textrm{B}\prime} \bold{w}
##' }{ \bold{w}^{\prime} \hat{\Sigma}^\textrm{B} \bold{w} +
##' \frac{1}{\tilde{n}_\textrm{clus}} \left(
##' \bold{w}^{\prime} \hat{\Sigma}^\textrm{W} \bold{w} \right) }, }
##'
##' \deqn{ \alpha^\textrm{B} = \frac{2k}{k - 1}\left[
##' \frac{ \sum^{k}_{i=2} \sum^{i-1}_{j=1} \hat\sigma^\textrm{B}_{ij}
##' }{ \bold{1}^{\prime} \hat\Sigma^\textrm{B} \bold{1} +
##' \frac{1}{\tilde{n}_\textrm{clus}} \left(
##' \bold{1}^{\prime} \hat\Sigma^\textrm{W} \bold{1} \right) }
##' \right],}
##'
##' where \eqn{\tilde{n}_\textrm{clus}} is the harmonic-mean cluster size, and
##' superscripts B and W indicate between- and within-level parameters.
##' Obtaining these estimates of composite reliability requires fitting a
##' 2-level CFA that provides the same factor structure and factor names in
##' the models at both levels (following the advice of Jak et al., 2021), as
##' well as the same composite name in both levels/blocks of syntax passed to
##' `W=` (if used). Furthermore, the between-level composite name must be
##' passed to the `shared=` argument; otherwise, the same factor/composite name
##' across levels will yield Lai's (2021) coefficient for a configural construct
##' (see **Examples**):
##'
##' \deqn{ \omega^\textrm{2L} =
##' \frac{\bold{w}^{\prime} \left(
##' \Lambda^\textrm{W} \Phi^\textrm{W} \Lambda^{\textrm{W}\prime} +
##' \Lambda^\textrm{B} \Phi^\textrm{B} \Lambda^{\textrm{B}\prime}
##' \right) \bold{w}
##' }{ \bold{w}^{\prime} \hat\Sigma^\textrm{B} \bold{w} +
##' \bold{w}^{\prime} \hat\Sigma^\textrm{W} \bold{w} }, }
##'
##' \deqn{ \alpha^\textrm{2L} = \frac{2k}{k - 1}\left[
##' \frac{ \sum^{k}_{i=2} \sum^{i-1}_{j=1} \left( \hat\sigma^\textrm{W}_{ij} +
##' \hat\sigma^\textrm{B}_{ij} \right)
##' }{ \bold{1}^{\prime} \hat\Sigma^\textrm{B} \bold{1} +
##' \bold{1}^{\prime} \hat\Sigma^\textrm{W} \bold{1} }
##' \right],}
##'
##' This can be interpreted as the scale-reliability coefficient ignoring the
##' nested design, as both the common-factor variance of the Level-1 factor
##' *and* of its Level-2 cluster means are treated as true-score variance.
##'
##' **Note** that Lai's (2021) between-level reliability coefficients for a
##' `shared` construct quantify generalizability across both indicators and
##' raters (i.e., subjects rating their cluster's construct).
##' Lüdtke et al. (2011) refer to these as measurement error and sampling error,
##' respectively. From this perspective (and following from generalizability
##' theory), an IRR coefficient can also be calculated:
##'
##' \deqn{ \textrm{IRR} =
##' \frac{\bold{w}^{\prime} \left( \hat{\Sigma}^\textrm{B} \right) \bold{w}
##' }{ \bold{w}^{\prime} \hat\Sigma^\textrm{B} \bold{w} +
##' \bold{w}^{\prime} \hat\Sigma^\textrm{W} \bold{w} }, }
##'
##' which quantifies generalizability across rater/sampling-error only, and can
##' be returned for any `shared=` construct's composite by setting `add.IRR=TRUE`.
##'
##'
##'
##'
##' @importFrom lavaan lavInspect lavNames parTable
##' @importFrom methods getMethod
##'
##' @param object A [lavaan::lavaan-class] or [lavaan.mi::lavaan.mi-class] object,
##' expected to contain only exogenous common factors (i.e., a CFA model).
##' @param W Composite weights applied to observed variables prior to summing.
##' By default (`NULL`), unit-weights are applied to all indicators per factor
##' (as well as all modeled indicators when `return.total=TRUE`), which is
##' equivalent to specifying equal weights of *any* value to each indicator.
##' Weights can be a `character` string specifying any number of composites
##' using [lavaan::model.syntax()], in the form `COMPOSITE <~ weight*indicator`
##' (any indicator without a numeric `weight` is given a unit weight = 1).
##' See **Details** and **Examples** about complicated CFAs (e.g., multilevel,
##' higher-order, or bifactor).
##' @param return.total For multidimensional CFAs, this `logical` value
##' indicates whether to return a final index for the reliability of a
##' composite of all modeled indicators (labeled `.TOTAL.`). This is redundant
##' whenever there is already a common factor indicated by all items (e.g.,
##' the general factor in a bifactor model). This argument is ignored when
##' using the `W=` argument to specify composites (optionally with weights).
##' Setting a negative value (e.g., `-1`) returns **only** the `.TOTAL.`
##' composite reliability (i.e., excluding coefficients per factor).
##' @param obs.var `logical` indicating whether to compute reliability
##' using observed (co)variances to compute the denominator. Setting `FALSE`
##' triggers using model-implied (co)variances to compute the denominator.
##' @param tau.eq `logical` indicating whether to assume (essential)
##' tau-equivalence by calculating coefficient \eqn{\alpha} (on observed or
##' model-implied (co)variances, depending on `obs.var=`).
##' Triggers error if requested in combination with unequal weights in `W=`.
##' Setting `FALSE` (default) yields an "\eqn{\omega}"-type coefficient.
##' Optionally, a `character` vector of composite names can specify
##' calculating coefficient \eqn{\alpha} for a subset of all composites.
##' @param ord.scale `logical` relevant only for composites of discrete items.
##' Setting `TRUE` (default) applies Green and Yang's (2009, formula 21)
##' method to calculate reliability of the actual composite (i.e., on the
##' actual ordinal response scale). Setting `FALSE` yields coefficients that
##' are only interpretable on the continuous latent-response scale, which can
##' be interpreted as the upper bound of reliability if items were more
##' approximately continuous.
##' Ignored for factors with continuous indicators.
##' Reliability cannot currently be calculated for composites of both
##' discrete and continuous indicators.
##' @param config Deprecated `character` vector.
# naming any configural constructs
# in a multilevel CFA. For these constructs (and optional total composite),
# Lai's (2021) coefficients \eqn{\omega^\textrm{W}} and \eqn{\omega^\textrm{2L}}
# are returned (or corresponding \eqn{\alpha} coefficients when
# `tau.eq=TRUE`), rather than Geldhof et al.'s (2014) coefficients for
# hypothetical composites of latent components (although the same formula
# is used for \eqn{\omega^\textrm{W}} in either case). Note that the same name
# must be used for the factor component represented at each level of the
# model.
##' @param shared `character` vector of **composite names**, to be interpreted
##' as representing (perhaps multidimensional) shared construct(s).
##' Lai's (2021) coefficient \eqn{\omega^\textrm{B}} or \eqn{\alpha^\textrm{B}}
##' is calculated to quantify reliability relative to error associated with
##' both indicators (measurement error) and subjects (sampling error), like a
##' generalizability coefficient. For purely *scale* reliability (relative
##' to item/measurement error alone, i.e., Lai's \eqn{\omega^\textrm{2L}}),
##' omit the composite(s) from the `shared=` argument.
##' @param add.IRR `logical` indicating whether to calculate an additional
##' reliability coefficient for any composite listed in `shared=`. Given that
##' subjects can be considered as raters of their cluster's shared construct,
##' an interrater reliability (IRR) coefficient can quantify reliability
##' relative to rater/sampling error alone.
##' @param higher Deprecated, supplanted by using the `true=` argument.
# @param higher `character` vector naming any higher-order constructs in
# `object` for which composite reliability should be calculated.
# Ignored when `tau.eq=TRUE` because alpha is not based on a CFA model;
# instead, users must fit a CFA with tau-equivalence constraints.
# To obtain Lai's (2021) multilevel composite-reliability indices for a
# higher-order factor, do not use this argument; instead, specify the
# higher-order factor(s) using the `shared=` or `config=` argument
# (`compRelSEM` will automatically check whether it includes latent
# indicators and apply the appropriate formula).
##' @param true Optional `list` of `character` vectors, with list-element names
##' corresponding to composite names. Each composite can have a `character`
##' vector with names of any common factor(s) that should be considered the
##' source(s) of "true-score variance" in that composite. For any composite
##' with a specification in `true=`, the default is to consider all common
##' factors to contribute true-score variance to any items in the composite.
##' Specifying a composite in `true=` is only necessary to deviate from this
##' default, for example, to specify the "general" factor in a bifactor model,
##' in order to obtain "hierarchical omega" (\eqn{\omega_\textrm{H}}).
##' A shortcut for this is available when `W=NULL`, by specifying a single
##' `character` string (one of `"omegaH"`, `"omega.h"`, or `"omega_h"`)
##' instead of a `list`.
##' @param dropSingle When `W=NULL`, this `logical` indicates whether to exclude
##' single-indicator factors from the list of default composites.
##' Even when `TRUE` (default), single indicators are still included in
##' the `.TOTAL.` composite when `return.total = TRUE`.
##' @param omit.factors Deprecated, supplanted by using the `true=` argument.
##' @param omit.indicators Deprecated, supplanted by using the `W=` argument.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results (using [lavaan.mi::lavaan.mi-class]).
##' Can include any of `c("no.conv", "no.se", "no.npd")`, the first 2 of which
##' are the default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions. Users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##' @param return.df Deprecated `logical` argument, replaced by `simplify=`.
##' @param simplify `logical` indicating whether to return reliability
##' coefficients in a `numeric` vector (for single-group model) or `data.frame`
##' (one row per group, or per level in some cases).
##' Specifying a negative number (`simplify = -1L`) additionally removes the
##' informative headers printed to facilitate interpretation.
##'
##' @return
##' By default (`simplify=FALSE`) a `list` of `numeric` vectors (1 per
##' composite) is returned. In multigroup CFA, the vector contains a reliability
##' index for each group in which the composite can be computed.
##' Each composite's vector has a `attr(..., "header")` with information to
##' facilitate interpretation of that index:
##' \itemize{
##' \item{A list of variables in the composite, which determines the
##' composite's total variance (denominator of reliability)}
##' \item{Whether that total variance (denominator) is determined from
##' the restricted model (i.e., CFA parameters) or unrestricted
##' model (i.e., a freely estimated covariance matrix)}
##' \item{Whether the variables in the composite are (a transformation
##' of) observed variables, or whether they are *latent*
##' (components of) variables. The latter (e.g., latent responses
##' assumed to underlie observed ordinal indicators, or latent
##' level-specific components of variables in a multilevel CFA)
##' cannot be used to calculated an observed composite variable,
##' so the resulting coefficient should be cautiously interpreted
##' as a "hypothetical reliability" (Chalmers, 2018; Lai, 2021).}
##' \item{The latent variables that contribute common-factor variance
##' to the composite, which determine the composite's
##' "true-score" variance (numerator of reliability)}
##' \item{Which reliability formula was used: model-based reliability
##' (so-called "omega") or coefficient alpha (a model-free
##' lower-bound estimate of true reliability, equivalent to
##' a model-based reliability that assumes tau-equivalence)}
##' }
##' This header will be printed immediately above each composite's
##' reliability coefficient. When multiple reliability coefficients are
##' returned, **and** each vector in the list has the same length, then
##' setting `simplify=TRUE` will collect the list of *single*
##' coefficients into a vector, or the list of *multiple* coefficients
##' into a `data.frame`, and their headers will be concatenated to be
##' printed above the coefficients. Setting `simplify = -1L` (or any
##' negative number) will omit the informative headers.
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' Uses hidden functions to implement Green & Yang's (2009) reliability for
##' categorical indicators, written by Sunthud Pornprasertmanit
##' (\email{psunthud@@gmail.com}) for the deprecated `reliability()` function.
##'
##' @seealso
##' [maximalRelia()] for the maximal reliability of weighted composite
##'
##' @references
##'
##' Bentler, P. M. (1968). Alpha-maximized factor analysis (alphamax): Its
##' relation to alpha and canonical factor analysis. *Psychometrika, 33*(3),
##' 335--345. \doi{10.1007/BF02289328}
##'
##' Chalmers, R. P. (2018). On misconceptions and the limited usefulness of
##' ordinal alpha. *Educational and Psychological Measurement, 78*(6),
##' 1056--1071. \doi{10.1177/0013164417727036}
##'
##' Cho, E. (2021) Neither Cronbach’s alpha nor McDonald’s omega: A commentary
##' on Sijtsma and Pfadt. *Psychometrika, 86*(4), 877--886.
##' \doi{10.1007/s11336-021-09801-1}
##'
##' Cronbach, L. J. (1951). Coefficient alpha and the internal structure of
##' tests. *Psychometrika, 16*(3), 297--334. \doi{10.1007/BF02310555}
##'
##' Geldhof, G. J., Preacher, K. J., & Zyphur, M. J. (2014). Reliability
##' estimation in a multilevel confirmatory factor analysis framework.
##' *Psychological Methods, 19*(1), 72--91. \doi{10.1037/a0032138}
##'
##' Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using
##' structural equation modeling: An alternative to coefficient alpha.
##' *Psychometrika, 74*(1), 155--167. \doi{10.1007/s11336-008-9099-3}
##'
##' Jak, S., Jorgensen, T. D., & Rosseel, Y. (2021). Evaluating cluster-level
##' factor models with `lavaan` and M*plus*. *Psych, 3*(2),
##' 134--152. \doi{10.3390/psych3020012}
##'
##' Lai, M. H. C. (2021). Composite reliability of multilevel data: It’s about
##' observed scores and construct meanings. *Psychological Methods, 26*(1),
##' 90--102. \doi{10.1037/met0000287}
##'
##' Lu, Z., Hong, M., & Kim, S. (2020). Formulas of multilevel reliabilities for
##' tests with ordered categorical responses.
##' In M. Wiberg, D. Molenaar, J. González, U.Böckenholt, & J.-S. Kim (Eds.),
##' *Quantitative psychology: The 85th annual meeting of the Psychometric Society, Virtual*
##' (pp. 103--112). Springer. \doi{10.1007/978-3-030-74772-5_10}
##'
##' Lüdtke, O., Marsh, H. W., Robitzsch, A., & Trautwein, U. (2011).
##' A 2 \eqn{\times} 2 taxonomy of multilevel latent contextual models:
##' Accuracy--bias trade-offs in full and partial error correction models.
##' *Psychological Methods, 16*(4), 444--467. \doi{10.1037/a0024376}
##'
##' McDonald, R. P. (1999). *Test theory: A unified treatment*. Mahwah, NJ:
##' Erlbaum.
##'
##' Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of
##' coefficients alpha and theta for Likert rating scales.
##' *Journal of Modern Applied Statistical Methods, 6*(1), 21--29.
##' \doi{10.22237/jmasm/1177992180}
##'
##' Zumbo, B. D., & Kroc, E. (2019). A measurement is a choice and Stevens’
##' scales of measurement do not help make it: A response to Chalmers.
##' *Educational and Psychological Measurement, 79*(6), 1184--1197.
##' \doi{10.1177/0013164419844305}
##'
##'
##' @examples
##' data(HolzingerSwineford1939)
##' HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
##' HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
##' names(HSbinary) <- c("y7","y8","y9")
##' HS <- cbind(HolzingerSwineford1939, HSbinary)
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ y7 + y8 + y9 '
##'
##' fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
##' fitg <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE,
##' group = "school")
##'
##' ## works for factors with exclusively continuous OR categorical indicators
##' compRelSEM(fit)
##' compRelSEM(fitg)
##'
##' ## reliability for composite of ALL indicators only available when they are
##' ## all continuous or all categorical. The example below calculates a
##' ## composite of continuous items from 2 factors (visual and textual)
##' ## using the custom-weights syntax (note the "<~" operator)
##' w.tot <- '
##' visual <~ x1 + x2 + x3
##' textual <~ x4 + x5 + x6
##' total <~ x1 + x2 + x3 + x4 + x5 + x6
##' '
##' compRelSEM(fit, W = w.tot)
##'
##'
##' ## ----------------------
##' ## Higher-order construct
##' ## ----------------------
##'
##' ## Reliability of a composite that represents a higher-order factor
##' mod.hi <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' general =~ visual + textual + speed '
##'
##' fit.hi <- cfa(mod.hi, data = HolzingerSwineford1939)
##' ## "general" is the factor representing "true scores", but it has no
##' ## observed indicators. Must use custom-weights syntax:
##' compRelSEM(fit.hi, W = 'g <~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9')
##'
##'
##' ## ----------------------
##' ## Hierarchical omega
##' ## and omega Total
##' ## ----------------------
##'
##' mod.bi <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' general =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
##' fit.bi <- cfa(mod.bi, data = HolzingerSwineford1939,
##' orthogonal = TRUE, std.lv = TRUE)
##' compRelSEM(fit.bi, return.total = -1) # omega_Total
##' compRelSEM(fit.bi, return.total = -1, # omega_Hierarchical
##' true = list(.TOTAL. = "general"))
##'
##'
##' ## ----------------------
##' ## Multilevel Constructs
##' ## ----------------------
##'
##' ## Same factor structure with metric invariance across levels (Jak et al., 2021)
##' model2 <- '
##' level: 1
##' f1 =~ y1 + L2*y2 + L3*y3
##' f2 =~ y4 + L5*y5 + L6*y6
##' level: 2
##' f1 =~ y1 + L2*y2 + L3*y3
##' f2 =~ y4 + L5*y5 + L6*y6
##' '
##' fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster")
##'
##' ## Lai's (2021, Eq. 13) omega index for a configural (Level-1) construct,
##' ## treating common-factor variance at both levels as "true" variance
##' compRelSEM(fit2)
##'
##' ## Lai's (2021, Eq. 17) omega index for a shared (Level-2) construct
##' ## (also its interrater reliability coefficient)
##' compRelSEM(fit2, shared = c("f1","f2"), add.IRR = TRUE)
##'
##' ## Geldhof et al.'s (2014) level-specific indices imply a different
##' ## composite (hypothetically) calculated per level. Thus, use
##' ## unique composite names per level.
##'
##' W2.Geldhof <- ' level: 1
##' F1w <~ y1 + y2 + y3
##' F2w <~ y4 + y5 + y6
##' level: 2
##' F1b <~ y1 + y2 + y3
##' F2b <~ y4 + y5 + y6
##' '
##' compRelSEM(fit2, W = W2.Geldhof)
##'
##'
##' @export
compRelSEM <- function(object, W = NULL,
return.total = FALSE,
obs.var = TRUE, tau.eq = FALSE, ord.scale = TRUE,
shared = character(0), config = character(0), # deprecated (only shared= needed)
add.IRR = FALSE, # only for shared constructs
higher = character(0), # deprecated (always used when found)
true = list(), # character(0) per composite
dropSingle = TRUE, # ignored when !is.null(W)
omit.factors = character(0), # deprecated (use true=)
omit.indicators = character(0), # deprecated (use W=)
omit.imps = c("no.conv","no.se"),
## simplify= replaces (deprecated) return.df=
simplify = FALSE, return.df = simplify) {
## warn about deprecated arguments following introduction of W= and true=
if (length(config)) {
warning('Argument config= is deprecated. Composites including both within- ',
'and between-level variance are assumed by default to represent ',
'configural constructs (unless listed in the shared= argument).')
}
if (length(higher)) {
warning('Argument higher= is deprecated. Higher-order constructs are ',
'automatically detected by nonzero elements in the Beta matrix, ',
'and are always assumed to represent true scores, whereas the ',
'residuals of its indicators (lower-order factors) are considered ',
'sources of error.')
}
if (length(omit.indicators)) {
warning('Argument omit.indicators= is deprecated. Specify custom composites ',
'with W= argument.')
}
if (length(omit.factors)) {
warning('Argument omit.factors= is deprecated. By default, all common ',
'factors are assumed to represent true-score variance. To treat ',
'a subset of factors as true-score variance (e.g., to calculate ',
'"omega_hierarchical"), use the true= argument.')
}
if (!is.null(match.call()$return.df)) {
message('Argument return.df= is deprecated, replaced by simplify= argument.')
if (return.df) simplify <- -1L
}
## numbers of blocks
ngroups <- lavInspect(object, "ngroups")
nLevels <- lavInspect(object, "nlevels")
nblocks <- ngroups*nLevels #FIXME: always true?
## extract parameter table
PT <- parTable(object)
## labels for groups
if (ngroups > 1L) {
group.label <- lavInspect(object, "group.label")
blk.g.lab <- if (!length(group.label)) paste0("g", 1:ngroups) else group.label
} else {
group.label <- blk.g.lab <- NULL
}
## labels for clusters
if (nLevels > 1L) {
#FIXME? lavInspect(object, "level.label") is always ==
# c("within", lavInspect(object, "cluster"))
# which can differ from unique(parTable(object)$level)
clus.label <- unique(PT$level)
clus.label <- clus.label[which(clus.label != "")]
clus.label <- clus.label[which(clus.label != 0)]
blk.clus.lab <- if (is.numeric(clus.label)) {
c("within", lavInspect(object, "cluster"))
} else clus.label
} else clus.label <- blk.clus.lab <- NULL
## labels for blocks
if (nblocks > 1L) {
block.label <- paste(rep(blk.g.lab, each = nLevels), blk.clus.lab,
sep = if (ngroups > 1L && nLevels > 1L) "_" else "")
} else block.label <- NULL
## Check (for) weights
if (is.null(W)) {
## construct default weights
wPT <- PT[PT$op == "=~", ] # isolate factor definitions
wPT$op <- "<~"
wPT$ustart <- 1L # default weights == 1
## add total composite?
if (return.total) {
## loop over blocks to add a total score
for (b in 1:nblocks) {
wPTb <- wPT[wPT$block == b, ]
ov.names <- lavNames(object, type = "ov.ind", block = b)
ind.idx <- unique(match(wPTb$rhs, table = ov.names))
totalPTb <- wPTb[ind.idx, ]
totalPTb$lhs <- ".TOTAL."
if (b == 1L) {
totalPT <- totalPTb
} else totalPT <- rbind(totalPT, totalPTb)
## end loop over blocks
}
if (return.total < 0L) {
## only the total composite
wPT <- totalPT
## add the total composite
} else wPT <- rbind(wPT, totalPT)
}
## replace default integers with labels for groups
if (length(group.label) && is.integer(wPT$group)) {
wPT$group[wPT$group > 0L] <- group.label[wPT$group]
}
## default composite names (each factor, and/or total)
allComps <- unique(wPT$lhs)
## Option to preserve old default behavior (partial numerator). Only
## available when we know composite names == factor names (W=NULL).
if (is.character(true)) {
if (tolower(true[1]) %in% paste0("omega", c("h",".h","_h")) ) {
## for each composite (factor), only that factor is true-score variance
true <- setNames(as.list(allComps), nm = allComps)
}
}
} else {
## user supplied W=
## turn off the dropSingle= argument
dropSingle <- FALSE
## Take 2: ONLY accept a lavaan script
stopifnot(is.character(W))
PTw <- lavaan::lavaanify(W,
ngroups = ngroups, #FIXME?
as.data.frame. = TRUE)
## Check for multilevel (if absent from W= syntax)
if (is.null(PTw$level) && nLevels > 1L) {
## repeat syntax per level (implies Lai's (2021) indices by default)
PTw <- lavaan::lavaanify(c('level: 1 \n', W, 'level: 2 \n', W),
ngroups = ngroups, #FIXME?
as.data.frame. = TRUE)
}
if (length(group.label)) {
## replace default integers with labels
PTw$group[PTw$group > 0L] <- group.label[PTw$group]
}
PTw$ustart[is.na(PTw$ustart)] <- 1L # replace missing weights with 1
wPT <- PTw[PTw$op == "<~", ] # isolate composite definitions
allComps <- unique(wPT$lhs) # extract composite names
}
## apply tau.eq= to all composites?
if (is.logical(tau.eq)) {
if (tau.eq) {
tau.eq <- allComps
} else tau.eq <- character(0)
} else if (!is.character(tau.eq)) {
stop('tau.eq= must be logical or a character vector of composite names')
}
## check for categorical
anyCategorical <- lavInspect(object, "categorical")
threshold <- if (anyCategorical) getThreshold(object, omit.imps = omit.imps) else NULL
latScales <- if (anyCategorical) getScales(object, omit.imps = omit.imps) else NULL
## extract common- and total-variance components
if (inherits(object, "lavaan")) {
## common-factor variance
PHI <- lavInspect(object, "cov.lv") # ignored if tau.eq
if (nblocks == 1L) PHI <- list(PHI)
names(PHI) <- block.label
## factor loadings
EST <- lavInspect(object, "est", drop.list.single.group = FALSE)
LAMBDA <- sapply(EST, "[[", i = "lambda", simplify = FALSE)
names(LAMBDA) <- block.label
## possibly higher-order loadings?
BETA <- sapply(EST, "[[", i = "beta", simplify = FALSE)
names(BETA) <- block.label
## total variance
SIGMA <- sapply(lavInspect(object, drop.list.single.group = FALSE,
what = ifelse(obs.var, "sampstat", "fitted")),
"[[", i = "cov", simplify = FALSE)
names(SIGMA) <- block.label
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
## common-factor variance
phiList <- object@phiList[useImps]
if (nblocks == 1L) for (i in 1:m) phiList[[i]] <- list(phiList[[i]])
PHI <- vector("list", nblocks)
names(PHI) <- block.label
for (b in 1:nblocks) {
PHI[[b]] <- Reduce("+", lapply(phiList, "[[", i = b) ) / m
}
## loadings (including higher-order in Beta)
if (nblocks == 1L) {
lamList <- lapply(object@coefList[useImps], "[[", i = "lambda")
LAMBDA <- list(Reduce("+", lamList) / length(lamList))
betList <- lapply(object@coefList[useImps], "[[", i = "beta")
if (length(betList)) {
BETA <- list(Reduce("+", betList) / length(betList))
} else BETA <- list(NULL)
} else {
LAMBDA <- BETA <- vector("list", nblocks)
names(LAMBDA) <- names(BETA) <- block.label
for (b in 1:nblocks) {
lamList <- lapply(object@coefList[useImps], function(i) i[[b]]$lambda)
LAMBDA[[b]] <- Reduce("+", lamList) / length(lamList)
betList <- lapply(object@coefList[useImps], function(i) i[[b]]$beta )
BETA[[b]] <- Reduce("+", betList) / length(betList)
}
}
## total variance
if (obs.var) {
## pool model-implied SIGMA from h1 model
SIGMA <- vector("list", nblocks)
names(SIGMA) <- block.label
## loop over blocks to pool saturated-model (observed) matrices
for (b in 1:nblocks) {
covList <- lapply(object@h1List[useImps], function(i) i$implied$cov[[b]])
SIGMA[[b]] <- Reduce("+", covList) / m
## The slot does not contain dimnames, so add them
rownames(SIGMA[[b]]) <- colnames(SIGMA[[b]]) <- lavNames(object, block = b)
}
} else {
## pool model-implied SIGMA from h0 model
if (nblocks == 1L) {
SIGMA <- getMethod("fitted", class(object))(object)["cov"] # retain list format
} else {
SIGMA <- sapply(getMethod("fitted", class(object))(object),
"[[", "cov", simplify = FALSE)
names(SIGMA) <- block.label
}
}
} # end lavaan vs. lavaan.mi conditional
## scale polychoric/polyserial to modeled LRV scale
if (anyCategorical) {
SDs <- sapply(latScales, function(x) diag(1 / x),
simplify = FALSE)
for (b in 1:nblocks) {
dimnames(SDs[[b]]) <- dimnames(SIGMA[[b]])
SIGMA[[b]] <- SDs[[b]] %*% SIGMA[[b]] %*% SDs[[b]]
}
}
## flag conditions to warn about (listing problematic composites)
warnLRV <- character(0)
rel <- list() # coefficient(s) per composite
## loop over composites
for (cc in allComps) {
## extract rows for this composite
cPT <- wPT[wPT$lhs == cc, ]
## loop over groups
for (g in 1:ngroups) {
## without labels, PT$group will be integers (even with 1 group)
g.idx <- ifelse(length(group.label), yes = group.label[g], no = g)
## group-specific rows
if (!is.null(cPT$group)) {
cPTg <- cPT[cPT$group == g.idx, ]
} else cPTg <- cPT # only 1 group
## check whether all weights are zero (so skip this composite)
allWts0 <- all(sapply(cPTg$ustart, function(target) {
isTRUE(all.equal(target, current = 0))
}))
if (allWts0) next
## Determine block indices (when multilevel)
compositeHas2Levels <- FALSE
## Either the user didn't specify levels in W= ...
if (nLevels > 1L && is.null(wPT$level)) compositeHas2Levels <- TRUE
## ... or they are in cPTg
if (length(unique(cPTg$level)) > 1L) compositeHas2Levels <- TRUE
## Same composite (factor?) name across levels?
isWithin <- FALSE
isBetween <- FALSE
if (compositeHas2Levels) {
## Then we need to distinguish levels in the same index
b.idx <- (g - 1L)*nLevels + 1
b.idx2 <- (g - 1L)*nLevels + 2
#FIXME if cross-classification or nLevels > 2 are implemented
## Still multilevel? Implies a level-specific name
} else if (nLevels > 1L) {
## which level is this?
isWithin <- as.logical(unique(cPTg$block) %% 2) # is it an odd block?
isBetween <- !isWithin
b.idx <- ifelse(isWithin,
yes = (g - 1L)*nLevels + 1, # within
no = (g - 1L)*nLevels + 2) # between
b.idx2 <- NULL
#FIXME? eventually possible to model partial clustering?
} else {
## SINGLE LEVEL model (block == group)
## but b.idx MUST be integer(s)
b.idx <- g # so NOT g.idx (which is character for MG-CFA)
b.idx2 <- NULL
}
## skip single-indicator construct?
if (dropSingle && nrow(cPTg) == 1L) next
## same logic applies across 2 levels:
if (isTRUE(unique(cPTg$level) > 1L)) {
if (dropSingle && nrow(cPTg) == 2L) next
}
## extract Lambda
LY <- LAMBDA[[b.idx]]
## checks names of modeled indicators against each composite below
allIndNames <- rownames(LY)
allFacNames <- colnames(LY)
## assign default weights == 0
wt <- setNames(rep(0, length(allIndNames)), nm = allIndNames)
## loop over indicators ...
for (i in allIndNames) {
## ... to check whether to assign any nonzero weights
if (isTRUE(cPTg$ustart[cPTg$rhs == i & cPTg$block == b.idx] != 0)) {
wt[i] <- cPTg$ustart[cPTg$rhs == i & cPTg$block == b.idx]
}
}
## save names of indicators with nonzero weights (for header/footer)
myIndNames <- names(wt[wt != 0])
if (!length(myIndNames)) next
## verify equal weights for alpha
non0wts <- wt[wt != 0]
if (cc %in% tau.eq && length(unique(non0wts)) > 1L) {
stop('Cannot calculate coefficient alpha for composite `', cc,
'` with unequal weights')
}
## check factor names in true=
if (length(true[[cc]])) {
## Isolate the factors considered "true" variance
myFacNames <- intersect(true[[cc]], allFacNames)
} else myFacNames <- allFacNames # defaults to all, but adjust below
## factor loadings for this composite:
if (is.null(BETA[[b.idx]])) {
theseRtrue <- apply(LY[myIndNames, myFacNames, drop = FALSE],
MARGIN = 2, FUN = function(x) !all(x == 0))
} else {
## Assume higher-order constructs represent true scores
#FIXME? check PT$op == "=~", compare to BETA predictors
L1 <- LY
L2 <- BETA[[b.idx]][ , myFacNames, drop = FALSE]
LY <- L1 %*% L2
theseRtrue <- apply(LY[myIndNames, myFacNames, drop = FALSE],
MARGIN = 2, FUN = function(x) !all(x == 0))
if (!any(theseRtrue)) {
## No higher-order factor(s) identified for this composite.
## Just use lower-order Lambda to identify true-score variance
LY <- L1
theseRtrue <- apply(LY[myIndNames, myFacNames, drop = FALSE],
MARGIN = 2, FUN = function(x) !all(x == 0))
}
}
## perhaps no Level-1 factors for shared constructs,
## so only total variance at within-level necessary
if ( !( any(theseRtrue) | cc %in% shared) ) {
stop('No common factors identified for composite `', cc, '`')
}
myFacNames <- myFacNames[theseRtrue]
L <- LY[, myFacNames, drop = FALSE]
## TRUE (co)variance among indicators in this composite:
Phi <- PHI[[b.idx]][myFacNames, myFacNames, drop = FALSE]
commonCov <- L %*% Phi %*% t(L)
## TOTAL (co)variance among indicators
totalCov <- SIGMA[[b.idx]]
## Second level too? Do it all again!
if (is.null(b.idx2)) {
myIndNames2 <- myFacNames2 <- NULL
commonCov2 <- totalCov2 <- NULL
} else {
## extract Level-2 Lambda
LY2 <- LAMBDA[[b.idx2]]
## checks names of modeled indicators against each composite below
allIndNames2 <- rownames(LY2)
allFacNames2 <- colnames(LY2)
## assign default weights == 0
wt2 <- setNames(rep(0, length(allIndNames2)), nm = allIndNames2)
## loop over indicators ...
for (i in allIndNames2) {
## ... to check whether to assign any nonzero weights
if (isTRUE( cPTg$ustart[cPTg$rhs == i & cPTg$block == b.idx2] != 0)) {
wt2[i] <- cPTg$ustart[cPTg$rhs == i & cPTg$block == b.idx2]
}
}
## save names of indicators with nonzero weights (for header/footer)
myIndNames2 <- names(wt2[wt2 != 0])
## verify equal weights for alpha
non0wts2 <- wt2[wt2 != 0]
if (cc %in% tau.eq && length(unique(non0wts2)) > 1L) {
stop('Cannot calculate coefficient alpha for composite `', cc,
'` with unequal weights')
}
## verify weights match across levels
for (i in names(wt)) {
L1wt <- wt[i]
L2wt <- wt2[i]
if (is.na(L2wt)) next # Level-1 variable not decomposed?
if (L1wt != L2wt) stop('Weights in W= do not match across levels for',
' indicator ', i, ' of composite `', cc, '`')
}
## check factor names in true=
if (length(true[[cc]])) {
## Isolate the factors considered "true" variance
myFacNames2 <- intersect(true[[cc]], allFacNames2)
} else myFacNames2 <- allFacNames2
## factor loadings for this composite:
if (is.null(BETA[[b.idx2]])) {
# L <- LY2[, myFacNames2, drop = FALSE]
theseRtrue <- apply(LY2[myIndNames2, myFacNames2, drop = FALSE],
MARGIN = 2, FUN = function(x) !all(x == 0))
} else {
## Assume higher-order constructs
#FIXME? check PT$op == "=~", compare to BETA predictors
L1 <- LY2
L2 <- BETA[[b.idx2]][ , myFacNames2, drop = FALSE]
LY2 <- L1 %*% L2
theseRtrue <- apply(LY2[myIndNames2, myFacNames2, drop = FALSE],
MARGIN = 2, FUN = function(x) !all(x == 0))
if (!any(theseRtrue)) {
## No higher-order factor(s) identified for this composite.
## Just use lower-order Lambda to identify true-score variance
LY2 <- L1
theseRtrue <- apply(LY2[myIndNames2, myFacNames2, drop = FALSE],
MARGIN = 2, FUN = function(x) !all(x == 0))
}
}
if (!any(theseRtrue)) {
stop('No common factors identified for composite `', cc, '`')
}
myFacNames2 <- myFacNames2[theseRtrue]
L <- LY2[, myFacNames2, drop = FALSE]
## TRUE (co)variance among indicators in this composite:
Phi <- PHI[[b.idx2]][myFacNames2, myFacNames2, drop = FALSE]
commonCov2 <- L %*% Phi %*% t(L)
## TOTAL (co)variance among indicators
totalCov2 <- SIGMA[[b.idx2]]
}
## distinguish between categorical & continuous indicators
nameArgs <- list(object = object)
if (nblocks > 1L) nameArgs$block <- c(b.idx, b.idx2)
## in case lavNames returns a list (multiple blocks),
## Reduce() with union() to save unique names
ordNames <- Reduce(union, do.call(lavNames, c(nameArgs, list(type = "ov.ord"))))
numNames <- Reduce(union, do.call(lavNames, c(nameArgs, list(type = "ov.num"))))
if (anyCategorical) {
#FIXME? Add second block when ML-SEM is implemented for categorical
## Are ALL these indicators categorical?
allCat <- all(myIndNames %in% ordNames)
## Are only SOME of these indicators categorical? (mixed indicators)
mix <- any(myIndNames %in% ordNames) && any(myIndNames %in% numNames)
} else {
allCat <- FALSE
mix <- FALSE
}
## can't (yet) mix observed and latent scales
if (mix) {
warning('Reliability cannot be computed for composite `', cc,
'` because it has both categorical and continuous indicators. ',
'A model can be fitted by treating ordinal indicators as continuous.')
next
}
## Calculate RELIABILITY for composite(s) of ordinal indicators
if (allCat && ord.scale) {
if (cc %in% tau.eq) {
stop('Setting ord.scale=TRUE and including composite `', cc,
'` in tau.eq= indicates you want to calculate coefficient ',
'alpha for a composite on the observed ordinal scale',
' (not hypothetical alpha on the latent scale).\n',
'There are 2 options, which are not equivalent:\n\n(1)',
' You can refit the model without specifying variables as ',
'ordered=, avoiding the latent-response assumption, so the model ',
'parameters are on the observed scale, making it unnecessary to ',
'set ord.scale=TRUE.\n\n(2)',
' You can fit a model that retains the latent-response assumption ',
'but imposes tau-equivalence (e.g., set all loadings = 1 and ',
'estimate all factor variances), making it unnecessary to ',
'include composite `', cc, '` in tau.eq='
)
}
## Green & Yang (2009)
relCat <- omegaCat(truevar = commonCov[myIndNames, myIndNames],
denom = totalCov[myIndNames, myIndNames],
#FIXME? Where will thresholds for L2-only variables be?
threshold = threshold[[b.idx]][myIndNames],
scales = latScales[[b.idx]][myIndNames],
wt = wt[myIndNames])
## the same composite name NEVER yields level-specific coefficients,
## so only assign using group index
rel[[cc]][g.idx] <- relCat
next
} else {
## all continuous or all LRV-scale?
if (allCat) warnLRV <- union(warnLRV, cc)
}
## else, calculate RELIABILITY for composite(s) of continuous indicators
## could be multilevel (not if categorical, calculated above)
isShared <- FALSE
isConfig <- FALSE
## calculate ALPHA?
if (cc %in% tau.eq) {
if (!is.null(b.idx2)) {
## calculate manually using Lai (2021, Eq. 23 or 24)
nI <- length(myIndNames2)
if (nI == 1L) {
stop('Coefficient alpha is undefined for a single indicator. ',
'Set tau.eq=FALSE (or specify a subset of composites ',
'excluding single-indcator factors) or set dropSingle=TRUE')
}
kw <- nI / (nI-1) # weight for alpha based on number of items
## numerator of shared or configural coefficient
onlyCov2 <- totalCov2
diag(onlyCov2) <- 0
## additional information depends on shared or configural
if (cc %in% shared) {
isShared <- TRUE
## (reciprocal of) harmonic-mean cluster size
N_bar <- mean(lavInspect(object, "cluster.size",
drop.list.single.group = FALSE)[[g.idx]])
Ns <- mean(1 / lavInspect(object, "cluster.size",
drop.list.single.group = FALSE)[[g.idx]])
## numerator only sums Level-2 covariances
numerator <- t(wt2) %*% onlyCov2 %*% wt2
denominator <- sum(Ns * (t(wt) %*% totalCov %*% wt),
t(wt2) %*% totalCov2 %*% wt2)
## ALPHA
rel[[cc]][g.idx] <- kw*numerator / denominator
if (add.IRR) {
numerator <- t(wt2) %*% totalCov2 %*% wt2 # same denominator
rel[[paste0("IRR_of_", cc)]][g.idx] <- numerator / denominator
## Add header
HEAD <- paste0('Composite `', cc, '` represents a shared construct, ',
'formed by summing not only over items but over ',
'(on average ', round(N_bar, 2), ') responses within ',
'each cluster (', dQuote(lavInspect(object, "cluster")),
'). Cluster members can therefore be interpreted as ',
'"raters" of the cluster-level shared construct.\n\n',
'The interrater reliability (IRR) of that construct is:')
attr( rel[[paste0("IRR_of_", cc)]], "header") <- HEAD
class(rel[[paste0("IRR_of_", cc)]]) <- c("lavaan.vector","numeric")
}
} else {
isConfig <- TRUE
## numerator sums covariances at both levels
onlyCov1 <- totalCov
diag(onlyCov1) <- 0
numerator <- sum(t(wt) %*% onlyCov1 %*% wt,
t(wt2) %*% onlyCov2 %*% wt2)
denominator <- sum(t(wt) %*% totalCov %*% wt,
t(wt2) %*% totalCov2 %*% wt2)
## ALPHA
rel[[cc]][g.idx] <- kw*numerator / denominator
}
} else {
## use built-in function for single-level model
rel[[cc]][g.idx] <- computeAlpha(totalCov, W = wt)
}
next
}
## else, calculate model-based RELIABILITY
denominator <- t(wt) %*% totalCov %*% wt
if (!is.null(b.idx2)) {
## 2-level model, what kind of construct?
if (cc %in% shared) {
isShared <- TRUE
## (reciprocal of) harmonic-mean cluster size
N_bar <- mean(lavInspect(object, "cluster.size",
drop.list.single.group = FALSE)[[g.idx]])
Ns <- mean(1 / lavInspect(object, "cluster.size",
drop.list.single.group = FALSE)[[g.idx]])
## only Level 2 is true-score variance (Level-1 is error)
numerator <- t(wt2) %*% commonCov2 %*% wt2
## Level-1 error is reduced by (harmonic-mean) cluster size
denominator <- sum(denominator * Ns, t(wt2) %*% totalCov2 %*% wt2)
} else {
isConfig <- TRUE
## Level 1 has true-score variance, but accumulates at Level 2
numerator <- sum( t(wt ) %*% commonCov %*% wt ,
t(wt2) %*% commonCov2 %*% wt2 )
denominator <- sum(denominator, t(wt2) %*% totalCov2 %*% wt2)
}
} else {
## single block
numerator <- t(wt) %*% commonCov %*% wt
}
## OMEGA
rel[[cc]][g.idx] <- as.numeric(numerator / denominator)
if (isShared && add.IRR) {
numerator <- sum( t(wt2) %*% totalCov2 %*% wt2)
denominator <- sum( t(wt ) %*% totalCov %*% wt ) * Ns + numerator
rel[[paste0("IRR_of_", cc)]][g.idx] <- numerator / denominator
## Add header
HEAD <- paste0('Composite `', cc, '` represents a shared construct, ',
'formed by summing not only over items but over ',
'(on average ', round(N_bar, 2), ') responses within ',
'each cluster (', dQuote(lavInspect(object, "cluster")),
'). Cluster members can therefore be interpreted as ',
'"raters" of the cluster-level shared construct.\n\n',
'The interrater reliability (IRR) of composite `', cc,
'` is:')
attr( rel[[paste0("IRR_of_", cc)]], "header") <- HEAD
class(rel[[paste0("IRR_of_", cc)]]) <- c("lavaan.vector","numeric")
}
## end loop over groups
}
## in case this composite was skipped
## (e.g., no observed indicators in any groups)
if (is.null(rel[[cc]])) next
class(rel[[cc]]) <- c("lavaan.vector","numeric")
## determine type of composite for header
compType <- ifelse(cc %in% warnLRV,
'latent responses underlying binary/ordinal variables:\n\t',
ifelse(isWithin,
'cluster-mean-centered observed (or latent Level-1 components of) 2-level variables:\n\t',
ifelse(isBetween,
'latent Level-2 components of 2-level variables:\n\t',
ifelse(isShared,
'(cluster means of) observed variables:\n\t',
'observed variables:\n\t'))))
## Check for indicators ONLY in Level-2 model
#TODO: enable this in lavaan: myIndNames2only <- lavNames(object, "ov.between")
# if (isBetween && length(myIndNames2only)) {
# myIndNames2from1 <- setdiff(union(myIndNames, myIndNames2), myIndNames2only)
# ## assemble 2 indicator lists, separated by further description
# indicatorList <- paste0(paste(myIndNames2from1, collapse = ', '),
# '\nand observed Level-2 variables:\n\t',
# paste(myIndNames2only, collapse = ', '))
# } else
indicatorList <- paste(union(myIndNames, myIndNames2), collapse = ', ')
## alpha or omega?
relType <- ifelse(cc %in% tau.eq && cc %in% warnLRV,
paste0('The latent polychoric correlation matrix was used ',
'to calculate a hypothetical coefficient alpha:'),
ifelse(cc %in% tau.eq && isWithin,
paste0('The (latent) Level-1 covariance matrix was used ',
'to calculate coefficient alpha:'),
ifelse(cc %in% tau.eq && isBetween,
paste0('The latent Level-2 covariance matrix was used ',
'to calculate a hypothetical coefficient alpha:'),
ifelse(cc %in% tau.eq && isShared,
paste0('Coefficient alpha was calculated using ',
'Lai (2021, Eq. 24):'),
ifelse(cc %in% tau.eq && isConfig,
paste0('Coefficient alpha was calculated using ',
'Lai (2021, Eq. 23):'),
ifelse(cc %in% tau.eq,
paste0('Coefficient alpha would be:'),
## else OMEGA
paste('The proportion attributable to "true" scores is its',
'model-based estimate of reliability ("omega"):')))))))
trueVar <- ifelse(cc %in% tau.eq, yes = '',
paste0('\nTrue-score variance is represented by',
ifelse(isShared | isBetween,
' (between-level components of) ',
ifelse(isWithin,
' (within-level components of) ',
' ')),
'common factor(s):\n\t',
paste(union(myFacNames, myFacNames2), collapse = ', ')))
HEAD <- paste0('Composite `', cc,'` is composed of ',
compType, indicatorList,
## alpha or omega? ## What's the denominator?
trueVar, '\nTotal variance of composite `', cc,
'`\ determined from the ',
ifelse(obs.var, 'un', ''), 'restricted model.\n',
## alpha or omega?
relType)
attr(rel[[cc]], "header") <- HEAD
if (isConfig) {
## message about scale reliability?
# attr(rel[[cc]], "footer") <-
}
## end loop over composites
}
## simplify structure at all? (NOTE: simplify < 0L reproduces old behavior)
## only if there are multiple composites
if (simplify) {
if (length(rel) == 1L) {
REL <- rel[[1L]]
## drop header?
if (simplify < 0L) attr(REL, "header") <- NULL
} else if (length(rel) > 1L) {
## Check there are just as many indices for each composite
nComps <- unique(sapply(rel, length))
if (length(nComps) > 1L) {
message('Not all composites yield the same number of indices, so the ',
'object (a list) cannot be simplified to a data.frame')
REL <- rel
## drop headers?
if (simplify < 0L) for (i in seq_along(REL)) {
attr(REL[[i]], "header") <- NULL
} else simplify <- FALSE
} else if (nComps == 1L) {
## Only 1 index per composite, so a single-group model.
## This automatically assigns list names to concatenated vector
REL <- do.call(c, rel) # drops header (can replace below)
class(REL) <- c("lavaan.vector","numeric")
} else {
## Multiple indices per composite, so a multiple-group model
## Check the indices have the same (group) names.
## NOTE: The same composite name NEVER yields level-specific coefficients
nameList <- lapply(rel, names)
sameNames <- all(sapply(2:length(nameList), function(i) {
isTRUE(all.equal(nameList[[1]], nameList[[i]]))
} ))
if (!sameNames) {
## can't simplify, return as a list
message('Not all composites have the same names for indices, so the ',
'object (a list) cannot be simplified to a data.frame')
REL <- rel
## drop headers?
if (simplify < 0L) for (i in seq_along(REL)) {
attr(REL[[i]], "header") <- NULL
} else simplify <- FALSE
}
## bind composites into rows (groups in columns)
REL <- as.data.frame(do.call(cbind, rel)) # drops header (can replace below)
class(REL) <- c("lavaan.data.frame","data.frame")
}
if (simplify > 0L) {
## concatenate(?) headers separated by this:
divL <- paste(c("\n\n", rep("-", 10)), collapse = "")
divR <- paste(c(rep("-", 10), "\n\n"), collapse = "")
allHeaders <- lapply(rel, attr, which = "header")
divH <- paste0(divL, names(rel), divR, allHeaders, "\n\n", collapse = "")
attr(REL, "header") <- paste0('Information about each composite is ',
'provided below, followed by reliability ',
'coefficients.', divH)
# feet <- which(sapply(rel, function(x) !is.null(attr(x, "footer"))))
#
# allFooters <- lapply(rel, attr, which = "footer")
# hasFeet <- sapply(allFooters, length) > 0L
# if (any(hasFeet)) {
# divF <- paste0(divL, names(rel[hasFeet]), divR,
# allFooters[hasFeet], "\n\n", collapse = "")
# attr(REL, "footer") <- paste0('Additional information is available for ',
# 'the following composite(s):', divF)
# }
}
}
## end if (simplify)
} else REL <- rel
return(REL)
}
## -------------
## reliability()
## (deprecated 10 May 2022)
## -------------
##' Composite Reliability using SEM
##'
##' Calculate composite reliability from estimated factor-model parameters
##'
##' The coefficient alpha (Cronbach, 1951) can be calculated by
##'
##' \deqn{ \alpha = \frac{k}{k - 1}\left[ 1 - \frac{\sum^{k}_{i = 1}
##' \sigma_{ii}}{\sum^{k}_{i = 1} \sigma_{ii} + 2\sum_{i < j} \sigma_{ij}}
##' \right],}
##'
##' where \eqn{k} is the number of items in a factor, \eqn{\sigma_{ii}} is the
##' item *i* observed variances, \eqn{\sigma_{ij}} is the observed
##' covariance of items *i* and *j*.
##'
##' Several coefficients for factor-analysis reliability have been termed
##' "omega", which Cho (2021) argues is a misleading misnomer and argues for
##' using \eqn{\rho} to represent them all, differentiated by descriptive
##' subscripts. In our package, we number \eqn{\omega} based on commonly
##' applied calculations. Bentler (1968) first introduced factor-analysis
##' reliability for a unidimensional factor model with congeneric indicators.
##' However, assuming there are no cross-loadings in a multidimensional CFA,
##' this reliability coefficient can be calculated for each factor in the model.
##'
##' \deqn{ \omega_1 =\frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
##' Var\left( \psi \right)}{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
##' Var\left( \psi \right) + \sum^{k}_{i = 1} \theta_{ii} + 2\sum_{i < j}
##' \theta_{ij} }, }
##'
##' where \eqn{\lambda_i} is the factor loading of item *i*, \eqn{\psi} is
##' the factor variance, \eqn{\theta_{ii}} is the variance of measurement errors
##' of item *i*, and \eqn{\theta_{ij}} is the covariance of measurement
##' errors from item *i* and *j*. McDonald (1999) later referred to
##' this *and other reliability coefficients* as "omega", which is a source
##' of confusion when reporting coefficients (Cho, 2021).
##'
##' The additional coefficients generalize the first formula by accounting for
##' multidimenisionality (possibly with cross-loadings) and correlated errors.
##' By setting `return.total=TRUE`, one can estimate reliability for a
##' single composite calculated using all indicators in the multidimensional
##' CFA (Bentler, 1972, 2009). `"omega2"` is calculated by
##'
##' \deqn{ \omega_2 = \frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
##' Var\left( \psi \right)}{\bold{1}^\prime \hat{\Sigma} \bold{1}}, }
##'
##' where \eqn{\hat{\Sigma}} is the model-implied covariance matrix, and
##' \eqn{\bold{1}} is the \eqn{k}-dimensional vector of 1. The first and the
##' second coefficients omega will have the same value per factor in models with
##' simple structure, but they differ when there are (e.g.) cross-loadings
##' or method factors. The first coefficient omega can be viewed as the
##' reliability controlling for the other factors (like \eqn{\eta^2_{partial}} in
##' ANOVA). The second coefficient omega can be viewed as the unconditional
##' reliability (like \eqn{\eta^2} in ANOVA).
##'
##' The `"omega3"` coefficient (McDonald, 1999), sometimes referred to as
##' hierarchical omega, can be calculated by
##'
##' \deqn{ \omega_3 =\frac{\left( \sum^{k}_{i = 1} \lambda_i \right)^{2}
##' Var\left( \psi \right)}{\bold{1}^\prime \Sigma \bold{1}}, }
##'
##' where \eqn{\Sigma} is the observed covariance matrix. If the model fits the
##' data well, \eqn{\omega_3} will be similar to \eqn{\omega_2}. Note that if
##' there is a directional effect in the model, all coefficients are calculated
##' from total factor variances: `lavInspect(object, "cov.lv")`.
##'
##' In conclusion, \eqn{\omega_1}, \eqn{\omega_2}, and \eqn{\omega_3} are
##' different in the denominator. The denominator of the first formula assumes
##' that a model is congeneric factor model where measurement errors are not
##' correlated. The second formula accounts for correlated measurement errors.
##' However, these two formulas assume that the model-implied covariance matrix
##' explains item relationships perfectly. The residuals are subject to sampling
##' error. The third formula use observed covariance matrix instead of
##' model-implied covariance matrix to calculate the observed total variance.
##' This formula is the most conservative method in calculating coefficient
##' omega.
##'
##' The average variance extracted (AVE) can be calculated by
##'
##' \deqn{ AVE = \frac{\bold{1}^\prime
##' \textrm{diag}\left(\Lambda\Psi\Lambda^\prime\right)\bold{1}}{\bold{1}^\prime
##' \textrm{diag}\left(\hat{\Sigma}\right) \bold{1}}, }
##'
##' Note that this formula is modified from Fornell & Larcker (1981) in the case
##' that factor variances are not 1. The proposed formula from Fornell & Larcker
##' (1981) assumes that the factor variances are 1. Note that AVE will not be
##' provided for factors consisting of items with dual loadings. AVE is the
##' property of items but not the property of factors. AVE is calculated with
##' polychoric correlations when ordinal indicators are used.
##'
##' Coefficient alpha is by definition applied by treating indicators as numeric
##' (see Chalmers, 2018), which is consistent with the `alpha` function in
##' the `psych` package. When indicators are ordinal, `reliability`
##' additionally applies the standard alpha calculation to the polychoric
##' correlation matrix to return Zumbo et al.'s (2007) "ordinal alpha".
##'
##' Coefficient omega for categorical items is calculated using Green and Yang's
##' (2009, formula 21) approach. Three types of coefficient omega indicate
##' different methods to calculate item total variances. The original formula
##' from Green and Yang is equivalent to \eqn{\omega_3} in this function.
##' Green and Yang did not propose a method for
##' calculating reliability with a mixture of categorical and continuous
##' indicators, and we are currently unaware of an appropriate method.
##' Therefore, when `reliability` detects both categorical and continuous
##' indicators of a factor, an error is returned. If the categorical indicators
##' load on a different factor(s) than continuous indicators, then reliability
##' will still be calculated separately for those factors, but
##' `return.total` must be `FALSE` (unless `omit.factors` is used
##' to isolate factors with indicators of the same type).
##'
##'
##' @importFrom lavaan lavInspect lavNames
##' @importFrom methods getMethod
##'
##' @param object A [lavaan::lavaan-class] or [lavaan.mi::lavaan.mi-class] object,
##' expected to contain only exogenous common factors (i.e., a CFA model).
##' @param what `character` vector naming any reliability indices to
##' calculate. All are returned by default. When indicators are ordinal,
##' both traditional `"alpha"` and Zumbo et al.'s (2007) so-called
##' "ordinal alpha" (`"alpha.ord"`) are returned, though the latter is
##' arguably of dubious value (Chalmers, 2018).
##' @param return.total `logical` indicating whether to return a final
##' column containing the reliability of a composite of all indicators (not
##' listed in `omit.indicators`) of factors not listed in
##' `omit.factors`. Ignored in 1-factor models, and should only be set
##' `TRUE` if all factors represent scale dimensions that could be
##' meaningfully collapsed to a single composite (scale sum or scale mean).
##' @param dropSingle `logical` indicating whether to exclude factors
##' defined by a single indicator from the returned results. If `TRUE`
##' (default), single indicators will still be included in the `total`
##' column when `return.total = TRUE`.
##' @param omit.factors `character` vector naming any common factors
##' modeled in `object` whose composite reliability is not of
##' interest. For example, higher-order or method factors. Note that
##' [reliabilityL2()] should be used to calculate composite
##' reliability of a higher-order factor.
##' @param omit.indicators `character` vector naming any observed variables
##' that should be ignored when calculating composite reliability. This can
##' be useful, for example, to estimate reliability when an indicator is
##' removed.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##'
##' @return Reliability values (coefficient alpha, coefficients omega, average
##' variance extracted) of each factor in each group. If there are multiple
##' factors, a `total` column can optionally be included.
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Yves Rosseel (Ghent University; \email{Yves.Rosseel@@UGent.be})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Bentler, P. M. (1972). A lower-bound method for the dimension-free
##' measurement of internal consistency. *Social Science Research, 1*(4),
##' 343--357. \doi{10.1016/0049-089X(72)90082-8}
##'
##' Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal
##' consistency reliability. *Psychometrika, 74*(1), 137--143.
##' \doi{10.1007/s11336-008-9100-1}
##'
##' Chalmers, R. P. (2018). On misconceptions and the limited usefulness of
##' ordinal alpha. *Educational and Psychological Measurement, 78*(6),
##' 1056--1071. \doi{10.1177/0013164417727036}
##'
##' Cho, E. (2021) Neither Cronbach’s alpha nor McDonald’s omega: A commentary
##' on Sijtsma and Pfadt. *Psychometrika, 86*(4), 877--886.
##' \doi{10.1007/s11336-021-09801-1}
##'
##' Cronbach, L. J. (1951). Coefficient alpha and the internal structure of
##' tests. *Psychometrika, 16*(3), 297--334. \doi{10.1007/BF02310555}
##'
##' Fornell, C., & Larcker, D. F. (1981). Evaluating structural equation models
##' with unobservable variables and measurement errors. *Journal of
##' Marketing Research, 18*(1), 39--50. \doi{10.2307/3151312}
##'
##' Green, S. B., & Yang, Y. (2009). Reliability of summed item scores using
##' structural equation modeling: An alternative to coefficient alpha.
##' *Psychometrika, 74*(1), 155--167. \doi{10.1007/s11336-008-9099-3}
##'
##' McDonald, R. P. (1999). *Test theory: A unified treatment*. Mahwah, NJ:
##' Erlbaum.
##'
##' Raykov, T. (2001). Estimation of congeneric scale reliability using
##' covariance structure analysis with nonlinear constraints *British
##' Journal of Mathematical and Statistical Psychology, 54*(2), 315--323.
##' \doi{10.1348/000711001159582}
##'
##' Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal versions of
##' coefficients alpha and theta for Likert rating scales.
##' *Journal of Modern Applied Statistical Methods, 6*(1), 21--29.
##' \doi{10.22237/jmasm/1177992180}
##'
##' Zumbo, B. D., & Kroc, E. (2019). A measurement is a choice and Stevens’
##' scales of measurement do not help make it: A response to Chalmers.
##' *Educational and Psychological Measurement, 79*(6), 1184--1197.
##' \doi{10.1177/0013164419844305}
##'
##'
##' @examples
##'
##' data(HolzingerSwineford1939)
##' HS9 <- HolzingerSwineford1939[ , c("x7","x8","x9")]
##' HSbinary <- as.data.frame( lapply(HS9, cut, 2, labels=FALSE) )
##' names(HSbinary) <- c("y7","y8","y9")
##' HS <- cbind(HolzingerSwineford1939, HSbinary)
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ y7 + y8 + y9 '
##'
##' fit <- cfa(HS.model, data = HS, ordered = c("y7","y8","y9"), std.lv = TRUE)
##'
##' ## works for factors with exclusively continuous OR categorical indicators
##' reliability(fit)
##'
##' ## reliability for ALL indicators only available when they are
##' ## all continuous or all categorical
##' reliability(fit, omit.factors = "speed", return.total = TRUE)
##'
##'
##' ## loop over visual indicators to calculate alpha if one indicator is removed
##' for (i in paste0("x", 1:3)) {
##' cat("Drop x", i, ":\n")
##' print(reliability(fit, omit.factors = c("textual","speed"),
##' omit.indicators = i, what = "alpha"))
##' }
##'
##'
##' ## works for multigroup models and for multilevel models (and both)
##' data(Demo.twolevel)
##' ## assign clusters to arbitrary groups
##' Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "type1", "type2")
##' model2 <- ' group: type1
##' level: within
##' fac =~ y1 + L2*y2 + L3*y3
##' level: between
##' fac =~ y1 + L2*y2 + L3*y3
##'
##' group: type2
##' level: within
##' fac =~ y1 + L2*y2 + L3*y3
##' level: between
##' fac =~ y1 + L2*y2 + L3*y3
##' '
##' fit2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
##' reliability(fit2, what = c("alpha","omega3"))
##'
##' @name reliability-deprecated
##' @usage
##' reliability(object, what = c("alpha", "omega", "omega2", "omega3", "ave"),
##' return.total = FALSE, dropSingle = TRUE, omit.factors = character(0),
##' omit.indicators = character(0), omit.imps = c("no.conv", "no.se"))
##' @seealso [semTools-deprecated()]
##' @keywords internal
NULL
##' @rdname semTools-deprecated
##' @section Reliability:
##' The original `reliability` function was suboptimally designed.
##' For example, AVE was returned, which is not a reliability index. Also,
##' alpha and several omega-type coefficients were returned, including the
##' original formula that was inappropriate for models with complex structure.
##' Some features could be controlled by the user for one but not both types of
##' index For example, alpha for categorical indicators was returned on both
##' the observed and latent-response scales, but this was not an option for any
##' omega-type indices. The omegas differed in terms of whether the observed or
##' model-implied covariance matrix was used in the denominator, but alpha was
##' only computed using the observed matrix. These inconsistencies have been
##' resolved in the new [compRelSEM()] function, which returns only
##' one reliability index per composite, tailored to the user's requested
##' features, for which there is much more flexibility. The average variance
##' extracted is now available in a dedicated [AVE()] function.
##'
##' @export
reliability <- function(object,
what = c("alpha","omega","omega2","omega3","ave"),
return.total = FALSE, dropSingle = TRUE,
omit.factors = character(0),
omit.indicators = character(0),
omit.imps = c("no.conv","no.se")) {
.Deprecated(msg = c("\nThe reliability() function was deprecated in 2022 and ",
"will cease to be included in future versions of semTools",
". See help('semTools-deprecated) for details.",
"\n\nIt is replaced by the compRelSEM() function, which ",
"can estimate alpha and model-based reliability in an ",
"even wider variety of models and data types, with ",
"greater control in specifying the desired type of ",
"reliability coefficient (i.e., more explicitly choosing ",
"assumptions). \n\nThe average variance extracted should ",
"never have been included because it is not a reliability ",
"coefficient. It is now available from the AVE() function."))
ngroups <- lavInspect(object, "ngroups") #TODO: adapt to multiple levels
nLevels <- lavInspect(object, "nlevels")
nblocks <- ngroups*nLevels #FIXME: always true?
return.total <- rep(return.total, nblocks)
group.label <- if (ngroups > 1L) lavInspect(object, "group.label") else NULL
#FIXME? lavInspect(object, "level.labels")
clus.label <- if (nLevels > 1L) c("within", lavInspect(object, "cluster")) else NULL
if (nblocks > 1L) {
block.label <- paste(rep(group.label, each = nLevels), clus.label,
sep = if (ngroups > 1L && nLevels > 1L) "_" else "")
}
## check for categorical (determines what S will be)
anyCategorical <- lavInspect(object, "categorical")
if (anyCategorical && "alpha" %in% what) {
what <- c(what, "alpha.ord")
what <- unique(what) # in case it was already explicitly requested
}
## categorical-model parameters
threshold <- if (anyCategorical) getThreshold(object, omit.imps = omit.imps) else NULL
latScales <- if (anyCategorical) getScales(object, omit.imps = omit.imps) else NULL
## all other relevant parameters in GLIST format (not flat, need block-level list)
if (inherits(object, "lavaan")) {
param <- lavInspect(object, "est")
ve <- lavInspect(object, "cov.lv") # model-implied latent covariance matrix
S <- object@h1$implied$cov # observed sample covariance matrix (already a list)
if (anyCategorical && any(c("alpha","alpha.ord") %in% what)) {
rawData <- try(lavInspect(object, "data"), silent = TRUE)
if (inherits(rawData, "try-error"))
stop('Error in lavInspect(fit, "data"); what="alpha" unavailable for ',
'models fitted to summary statistics of categorial data.')
if (nblocks == 1L) rawData <- list(rawData)
S.as.con <- lapply(rawData, cov) # for actual "alpha", not "alpha.ord"
}
if (nblocks == 1L) {
param <- list(param)
ve <- list(ve)
}
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
param <- object@coefList[[ useImps[1] ]] # first admissible as template
coefList <- object@coefList[useImps]
phiList <- object@phiList[useImps]
if (anyCategorical) {
dataList <- object@DataList[useImps]
S.as.con <- vector("list", nblocks) # for group-list of pooled S
}
## add block-level list per imputation?
if (nblocks == 1L) {
param <- list(param)
for (i in 1:m) {
coefList[[i]] <- list(coefList[[i]])
phiList[[i]] <- list(phiList[[i]])
}
if (anyCategorical) { #FIXME: currently no categorical ML-SEMs
#dataList[[i]] <- list(dataList[[i]])
VV <- lavNames(object, type = "ov")
impCovList <- lapply(dataList, function(DD) {
dat <- do.call(cbind, sapply(DD[VV], as.numeric, simplify = FALSE))
cov(dat)
})
S.as.con[[1]] <- Reduce("+", impCovList) / length(impCovList)
}
} else if (anyCategorical) { #FIXME: currently no categorical ML-SEMs
## multigroup models need separate data matrices per group
G <- lavInspect(object, "group")
for (g in seq_along(group.label)) {
VV <- try(lavNames(object, type = "ov", group = group.label[g]),
silent = TRUE)
if (inherits(VV, "try-error")) {
VV <- lavNames(object, type = "ov", group = g)
}
impCovList <- lapply(dataList, function(DD) {
RR <- DD[,G] == group.label[g]
dat <- do.call(cbind, sapply(DD[RR, VV], as.numeric, simplify = FALSE))
cov(dat)
})
S.as.con[[g]] <- Reduce("+", impCovList) / length(impCovList)
}
}
S <- vector("list", nblocks) # pooled observed OR polychoric covariance matrix
ve <- vector("list", nblocks)
## loop over blocks
for (b in 1:nblocks) {
## param: loop over GLIST elements
for (mat in names(param[[b]])) {
matList <- lapply(coefList, function(i) i[[b]][[mat]])
param[[b]][[mat]] <- Reduce("+", matList) / length(matList)
} # mat
## pooled observed OR polychoric covariance matrix
covList <- lapply(object@h1List[useImps], function(i) i$implied$cov[[b]])
S[[b]] <- Reduce("+", covList) / m
## pooled model-implied latent covariance matrix
ve[[b]] <- Reduce("+", lapply(phiList, "[[", i = b) ) / m
} # b
}
if (nblocks == 1L) {
SigmaHat <- getMethod("fitted", class(object))(object)["cov"] # retain list format
} else {
SigmaHat <- sapply(getMethod("fitted", class(object))(object),
"[[", "cov", simplify = FALSE)
}
ly <- lapply(param, "[[", "lambda")
te <- lapply(param, "[[", "theta")
beta <- if ("beta" %in% names(param[[1]])) {
lapply(param, "[[", "beta")
} else NULL
result <- list()
warnTotal <- FALSE
warnHigher <- character(0) # collect list of potential higher-order factors
## loop over i blocks (groups/levels)
for (i in 1:nblocks) {
## extract factor and indicator names
allIndNames <- rownames(ly[[i]])
allFacNames <- colnames(ly[[i]])
myFacNames <- setdiff(allFacNames, omit.factors)
subLY <- ly[[i]][ , myFacNames, drop = FALSE] != 0
myIndNames <- rownames(subLY)[apply(subLY, MARGIN = 1L, FUN = any)]
## distinguish between categorical, continuous, and latent indicators
nameArgs <- list(object = object)
if (nblocks > 1L) nameArgs$block <- i
ordNames <- do.call(lavNames, c(nameArgs, list(type = "ov.ord")))
numNames <- do.call(lavNames, c(nameArgs, list(type = "ov.num")))
if (anyCategorical) {
## identify when the (sub)set of factors are all categorical
blockCat <- all(myIndNames %in% ordNames)
## identify when the (sub)set of factors have mixed indicators, so no total
mix <- any(myIndNames %in% ordNames) && any(myIndNames %in% numNames)
} else {
blockCat <- FALSE
mix <- FALSE
}
if (mix && return.total[i]) {
return.total[i] <- FALSE
warnTotal <- TRUE
}
## identify POSSIBLE higher-order factors (that affect other latent vars)
latInds <- do.call(lavNames, c(nameArgs, list(type = "lv.ind")))
higher <- if (length(latInds) == 0L) character(0) else {
allFacNames[apply(beta[[i]], MARGIN = 2, function(x) any(x != 0))]
}
## keep track of factor indices to skip
idx.drop <- numeric(0)
## relevant quantities
common <- (apply(ly[[i]], 2, sum)^2) * diag(ve[[i]])
truevar <- ly[[i]] %*% ve[[i]] %*% t(ly[[i]])
## vectors to store results for each factor
error <- rep(NA, length(common))
alpha <- rep(NA, length(common))
alpha.ord <- rep(NA, length(common))
total <- rep(NA, length(common))
omega1 <- omega2 <- omega3 <- rep(NA, length(common))
impliedTotal <- rep(NA, length(common))
avevar <- rep(NA, length(common))
warnOmega <- FALSE
## loop over j factors
for (j in 1:length(common)) {
## skip this factor?
if (allFacNames[j] %in% omit.factors) {
idx.drop <- c(idx.drop, j)
next
}
index <- setdiff(which(ly[[i]][,j] != 0), # nonzero loadings
which(allIndNames %in% omit.indicators))
jIndNames <- allIndNames[index]
## identify when this factor has mixed indicators, so no omegas
jMix <- any(jIndNames %in% ordNames) && any(jIndNames %in% numNames)
## check for ANY indicators (possibly skip purely higher-order factors)
if (length(index) == 0L) {
idx.drop <- c(idx.drop, j)
next
}
## check for single indicators
if (dropSingle && length(index) == 1L) {
idx.drop <- c(idx.drop, j)
next
}
## check for categorical (or mixed) indicators
jCat <- any(jIndNames %in% ordNames)
warnOmega <- jCat && !all(jIndNames %in% ordNames)
## check for latent indicators
if (allFacNames[j] %in% higher && !(allFacNames[j] %in% omit.factors)) {
warnHigher <- c(warnHigher, allFacNames[j])
}
sigma <- S[[i]][index, index, drop = FALSE]
faccontrib <- ly[[i]][,j, drop = FALSE] %*% ve[[i]][j,j, drop = FALSE] %*% t(ly[[i]][,j, drop = FALSE])
truefac <- diag(faccontrib[index, index, drop = FALSE])
trueitem <- diag(truevar[index, index, drop = FALSE])
erritem <- diag(te[[i]][index, index, drop = FALSE])
if (sum(abs(trueitem - truefac)) < 0.00001 & "ave" %in% what) {
avevar[j] <- sum(trueitem) / sum(trueitem + erritem)
}
if (jCat) {
if ("alpha" %in% what) {
alpha[j] <- computeAlpha(S.as.con[[i]][index, index, drop = FALSE])
}
if ("alpha.ord" %in% what) {
alpha.ord[j] <- computeAlpha(sigma)
}
if ("omega" %in% what) {
omega1[j] <- omegaCat(truevar = faccontrib[index, index, drop = FALSE],
threshold = threshold[[i]][jIndNames],
scales = latScales[[i]][index],
denom = faccontrib[index, index, drop = FALSE] + te[[i]][index, index, drop = FALSE])
}
if ("omega2" %in% what) {
omega2[j] <- omegaCat(truevar = faccontrib[index, index, drop = FALSE],
threshold = threshold[[i]][jIndNames],
scales = latScales[[i]][index],
denom = SigmaHat[[i]][index, index, drop = FALSE])
}
if ("omega3" %in% what) {
omega3[j] <- omegaCat(truevar = faccontrib[index, index, drop = FALSE],
threshold = threshold[[i]][jIndNames],
scales = latScales[[i]][index],
denom = sigma)
}
} else {
alpha[j] <- computeAlpha(sigma)
commonfac <- sum(faccontrib[index, index, drop = FALSE])
error[j] <- sum(te[[i]][index, index, drop = FALSE])
impliedTotal[j] <- sum(SigmaHat[[i]][index, index, drop = FALSE])
total[j] <- sum(sigma)
omega1[j] <- commonfac / (commonfac + error[j])
omega2[j] <- commonfac / impliedTotal[j]
omega3[j] <- commonfac / total[j]
}
## end loop over j factors
}
if (return.total[i] & length(myFacNames) > 1L) {
if (blockCat) {
if ("alpha" %in% what) {
alpha <- c(alpha, computeAlpha(S.as.con[[i]]))
}
if ("alpha.ord" %in% what) {
alpha.ord <- c(alpha.ord, total = computeAlpha(S[[i]]))
}
if ("omega" %in% what) {
omega1 <- c(omega1, total = omegaCat(truevar = truevar,
threshold = threshold[[i]],
scales = latScales[[i]],
denom = truevar + te[[i]]))
}
if ("omega2" %in% what) {
omega2 <- c(omega2, total = omegaCat(truevar = truevar,
threshold = threshold[[i]],
scales = latScales[[i]],
denom = SigmaHat[[i]]))
}
if ("omega2" %in% what) {
omega3 <- c(omega3, total = omegaCat(truevar = truevar,
threshold = threshold[[i]],
scales = latScales[[i]],
denom = S[[i]]))
}
} else {
alpha <- c(alpha, total = computeAlpha(S[[i]]))
omega1 <- c(omega1, total = sum(truevar) / (sum(truevar) + sum(te[[i]])))
omega2 <- c(omega2, total = sum(truevar) / (sum(SigmaHat[[i]])))
omega3 <- c(omega3, total = sum(truevar) / (sum(S[[i]])))
}
avevar <- c(avevar,
total = sum(diag(truevar)) / sum((diag(truevar) + diag(te[[i]]))))
}
if (all(is.na(alpha.ord))) alpha.ord <- NULL
result[[i]] <- rbind(alpha = if ("alpha" %in% what) alpha else NULL,
alpha.ord = if ("alpha.ord" %in% what) alpha.ord else NULL,
omega = if ("omega" %in% what) omega1 else NULL,
omega2 = if ("omega2" %in% what) omega2 else NULL,
omega3 = if ("omega3" %in% what) omega3 else NULL,
avevar = if ("ave" %in% what) avevar else NULL)
colnames(result[[i]])[1:length(allFacNames)] <- allFacNames
if (return.total[i] & length(myFacNames) > 1L) {
colnames(result[[i]])[ ncol(result[[i]]) ] <- "total"
}
if (length(idx.drop)) {
result[[i]] <- result[[i]][ , -idx.drop, drop = FALSE]
## reset indices for next block (could have different model/variables)
idx.drop <- numeric(0)
}
## end loop over blocks
}
warnCat <- sapply(result, function(x) any(c("alpha.ord","ave") %in% rownames(x)))
if (any(warnCat)) {
alphaMessage <- paste0('Zumbo et al.`s (2007) "ordinal alpha" is calculated',
' in addition to the standard alpha, which treats ',
'ordinal variables as numeric. See Chalmers (2018) ',
'for a critique of "alpha.ord" and the response by ',
'Zumbo & Kroc (2019).')
AVEmessage <- paste0('average variance extracted is calculated from ',
'polychoric (polyserial) not Pearson correlations.')
both <- "alpha.ord" %in% what & "ave" %in% what
connectMessage <- if (both) ' Likewise, ' else ' the '
catMessage <- paste0("For constructs with categorical indicators, ",
if ("alpha.ord" %in% what) alphaMessage else NULL,
if (both) ' Likewise, ' else NULL,
if ("ave" %in% what) AVEmessage else NULL)
if ("alpha.ord" %in% what || "ave" %in% what) message(catMessage, "\n")
}
if (length(warnHigher)) warning('Possible higher-order factors detected:\n',
paste(unique(warnHigher), sep = ", "))
if (warnTotal) {
message('Cannot return.total when model contains both continuous and ',
'binary/ordinal observed indicators. Use the ',
'omit.factors= argument to choose factors with only categorical ',
'indicators, if that is a composite of interest.\n')
}
if (warnOmega) {
message('Composite reliability (omega) cannot be computed for factors ',
'with mixed categorical and continuous indicators.')
}
## drop list structure?
if (nblocks == 1L) {
result <- result[[1]]
} else names(result) <- block.label
result
}
## ---------------
## reliabilityL2()
## (deprecated 10 May 2022)
## ---------------
##' Calculate the reliability values of a second-order factor
##'
##' Calculate the reliability values (coefficient omega) of a second-order
##' factor
##'
##' The first formula of the coefficient omega (in the
##' [reliability()]) will be mainly used in the calculation. The
##' model-implied covariance matrix of a second-order factor model can be
##' separated into three sources: the second-order common-factor variance,
##' the residual variance of the first-order common factors (i.e., not
##' accounted for by the second-order factor), and the measurement error of
##' observed indicators:
##'
##' \deqn{ \hat{\Sigma} = \Lambda \bold{B} \Phi_2 \bold{B}^{\prime}
##' \Lambda^{\prime} + \Lambda \Psi_{u} \Lambda^{\prime} + \Theta, }
##'
##' where \eqn{\hat{\Sigma}} is the model-implied covariance matrix,
##' \eqn{\Lambda} contains first-order factor loadings, \eqn{\bold{B}} contains
##' second-order factor loadings, \eqn{\Phi_2} is the covariance matrix of the
##' second-order factor(s), \eqn{\Psi_{u}} is the covariance matrix of residuals
##' from first-order factors, and \eqn{\Theta} is the covariance matrix of the
##' measurement errors from observed indicators. Thus, we can calculate the
##' proportion of variance of a composite score calculated from the observed
##' indicators (e.g., a total score or scale mean) that is attributable to the
##' second-order factor, i.e. coefficient omega at Level 1:
##'
##' \deqn{ \omega_{L1} = \frac{\bold{1}^{\prime} \Lambda \bold{B} \Phi_2
##' \bold{B}^{\prime} \Lambda^{\prime} \bold{1}}{\bold{1}^{\prime} \Lambda
##' \bold{B} \Phi_2 \bold{B} ^{\prime} \Lambda^{\prime} \bold{1} +
##' \bold{1}^{\prime} \Lambda \Psi_{u} \Lambda^{\prime} \bold{1} +
##' \bold{1}^{\prime} \Theta \bold{1}}, }
##'
##' where \eqn{\bold{1}} is the *k*-dimensional vector of 1 and *k* is
##' the number of observed variables.
##'
##' The model-implied covariance matrix among first-order factors (\eqn{\Phi_1})
##' can be calculated as:
##'
##' \deqn{ \Phi_1 = \bold{B} \Phi_2 \bold{B}^{\prime} + \Psi_{u}, }
##'
##' Thus, the proportion of variance among first-order common factors that is
##' attributable to the second-order factor (i.e., coefficient omega at Level 2)
##' can be calculated as:
##'
##' \deqn{ \omega_{L2} = \frac{\bold{1}_F^{\prime} \bold{B} \Phi_2
##' \bold{B}^{\prime} \bold{1}_F}{\bold{1}_F^{\prime} \bold{B} \Phi_2
##' \bold{B}^{\prime} \bold{1}_F + \bold{1}_F^{\prime} \Psi_{u} \bold{1}_F}, }
##'
##' where \eqn{\bold{1}_F} is the *F*-dimensional vector of 1 and *F*
##' is the number of first-order factors. This Level-2 omega can be interpreted
##' as an estimate of the reliability of a hypothetical composite calculated
##' from error-free observable variables representing the first-order common
##' factors. This might only be meaningful as a thought experiment.
##'
##' An additional thought experiment is possible: If the observed indicators
##' contained only the second-order common-factor variance and unsystematic
##' measurement error, then there would be no first-order common factors because
##' their unique variances would be excluded from the observed measures. An
##' estimate of this hypothetical composite reliability can be calculated as the
##' partial coefficient omega at Level 1, or the proportion of observed
##' variance explained by the second-order factor after partialling out the
##' uniqueness from the first-order factors:
##'
##' \deqn{ \omega_{L1} = \frac{\bold{1}^{\prime} \Lambda \bold{B} \Phi_2
##' \bold{B}^{\prime} \Lambda^{\prime} \bold{1}}{\bold{1}^{\prime} \Lambda
##' \bold{B} \Phi_2 \bold{B}^{\prime} \Lambda^{\prime} \bold{1} +
##' \bold{1}^{\prime} \Theta \bold{1}}, }
##'
##' Note that if the second-order factor has a direct factor loading on some
##' observed variables, the observed variables will be counted as first-order
##' factors, which might not be desirable.
##'
##'
##' @importFrom lavaan lavInspect
##'
##' @param object A [lavaan::lavaan-class] or [lavaan.mi::lavaan.mi-class] object,
##' expected to contain a least one exogenous higher-order common factor.
##' @param secondFactor The name of a single second-order factor in the
##' model fitted in `object`. The function must be called multiple
##' times to estimate reliability for each higher-order factor.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##'
##' @return Reliability values at Levels 1 and 2 of the second-order factor, as
##' well as the partial reliability value at Level 1
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @examples
##'
##' HS.model3 <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' higher =~ visual + textual + speed'
##'
##' fit6 <- cfa(HS.model3, data = HolzingerSwineford1939)
##' reliability(fit6) # Should provide a warning for the endogenous variables
##' reliabilityL2(fit6, "higher")
##'
##' @name reliabilityL2-deprecated
##' @usage
##' reliabilityL2(object, secondFactor, omit.imps = c("no.conv", "no.se"))
##' @seealso [semTools-deprecated()]
##' @keywords internal
NULL
##' @rdname semTools-deprecated
##' @section Higher-Order Reliability:
##' Originally, composite reliability of a single higher-order factor was
##' estimated in a separate function: `reliabilityL2`. It is now available
##' for multiple higher-order factors in the same model, and from the same
##' [compRelSEM()] function that estimates reliability for first-order
##' factors, using the `higher=` argument to name higher-order factors in
##' the fitted model.
##'
##' @export
reliabilityL2 <- function(object, secondFactor,
omit.imps = c("no.conv","no.se")) {
.Deprecated(msg = c("\nThe reliabilityL2() function was deprecated in 2022 and ",
"will cease to be included in future versions of semTools",
". See help('semTools-deprecated) for details.",
"\n\nIt is replaced by the compRelSEM() function, which ",
"can estimate alpha and model-based reliability in an ",
"even wider variety of models and data types, with ",
"greater control in specifying the desired type of ",
"reliability coefficient (i.e., more explicitly choosing ",
"assumptions)."))
secondFactor <- as.character(secondFactor)[1] # only one at a time
ngroups <- lavInspect(object, "ngroups") #TODO: adapt to multiple levels
nLevels <- lavInspect(object, "nlevels")
nblocks <- ngroups*nLevels #FIXME: always true?
group.label <- if (ngroups > 1L) lavInspect(object, "group.label") else NULL
#FIXME? lavInspect(object, "level.labels")
clus.label <- if (nLevels > 1L) c("within", lavInspect(object, "cluster")) else NULL
if (nblocks > 1L) {
block.label <- paste(rep(group.label, each = nLevels), clus.label,
sep = if (ngroups > 1L && nLevels > 1L) "_" else "")
}
## parameters in GLIST format (not flat, need block-level list)
if (inherits(object, "lavaan")) {
param <- lavInspect(object, "est")
ve <- lavInspect(object, "cov.lv") # model-implied latent covariance matrix
S <- object@h1$implied$cov # observed sample covariance matrix (already a list)
if (nblocks == 1L) {
param <- list(param)
ve <- list(ve)
}
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
param <- object@coefList[[ useImps[1] ]] # first admissible as template
coefList <- object@coefList[useImps]
phiList <- object@phiList[useImps]
## add block-level list per imputation?
if (nblocks == 1L) {
param <- list(param)
for (i in 1:m) {
coefList[[i]] <- list(coefList[[i]])
phiList[[i]] <- list(phiList[[i]])
}
}
S <- vector("list", nblocks) # pooled observed covariance matrix
ve <- vector("list", nblocks)
## loop over blocks
for (b in 1:nblocks) {
## param: loop over GLIST elements
for (mat in names(param[[b]])) {
matList <- lapply(coefList, function(i) i[[b]][[mat]])
param[[b]][[mat]] <- Reduce("+", matList) / length(matList)
} # mat
## pooled observed covariance matrix
covList <- lapply(object@h1List[useImps], function(i) i$implied$cov[[b]])
S[[b]] <- Reduce("+", covList) / m
## pooled model-implied latent covariance matrix
ve[[b]] <- Reduce("+", lapply(phiList, "[[", i = b) ) / m
} # b
}
if (nblocks == 1L) {
SigmaHat <- getMethod("fitted", class(object))(object)["cov"] # retain list format
} else {
SigmaHat <- sapply(getMethod("fitted", class(object))(object),
"[[", "cov", simplify = FALSE)
}
ly <- lapply(param, "[[", "lambda")
te <- lapply(param, "[[", "theta")
ps <- lapply(param, "[[", "psi")
be <- lapply(param, "[[", "beta")
result <- list()
for (i in 1:nblocks) {
# Prepare for higher-order reliability
l2var <- ve[[i]][secondFactor, secondFactor, drop = FALSE]
l2load <- be[[1]][,secondFactor]
indexl2 <- which(l2load != 0)
commonl2 <- (sum(l2load)^2) * l2var
errorl2 <- sum(ps[[i]][indexl2, indexl2, drop = FALSE])
# Prepare for lower-order reliability
indexl1 <- which(apply(ly[[i]][,indexl2, drop = FALSE], 1, function(x) sum(x != 0)) > 0)
l1load <- ly[[i]][,indexl2] %*% as.matrix(be[[1]][indexl2, secondFactor, drop = FALSE])
commonl1 <- (sum(l1load)^2) * l2var
errorl1 <- sum(te[[i]][indexl1, indexl1, drop = FALSE])
uniquel1 <- 0
for (j in seq_along(indexl2)) {
uniquel1 <- uniquel1 + (sum(ly[[i]][,indexl2[j]])^2) * ps[[i]][indexl2[j], indexl2[j], drop = FALSE]
}
# Adjustment for direct loading from L2 to observed variables
if (any(ly[[i]][,secondFactor] != 0)) {
indexind <- which(ly[[i]][,secondFactor] != 0)
if (length(intersect(indexind, indexl1)) > 0)
stop("Direct and indirect loadings of higher-order factor to observed",
" variables are specified at the same time.")
commonl2 <- sum(c(ly[[i]][,secondFactor], l2load))^2 * l2var
errorl2 <- errorl2 + sum(te[[i]][indexind, indexind, drop = FALSE])
commonl1 <- sum(c(ly[[i]][,secondFactor], l1load))^2 * l2var
errorl1 <- errorl1 + sum(te[[i]][indexind, indexind, drop = FALSE])
}
# Calculate Reliability
omegaL1 <- commonl1 / (commonl1 + uniquel1 + errorl1)
omegaL2 <- commonl2 / (commonl2 + errorl2)
partialOmegaL1 <- commonl1 / (commonl1 + errorl1)
result[[i]] <- c(omegaL1 = omegaL1, omegaL2 = omegaL2, partialOmegaL1 = partialOmegaL1)
}
if (nblocks == 1L) {
result <- result[[1]]
} else names(result) <- block.label
result
}
## --------------
## maximalRelia()
## --------------
##' Calculate maximal reliability
##'
##' Calculate maximal reliability of a scale
##'
##' Given that a composite score (\eqn{W}) is a weighted sum of item scores:
##'
##' \deqn{ W = \bold{w}^\prime \bold{x} ,}
##'
##' where \eqn{\bold{x}} is a \eqn{k \times 1} vector of the scores of each
##' item, \eqn{\bold{w}} is a \eqn{k \times 1} weight vector of each item, and
##' \eqn{k} represents the number of items. Then, maximal reliability is
##' obtained by finding \eqn{\bold{w}} such that reliability attains its maximum
##' (Li, 1997; Raykov, 2012). Note that the reliability can be obtained by
##'
##' \deqn{ \rho = \frac{\bold{w}^\prime \bold{S}_T \bold{w}}{\bold{w}^\prime
##' \bold{S}_X \bold{w}}}
##'
##' where \eqn{\bold{S}_T} is the covariance matrix explained by true scores and
##' \eqn{\bold{S}_X} is the observed covariance matrix. Numerical method is used
##' to find \eqn{\bold{w}} in this function.
##'
##' For continuous items, \eqn{\bold{S}_T} can be calculated by
##'
##' \deqn{ \bold{S}_T = \Lambda \Psi \Lambda^\prime,}
##'
##' where \eqn{\Lambda} is the factor loading matrix and \eqn{\Psi} is the
##' covariance matrix among factors. \eqn{\bold{S}_X} is directly obtained by
##' covariance among items.
##'
##' For categorical items, Green and Yang's (2009) method is used for
##' calculating \eqn{\bold{S}_T} and \eqn{\bold{S}_X}. The element \eqn{i} and
##' \eqn{j} of \eqn{\bold{S}_T} can be calculated by
##'
##' \deqn{ \left[\bold{S}_T\right]_{ij} = \sum^{C_i - 1}_{c_i = 1} \sum^{C_j -
##' 1}_{c_j - 1} \Phi_2\left( \tau_{x_{c_i}}, \tau_{x_{c_j}}, \left[ \Lambda
##' \Psi \Lambda^\prime \right]_{ij} \right) - \sum^{C_i - 1}_{c_i = 1}
##' \Phi_1(\tau_{x_{c_i}}) \sum^{C_j - 1}_{c_j - 1} \Phi_1(\tau_{x_{c_j}}),}
##'
##' where \eqn{C_i} and \eqn{C_j} represents the number of thresholds in Items
##' \eqn{i} and \eqn{j}, \eqn{\tau_{x_{c_i}}} represents the threshold \eqn{c_i}
##' of Item \eqn{i}, \eqn{\tau_{x_{c_j}}} represents the threshold \eqn{c_i} of
##' Item \eqn{j}, \eqn{ \Phi_1(\tau_{x_{c_i}})} is the cumulative probability of
##' \eqn{\tau_{x_{c_i}}} given a univariate standard normal cumulative
##' distribution and \eqn{\Phi_2\left( \tau_{x_{c_i}}, \tau_{x_{c_j}}, \rho
##' \right)} is the joint cumulative probability of \eqn{\tau_{x_{c_i}}} and
##' \eqn{\tau_{x_{c_j}}} given a bivariate standard normal cumulative
##' distribution with a correlation of \eqn{\rho}
##'
##' Each element of \eqn{\bold{S}_X} can be calculated by
##'
##' \deqn{ \left[\bold{S}_T\right]_{ij} = \sum^{C_i - 1}_{c_i = 1} \sum^{C_j -
##' 1}_{c_j - 1} \Phi_2\left( \tau_{V_{c_i}}, \tau_{V_{c_j}}, \rho^*_{ij}
##' \right) - \sum^{C_i - 1}_{c_i = 1} \Phi_1(\tau_{V_{c_i}}) \sum^{C_j -
##' 1}_{c_j - 1} \Phi_1(\tau_{V_{c_j}}),}
##'
##' where \eqn{\rho^*_{ij}} is a polychoric correlation between Items \eqn{i}
##' and \eqn{j}.
##'
##'
##' @importFrom lavaan lavInspect lavNames
##'
##' @param object A [lavaan::lavaan-class] or [lavaan.mi::lavaan.mi-class] object,
##' expected to contain only exogenous common factors (i.e., a CFA model).
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##'
##' @return Maximal reliability values of each group. The maximal-reliability
##' weights are also provided. Users may extracted the weighted by the
##' `attr` function (see example below).
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso [reliability()] for reliability of an unweighted
##' composite score
##'
##' @references
##' Li, H. (1997). A unifying expression for the maximal reliability of a linear
##' composite. *Psychometrika, 62*(2), 245--249. \doi{10.1007/BF02295278}
##'
##' Raykov, T. (2012). Scale construction and development using structural
##' equation modeling. In R. H. Hoyle (Ed.), *Handbook of structural
##' equation modeling* (pp. 472--494). New York, NY: Guilford.
##'
##' @examples
##'
##' total <- 'f =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
##' fit <- cfa(total, data = HolzingerSwineford1939)
##' maximalRelia(fit)
##'
##' # Extract the weight
##' mr <- maximalRelia(fit)
##' attr(mr, "weight")
##'
##' @export
maximalRelia <- function(object, omit.imps = c("no.conv","no.se")) {
ngroups <- lavInspect(object, "ngroups") #TODO: adapt to multiple levels
nLevels <- lavInspect(object, "nlevels")
nblocks <- ngroups*nLevels #FIXME: always true?
group.label <- if (ngroups > 1L) lavInspect(object, "group.label") else NULL
#FIXME? lavInspect(object, "level.labels")
clus.label <- if (nLevels > 1L) c("within", lavInspect(object, "cluster")) else NULL
if (nblocks > 1L) {
block.label <- paste(rep(group.label, each = nLevels), clus.label,
sep = if (ngroups > 1L && nLevels > 1L) "_" else "")
}
## parameters in GLIST format (not flat, need block-level list)
if (inherits(object, "lavaan")) {
param <- lavInspect(object, "est")
ve <- lavInspect(object, "cov.lv") # model-implied latent covariance matrix
S <- object@h1$implied$cov # observed sample covariance matrix (already a list)
if (nblocks == 1L) {
param <- list(param)
ve <- list(ve)
}
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
param <- object@coefList[[ useImps[1] ]] # first admissible as template
coefList <- object@coefList[useImps]
phiList <- object@phiList[useImps]
## add block-level list per imputation?
if (nblocks == 1L) {
param <- list(param)
for (i in 1:m) {
coefList[[i]] <- list(coefList[[i]])
phiList[[i]] <- list(phiList[[i]])
}
}
S <- vector("list", nblocks) # pooled observed covariance matrix
ve <- vector("list", nblocks)
## loop over blocks
for (b in 1:nblocks) {
## param: loop over GLIST elements
for (mat in names(param[[b]])) {
matList <- lapply(coefList, function(i) i[[b]][[mat]])
param[[b]][[mat]] <- Reduce("+", matList) / length(matList)
} # mat
## pooled observed covariance matrix
covList <- lapply(object@h1List[useImps], function(i) i$implied$cov[[b]])
S[[b]] <- Reduce("+", covList) / m
## pooled model-implied latent covariance matrix
ve[[b]] <- Reduce("+", lapply(phiList, "[[", i = b) ) / m
} # b
}
if (nblocks == 1L) {
SigmaHat <- getMethod("fitted", class(object))(object)["cov"] # retain list format
} else {
SigmaHat <- sapply(getMethod("fitted", class(object))(object),
"[[", "cov", simplify = FALSE)
}
ly <- lapply(param, "[[", "lambda")
te <- lapply(param, "[[", "theta")
categorical <- lavInspect(object, "categorical")
threshold <- if (categorical) getThreshold(object, omit.imps = omit.imps) else NULL
result <- list()
for (i in 1:nblocks) {
truevar <- ly[[i]] %*% ve[[i]] %*% t(ly[[i]])
varnames <- colnames(truevar)
if (categorical) {
invstdvar <- 1 / sqrt(diag(SigmaHat[[i]]))
polyr <- diag(invstdvar) %*% truevar %*% diag(invstdvar)
nitem <- ncol(SigmaHat[[i]])
result[[i]] <- calcMaximalReliaCat(polyr, threshold[[i]], S[[i]], nitem, varnames)
} else {
result[[i]] <- calcMaximalRelia(truevar, S[[i]], varnames)
}
}
if (nblocks == 1L) {
result <- result[[1]]
} else names(result) <- block.label
result
}
## ----------------
## Hidden Functions
## ----------------
computeAlpha <- function(S, W = NULL) {
k <- nrow(S)
if (is.null(W)) {
## Traditional formula
ALPHA <- k/(k - 1) * (1 - sum(diag(S)) / sum(S))
} else {
stopifnot(length(W) == nrow(S))
#TODO? Develop theory for unequal weights.
# For now, force equal weights (when nonzero).
wt <- W != 0
k <- sum(wt)
#FIXME? Should k = sum(wt) even with unequal weights?
DIAG <- t(wt) %*% diag(diag(S)) %*% wt
ALL <- t(wt) %*% S %*% wt
ALPHA <- k/(k - 1) * (1 - DIAG / ALL)[1,1]
}
ALPHA
}
##' @importFrom stats cov2cor pnorm
omegaCat <- function(truevar, threshold, scales, denom, wt = 1) {
#TODO: How to incorporate varying distances between category weights
# (e.g., 0 = never, 0.5 = <1, 1.5 = 1-2, 4 = 3-5, 6 = >5)
#TODO: How to adapt for composites of continuous & categorical items?
## must be in standardized latent scale
R <- diag(scales) %*% truevar %*% diag(scales)
## denom could be model-implied polychoric correlation assuming diagonal theta,
## model-implied polychoric correlation accounting for error covariances,
## or "observed" polychoric correlation matrix.
## If parameterization="theta", standardize the polychoric coVARIANCE matrix
denom <- cov2cor(denom)
nitem <- ncol(denom)
if (length(wt) == 1L) {
wt <- rep(wt, nitem)
}
## initialize sums of cumulative probabilities
sumnum <- 0 # numerator
addden <- 0 # denominator
## loop over all pairs of items
for (j in 1:nitem) {
for (jp in 1:nitem) {
## initialize sums of cumulative probabilities *per item*
sumprobn2 <- 0
addprobn2 <- 0
## for each pair of items, loop over all their thresholds
t1 <- threshold[[j ]] * scales[j ] # on standardized latent scale
t2 <- threshold[[jp]] * scales[jp] #FIXME? subtract intercept (or marginal mean?)
for (c in 1:length(t1)) {
for (cp in 1:length(t2)) {
sumprobn2 <- sumprobn2 + p2(t1[c], t2[cp], R[j, jp])
addprobn2 <- addprobn2 + p2(t1[c], t2[cp], denom[j, jp])
}
}
sumprobn1 <- sum(pnorm(t1))
sumprobn1p <- sum(pnorm(t2))
## Add item weights (see Lu et al., 2020, doi:10.1037/met0000287)
sumnum <- sumnum + wt[j] * wt[jp] * (sumprobn2 - sumprobn1 * sumprobn1p)
addden <- addden + wt[j] * wt[jp] * (addprobn2 - sumprobn1 * sumprobn1p)
}
}
reliab <- sumnum / addden
reliab
}
p2 <- function(t1, t2, r) {
mnormt::pmnorm(c(t1, t2), c(0,0), matrix(c(1, r, r, 1), 2, 2))
}
# polycorLavaan <- function(object) {
# ngroups <- lavInspect(object, "ngroups")
# coef <- lavInspect(object, "est")
# targettaunames <- NULL
# if (ngroups == 1L) {
# targettaunames <- rownames(coef$tau)
# } else {
# targettaunames <- rownames(coef[[1]]$tau)
# }
# barpos <- sapply(strsplit(targettaunames, ""), function(x) which(x == "|"))
# varnames <- unique(apply(data.frame(targettaunames, barpos - 1), MARGIN = 1,
# FUN = function(x) substr(x[1], 1, x[2])))
# if (length(varnames))
# script <- ""
# for (i in 2:length(varnames)) {
# temp <- paste0(varnames[1:(i - 1)], collapse = " + ")
# temp <- paste0(varnames[i], "~~", temp, "\n")
# script <- paste(script, temp)
# }
# newobject <- refit(script, object)
# if (ngroups == 1L) {
# return(lavInspect(newobject, "est")$theta)
# }
# lapply(lavInspect(newobject, "est"), "[[", "theta")
# }
##' @importFrom lavaan lavInspect lavNames
getThreshold <- function(object, omit.imps = c("no.conv","no.se")) {
ngroups <- lavInspect(object, "ngroups") #TODO: add nlevels when capable
ordnames <- lavNames(object, "ov.ord")
if (inherits(object, "lavaan")) {
EST <- lavInspect(object, "est")
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
EST <- object@coefList[useImps]
}
if (ngroups == 1L) {
if (inherits(object, "lavaan")) {
thresholds <- EST$tau[,"threshold"]
} else if (inherits(object, "lavaan.mi")) {
tauList <- lapply(EST, function(x) x$tau[,"threshold"])
thresholds <- Reduce("+", tauList) / length(tauList)
}
result <- lapply(ordnames,
function(nn) thresholds[grepl(paste0(nn, "\\|"), names(thresholds))])
names(result) <- ordnames
## needs to be within a list when called above within block-loops
result <- list(result)
} else {
thresholds <- vector("list", ngroups)
for (g in 1:ngroups) {
if (inherits(object, "lavaan")) {
thresholds[[g]] <- EST[[g]]$tau[,"threshold"]
} else if (inherits(object, "lavaan.mi")) {
tauList <- lapply(EST, function(x) x[[g]]$tau[,"threshold"])
thresholds[[g]] <- Reduce("+", tauList) / length(tauList)
}
}
result <- list()
group.label <- lavInspect(object, "group.label")
for (g in 1:ngroups) {
result[[ group.label[g] ]] <- lapply(ordnames, function(nn) {
thresholds[[g]][ grepl(paste0(nn, "\\|"), names(thresholds[[g]])) ]
})
names(result[[ group.label[g] ]]) <- ordnames
}
}
return(result)
}
##' @importFrom lavaan lavInspect lavNames
getScales <- function(object, omit.imps = c("no.conv","no.se")) {
ngroups <- lavInspect(object, "ngroups") #TODO: add nlevels when capable
ordnames <- lavNames(object, "ov.ord") #TODO: use to allow mix of cat/con vars
if (inherits(object, "lavaan")) {
EST <- lavInspect(object, "est")
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
EST <- object@coefList[useImps]
}
if (ngroups == 1L) {
if (inherits(object, "lavaan")) {
result <- list(EST$delta[,"scales"])
} else if (inherits(object, "lavaan.mi")) {
scales <- lapply(EST, function(x) x$delta[,"scales"])
result <- list(Reduce("+", scales) / length(scales))
}
} else {
result <- vector("list", ngroups)
for (g in 1:ngroups) {
if (inherits(object, "lavaan")) {
result[[g]] <- EST[[g]]$delta[,"scales"]
} else if (inherits(object, "lavaan.mi")) {
scales <- lapply(EST, function(x) x[[g]]$delta[,"scales"])
result[[g]] <- Reduce("+", scales) / length(scales)
}
}
}
return(result)
}
invGeneralRelia <- function(w, truevar, totalvar) {
1 - (t(w) %*% truevar %*% w) / (t(w) %*% totalvar %*% w)
}
#' @importFrom stats pnorm
invGeneralReliaCat <- function(w, polyr, threshold, denom, nitem) {
# denom could be polychoric correlation, model-implied correlation, or model-implied without error correlation
upper <- matrix(NA, nitem, nitem)
lower <- matrix(NA, nitem, nitem)
for (j in 1:nitem) {
for (jp in 1:nitem) {
sumprobn2 <- 0
addprobn2 <- 0
t1 <- threshold[[j]]
t2 <- threshold[[jp]]
for (c in 1:length(t1)) {
for (cp in 1:length(t2)) {
sumprobn2 <- sumprobn2 + p2(t1[c], t2[cp], polyr[j, jp])
addprobn2 <- addprobn2 + p2(t1[c], t2[cp], denom[j, jp])
}
}
sumprobn1 <- sum(pnorm(t1))
sumprobn1p <- sum(pnorm(t2))
upper[j, jp] <- (sumprobn2 - sumprobn1 * sumprobn1p)
lower[j, jp] <- (addprobn2 - sumprobn1 * sumprobn1p)
}
}
1 - (t(w) %*% upper %*% w) / (t(w) %*% lower %*% w)
}
#' @importFrom stats nlminb
calcMaximalRelia <- function(truevar, totalvar, varnames) {
start <- rep(1, nrow(truevar))
out <- nlminb(start, invGeneralRelia, truevar = truevar, totalvar = totalvar)
if (out$convergence != 0) stop("The numerical method for finding the maximal",
" reliability did not converge.")
result <- 1 - out$objective
weight <- out$par / mean(out$par)
names(weight) <- varnames
attr(result, "weight") <- weight
result
}
#' @importFrom stats nlminb
calcMaximalReliaCat <- function(polyr, threshold, denom, nitem, varnames) {
start <- rep(1, nrow(polyr))
out <- nlminb(start, invGeneralReliaCat, polyr = polyr, threshold = threshold,
denom = denom, nitem = nitem)
if (out$convergence != 0) stop("The numerical method for finding the maximal",
" reliability did not converge.")
result <- 1 - out$objective
weight <- out$par / mean(out$par)
names(weight) <- varnames
attr(result, "weight") <- weight
result
}
semTools/R/clipboard.R 0000644 0001762 0000144 00000032564 15142325143 014354 0 ustar ligges users ### Sunthud Pornprasertmanit & Terrence D. Jorgensen
### Last updated: 12 March 2025
### Copy or save each aspect of the lavaan object into a clipboard or a file
##' Copy or save the result of `lavaan` or `FitDiff` objects into a
##' clipboard or a file
##'
##' Copy or save the result of `lavaan` or [FitDiff-class]
##' object into a clipboard or a file. From the clipboard, users may paste the
##' result into the Microsoft Excel or spreadsheet application to create a table
##' of the output.
##'
##'
##' @aliases clipboard saveFile
##'
##' @param object An object of class [lavaan::lavaan-class] or
##' [FitDiff-class].
##' @param what The attributes of the `lavaan` object to be copied in the
##' clipboard. `"summary"` is to copy the screen provided from the
##' `summary` function. `"epceqfit"` is to copy the result from the
##' [epcEquivFit()] function. Other attributes listed in the
##' `inspect` method in the [lavaan::lavaan-class] could also be
##' used, such as `"coef"`, `"se"`, `"fit"`, `"samp"`, and
##' so on. Ignored for [FitDiff-class]-class objects.
##' @param file A file name used for saving the result.
##' @param tableFormat If `TRUE`, save the result in the table format using
##' tabs for separation. Otherwise, save the result as the output screen
##' printed in the R console.
##' @param fit.measures `character` vector specifying names of fit measures
##' returned by [lavaan::fitMeasures()] to be copied/saved. Only
##' relevant if `object` is class [FitDiff-class].
##' @param writeArgs `list` of additional arguments to be passed to
##' [utils::write.table()]
##' @param \dots Additional arguments when passing a `lavaan` object to the
##' `summary` or [epcEquivFit()] function.
##'
##' @return The resulting output will be saved into a clipboard or a file. If
##' using the `clipboard` function, users may paste it in the other
##' applications.
##'
##' @author
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@gmail.com})
##'
##' @examples
##'
##' library(lavaan)
##' HW.model <- ' visual =~ x1 + c1*x2 + x3
##' textual =~ x4 + c1*x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' fit <- cfa(HW.model, data = HolzingerSwineford1939, group = "school")
##'
##' if(interactive()){
##' # Copy the summary of the lavaan object
##' clipboard(fit)
##'
##' # pass additional arguments to summary() method for class?lavaan
##' clipboard(fit, rsquare = TRUE, standardized = TRUE, fit.measures = TRUE)
##'
##' # Copy the EPC equivalence testing results from the epcEquivFit() function
##' clipboard(fit, "epceqfit")
##'
##' # Copy the parameter estimates
##' clipboard(fit, "coef")
##'
##' # Copy the standard errors
##' clipboard(fit, "se")
##'
##' # Copy the sample statistics
##' clipboard(fit, "samp")
##'
##' # Copy the fit measures
##' clipboard(fit, "fit")
##'
##' # Save the summary of the lavaan object
##' saveFile(fit, "out.txt")
##'
##' # Save the EPC equivalence testing results from the epcEquivFit() function
##' saveFile(fit, "out.txt", "epceqfit")
##'
##' # Save the parameter estimates
##' saveFile(fit, "out.txt", "coef")
##'
##' # Save the standard errors
##' saveFile(fit, "out.txt", "se")
##'
##' # Save the sample statistics
##' saveFile(fit, "out.txt", "samp")
##'
##' # Save the fit measures
##' saveFile(fit, "out.txt", "fit")
##' }
##'
##' @export
clipboard <- function(object, what = "summary", ...) {
if (.Platform$OS.type == "windows") {
saveFile(object, file = "clipboard-128", what = what, tableFormat = TRUE, ...)
cat("File saved in the clipboard; please paste it in any program you wish.\n")
} else {
## Mac OS?
if (system("pbcopy -help", ignore.stderr = TRUE) == 0) {
CON <- pipe("pbcopy", "w")
on.exit(close(CON))
saveFile(object, file = CON, what = what, tableFormat = TRUE, ...)
cat("File saved in the clipboard; please paste it in any program you wish. If you cannot paste it, it is okay because this function works for some computers, which I still have no explanation currently. Please consider using the 'saveFile' function instead.\n")
} else if (system("xclip -version", ignore.stderr = TRUE) == 0) {
## Linux OS?
CON <- pipe("xclip -i", "w")
on.exit(close(CON))
saveFile(object, file = CON, what = what, tableFormat = TRUE, ...)
cat("File saved in the xclip; please paste it in any program you wish. If you cannot paste it, it is okay because this function works for some computers, which I still have no explanation currently. Please consider using the 'saveFile' function instead.\n")
} else {
stop("For Mac users, the 'pbcopy' command in the shell file does not work. For linux users, this function depends on the 'xclip' application. Please install and run the xclip application before using this function in R (it does not guarantee to work though). Alternatively, use the 'saveFile' function to write the output into a file.")
}
}
}
##' @rdname clipboard
##' @export
saveFile <- function(object, file, what = "summary", tableFormat = FALSE,
fit.measures = "default", writeArgs = list(), ...) {
# Check whether the object is in the lavaan class
if (is(object, "lavaan")) {
saveFileLavaan(object, file, what = what, tableFormat = tableFormat,
writeArgs = writeArgs, ...)
} else if (is(object, "FitDiff")) {
saveFileFitDiff(object, file, what = what, tableFormat = tableFormat,
fit.measures = fit.measures, writeArgs = writeArgs)
} else {
stop("The object must be a class?lavaan object or the",
" output from the compareFit() function.")
}
}
## ----------------
## Hidden functions
## ----------------
##' @importFrom lavaan lavInspect
saveFileLavaan <- function(object, file, what = "summary", tableFormat = FALSE,
writeArgs = list(), ...) {
if (length(what) > 1) message("only the first `what' option is used")
# be case insensitive
what <- tolower(what[1])
writeArgs$file <- file
if (is.null(writeArgs$sep)) writeArgs$sep <- "\t"
if (is.null(writeArgs$quote)) writeArgs$quote <- FALSE
if (what == "summary") {
if (tableFormat) {
writeArgs <- copySummary(object, file = file, writeArgs = writeArgs, ...)
} else {
writeArgs$x <- paste(utils::capture.output(summary(object, ...)),
collapse = "\n")
}
} else if (what %in% c("epceqfit", "mifit")) { # "mifit" retained for backward compatibility
if (tableFormat) {
writeArgs$x <- epcEquivFit(object, ...)
if (is.null(writeArgs$row.names)) writeArgs$row.names <- FALSE
if (is.null(writeArgs$col.names)) writeArgs$col.names <- TRUE
} else {
writeArgs$x <- paste(utils::capture.output(epcEquivFit(object, ...)),
collapse = "\n")
}
} else {
target <- lavInspect(object, what=what)
if (tableFormat) {
if (is(target, "lavaan.data.frame") || is(target, "data.frame")) {
writeArgs$x <- target
if (is.null(writeArgs$row.names)) writeArgs$row.names <- FALSE
if (is.null(writeArgs$col.names)) writeArgs$col.names <- TRUE
} else if (is(target, "list")) {
if (is(target[[1]], "list")) {
target <- lapply(target, listToDataFrame)
target <- mapply(function(x, y) rbind(rep("", ncol(y)), c(x, rep("", ncol(y) - 1)), y),
names(target), target, SIMPLIFY = FALSE)
writeArgs$x <- do.call(rbind, target)
if (is.null(writeArgs$row.names)) writeArgs$row.names <- FALSE
if (is.null(writeArgs$col.names)) writeArgs$col.names <- FALSE
} else {
writeArgs$x <- listToDataFrame(target)
if (is.null(writeArgs$row.names)) writeArgs$row.names <- FALSE
if (is.null(writeArgs$col.names)) writeArgs$col.names <- FALSE
}
} else {
writeArgs$x <- target
if (is.null(writeArgs$row.names)) writeArgs$row.names <- TRUE
if (is.null(writeArgs$col.names)) writeArgs$col.names <- TRUE
}
} else {
writeArgs$x <- paste(utils::capture.output(target), collapse = "\n")
}
}
do.call("write.table", writeArgs)
}
## copySummary: copy the summary of the lavaan object into the clipboard and
## potentially be useful if users paste it into the Excel application
## object = lavaan object input
copySummary <- function(object, file, writeArgs = list(), ...) {
# Capture the output of the lavaan class
outputText <- utils::capture.output(lavaan::summary(object, ...))
# Split the text by two spaces
outputText <- strsplit(outputText, " ")
# Trim and delete the "" elements
outputText <- lapply(outputText, function(x) x[x != ""])
outputText <- lapply(outputText, trim)
outputText <- lapply(outputText, function(x) x[x != ""])
# Group the output into three sections: fit, parameter estimates, and r-squared
cut1 <- grep("Estimate", outputText)[1]
cut2 <- grep("R-Square", outputText)[1]
if (is.na(cut2)) {
## no R-squared output requested, so set2 == set3
cut2 <- length(outputText)
}
set1 <- outputText[1:(cut1 - 1)]
set2 <- outputText[cut1:(cut2 - 1)]
set3 <- outputText[cut2:length(outputText)]
# Assign the number of columns in the resulting data frame and check whether the output contains any labels
numcol <- 7
test <- set2[-grep("Estimate", set2)]
test <- test[sapply(test, length) >= 2]
if (any(sapply(test, function(x) is.na(suppressWarnings(as.numeric(x[2])))))) numcol <- numcol + 1
# A function to parse the fit-measures output
set1Parse <- function(x, numcol) {
if (length(x) == 0) {
return(rep("", numcol))
} else if (length(x) == 1) {
return(c(x, rep("", numcol - 1)))
} else if ((length(x) >= 2) & (length(x) <= numcol)) {
return(c(x[1], rep("", numcol - length(x)), x[2:length(x)]))
} else {
stop("Cannot parse text")
}
}
set1 <- t(sapply(set1, set1Parse, numcol))
# A function to parse the parameter-estimates output
set2Parse <- function(x, numcol) {
if (length(x) == 0) return(rep("", numcol))
if (any(grepl("Estimate", x))) return(c(rep("", numcol-length(x)), x))
if (length(x) == 1) {
return(c(x, rep("", numcol-1)))
} else {
group1 <- x[1]
group2 <- x[2:length(x)]
if (is.na(suppressWarnings(as.numeric(x[2])))) {
group1 <- x[1:2]
group2 <- x[3:length(x)]
} else if (numcol == 8) {
group1 <- c(group1, "")
}
if (length(group2) == 1) {
group2 <- c(group2, rep("", 6 - length(group2)))
} else if (length(group2) == 4) {
group2 <- c(group2, rep("", 6 - length(group2)))
} else {
group2 <- c(group2[1], rep("", 6 - length(group2)), group2[2:length(group2)])
}
return(c(group1, group2))
}
}
set2 <- t(sapply(set2, set2Parse, numcol))
# A function to parse the r-squared output
set3Parse <- function(x, numcol) {
if (length(x) == 0) {
return(rep("", numcol))
} else {
return(c(x, rep("", numcol - length(x))))
}
}
set3 <- t(sapply(set3, set3Parse, numcol))
# Copy the output into the clipboard
writeArgs$x <- rbind(set1, set2, set3)
writeArgs$file <- file
if (is.null(writeArgs$quote)) writeArgs$quote <- FALSE
if (is.null(writeArgs$sep)) writeArgs$sep <- "\t"
if (is.null(writeArgs$row.names)) writeArgs$row.names <- FALSE
if (is.null(writeArgs$col.names)) writeArgs$col.names <- FALSE
# do.call("write.table", writeArgs)
writeArgs
}
## trim function from the R.oo package
trim <- function(object) {
s <- sub("^[\t\n\f\r ]*", "", as.character(object));
s <- sub("[\t\n\f\r ]*$", "", s);
s;
}
## listToDataFrame: Change a list with multiple elements into a single data.frame
listToDataFrame <- function(object) {
name <- names(object)
# Count the maximum number of column (+1 is for the column for row name)
numcol <- max(sapply(object, function(x) ifelse(is(x, "lavaan.matrix") || is(x, "lavaan.matrix.symmetric") || is(x, "matrix") || is(x, "data.frame"), return(ncol(x)), return(1)))) + 1
# Change all objects in the list into a data.frame with the specified column
target <- lapply(object, niceDataFrame, numcol)
# Paste the name of each object into each data.frame
target <- mapply(function(x, y) rbind(rep("", ncol(y)), c(x, rep("", ncol(y) - 1)), y), name, target, SIMPLIFY=FALSE)
# Combine into a single data.frame
target <- do.call(rbind, target)
target[-1,]
}
## niceDataFrame: Change an object into a data.frame with a specified number of
## columns and the row and column names are included in the data.frame
niceDataFrame <- function(object, numcol) {
temp <- NULL
if (is(object, "lavaan.matrix.symmetric")) {
# save only the lower diagonal of the symmetric matrix
temp <- matrix("", nrow(object), ncol(object))
for (i in 1:nrow(object)) {
temp[i, 1:i] <- object[i, 1:i]
}
} else if (is(object, "data.frame") || is(object, "matrix") || is(object, "lavaan.matrix")) {
# copy the matrix
temp <- object
} else if (is(object, "vector") || is(object, "lavaan.vector")) {
# transform a vector into a matrix
object <- as.matrix(object)
temp <- object
} else {
stop("The 'niceDataFrame' function has a bug. Please contact the developer.")
}
# Transform into the result with a specified number of columns, excluding the row name
result <- matrix("", nrow(temp), numcol - 1)
# Parse the column names
result[,1:ncol(temp)] <- temp
firstRow <- colnames(object)
ifelse(is.null(firstRow), firstRow <- rep("", ncol(result)), firstRow <- c(firstRow, rep("", numcol - length(firstRow) - 1)))
# Parse the row names
result <- rbind(firstRow, result)
firstCol <- rownames(object)
ifelse(is.null(firstCol), firstCol <- rep("", nrow(result)), firstCol <- c("", firstCol))
result <- cbind(firstCol, result)
dimnames(result) <- NULL
result
}
semTools/R/partialInvariance.R 0000644 0001762 0000144 00000262173 15142322171 016050 0 ustar ligges users ### Sunthud Pornprasertmanit
### Last updated: 9 February 2026
##' Partial Measurement Invariance Testing Across Groups
##'
##' This test will provide partial invariance testing by (a) freeing a parameter
##' one-by-one from nested model and compare with the original nested model or
##' (b) fixing (or constraining) a parameter one-by-one from the parent model
##' and compare with the original parent model. This function only works with
##' congeneric models. The `partialInvariance` is used for continuous
##' variable. The `partialInvarianceCat` is used for categorical variables.
##'
##' There are four types of partial invariance testing:
##'
##' \itemize{
##' \item Partial weak invariance. The model named `fit.configural`
##' from the list of models is compared with the model named `fit.loadings`.
##' Each loading will be freed or fixed from the metric and configural
##' invariance models respectively. The modified models are compared with the
##' original model. Note that the objects in the list of models must have the
##' names of `"fit.configural"` and `"fit.loadings"`. Users may use "metric",
##' "weak", "loading", or "loadings" in the `type` argument. Note that, for
##' testing invariance on marker variables, other variables will be assigned as
##' marker variables automatically.
##'
##' \item Partial strong invariance. The model
##' named `fit.loadings` from the list of models is compared with the model
##' named either `fit.intercepts` or 'fit.thresholds'. Each intercept will be
##' freed or fixed from the scalar and metric invariance models respectively.
##' The modified models are compared with the original model. Note that the
##' objects in the list of models must have the names of "fit.loadings" and
##' either "fit.intercepts" or "fit.thresholds". Users may use "scalar",
##' "strong", "intercept", "intercepts", "threshold", or "thresholds" in the
##' `type` argument. Note that, for testing invariance on marker variables,
##' other variables will be assigned as marker variables automatically. Note
##' that if all variables are dichotomous, scalar invariance testing is not
##' available.
##'
##' \item Partial strict invariance. The model named either
##' 'fit.intercepts' or 'fit.thresholds' (or 'fit.loadings') from the list of
##' models is compared with the model named 'fit.residuals'. Each residual
##' variance will be freed or fixed from the strict and scalar (or metric)
##' invariance models respectively. The modified models are compared with the
##' original model. Note that the objects in the list of models must have the
##' names of "fit.residuals" and either "fit.intercepts", "fit.thresholds", or
##' "fit.loadings". Users may use "strict", "residual", "residuals", "error", or
##' "errors" in the `type` argument.
##'
##' \item Partial mean invariance. The
##' model named either 'fit.intercepts' or 'fit.thresholds' (or 'fit.residuals'
##' or 'fit.loadings') from the list of models is compared with the model named
##' 'fit.means'. Each factor mean will be freed or fixed from the means and
##' scalar (or strict or metric) invariance models respectively. The modified
##' models are compared with the original model. Note that the objects in the
##' list of models must have the names of "fit.means" and either
##' "fit.residuals", "fit.intercepts", "fit.thresholds", or "fit.loadings".
##' Users may use "means" or "mean" in the `type` argument. }
##'
##' Two types of comparisons are used in this function:
##' \enumerate{
##' \item `free`: The nested model is used as a template. Then, one
##' parameter indicating the differences between two models is free. The new
##' model is compared with the nested model. This process is repeated for all
##' differences between two models. The likelihood-ratio test and the difference
##' in CFI are provided.
##'
##' \item `fix`: The parent model is used as a template. Then, one parameter
##' indicating the differences between two models is fixed or constrained to be
##' equal to other parameters. The new model is then compared with the parent
##' model. This process is repeated for all differences between two models. The
##' likelihood-ratio test and the difference in CFI are provided.
##'
##' \item `wald`: This method is similar to the `fix` method. However,
##' instead of building a new model and compare them with likelihood-ratio test,
##' multivariate wald test is used to compare equality between parameter
##' estimates. See [lavaan::lavTestWald()] for further details. Note
##' that if any rows of the contrast cannot be summed to 0, the Wald test is not
##' provided, such as comparing two means where one of the means is fixed as 0.
##' This test statistic is not as accurate as likelihood-ratio test provided in
##' `fix`. I provide it here in case that likelihood-ratio test fails to
##' converge.
##' }
##'
##' Note that this function does not adjust for the inflated Type I error rate
##' from multiple tests. The degree of freedom of all tests would be the number
##' of groups minus 1.
##'
##' The details of standardized estimates and the effect size used for each
##' parameters are provided in the vignettes by running
##' `vignette("partialInvariance")`.
##'
##' @importFrom lavaan lavInspect parTable
##' @aliases partialInvariance partialInvarianceCat
##'
##' @param fit A list of models for invariance testing. Each model should be
##' assigned by appropriate names (see details).
##' @param type The types of invariance testing: "metric", "scalar", "strict",
##' or "means"
##' @param free A vector of variable names that are free across groups in
##' advance. If partial mean invariance is tested, this argument represents a
##' vector of factor names that are free across groups.
##' @param fix A vector of variable names that are constrained to be equal
##' across groups in advance. If partial mean invariance is tested, this
##' argument represents a vector of factor names that are fixed across groups.
##' @param refgroup The reference group used to make the effect size comparison
##' with the other groups.
##' @param poolvar If `TRUE`, the variances are pooled across group for
##' standardization. Otherwise, the variances of the reference group are used
##' for standardization.
##' @param p.adjust The method used to adjust p values. See
##' [stats::p.adjust()] for the options for adjusting p values. The
##' default is to not use any corrections.
##' @param fbound The z-scores of factor that is used to calculate the effect
##' size of the loading difference proposed by Millsap and Olivera-Aguilar
##' (2012).
##' @param return.fit Return the submodels fitted by this function
##' @param method The method used to calculate likelihood ratio test. See
##' [lavaan::lavTestLRT()] for available options
##'
##' @return A list of results are provided. The list will consists of at least
##' two elements:
##' \enumerate{
##' \item `estimates`: The results of parameter estimates including pooled
##' estimates (`poolest`), the estimates for each group, standardized
##' estimates for each group (`std`), the difference in standardized
##' values, and the effect size statistic (*q* for factor loading
##' difference and *h* for error variance difference). See the details of
##' this effect size statistic by running `vignette("partialInvariance")`.
##' In the `partialInvariance` function, the additional effect statistics
##' proposed by Millsap and Olivera-Aguilar (2012) are provided. For factor
##' loading, the additional outputs are the observed mean difference
##' (`diff_mean`), the mean difference if factor scores are low
##' (`low_fscore`), and the mean difference if factor scores are high
##' (`high_fscore`). The low factor score is calculated by (a) finding the
##' factor scores that its *z* score equals -`bound` (the default is
##' \eqn{-2}) from all groups and (b) picking the minimum value among the
##' factor scores. The high factor score is calculated by (a) finding the
##' factor scores that its *z* score equals `bound` (default = 2)
##' from all groups and (b) picking the maximum value among the factor scores.
##' For measurement intercepts, the additional outputs are the observed means
##' difference (`diff_mean`) and the proportion of the differences in the
##' intercepts over the observed means differences (`propdiff`). For error
##' variances, the additional outputs are the proportion of the difference in
##' error variances over the difference in observed variances (`propdiff`).
##'
##' \item `results`: Statistical tests as well as the change in CFI are
##' provided. \eqn{\chi^2} and *p* value are provided for all methods.
##'
##' \item `models`: The submodels used in the `free` and `fix`
##' methods, as well as the nested and parent models. The nested and parent
##' models will be changed from the original models if `free` or
##' `fit` arguments are specified.
##' }
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @references Millsap, R. E., & Olivera-Aguilar, M. (2012). Investigating
##' measurement invariance using confirmatory factor analysis. In R. H. Hoyle
##' (Ed.), *Handbook of structural equation modeling* (pp. 380--392). New
##' York, NY: Guilford.
##'
##' @examples
##'
##' ## Conduct weak invariance testing manually by using fixed-factor
##' ## method of scale identification
##'
##' library(lavaan)
##'
##' conf <- "
##' f1 =~ NA*x1 + x2 + x3
##' f2 =~ NA*x4 + x5 + x6
##' f1 ~~ c(1, 1)*f1
##' f2 ~~ c(1, 1)*f2
##' "
##'
##' weak <- "
##' f1 =~ NA*x1 + x2 + x3
##' f2 =~ NA*x4 + x5 + x6
##' f1 ~~ c(1, NA)*f1
##' f2 ~~ c(1, NA)*f2
##' "
##'
##' configural <- cfa(conf, data = HolzingerSwineford1939, std.lv = TRUE, group="school")
##' weak <- cfa(weak, data = HolzingerSwineford1939, group="school", group.equal="loadings")
##' models <- list(fit.configural = configural, fit.loadings = weak)
##' partialInvariance(models, "metric")
##'
# \donttest{
# partialInvariance(models, "metric", free = "x5") # "x5" is free across groups in advance
# partialInvariance(models, "metric", fix = "x4") # "x4" is fixed across groups in advance
#
# ## Use the result from the measurementInvariance function
# HW.model <- ' visual =~ x1 + x2 + x3
# textual =~ x4 + x5 + x6
# speed =~ x7 + x8 + x9 '
#
# models2 <- measurementInvariance(model = HW.model, data=HolzingerSwineford1939,
# group="school")
# partialInvariance(models2, "scalar")
#
# ## Conduct weak invariance testing manually by using fixed-factor
# ## method of scale identification for dichotomous variables
#
# f <- rnorm(1000, 0, 1)
# u1 <- 0.9*f + rnorm(1000, 1, sqrt(0.19))
# u2 <- 0.8*f + rnorm(1000, 1, sqrt(0.36))
# u3 <- 0.6*f + rnorm(1000, 1, sqrt(0.64))
# u4 <- 0.7*f + rnorm(1000, 1, sqrt(0.51))
# u1 <- as.numeric(cut(u1, breaks = c(-Inf, 0, Inf)))
# u2 <- as.numeric(cut(u2, breaks = c(-Inf, 0.5, Inf)))
# u3 <- as.numeric(cut(u3, breaks = c(-Inf, 0, Inf)))
# u4 <- as.numeric(cut(u4, breaks = c(-Inf, -0.5, Inf)))
# g <- rep(c(1, 2), 500)
# dat2 <- data.frame(u1, u2, u3, u4, g)
#
# configural2 <- "
# f1 =~ NA*u1 + u2 + u3 + u4
# u1 | c(t11, t11)*t1
# u2 | c(t21, t21)*t1
# u3 | c(t31, t31)*t1
# u4 | c(t41, t41)*t1
# f1 ~~ c(1, 1)*f1
# f1 ~ c(0, NA)*1
# u1 ~~ c(1, 1)*u1
# u2 ~~ c(1, NA)*u2
# u3 ~~ c(1, NA)*u3
# u4 ~~ c(1, NA)*u4
# "
#
# outConfigural2 <- cfa(configural2, data = dat2, group = "g",
# parameterization = "theta", estimator = "wlsmv",
# ordered = c("u1", "u2", "u3", "u4"))
#
# weak2 <- "
# f1 =~ NA*u1 + c(f11, f11)*u1 + c(f21, f21)*u2 + c(f31, f31)*u3 + c(f41, f41)*u4
# u1 | c(t11, t11)*t1
# u2 | c(t21, t21)*t1
# u3 | c(t31, t31)*t1
# u4 | c(t41, t41)*t1
# f1 ~~ c(1, NA)*f1
# f1 ~ c(0, NA)*1
# u1 ~~ c(1, 1)*u1
# u2 ~~ c(1, NA)*u2
# u3 ~~ c(1, NA)*u3
# u4 ~~ c(1, NA)*u4
# "
#
# outWeak2 <- cfa(weak2, data = dat2, group = "g", parameterization = "theta",
# estimator = "wlsmv", ordered = c("u1", "u2", "u3", "u4"))
# modelsCat <- list(fit.configural = outConfigural2, fit.loadings = outWeak2)
#
# partialInvarianceCat(modelsCat, type = "metric")
#
# partialInvarianceCat(modelsCat, type = "metric", free = "u2")
# partialInvarianceCat(modelsCat, type = "metric", fix = "u3")
#
# ## Use the result from the measurementInvarianceCat function
#
# model <- ' f1 =~ u1 + u2 + u3 + u4
# f2 =~ u5 + u6 + u7 + u8'
#
# modelsCat2 <- measurementInvarianceCat(model = model, data = datCat, group = "g",
# parameterization = "theta",
# estimator = "wlsmv", strict = TRUE)
#
# partialInvarianceCat(modelsCat2, type = "scalar")
# }
##'
##' @export
partialInvariance <- function(fit, type, free = NULL, fix = NULL, refgroup = 1,
poolvar = TRUE, p.adjust = "none", fbound = 2,
return.fit = FALSE, method = "satorra.bentler.2001") {
type <- tolower(type)
numType <- 0
fit1 <- fit0 <- NULL
# fit0 = Nested model, fit1 = Parent model
if(type %in% c("metric", "weak", "loading", "loadings")) {
numType <- 1
if(all(c("fit.configural", "fit.loadings") %in% names(fit))) {
fit1 <- fit$fit.configural
fit0 <- fit$fit.loadings
} else {
stop("The elements named 'fit.configural' and 'fit.loadings' are needed in the 'fit' argument")
}
} else if (type %in% c("scalar", "strong", "intercept", "intercepts", "threshold", "thresholds")) {
numType <- 2
if(all(c("fit.loadings", "fit.intercepts") %in% names(fit))) {
fit1 <- fit$fit.loadings
fit0 <- fit$fit.intercepts
} else {
stop("The elements named 'fit.loadings' and 'fit.intercepts' are needed in the 'fit' argument")
}
} else if (type %in% c("strict", "residual", "residuals", "error", "errors")) {
numType <- 3
if(all(c("fit.intercepts", "fit.residuals") %in% names(fit))) {
fit1 <- fit$fit.intercepts
fit0 <- fit$fit.residuals
} else {
stop("The elements named 'fit.intercepts' and 'fit.residuals' are needed in the 'fit' argument")
}
} else if (type %in% c("means", "mean")) {
numType <- 4
if("fit.means" %in% names(fit)) {
fit0 <- fit$fit.means
if("fit.residuals" %in% names(fit)) {
fit1 <- fit$fit.residuals
} else if ("fit.intercepts" %in% names(fit)) {
fit1 <- fit$fit.intercepts
} else {
stop("The elements named either 'fit.residuals' or 'fit.intercepts ' is needed in the 'fit' argument")
}
} else {
stop("The elements named 'fit.means' is needed in the 'fit' argument")
}
} else {
stop("Please specify the correct type of measurement invariance. See the help page.")
}
pt1 <- parTable(fit1)
pt0 <- parTable(fit0)
pt0$start <- pt0$est <- pt0$se <- NULL
pt1$start <- pt1$est <- pt1$se <- NULL
pt1$label[substr(pt1$label, 1, 1) == "." & substr(pt1$label, nchar(pt1$label), nchar(pt1$label)) == "."] <- ""
pt0$label[substr(pt0$label, 1, 1) == "." & substr(pt0$label, nchar(pt0$label), nchar(pt0$label)) == "."] <- ""
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
if(length(table(table(pt0$rhs[pt0$op == "=~"]))) != 1) stop("The model is not congeneric. This function does not support non-congeneric model.")
varfree <- varnames <- unique(pt0$rhs[pt0$op == "=~"])
facnames <- unique(pt0$lhs[(pt0$op == "=~") & (pt0$rhs %in% varnames)])
facrepresent <- table(pt0$lhs[(pt0$op == "=~") & (pt0$rhs %in% varnames)], pt0$rhs[(pt0$op == "=~") & (pt0$rhs %in% varnames)])
if(any(apply(facrepresent, 2, function(x) sum(x != 0)) > 1)) stop("The model is not congeneric. This function does not support non-congeneric model.")
facList <- list()
for(i in 1:nrow(facrepresent)) {
facList[[i]] <- colnames(facrepresent)[facrepresent[i,] > 0]
}
names(facList) <- rownames(facrepresent)
facList <- facList[match(names(facList), facnames)]
fixLoadingFac <- list()
for(i in seq_along(facList)) {
select <- pt1$lhs == names(facList)[i] & pt1$op == "=~" & pt1$rhs %in% facList[[i]] & pt1$group == 1 & pt1$free == 0 & (!is.na(pt1$ustart) & pt1$ustart > 0)
fixLoadingFac[[i]] <- pt1$rhs[select]
}
names(fixLoadingFac) <- names(facList)
fixIntceptFac <- list()
for(i in seq_along(facList)) {
select <- pt1$op == "~1" & pt1$rhs %in% facList[[i]] & pt1$group == 1 & pt1$free == 0
fixIntceptFac[[i]] <- pt1$rhs[select]
}
names(fixIntceptFac) <- names(facList)
ngroups <- max(pt0$group)
neach <- lavInspect(fit0, "nobs")
groupvar <- lavInspect(fit0, "group")
grouplab <- lavInspect(fit0, "group.label")
if(!is.numeric(refgroup)) refgroup <- which(refgroup == grouplab)
grouporder <- 1:ngroups
grouporder <- c(refgroup, setdiff(grouporder, refgroup))
grouplaborder <- grouplab[grouporder]
complab <- paste(grouplaborder[2:ngroups], "vs.", grouplaborder[1])
if(ngroups <= 1) stop("Well, the number of groups is 1. Measurement invariance across 'groups' cannot be done.")
if(numType == 4) {
if(!all(c(free, fix) %in% facnames)) stop("'free' and 'fix' arguments should consist of factor names because mean invariance is tested.")
} else {
if(!all(c(free, fix) %in% varnames)) stop("'free' and 'fix' arguments should consist of variable names.")
}
result <- fixCon <- freeCon <- NULL
estimates <- NULL
listFreeCon <- listFixCon <- list()
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
if(numType == 1) {
if(!is.null(free) | !is.null(fix)) {
if(!is.null(fix)) {
facinfix <- findFactor(fix, facList)
dup <- duplicated(facinfix)
for(i in seq_along(fix)) {
if(dup[i]) {
pt0 <- constrainParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups)
pt1 <- constrainParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups)
} else {
oldmarker <- fixLoadingFac[[facinfix[i]]]
if(length(oldmarker) > 0) {
oldmarkerval <- pt1$ustart[pt1$lhs == facinfix[i] & pt1$op == "=~" & pt1$rhs == oldmarker & pt1$group == 1]
if(oldmarker == fix[i]) {
pt0 <- fixParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
} else {
pt0 <- freeParTable(pt0, facinfix[i], "=~", oldmarker, 1:ngroups)
pt0 <- constrainParTable(pt0, facinfix[i], "=~", oldmarker, 1:ngroups)
pt1 <- freeParTable(pt1, facinfix[i], "=~", oldmarker, 1:ngroups)
pt0 <- fixParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
fixLoadingFac[[facinfix[i]]] <- fix[i]
}
} else {
pt0 <- constrainParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups)
pt1 <- constrainParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups)
}
}
}
}
if(!is.null(free)) {
facinfree <- findFactor(free, facList)
for(i in seq_along(free)) {
# Need to change marker variable if fixed
oldmarker <- fixLoadingFac[[facinfree[i]]]
if(length(oldmarker) > 0 && oldmarker == free[i]) {
oldmarkerval <- pt1$ustart[pt1$lhs == facinfix[i] & pt1$op == "=~" & pt1$rhs == oldmarker & pt1$group == 1]
candidatemarker <- setdiff(facList[[facinfree[i]]], free[i])[1]
pt0 <- freeParTable(pt0, facinfree[i], "=~", free[i], 1:ngroups)
pt1 <- freeParTable(pt1, facinfree[i], "=~", free[i], 1:ngroups)
pt0 <- fixParTable(pt0, facinfix[i], "=~", candidatemarker, 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, facinfix[i], "=~", candidatemarker, 1:ngroups, oldmarkerval)
fixLoadingFac[[facinfix[i]]] <- candidatemarker
} else {
pt0 <- freeParTable(pt0, facinfree[i], "=~", free[i], 1:ngroups)
pt1 <- freeParTable(pt1, facinfree[i], "=~", free[i], 1:ngroups)
}
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
obsmean <- sapply(lavInspect(fit0, "sampstat"), "[[", "mean") #FIXME: there might not be a mean structure
obsmean <- obsmean[,grouporder]
obsdiff <- obsmean[,2:ngroups, drop = FALSE] - matrix(obsmean[,1], nrow(obsmean), ngroups - 1)
obsdiff <- obsdiff[varfree, , drop = FALSE]
colnames(obsdiff) <- paste0("diff_mean:", complab)
estimates <- matrix(NA, length(varfree), ngroups + 1)
stdestimates <- matrix(NA, length(varfree), ngroups)
colnames(estimates) <- c("poolest", paste0("load:", grouplab))
colnames(stdestimates) <- paste0("std:", grouplab)
esstd <- esz <- matrix(NA, length(varfree), ngroups - 1)
colnames(esstd) <- paste0("diff_std:", complab)
colnames(esz) <- paste0("q:", complab)
esdiff <- matrix(NA, length(varfree), ngroups - 1)
# Extract facmean, facsd, load, tau -> lowdiff, highdiff
lowdiff <- matrix(NA, length(varfree), ngroups - 1)
highdiff <- matrix(NA, length(varfree), ngroups - 1)
colnames(lowdiff) <- paste0("low_fscore:", complab)
colnames(highdiff) <- paste0("high_fscore:", complab)
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
index <- which((pt1$rhs %in% varfree) & (pt1$op == "=~") & (pt1$group == 1))
facinfix <- findFactor(fix, facList)
varinfixvar <- unlist(facList[facinfix])
varinfixvar <- setdiff(varinfixvar, setdiff(varinfixvar, varfree))
indexfixvar <- which((pt1$rhs %in% varinfixvar) & (pt1$op == "=~") & (pt1$group == 1))
varnonfixvar <- setdiff(varfree, varinfixvar)
indexnonfixvar <- setdiff(index, indexfixvar)
pos <- 1
for(i in seq_along(indexfixvar)) {
runnum <- indexfixvar[i]
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if(!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if(!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
loadVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 2:ncol(estimates)] <- loadVal
facVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~~", pt0$lhs[runnum], 1:ngroups)
totalVal <- sapply(lavaan::fitted.values(tempfit0), function(x, v) x$cov[v, v], v = pt0$rhs[runnum])
names(facVal) <- names(totalVal) <- grouplab
ifelse(poolvar, refFacVal <- poolVariance(facVal, neach), refFacVal <- facVal[refgroup])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdLoadVal <- loadVal * sqrt(refFacVal) / sqrt(refTotalVal)
stdestimates[pos,] <- stdLoadVal
stdLoadVal <- stdLoadVal[grouporder]
esstd[pos,] <- stdLoadVal[2:ngroups] - stdLoadVal[1]
if(any(abs(stdLoadVal) > 0.9999)) warning(paste("Standardized Loadings of", pt0$rhs[runnum], "in some groups are less than -1 or over 1. The standardized loadings used in Fisher z transformation are changed to -0.9999 or 0.9999."))
stdLoadVal[stdLoadVal > 0.9999] <- 0.9999
stdLoadVal[stdLoadVal < -0.9999] <- -0.9999
zLoadVal <- atanh(stdLoadVal)
esz[pos,] <- zLoadVal[2:ngroups] - zLoadVal[1]
facMean <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~1", "", 1:ngroups)
wlow <- min(facMean - fbound * sqrt(facVal))
whigh <- max(facMean + fbound * sqrt(facVal))
intVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$rhs[runnum], "~1", "", 1:ngroups)
loadVal <- loadVal[grouporder]
intVal <- intVal[grouporder]
loaddiff <- loadVal[2:ngroups] - loadVal[1]
intdiff <- intVal[2:ngroups] - intVal[1]
lowdiff[pos,] <- intdiff + wlow * loaddiff
highdiff[pos,] <- intdiff + whigh * loaddiff
}
listFreeCon <- c(listFreeCon, tryresult0)
waldCon[pos,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
pos <- pos + 1
}
facinvarfree <- findFactor(varnonfixvar, facList)
for(i in seq_along(indexnonfixvar)) {
runnum <- indexnonfixvar[i]
# Need to change marker variable if fixed
oldmarker <- fixLoadingFac[[facinvarfree[i]]]
if(length(oldmarker) > 0 && oldmarker == varnonfixvar[i]) {
candidatemarker <- setdiff(facList[[facinvarfree[i]]], varnonfixvar[i])[1]
temp <- freeParTable(pt1, facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups)
temp <- fixParTable(temp, facinvarfree[i], "=~", candidatemarker, 1:ngroups, ustart = 1)
temp <- constrainParTable(temp, facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups)
newparent <- freeParTable(pt1, facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups)
newparent <- fixParTable(newparent, facinvarfree[i], "=~", candidatemarker, 1:ngroups, ustart = 1)
newparentresult <- try(newparentfit <- refit(newparent, fit1), silent = TRUE)
if(!is(newparentresult, "try-error")) {
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, newparentfit, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(newparentfit, tempfit))
}
waldCon[pos,] <- waldConstraint(newparentfit, newparent, waldMat, cbind(facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups))
}
} else {
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
waldCon[pos,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
}
listFixCon <- c(listFixCon, tryresult)
if(length(oldmarker) > 0 && oldmarker == varnonfixvar[i]) {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 2:ngroups)
} else {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
}
estimates[pos, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if(!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if(!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
loadVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 2:ncol(estimates)] <- loadVal
facVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~~", pt0$lhs[runnum], 1:ngroups)
totalVal <- sapply(lavaan::fitted.values(tempfit0), function(x, v) x$cov[v, v], v = pt0$rhs[runnum])
names(facVal) <- names(totalVal) <- grouplab
ifelse(poolvar, refFacVal <- poolVariance(facVal, neach), refFacVal <- facVal[refgroup])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdLoadVal <- loadVal * sqrt(refFacVal) / sqrt(refTotalVal)
stdestimates[pos,] <- stdLoadVal
stdLoadVal <- stdLoadVal[grouporder]
esstd[pos,] <- stdLoadVal[2:ngroups] - stdLoadVal[1]
if(any(abs(stdLoadVal) > 0.9999)) warning(paste("Standardized Loadings of", pt0$rhs[runnum], "in some groups are less than -1 or over 1. The standardized loadings used in Fisher z transformation are changed to -0.9999 or 0.9999."))
stdLoadVal[stdLoadVal > 0.9999] <- 0.9999
stdLoadVal[stdLoadVal < -0.9999] <- -0.9999
zLoadVal <- atanh(stdLoadVal)
esz[pos,] <- zLoadVal[2:ngroups] - zLoadVal[1]
facMean <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~1", "", 1:ngroups)
wlow <- min(facMean - fbound * sqrt(facVal))
whigh <- max(facMean + fbound * sqrt(facVal))
intVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$rhs[runnum], "~1", "", 1:ngroups)
loadVal <- loadVal[grouporder]
intVal <- intVal[grouporder]
loaddiff <- loadVal[2:ngroups] - loadVal[1]
intdiff <- intVal[2:ngroups] - intVal[1]
lowdiff[pos,] <- intdiff + wlow * loaddiff
highdiff[pos,] <- intdiff + whigh * loaddiff
}
listFreeCon <- c(listFreeCon, tryresult0)
pos <- pos + 1
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- namept1[c(indexfixvar, indexnonfixvar)]
estimates <- cbind(estimates, stdestimates, esstd, esz, obsdiff, lowdiff, highdiff)
result <- cbind(freeCon, fixCon, waldCon)
} else if (numType == 2) {
if(!is.null(free) | !is.null(fix)) {
if(!is.null(fix)) {
facinfix <- findFactor(fix, facList)
dup <- duplicated(facinfix)
for(i in seq_along(fix)) {
if(dup[i]) {
pt0 <- constrainParTable(pt0, fix[i], "~1", "", 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "~1", "", 1:ngroups)
} else {
oldmarker <- fixIntceptFac[[facinfix[i]]]
if(length(oldmarker) > 0) {
oldmarkerval <- pt1$ustart[pt1$lhs == fix[i] & pt1$op == "~1" & pt1$rhs == "" & pt1$group == 1]
if(oldmarker == fix[i]) {
pt0 <- fixParTable(pt0, fix[i], "~1", "", 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, fix[i], "~1", "", 1:ngroups, oldmarkerval)
} else {
pt0 <- freeParTable(pt0, oldmarker, "~1", "", 1:ngroups)
pt0 <- constrainParTable(pt0, oldmarker, "~1", "", 1:ngroups)
pt1 <- freeParTable(pt1, oldmarker, "~1", "", 1:ngroups)
pt0 <- fixParTable(pt0, fix[i], "~1", "", 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, fix[i], "~1", "", 1:ngroups, oldmarkerval)
fixIntceptFac[[facinfix[i]]] <- fix[i]
}
} else {
pt0 <- constrainParTable(pt0, fix[i], "~1", "", 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "~1", "", 1:ngroups)
}
}
}
}
if(!is.null(free)) {
facinfree <- findFactor(free, facList)
for(i in seq_along(free)) {
# Need to change marker variable if fixed
oldmarker <- fixIntceptFac[[facinfree[i]]]
if(length(oldmarker) > 0 && oldmarker == free[i]) {
oldmarkerval <- pt1$ustart[pt1$lhs == oldmarker & pt1$op == "~1" & pt1$rhs == "" & pt1$group == 1]
candidatemarker <- setdiff(facList[[facinfree[i]]], free[i])[1]
pt0 <- freeParTable(pt0, free[i], "~1", "", 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "~1", "", 1:ngroups)
pt0 <- fixParTable(pt0, candidatemarker, "~1", "", 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, candidatemarker, "~1", "", 1:ngroups, oldmarkerval)
fixIntceptFac[[facinfix[i]]] <- candidatemarker
} else {
pt0 <- freeParTable(pt0, free[i], "~1", "", 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "~1", "", 1:ngroups)
}
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
obsmean <- sapply(lavInspect(fit0, "sampstat"), "[[", "mean") #FIXME: there might not be a mean structure
obsmean <- obsmean[,grouporder]
obsdiff <- obsmean[,2:ngroups, drop = FALSE] - matrix(obsmean[,1], nrow(obsmean), ngroups - 1)
obsdiff <- obsdiff[varfree, , drop = FALSE]
colnames(obsdiff) <- paste0("diff_mean:", complab)
# Prop diff
propdiff <- matrix(NA, length(varfree), ngroups - 1)
colnames(propdiff) <- paste0("propdiff:", complab)
estimates <- matrix(NA, length(varfree), ngroups + 1)
stdestimates <- matrix(NA, length(varfree), ngroups)
colnames(estimates) <- c("poolest", paste0("int:", grouplab))
colnames(stdestimates) <- paste0("std:", grouplab)
esstd <- matrix(NA, length(varfree), ngroups - 1)
colnames(esstd) <- paste0("diff_std:", complab)
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
index <- which((pt1$lhs %in% varfree) & (pt1$op == "~1") & (pt1$group == 1))
facinfix <- findFactor(fix, facList)
varinfixvar <- unlist(facList[facinfix])
varinfixvar <- setdiff(varinfixvar, setdiff(varinfixvar, varfree))
indexfixvar <- which((pt1$lhs %in% varinfixvar) & (pt1$op == "~1") & (pt1$group == 1))
varnonfixvar <- setdiff(varfree, varinfixvar)
indexnonfixvar <- setdiff(index, indexfixvar)
pos <- 1
for(i in seq_along(varinfixvar)) {
runnum <- indexfixvar[i]
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if(!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if(!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
intVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 2:ncol(estimates)] <- intVal
totalVal <- sapply(lavaan::fitted.values(tempfit0), function(x, v) x$cov[v, v], v = pt0$lhs[runnum])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdIntVal <- intVal / sqrt(refTotalVal)
stdestimates[pos,] <- stdIntVal
stdIntVal <- stdIntVal[grouporder]
esstd[pos,] <- stdIntVal[2:ngroups] - stdIntVal[1]
intVal <- intVal[grouporder]
propdiff[pos,] <- (intVal[2:ngroups] - intVal[1]) / obsdiff[pos,]
}
listFreeCon <- c(listFreeCon, tryresult0)
waldCon[pos,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
pos <- pos + 1
}
facinvarfree <- findFactor(varfree, facList)
for(i in seq_along(varnonfixvar)) {
runnum <- indexnonfixvar[i]
# Need to change marker variable if fixed
oldmarker <- fixIntceptFac[[facinvarfree[i]]]
if(length(oldmarker) > 0 && oldmarker == varfree[i]) {
candidatemarker <- setdiff(facList[[facinvarfree[i]]], varfree[i])[1]
temp <- freeParTable(pt1, varfree[i], "~1", "", 1:ngroups)
temp <- constrainParTable(temp, varfree[i], "~1", "", 1:ngroups)
temp <- fixParTable(temp, candidatemarker, "~1", "", 1:ngroups)
newparent <- freeParTable(pt1, varfree[i], "~1", "", 1:ngroups)
newparent <- fixParTable(newparent, candidatemarker, "~1", "", 1:ngroups)
newparentresult <- try(newparentfit <- refit(newparent, fit1), silent = TRUE)
if(!is(newparentresult, "try-error")) {
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, newparentfit, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(newparentfit, tempfit))
}
waldCon[pos,] <- waldConstraint(newparentfit, newparent, waldMat, cbind(varfree[i], "~1", "", 1:ngroups))
}
} else {
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
waldCon[pos,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
}
listFixCon <- c(listFixCon, tryresult)
if(length(oldmarker) > 0 && oldmarker == varfree[i]) {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 2:ngroups)
} else {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
}
estimates[pos, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if(!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if(!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
intVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 2:ncol(estimates)] <- intVal
totalVal <- sapply(lavaan::fitted.values(tempfit0), function(x, v) x$cov[v, v], v = pt0$lhs[runnum])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdIntVal <- intVal / sqrt(refTotalVal)
stdestimates[pos,] <- stdIntVal
stdIntVal <- stdIntVal[grouporder]
esstd[pos,] <- stdIntVal[2:ngroups] - stdIntVal[1]
intVal <- intVal[grouporder]
propdiff[pos,] <- (intVal[2:ngroups] - intVal[1]) / obsdiff[pos,]
}
listFreeCon <- c(listFreeCon, tryresult0)
pos <- pos + 1
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- namept1[c(indexfixvar, indexnonfixvar)]
estimates <- cbind(estimates, stdestimates, esstd, obsdiff, propdiff)
result <- cbind(freeCon, fixCon, waldCon)
} else if (numType == 3) {
if(!is.null(free) | !is.null(fix)) {
if(!is.null(fix)) {
for(i in seq_along(fix)) {
pt0 <- constrainParTable(pt0, fix[i], "~~", fix[i], 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "~~", fix[i], 1:ngroups)
}
}
if(!is.null(free)) {
for(i in seq_along(free)) {
pt0 <- freeParTable(pt0, free[i], "~~", free[i], 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "~~", free[i], 1:ngroups)
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
# Prop diff
propdiff <- matrix(NA, length(varfree), ngroups - 1)
colnames(propdiff) <- paste0("propdiff:", complab)
estimates <- matrix(NA, length(varfree), ngroups + 1)
stdestimates <- matrix(NA, length(varfree), ngroups)
colnames(estimates) <- c("poolest", paste0("errvar:", grouplab))
colnames(stdestimates) <- paste0("std:", grouplab)
esstd <- esz <- matrix(NA, length(varfree), ngroups - 1)
colnames(esstd) <- paste0("diff_std:", complab)
colnames(esz) <- paste0("h:", complab)
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
index <- which((pt1$lhs %in% varfree) & (pt1$op == "~~") & (pt1$lhs == pt1$rhs) & (pt1$group == 1))
for(i in seq_along(index)) {
runnum <- index[i]
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[i,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[i, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if(!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if(!is(compresult0, "try-error")) freeCon[i,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
errVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[i, 2:ncol(estimates)] <- errVal
totalVal <- sapply(lavaan::fitted.values(tempfit0), function(x, v) x$cov[v, v], v = pt0$rhs[runnum])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdErrVal <- errVal / sqrt(refTotalVal)
stdestimates[i,] <- stdErrVal
stdErrVal <- stdErrVal[grouporder]
esstd[i,] <- stdErrVal[2:ngroups] - stdErrVal[1]
if(any(abs(stdErrVal) > 0.9999)) warning(paste("The uniqueness of", pt0$rhs[runnum], "in some groups are over 1. The uniqueness used in arctan transformation are changed to 0.9999."))
stdErrVal[stdErrVal > 0.9999] <- 0.9999
zErrVal <- asin(sqrt(stdErrVal))
esz[i,] <- zErrVal[2:ngroups] - zErrVal[1]
errVal <- errVal[grouporder]
totalVal <- totalVal[grouporder]
errdiff <- errVal[2:ngroups] - errVal[1]
totaldiff <- totalVal[2:ngroups] - totalVal[1]
propdiff[i,] <- errdiff / totaldiff
}
listFreeCon <- c(listFreeCon, tryresult0)
waldCon[i,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- namept1[index]
estimates <- cbind(estimates, stdestimates, esstd, esz, propdiff)
result <- cbind(freeCon, fixCon, waldCon)
} else if (numType == 4) {
varfree <- facnames
if(!is.null(free) | !is.null(fix)) {
if(!is.null(fix)) {
for(i in seq_along(fix)) {
pt0 <- constrainParTable(pt0, fix[i], "~1", "", 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "~1", "", 1:ngroups)
}
}
if(!is.null(free)) {
for(i in seq_along(free)) {
pt0 <- freeParTable(pt0, free[i], "~1", "", 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "~1", "", 1:ngroups)
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
estimates <- matrix(NA, length(varfree), ngroups + 1)
stdestimates <- matrix(NA, length(varfree), ngroups)
colnames(estimates) <- c("poolest", paste0("mean:", grouplab))
colnames(stdestimates) <- paste0("std:", grouplab)
esstd <- matrix(NA, length(varfree), ngroups - 1)
colnames(esstd) <- paste0("diff_std:", complab)
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
index <- which((pt1$lhs %in% varfree) & (pt1$op == "~1") & (pt1$group == 1))
for(i in seq_along(index)) {
runnum <- index[i]
isfree <- pt1$free[runnum] != 0
if(isfree) {
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
} else {
temp <- fixParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 2:ngroups, ustart = pt1$ustart[runnum])
}
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[i,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
isfree0 <- pt0$free[runnum] != 0
if(isfree0) {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
} else {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 2:ngroups)
}
estimates[i, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if(!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if(!is(compresult0, "try-error")) freeCon[i,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
meanVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[i, 2:ncol(estimates)] <- meanVal
facVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~~", pt0$lhs[runnum], 1:ngroups)
ifelse(poolvar, refFacVal <- poolVariance(facVal, neach), refFacVal <- facVal[refgroup])
stdMeanVal <- meanVal / sqrt(refFacVal)
stdestimates[i,] <- stdMeanVal
stdMeanVal <- stdMeanVal[grouporder]
esstd[i,] <- stdMeanVal[2:ngroups] - stdMeanVal[1]
}
listFreeCon <- c(listFreeCon, tryresult0)
waldCon[i,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- namept1[index]
estimates <- cbind(estimates, stdestimates, esstd)
result <- cbind(freeCon, fixCon, waldCon)
}
if(return.fit) {
return(invisible(list(estimates = estimates, results = result, models = list(free = listFreeCon, fix = listFixCon, nested = fit0, parent = fit1))))
} else {
return(list(estimates = estimates, results = result))
}
}
##' @importFrom lavaan lavInspect parTable
##' @rdname partialInvariance
##' @export
partialInvarianceCat <- function(fit, type, free = NULL, fix = NULL,
refgroup = 1, poolvar = TRUE,
p.adjust = "none", return.fit = FALSE,
method = "satorra.bentler.2001") {
type <- tolower(type)
numType <- 1
fit1 <- fit0 <- NULL
# fit0 = Nested model, fit1 = Parent model
if (type %in% c("metric", "weak", "loading", "loadings")) {
numType <- 1
if (all(c("fit.configural", "fit.loadings") %in% names(fit))) {
fit1 <- fit$fit.configural
fit0 <- fit$fit.loadings
} else {
stop("The elements named 'fit.configural' and 'fit.loadings' are needed",
" in the 'fit' argument")
}
} else if (type %in% c("scalar", "strong", "intercept", "intercepts",
"threshold", "thresholds")) {
numType <- 2
if (all(c("fit.loadings", "fit.thresholds") %in% names(fit))) {
fit1 <- fit$fit.loadings
fit0 <- fit$fit.thresholds
} else {
stop("The elements named 'fit.loadings' and 'fit.thresholds' are needed",
" in the 'fit' argument")
}
} else if (type %in% c("strict", "residual", "residuals", "error", "errors")) {
numType <- 3
if ("fit.residuals" %in% names(fit)) {
fit0 <- fit$fit.residuals
if ("fit.thresholds" %in% names(fit)) {
fit1 <- fit$fit.thresholds
} else if ("fit.loadings" %in% names(fit)) {
fit1 <- fit$fit.loadings
} else {
stop("The element named either 'fit.thresholds' or 'fit.loadings' is",
" needed in the 'fit' argument")
}
} else {
stop("The element named 'fit.residuals' is needed in the 'fit' argument")
}
} else if (type %in% c("means", "mean")) {
numType <- 4
if ("fit.means" %in% names(fit)) {
fit0 <- fit$fit.means
if("fit.residuals" %in% names(fit)) {
fit1 <- fit$fit.residuals
} else if ("fit.thresholds" %in% names(fit)) {
fit1 <- fit$fit.thresholds
} else if ("fit.loadings" %in% names(fit)) {
fit1 <- fit$fit.loadings
} else {
stop("The element named either 'fit.residuals', 'fit.thresholds',",
" or 'fit.loadings' is needed in the 'fit' argument")
}
} else {
stop("The element named 'fit.means' is needed in the 'fit' argument")
}
} else {
stop("Please specify the correct type of measurement invariance. See the help page.")
}
pt1 <- parTable(fit1)
pt0 <- parTable(fit0)
pt0$start <- pt0$est <- pt0$se <- NULL
pt1$start <- pt1$est <- pt1$se <- NULL
pt1$label[substr(pt1$label, 1, 1) == "." & substr(pt1$label, nchar(pt1$label),
nchar(pt1$label)) == "."] <- ""
pt0$label[substr(pt0$label, 1, 1) == "." & substr(pt0$label, nchar(pt0$label),
nchar(pt0$label)) == "."] <- ""
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
if (length(table(table(pt0$rhs[pt0$op == "=~"]))) != 1)
stop("The model is not congeneric. This function does not support non-congeneric model.")
varfree <- varnames <- unique(pt0$rhs[pt0$op == "=~"])
facnames <- unique(pt0$lhs[(pt0$op == "=~") & (pt0$rhs %in% varnames)])
facrepresent <- table(pt0$lhs[(pt0$op == "=~") & (pt0$rhs %in% varnames)],
pt0$rhs[(pt0$op == "=~") & (pt0$rhs %in% varnames)])
if (any(apply(facrepresent, 2, function(x) sum(x != 0)) > 1))
stop("The model is not congeneric. This function does not support non-congeneric model.")
facList <- list()
for (i in 1:nrow(facrepresent)) {
facList[[i]] <- colnames(facrepresent)[facrepresent[i,] > 0]
}
names(facList) <- rownames(facrepresent)
facList <- facList[match(names(facList), facnames)]
fixLoadingFac <- list()
for (i in seq_along(facList)) {
select <- pt1$lhs == names(facList)[i] & pt1$op == "=~" & pt1$rhs %in% facList[[i]] & pt1$group == 1 & pt1$free == 0 & (!is.na(pt1$ustart) & pt1$ustart > 0)
fixLoadingFac[[i]] <- pt1$rhs[select]
}
names(fixLoadingFac) <- names(facList)
# Find the number of thresholds
# Check whether the factor configuration is the same across gorups
conParTable <- lapply(pt1, "[", pt1$op == "==")
group1pt <- lapply(pt1, "[", pt1$group != 1)
numThreshold <- table(sapply(group1pt, "[", group1pt$op == "|")[,"lhs"])
plabelthres <- split(group1pt$plabel[group1pt$op == "|"], group1pt$lhs[group1pt$op == "|"])
numFixedThreshold <- sapply(lapply(plabelthres, function(vec) !is.na(match(vec, conParTable$lhs)) | !is.na(match(vec, conParTable$rhs))), sum)[names(numThreshold)]
#numFixedThreshold <- table(sapply(group1pt, "[", group1pt$op == "|" & group1pt$eq.id != 0)[,"lhs"])
fixIntceptFac <- list()
for (i in seq_along(facList)) {
tmp <- numFixedThreshold[facList[[i]]]
if (all(tmp > 1)) {
fixIntceptFac[[i]] <- integer(0)
} else {
fixIntceptFac[[i]] <- names(which.max(tmp))[1]
}
}
names(fixIntceptFac) <- names(facList)
ngroups <- max(pt0$group)
neach <- lavInspect(fit0, "nobs")
groupvar <- lavInspect(fit0, "group")
grouplab <- lavInspect(fit0, "group.label")
if (!is.numeric(refgroup)) refgroup <- which(refgroup == grouplab)
grouporder <- 1:ngroups
grouporder <- c(refgroup, setdiff(grouporder, refgroup))
grouplaborder <- grouplab[grouporder]
complab <- paste(grouplaborder[2:ngroups], "vs.", grouplaborder[1])
if (ngroups <= 1) stop("Well, the number of groups is 1. Measurement",
" invariance across 'groups' cannot be done.")
if (numType == 4) {
if (!all(c(free, fix) %in% facnames))
stop("'free' and 'fix' arguments should consist of factor names because",
" mean invariance is tested.")
} else {
if (!all(c(free, fix) %in% varnames))
stop("'free' and 'fix' arguments should consist of variable names.")
}
result <- fixCon <- freeCon <- NULL
estimates <- NULL
listFreeCon <- listFixCon <- list()
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
if (numType == 1) {
if (!is.null(free) | !is.null(fix)) {
if (!is.null(fix)) {
facinfix <- findFactor(fix, facList)
dup <- duplicated(facinfix)
for (i in seq_along(fix)) {
if (dup[i]) {
pt0 <- constrainParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups)
pt1 <- constrainParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups)
} else {
oldmarker <- fixLoadingFac[[facinfix[i]]]
if (length(oldmarker) > 0) {
oldmarkerval <- pt1$ustart[pt1$lhs == facinfix[i] & pt1$op == "=~" & pt1$rhs == oldmarker & pt1$group == 1]
if (oldmarker == fix[i]) {
pt0 <- fixParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
} else {
pt0 <- freeParTable(pt0, facinfix[i], "=~", oldmarker, 1:ngroups)
pt0 <- constrainParTable(pt0, facinfix[i], "=~", oldmarker, 1:ngroups)
pt1 <- freeParTable(pt1, facinfix[i], "=~", oldmarker, 1:ngroups)
pt0 <- fixParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups, oldmarkerval)
fixLoadingFac[[facinfix[i]]] <- fix[i]
}
} else {
pt0 <- constrainParTable(pt0, facinfix[i], "=~", fix[i], 1:ngroups)
pt1 <- constrainParTable(pt1, facinfix[i], "=~", fix[i], 1:ngroups)
}
}
}
}
if (!is.null(free)) {
facinfree <- findFactor(free, facList)
for (i in seq_along(free)) {
# Need to change marker variable if fixed
oldmarker <- fixLoadingFac[[facinfree[i]]]
if (length(oldmarker) > 0 && oldmarker == free[i]) {
oldmarkerval <- pt1$ustart[pt1$lhs == facinfix[i] & pt1$op == "=~" & pt1$rhs == oldmarker & pt1$group == 1]
candidatemarker <- setdiff(facList[[facinfree[i]]], free[i])[1]
pt0 <- freeParTable(pt0, facinfree[i], "=~", free[i], 1:ngroups)
pt1 <- freeParTable(pt1, facinfree[i], "=~", free[i], 1:ngroups)
pt0 <- fixParTable(pt0, facinfix[i], "=~", candidatemarker, 1:ngroups, oldmarkerval)
pt1 <- fixParTable(pt1, facinfix[i], "=~", candidatemarker, 1:ngroups, oldmarkerval)
fixLoadingFac[[facinfix[i]]] <- candidatemarker
} else {
pt0 <- freeParTable(pt0, facinfree[i], "=~", free[i], 1:ngroups)
pt1 <- freeParTable(pt1, facinfree[i], "=~", free[i], 1:ngroups)
}
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
estimates <- matrix(NA, length(varfree), ngroups + 1)
stdestimates <- matrix(NA, length(varfree), ngroups)
colnames(estimates) <- c("poolest", paste0("load:", grouplab))
colnames(stdestimates) <- paste0("std:", grouplab)
esstd <- esz <- matrix(NA, length(varfree), ngroups - 1)
colnames(esstd) <- paste0("diff_std:", complab)
colnames(esz) <- paste0("q:", complab)
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
index <- which((pt1$rhs %in% varfree) & (pt1$op == "=~") & (pt1$group == 1))
facinfix <- findFactor(fix, facList)
varinfixvar <- unlist(facList[facinfix])
varinfixvar <- setdiff(varinfixvar, setdiff(varinfixvar, varfree))
indexfixvar <- which((pt1$rhs %in% varinfixvar) & (pt1$op == "=~") & (pt1$group == 1))
varnonfixvar <- setdiff(varfree, varinfixvar)
indexnonfixvar <- setdiff(index, indexfixvar)
pos <- 1
for (i in seq_along(indexfixvar)) {
runnum <- indexfixvar[i]
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if (!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if (!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
loadVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 2:ncol(estimates)] <- loadVal
facVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~~", pt0$lhs[runnum], 1:ngroups)
totalVal <- sapply(thetaImpliedTotalVar(tempfit0), function(x, v) x[v, v], v = pt0$rhs[runnum])
names(facVal) <- names(totalVal) <- grouplab
ifelse(poolvar, refFacVal <- poolVariance(facVal, neach), refFacVal <- facVal[refgroup])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdLoadVal <- loadVal * sqrt(refFacVal) / sqrt(refTotalVal)
stdestimates[pos,] <- stdLoadVal
stdLoadVal <- stdLoadVal[grouporder]
esstd[pos,] <- stdLoadVal[2:ngroups] - stdLoadVal[1]
if (any(abs(stdLoadVal) > 0.9999))
warning(paste("Standardized Loadings of", pt0$rhs[runnum],
"in some groups are less than -1 or over 1. The",
" standardized loadings used in Fisher z",
" transformation are changed to -0.9999 or 0.9999."))
stdLoadVal[stdLoadVal > 0.9999] <- 0.9999
stdLoadVal[stdLoadVal < -0.9999] <- -0.9999
zLoadVal <- atanh(stdLoadVal)
esz[pos,] <- zLoadVal[2:ngroups] - zLoadVal[1]
}
listFreeCon <- c(listFreeCon, tryresult0)
waldCon[pos,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
pos <- pos + 1
}
facinvarfree <- findFactor(varnonfixvar, facList)
for (i in seq_along(indexnonfixvar)) {
runnum <- indexnonfixvar[i]
# Need to change marker variable if fixed
oldmarker <- fixLoadingFac[[facinvarfree[i]]]
if (length(oldmarker) > 0 && oldmarker == varnonfixvar[i]) {
candidatemarker <- setdiff(facList[[facinvarfree[i]]], varnonfixvar[i])[1]
temp <- freeParTable(pt1, facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups)
temp <- constrainParTable(temp, facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups)
temp <- fixParTable(temp, facinvarfree[i], "=~", candidatemarker, 1:ngroups)
newparent <- freeParTable(pt1, facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups)
newparent <- fixParTable(newparent, facinvarfree[i], "=~", candidatemarker, 1:ngroups)
newparentresult <- try(newparentfit <- refit(newparent, fit1), silent = TRUE)
if (!is(newparentresult, "try-error")) {
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, newparentfit, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(newparentfit, tempfit))
}
waldCon[pos,] <- waldConstraint(newparentfit, newparent, waldMat, cbind(facinvarfree[i], "=~", varnonfixvar[i], 1:ngroups))
}
} else {
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
waldCon[pos,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
}
listFixCon <- c(listFixCon, tryresult)
if (length(oldmarker) > 0 && oldmarker == varnonfixvar[i]) {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 2:ngroups)
} else {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
}
estimates[pos, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if (!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if (!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
loadVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, 2:ncol(estimates)] <- loadVal
facVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~~", pt0$lhs[runnum], 1:ngroups)
totalVal <- sapply(thetaImpliedTotalVar(tempfit0), function(x, v) x[v, v], v = pt0$rhs[runnum])
names(facVal) <- names(totalVal) <- grouplab
ifelse(poolvar, refFacVal <- poolVariance(facVal, neach), refFacVal <- facVal[refgroup])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdLoadVal <- loadVal * sqrt(refFacVal) / sqrt(refTotalVal)
stdestimates[pos,] <- stdLoadVal
stdLoadVal <- stdLoadVal[grouporder]
esstd[pos,] <- stdLoadVal[2:ngroups] - stdLoadVal[1]
if (any(abs(stdLoadVal) > 0.9999))
warning(paste("Standardized Loadings of", pt0$rhs[runnum],
"in some groups are less than -1 or over 1. The",
" standardized loadings used in Fisher z",
" transformation are changed to -0.9999 or 0.9999."))
stdLoadVal[stdLoadVal > 0.9999] <- 0.9999
stdLoadVal[stdLoadVal < -0.9999] <- -0.9999
zLoadVal <- atanh(stdLoadVal)
esz[pos,] <- zLoadVal[2:ngroups] - zLoadVal[1]
}
listFreeCon <- c(listFreeCon, tryresult0)
pos <- pos + 1
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- namept1[c(indexfixvar, indexnonfixvar)]
estimates <- cbind(estimates, stdestimates, esstd, esz)
result <- cbind(freeCon, fixCon, waldCon)
} else if (numType == 2) {
if (!is.null(free) | !is.null(fix)) {
if (!is.null(fix)) {
facinfix <- findFactor(fix, facList)
dup <- duplicated(facinfix)
for (i in seq_along(fix)) {
numfixthres <- numThreshold[fix[i]]
if (numfixthres > 1) {
if (dup[i]) {
for (s in 2:numfixthres) {
pt0 <- constrainParTable(pt0, fix[i], "|", paste0("t", s), 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "|", paste0("t", s), 1:ngroups)
}
} else {
oldmarker <- fixIntceptFac[[facinfix[i]]]
numoldthres <- numThreshold[oldmarker]
if (length(oldmarker) > 0) {
if (oldmarker == fix[i]) {
for (s in 2:numfixthres) {
pt0 <- constrainParTable(pt0, fix[i], "|", paste0("t", s), 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "|", paste0("t", s), 1:ngroups)
}
} else {
for (r in 2:numoldthres) {
pt1 <- freeParTable(pt1, oldmarker, "|", paste0("t", r), 1:ngroups)
}
for (s in 2:numfixthres) {
pt0 <- constrainParTable(pt0, fix[i], "|", paste0("t", s), 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "|", paste0("t", s), 1:ngroups)
}
fixIntceptFac[[facinfix[i]]] <- fix[i]
}
} else {
for (s in 2:numfixthres) {
pt0 <- constrainParTable(pt0, fix[i], "|", paste0("t", s), 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "|", paste0("t", s), 1:ngroups)
}
}
}
}
}
}
if (!is.null(free)) {
facinfree <- findFactor(free, facList)
for (i in seq_along(free)) {
numfreethres <- numThreshold[free[i]]
# Need to change marker variable if fixed
oldmarker <- fixIntceptFac[[facinfree[i]]]
numoldthres <- numThreshold[oldmarker]
if (length(oldmarker) > 0 && oldmarker == free[i]) {
candidatemarker <- setdiff(facList[[facinfree[i]]], free[i])
candidatemarker <- candidatemarker[numThreshold[candidatemarker] > 1][1]
numcandidatethres <- numThreshold[candidatemarker]
pt0 <- constrainParTable(pt0, candidatemarker, "|", "t2", 1:ngroups)
pt1 <- constrainParTable(pt1, candidatemarker, "|", "t2", 1:ngroups)
for (s in 2:numfixthres) {
pt0 <- freeParTable(pt0, free[i], "|", paste0("t", s), 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "|", paste0("t", s), 1:ngroups)
}
fixIntceptFac[[facinfix[i]]] <- candidatemarker
} else {
for (s in 2:numfixthres) {
pt0 <- freeParTable(pt0, free[i], "|", paste0("t", s), 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "|", paste0("t", s), 1:ngroups)
}
}
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
maxcolumns <- max(numThreshold[varfree]) - 1
tname <- paste0("t", 2:(maxcolumns + 1))
estimates <- matrix(NA, length(varfree), (ngroups * length(tname)) + length(tname))
stdestimates <- matrix(NA, length(varfree), ngroups * length(tname))
tnameandlab <- expand.grid(tname, grouplab)
colnames(estimates) <- c(paste0("pool:", tname), paste0(tnameandlab[,1], ":", tnameandlab[,2]))
colnames(stdestimates) <- paste0("std:", tnameandlab[,1], ":", tnameandlab[,2])
esstd <- matrix(NA, length(varfree), (ngroups - 1)* length(tname))
tnameandcomplab <- expand.grid(tname, complab)
colnames(esstd) <- paste0("diff_std:", tnameandcomplab[,1], ":", tnameandcomplab[,2])
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
facinfix <- findFactor(fix, facList)
varinfixvar <- unlist(facList[facinfix])
varinfixvar <- setdiff(varinfixvar, setdiff(varinfixvar, varfree))
varnonfixvar <- setdiff(varfree, varinfixvar)
pos <- 1
for (i in seq_along(varinfixvar)) {
temp <- pt1
for (s in 2:numThreshold[varinfixvar[i]]) {
runnum <- which((pt1$lhs == varfree[i]) & (pt1$op == "|") & (pt1$rhs == paste0("t", s)) & (pt1$group == 1))
temp <- constrainParTable(temp, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
}
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
temp0 <- pt0
for (s in 2:numThreshold[varinfixvar[i]]) {
runnum <- which((pt0$lhs == varfree[i]) & (pt0$op == "|") & (pt0$rhs == paste0("t", s)) & (pt0$group == 1))
temp0 <- freeParTable(temp0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, s - 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
}
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if (!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if (!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
for (s in 2:numThreshold[varinfixvar[i]]) {
runnum <- which((pt0$lhs == varfree[i]) & (pt0$op == "|") & (pt0$rhs == paste0("t", s)) & (pt0$group == 1))
thresVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, maxcolumns*(1:ngroups) + (s - 1)] <- thresVal
totalVal <- sapply(thetaImpliedTotalVar(tempfit0), function(x, v) x[v, v], v = pt0$lhs[runnum])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdIntVal <- thresVal / sqrt(refTotalVal)
stdestimates[pos, maxcolumns*(1:ngroups - 1) + (s - 1)] <- stdIntVal
stdIntVal <- stdIntVal[grouporder]
esstd[pos, maxcolumns*(1:length(complab) - 1) + (s - 1)] <- stdIntVal[2:ngroups] - stdIntVal[1]
}
}
listFreeCon <- c(listFreeCon, tryresult0)
args <- list(fit1, pt1, waldMat)
for (s in 2:numThreshold[varinfixvar[i]]) {
runnum <- which((pt1$lhs == varfree[i]) & (pt1$op == "|") & (pt1$rhs == paste0("t", s)) & (pt1$group == 1))
args <- c(args, list(cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)))
}
waldCon[pos,] <- do.call(waldConstraint, args)
pos <- pos + 1
}
facinvarfree <- findFactor(varnonfixvar, facList)
for (i in seq_along(varnonfixvar)) {
# Need to change marker variable if fixed
oldmarker <- fixIntceptFac[[facinvarfree[i]]]
if (length(oldmarker) > 0 && oldmarker == varfree[i]) {
candidatemarker <- setdiff(facList[[facinvarfree[i]]], varnonfixvar[i])
candidatemarker <- candidatemarker[numThreshold[candidatemarker] > 1][1]
numcandidatethres <- numThreshold[candidatemarker]
newparent <- constrainParTable(pt1, candidatemarker, "|", "t2", 1:ngroups)
for (s in 2:numcandidatethres) {
newparent <- freeParTable(newparent, varnonfixvar[i], "|", paste0("t", s), 1:ngroups)
}
temp <- newparent
for (s in 2:numThreshold[varnonfixvar[i]]) {
runnum <- which((newparent$lhs == varnonfixvar[i]) & (newparent$op == "|") & (newparent$rhs == paste0("t", s)) & (newparent$group == 1))
temp <- constrainParTable(temp, newparent$lhs[runnum], newparent$op[runnum], newparent$rhs[runnum], 1:ngroups)
}
newparentresult <- try(newparentfit <- refit(newparent, fit1), silent = TRUE)
if (!is(newparentresult, "try-error")) {
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, newparentfit, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(newparentfit, tempfit))
}
args <- list(newparentfit, newparent, waldMat)
for (s in 2:numThreshold[varnonfixvar[i]]) {
runnum <- which((newparent$lhs == varnonfixvar[i]) & (newparent$op == "|") & (newparent$rhs == paste0("t", s)) & (newparent$group == 1))
args <- c(args, list(cbind(newparent$lhs[runnum], newparent$op[runnum], newparent$rhs[runnum], 1:ngroups)))
}
waldCon[pos,] <- do.call(waldConstraint, args)
}
} else {
temp <- pt1
for (s in 2:numThreshold[varnonfixvar[i]]) {
runnum <- which((pt1$lhs == varfree[i]) & (pt1$op == "|") & (pt1$rhs == paste0("t", s)) & (pt1$group == 1))
temp <- constrainParTable(temp, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
}
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[pos,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
args <- list(fit1, pt1, waldMat)
for (s in 2:numThreshold[varnonfixvar[i]]) {
runnum <- which((pt1$lhs == varfree[i]) & (pt1$op == "|") & (pt1$rhs == paste0("t", s)) & (pt1$group == 1))
args <- c(args, list(cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)))
}
waldCon[pos,] <- do.call(waldConstraint, args)
}
listFixCon <- c(listFixCon, tryresult)
temp0 <- pt0
for (s in 2:numThreshold[varnonfixvar[i]]) {
runnum <- which((pt0$lhs == varfree[i]) & (pt0$op == "|") & (pt0$rhs == paste0("t", s)) & (pt0$group == 1))
temp0 <- freeParTable(temp0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, s - 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
}
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if (!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if (!is(compresult0, "try-error")) freeCon[pos,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
for (s in 2:numThreshold[varnonfixvar[i]]) {
runnum <- which((pt0$lhs == varfree[i]) & (pt0$op == "|") & (pt0$rhs == paste0("t", s)) & (pt0$group == 1))
thresVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[pos, maxcolumns*(1:ngroups) + (s - 1)] <- thresVal
totalVal <- sapply(thetaImpliedTotalVar(tempfit0), function(x, v) x[v, v], v = pt0$lhs[runnum])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdIntVal <- thresVal / sqrt(refTotalVal)
stdestimates[pos, maxcolumns*(1:ngroups - 1) + (s - 1)] <- stdIntVal
stdIntVal <- stdIntVal[grouporder]
esstd[pos, maxcolumns*(1:length(complab) - 1) + (s - 1)] <- stdIntVal[2:ngroups] - stdIntVal[1]
}
}
listFreeCon <- c(listFreeCon, tryresult0)
pos <- pos + 1
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- paste0(c(varinfixvar, varnonfixvar), "|")
estimates <- cbind(estimates, stdestimates, esstd)
result <- cbind(freeCon, fixCon, waldCon)
} else if (numType == 3) {
if (!is.null(free) | !is.null(fix)) {
if (!is.null(fix)) {
for (i in seq_along(fix)) {
pt0 <- constrainParTable(pt0, fix[i], "~~", fix[i], 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "~~", fix[i], 1:ngroups)
}
}
if (!is.null(free)) {
for (i in seq_along(free)) {
pt0 <- freeParTable(pt0, free[i], "~~", free[i], 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "~~", free[i], 1:ngroups)
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
estimates <- matrix(NA, length(varfree), ngroups + 1)
stdestimates <- matrix(NA, length(varfree), ngroups)
colnames(estimates) <- c("poolest", paste0("errvar:", grouplab))
colnames(stdestimates) <- paste0("std:", grouplab)
esstd <- esz <- matrix(NA, length(varfree), ngroups - 1)
colnames(esstd) <- paste0("diff_std:", complab)
colnames(esz) <- paste0("h:", complab)
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
index <- which((pt1$lhs %in% varfree) & (pt1$op == "~~") & (pt1$lhs == pt1$rhs) & (pt1$group == 1))
for (i in seq_along(index)) {
runnum <- index[i]
ustart <- getValue(pt1, beta, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1)
temp <- fixParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 2:ngroups, ustart)
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[i,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 2:ngroups)
estimates[i, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if (!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if (!is(compresult0, "try-error")) freeCon[i,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
errVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[i, 2:ncol(estimates)] <- errVal
totalVal <- sapply(thetaImpliedTotalVar(tempfit0), function(x, v) x[v, v], v = pt0$rhs[runnum])
ifelse(poolvar, refTotalVal <- poolVariance(totalVal, neach), refTotalVal <- totalVal[refgroup])
stdErrVal <- errVal / sqrt(refTotalVal)
stdestimates[i,] <- stdErrVal
stdErrVal <- stdErrVal[grouporder]
esstd[i,] <- stdErrVal[2:ngroups] - stdErrVal[1]
if (any(abs(stdErrVal) > 0.9999))
warning(paste("The uniqueness of", pt0$rhs[runnum],
"in some groups are over 1. The uniqueness used in",
" arctan transformation are changed to 0.9999."))
stdErrVal[stdErrVal > 0.9999] <- 0.9999
zErrVal <- asin(sqrt(stdErrVal))
esz[i,] <- zErrVal[2:ngroups] - zErrVal[1]
}
listFreeCon <- c(listFreeCon, tryresult0)
waldCon[i,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- namept1[index]
estimates <- cbind(estimates, stdestimates, esstd, esz)
result <- cbind(freeCon, fixCon, waldCon)
} else if (numType == 4) {
varfree <- facnames
if (!is.null(free) | !is.null(fix)) {
if (!is.null(fix)) {
for (i in seq_along(fix)) {
pt0 <- constrainParTable(pt0, fix[i], "~1", "", 1:ngroups)
pt1 <- constrainParTable(pt1, fix[i], "~1", "", 1:ngroups)
}
}
if (!is.null(free)) {
for (i in seq_along(free)) {
pt0 <- freeParTable(pt0, free[i], "~1", "", 1:ngroups)
pt1 <- freeParTable(pt1, free[i], "~1", "", 1:ngroups)
}
}
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
fit0 <- refit(pt0, fit0)
fit1 <- refit(pt1, fit1)
beta <- lavaan::coef(fit1)
beta0 <- lavaan::coef(fit0)
waldMat <- matrix(0, ngroups - 1, length(beta))
varfree <- setdiff(varfree, c(free, fix))
}
estimates <- matrix(NA, length(varfree), ngroups + 1)
stdestimates <- matrix(NA, length(varfree), ngroups)
colnames(estimates) <- c("poolest", paste0("mean:", grouplab))
colnames(stdestimates) <- paste0("std:", grouplab)
esstd <- matrix(NA, length(varfree), ngroups - 1)
colnames(esstd) <- paste0("diff_std:", complab)
fixCon <- freeCon <- matrix(NA, length(varfree), 4)
waldCon <- matrix(NA, length(varfree), 3)
colnames(fixCon) <- c("fix.chi", "fix.df", "fix.p", "fix.cfi")
colnames(freeCon) <- c("free.chi", "free.df", "free.p", "free.cfi")
colnames(waldCon) <- c("wald.chi", "wald.df", "wald.p")
index <- which((pt1$lhs %in% varfree) & (pt1$op == "~1") & (pt1$group == 1))
for (i in seq_along(index)) {
runnum <- index[i]
isfree <- pt1$free[runnum] != 0
if (isfree) {
temp <- constrainParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups)
} else {
temp <- fixParTable(pt1, pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 2:ngroups, ustart = pt1$ustart[runnum])
}
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if (!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if (!is(compresult, "try-error")) fixCon[i,] <- c(unlist(modelcomp[2,5:7]), deltacfi(fit1, tempfit))
}
listFixCon <- c(listFixCon, tryresult)
isfree0 <- pt0$free[runnum] != 0
if (isfree0) {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
} else {
temp0 <- freeParTable(pt0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 2:ngroups)
}
estimates[i, 1] <- getValue(pt0, beta0, pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1)
tryresult0 <- try(tempfit0 <- refit(temp0, fit0), silent = TRUE)
if (!is(tryresult0, "try-error")) {
compresult0 <- try(modelcomp0 <- lavaan::lavTestLRT(tempfit0, fit0, method = method), silent = TRUE)
if (!is(compresult0, "try-error")) freeCon[i,] <- c(unlist(modelcomp0[2,5:7]), deltacfi(tempfit0, fit0))
meanVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], pt0$op[runnum], pt0$rhs[runnum], 1:ngroups)
estimates[i, 2:ncol(estimates)] <- meanVal
facVal <- getValue(temp0, lavaan::coef(tempfit0), pt0$lhs[runnum], "~~", pt0$lhs[runnum], 1:ngroups)
ifelse(poolvar, refFacVal <- poolVariance(facVal, neach), refFacVal <- facVal[refgroup])
stdMeanVal <- meanVal / sqrt(refFacVal)
stdestimates[i,] <- stdMeanVal
stdMeanVal <- stdMeanVal[grouporder]
esstd[i,] <- stdMeanVal[2:ngroups] - stdMeanVal[1]
}
listFreeCon <- c(listFreeCon, tryresult0)
waldCon[i,] <- waldConstraint(fit1, pt1, waldMat, cbind(pt1$lhs[runnum], pt1$op[runnum], pt1$rhs[runnum], 1:ngroups))
}
freeCon[,3] <- stats::p.adjust(freeCon[,3], p.adjust)
fixCon[,3] <- stats::p.adjust(fixCon[,3], p.adjust)
waldCon[,3] <- stats::p.adjust(waldCon[,3], p.adjust)
rownames(fixCon) <- names(listFixCon) <- rownames(freeCon) <- names(listFreeCon) <- rownames(waldCon) <- rownames(estimates) <- namept1[index]
estimates <- cbind(estimates, stdestimates, esstd)
result <- cbind(freeCon, fixCon, waldCon)
}
if (return.fit) {
return(invisible(list(estimates = estimates, results = result, models = list(free = listFreeCon, fix = listFixCon, nested = fit0, parent = fit1))))
} else {
return(list(estimates = estimates, results = result))
}
}
## ----------------
## Hidden Functions
## ----------------
findFactor <- function(var, facList) {
tempfac <- lapply(facList, intersect, var)
facinvar <- rep(names(tempfac), sapply(tempfac, length))
facinvar[match(unlist(tempfac), var)]
}
## Terry moved here from wald.R so that wald() could be removed (redundant with lavaan::lavTestWald)
## FIXME: Update WaldConstraint to rely on lavaan::lavTestWald instead
#' @importFrom stats pchisq
waldContrast <- function(object, contrast) {
beta <- lavaan::coef(object)
acov <- lavaan::vcov(object)
chisq <- t(contrast %*% beta) %*% solve(contrast %*% as.matrix(acov) %*% t(contrast)) %*% (contrast %*% beta)
df <- nrow(contrast)
p <- pchisq(chisq, df, lower.tail=FALSE)
c(chisq = chisq, df = df, p = p)
}
#' @importFrom lavaan parTable
waldConstraint <- function(fit, pt, mat, ...) {
dotdotdot <- list(...)
overallMat <- NULL
for(i in seq_along(dotdotdot)) {
target <- dotdotdot[[i]]
tempMat <- mat
element <- apply(target, 1, matchElement, parTable = pt)
freeIndex <- pt$free[element]
tempMat[,freeIndex[1]] <- -1
for(m in 2:length(freeIndex)) {
tempMat[m - 1, freeIndex[m]] <- 1
}
overallMat <- rbind(overallMat, tempMat)
}
result <- rep(NA, 3)
if(!any(apply(overallMat, 1, sum) != 0)) {
try(result <- waldContrast(fit, overallMat), silent = TRUE)
}
return(result)
}
poolVariance <- function(var, n) {
nm <- n - 1
sum(var * nm) / sum(nm)
}
deltacfi <- function(parent, nested) lavaan::fitmeasures(nested)["cfi"] - lavaan::fitmeasures(parent)["cfi"]
## For categorical.
## FIXME: Why is this even necessary?
## Did Sunthud not know implied Sigma is available?
#' @importFrom lavaan lavInspect
thetaImpliedTotalVar <- function(object) {
# param <- lavInspect(object, "est")
# ngroup <- lavInspect(object, "ngroups")
# name <- names(param)
# if(ngroup == 1) {
# ly <- param[name == "lambda"]
# } else {
# ly <- lapply(param, "[[", "lambda")
# }
# ps <- lavInspect(object, "cov.lv")
# if(ngroup == 1) ps <- list(ps)
# if(ngroup == 1) {
# te <- param[name == "theta"]
# } else {
# te <- lapply(param, "[[", "theta")
# }
# result <- list()
# for(i in 1:ngroup) {
# result[[i]] <- ly[[i]] %*% ps[[i]] %*% t(ly[[i]]) + te[[i]]
# }
# result
if (lavInspect(object, "ngroups") == 1L) return(list(lavInspect(object, "cov.ov")))
lavInspect(object, "cov.ov")
}
## MOVED FROM lonInvariance.R when it was removed from semTools 0.5-8 (9 Feb 2026)
# constrainParTable: Impose equality constraints in any set of elements in the parameter table
constrainParTable <- function(parTable, lhs, op, rhs, group) {
parTable$start <- parTable$est <- parTable$se <- NULL
target <- cbind(lhs, op, rhs, group)
element <- apply(target, 1, matchElement, parTable=parTable)
# id lhs op rhs user group free ustart exo label plabel start
for (i in 2:length(element)) {
len <- length(parTable$id)
newline <- list(lhs = parTable$plabel[element[1]], op = "==",
rhs = parTable$plabel[element[i]])
if (!any(parTable$lhs == newline$lhs & parTable$op == newline$op &
parTable$rhs == newline$rhs)) parTable <- patMerge(pt1 = parTable, pt2 = newline)
}
parTable
}
# matchElement: Find the number of row that have the specification in vec (lhs, op, rhs, group)
matchElement <- function(parTable, vec) {
if (is.null(parTable$group)) {
return(which((parTable$lhs == vec[1]) & (parTable$op == vec[2]) & (parTable$rhs == vec[3])))
} else {
return(which((parTable$lhs == vec[1]) & (parTable$op == vec[2]) & (parTable$rhs == vec[3]) & (parTable$group == vec[4])))
}
}
getValue <- function(parTable, est, lhs, op, rhs, group) {
target <- cbind(lhs, op, rhs, group)
element <- apply(target, 1, matchElement, parTable = parTable)
free <- parTable$free[element]
out <- parTable$ustart[element]
out[free != 0] <- est[free[free != 0]]
out
}
semTools/R/measEq.R 0000644 0001762 0000144 00000453013 15142343551 013627 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 9 February 2026
### lavaan model syntax-writing engine for new measEq() to replace
### measurementInvariance(), measurementInvarianceCat(), and longInvariance()
## ----------------------
## Model-Fitting Function
## ----------------------
measEq <- function(configural.model,
ID.fac = c("std.lv","auto.fix.first","effects.coding"),
ID.cat = c("Wu.Estabrook.2016","Mplus","Millsap.Tein.2004","LISREL"),
ID.thr = c(1L, 2L), # only for ID.cat == "Millsap.Tein.2004"
group = NULL, longFacNames = list(), longIndNames = list(),
#group.equal = "", long.equal = "",
group.partial = "", long.partial = "",
auto = "all", extra = NULL,
test.seq = c("thresholds","loadings","intercepts","means",
"lv.variances","residuals"), # optional resid/lv.autocov, residual/lv.covariances
#fixed.x = TRUE, strict = FALSE, quiet = FALSE,
warn = TRUE, debug = FALSE,
alpha = .05, fit.measures = "default", argsLRT = list(), ...) {
#TODO: check GLIST structure for multilevel (with single and multiple groups)
#TODO: compatibility with auxiliary(), runMI(), parcelAllocation(), permuteMeasEq()
#TODO: develop automated anchorSelection() and measEq.partial()
#TODO: if (inherits(configural.model, "lavaan.measEq")) {continue sequence}?
## This example might help: https://groups.google.com/d/msg/lavaan/LvALeUpJBDg/2zD1CoikAQAJ
#TODO: add argument to accept measEq.partial output, to continue sequence (or make and update() method?)
if (is.character(group.partial)) {
if ((group.partial == "")[1L] && length(group.partial) == 1L) {
group.partial <- data.frame(stringsAsFactors = FALSE, lhs = character(0),
op = character(0), rhs = character(0))
} else {
group.partial <- lavaan::lavParseModelString(group.partial,
as.data.frame. = TRUE,
warn = warn, debug = debug)
}
} #TODO: else {extract information from a measEq.partial object}
if (is.character(long.partial)) {
if ((long.partial == "")[1L] && length(long.partial) == 1L) {
long.partial <- data.frame(stringsAsFactors = FALSE, lhs = character(0),
op = character(0), rhs = character(0))
} else {
long.partial <- lavaan::lavParseModelString(long.partial,
as.data.frame. = TRUE,
warn = warn, debug = debug)
}
} #TODO: else {extract information from a measEq.partial object}
## pass arguments to measEq.syntax(), which performs checks
}
## -----------------
## Class and Methods
## -----------------
##' Class for Representing a Measurement-Equivalence Model
##'
##' This class of object stores information used to automatically generate
##' lavaan model syntax to represent user-specified levels of measurement
##' equivalence/invariance across groups and/or repeated measures. See
##' [measEq.syntax()] for details.
##'
##'
##' @name measEq.syntax-class
##' @aliases measEq.syntax-class show,measEq.syntax-method
##' summary,measEq.syntax-method as.character,measEq.syntax-method
##' update,measEq.syntax-method
##' @docType class
##'
##' @slot package `character` indicating the software package used to
##' represent the model. Currently, only `"lavaan"` is available, which
##' uses the LISREL representation (see [lavaan::lavOptions()]).
##' In the future, `"OpenMx"` may become available, using RAM
##' representation.
##' @slot model.type `character`. Currently, only "cfa" is available.
##' Future versions may allow for MIMIC / RFA models, where invariance can be
##' tested across levels of exogenous variables explicitly included as
##' predictors of indicators, controlling for their effects on (or correlation
##' with) the common factors.
##' @slot call The function call as returned by `match.call()`, with
##' some arguments updated if necessary for logical consistency.
##' @slot meanstructure `logical` indicating whether a mean structure is
##' included in the model.
##' @slot numeric `character` vector naming `numeric` manifest indicators.
##' @slot ordered `character` vector naming `ordered` indicators.
##' @slot parameterization `character`. See [lavaan::lavOptions()].
##' @slot specify `list` of parameter matrices, similar in form to the
##' output of `lavInspect(fit, "free")`. These matrices
##' are `logical`, indicating whether each parameter should be specified
##' in the model syntax.
##' @slot values `list` of parameter matrices, similar in form to the
##' output of `lavInspect(fit, "free")`. These matrices
##' are `numeric`, indicating whether each parameter should be freely
##' estimated (indicated by `NA`) or fixed to a particular value.
##' @slot labels `list` of parameter matrices, similar in form to the
##' output of `lavInspect(fit, "free")`. These matrices
##' contain `character` labels used to constrain parameters to equality.
##' @slot constraints `character` vector containing additional equality
##' constraints used to identify the model when `ID.fac = "fx"`.
##' @slot ngroups `integer` indicating the number of groups.
##'
##' @param x,object an object of class `measEq.syntax`
##' @param package `character` indicating the package for which the model
##' syntax should be generated. Currently, only `"lavaan"` and
##' `"mplus"` are supported.
##' @param params `character` vector indicating which type(s) of parameter
##' to print syntax for. Must match a type that can be passed to
##' `group.equal` or `long.equal`, but `"residual.covariances"`
##' and `"lv.covariances"` will be silently ignored. Instead, requesting
##' `"residuals"` or `"lv.variances"` will return covariances along
##' with variances. By default (`NULL`), all types are printed.
##' @param single `logical` indicating whether to concatenate lavaan
##' [lavaan::model.syntax()] into a single `character` string.
##' Setting `FALSE` will return a vector of strings, which may be
##' convenient (or even necessary to prevent an error) in
##' models with long variable names, many variables, or many groups.
##' @param groups.as.blocks `logical` indicating whether to write lavaan
##' [lavaan::model.syntax()] using vectors of labels and values
##' for multiple groups (the default: `FALSE`), or whether to write
##' a separate "block" of syntax per group. The block structure could allow
##' users to apply the generated multigroup syntax (after some editing) to
##' test invariance across levels in a multilevel SEM (see final example on
##' [measEq.syntax()] help page).
##' @param verbose `logical` indicating whether to print a summary to the
##' screen (default). If `FALSE`, only a pattern matrix is returned.
##' @param ... Additional arguments to the `call`, or arguments with
##' changed values.
##' @param evaluate If `TRUE`, evaluate the new `call`; otherwise,
##' return the new `call`.
##' @param change.syntax [lavaan::model.syntax()] specifying
##' labels or fixed/free values of parameters in `object`.
##' These provide some flexibility to customize
##' existing parameters without having to copy/paste the output of
##' `as.character(object)` into an R script. For example,
##' `group.partial` will free a parameter across all groups, but
##' `update` allows users to free the parameter in just one group
##' while maintaining equality constraints among other groups.
##'
##' @return
##' \item{summary}{`signature(object = "measEq.syntax", verbose = TRUE)`:
##' A `character` matrix indicating the pattern of `numeric`,
##' `ordered`, or latent indicators loading on common factors.
##' By default (`verbose = TRUE`), `summary` also prints descriptive
##' details about the model, including the numbers of indicators and factors,
##' and which parameters are constrained to equality.}
##' \item{show}{`signature(object = "measEq.syntax")`: Prints a message
##' about how to use the `object` for model fitting. Invisibly
##' returns the `object`.}
##' \item{update}{`signature(object = "measEq.syntax", ...,
##' evaluate = TRUE, change.syntax = NULL)`: Creates a new
##' `object` with updated arguments in `...`, or updated
##' parameter labels or fixed/free specifications in `object`.}
##' \item{as.character}{`signature(x = "measEq.syntax", package = "lavaan")`:
##' Converts the `measEq.syntax` object to model syntax that can be
##' copy/pasted or written to a syntax file to be edited before analysis,
##' or simply passed to [lavaan::lavaan()] to fit the model to
##' data. Generated M*plus* syntax could also be utilized using the
##' \pkg{MplusAuthomation} package.}
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' @examples
##' ## See ?measEq.syntax help page for examples using lavaan
##'
## ## Here, we illustrate how measEq.syntax() objects can be used in
## ## tandem with MplusAutomation.
##
## \donttest{
## ## borrow example data from Mplus user guide
## myData <- read.table("https://www.statmodel.com/usersguide/chap5/ex5.16.dat")
## names(myData) <- c("u1","u2","u3","u4","u5","u6","x1","x2","x3","g")
## bin.mod <- '
## FU1 =~ u1 + u2 + u3
## FU2 =~ u4 + u5 + u6
## '
## ## pretend the 2 factors actually measure the same factor (FU) twice
## longFacNames <- list(FU = c("FU1","FU2"))
## syntax.scalar <- measEq.syntax(configural.model = bin.mod,
## data = myData, ordered = paste0("u", 1:6),
## parameterization = "theta",
## ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
## group = "g", longFacNames = longFacNames,
## group.equal = c("thresholds","loadings","intercepts"),
## long.equal = c("thresholds","loadings","intercepts"))
## if(requireNamespace("MplusAutomation")){
## library(MplusAutomation)
## mpInp <- mplusObject(rdata = myData, TITLE = "Scalar Invariance",
## VARIABLE = "GROUPING = g (1=g1 2=g2);",
## usevariables = c(paste0("u", 1:6), "g"),
## ANALYSIS = "ESTIMATOR = WLSMV;",
## ## model specification from measEq.syntax():
## MODEL = as.character(syntax.scalar, package = "mplus"))
## ## add details for Mplus script:
## mpInp <- update(mpInp, ANALYSIS = ~ . + "PARAMETERIZATION = THETA;",
## VARIABLE = ~ . + "CATEGORICAL = u1 u2 u3 u4 u5 u6;")
## ## fit model
## mpOut <- mplusModeler(mpInp, modelout = "scalar.inp", run = 1L)
## }
## }
#TODO: add configural and DIFFTEST example
##'
setClass("measEq.syntax", slots = c(package = "character", # lavaan, OpenMx in the future?
model.type = "character", # cfa, extend to mimic/rfa?
call = "call",
meanstructure = "logical",
numeric = "character",
ordered = "character",
parameterization = "character",
specify = "list",
values = "list",
labels = "list",
constraints = "character",
updates = "list", # 2 data.frames: labels and values
ngroups = "integer"))
##' @rdname measEq.syntax-class
##' @aliases as.character,measEq.syntax-method
##' @export
setMethod("as.character", "measEq.syntax", function(x, package = "lavaan",
params = NULL, single = TRUE,
groups.as.blocks = FALSE) {
package <- tolower(package)[1]
if (package == "mplus") {
LL <- x@specify[[1]]$lambda
nn <- c(rownames(LL), colnames(LL))
over8 <- nchar(nn) > 8L
if (any(over8)) warning('Mplus only allows variable names to have 8 ',
'characters. The following variable names in ',
'your model exceed 8 characters:\n',
paste(nn[over8], collapse = ", "), '\n',
'Consider shortening variable names before ',
'printing an Mplus MODEL statement.')
## print everything leading up to the MODEL command
script <- c("MODEL:\n")
if (length(x@ordered)) {
script <- c(paste0("!Make sure your VARIABLE command indicates the ",
"following variables as CATEGORICAL:\n!",
paste(x@ordered, collapse = ", "), '\n'), script)
}
if (x@ngroups > 1L) {
script <- c(script, "!This is the first group's MODEL.\n!Group 2's MODEL",
"!will be labeled as 'g2', and so on for any other groups.\n")
}
script <- c(script, write.mplus.syntax(object = x, group = 1L,
params = params))
if (x@ngroups > 1L) for (g in 2:x@ngroups) {
script <- c(script, paste0("\nMODEL g", g, ":\n"),
write.mplus.syntax(object = x, group = g, params = params))
}
return(paste(script, collapse = "\n")) # always return a single string
} else if (package == "lavaan") {
script <- character(0)
pmatList <- c("lambda","tau","nu","delta","theta","alpha","psi")
names(pmatList) <- c("loadings","thresholds","intercepts","scales",
"residuals","means","lv.variances")
## selected parameter types?
if (!is.null(params)) {
requested <- intersect(names(pmatList), params)
if (!length(requested)) stop('invalid choice: params = c("',
paste(params, collapse = '", "'), '")\n',
'Valid choices include: ',
paste(names(pmatList), collapse = ", "))
pmatList <- pmatList[requested]
}
if (groups.as.blocks) {
## loop over groups
for (gg in 1:x@ngroups) {
script <- c(script, paste("group:", gg, "\n", collapse = ""))
## loop over pmats
for (pm in pmatList) {
if (!pm %in% names(x@specify[[gg]])) next
if (pm == "lambda" && "beta" %in% names(x@specify[[gg]])) {
## add higher-order loadings to lambda matrix
specify <- list(rbind(x@specify[[gg]]$lambda, x@specify[[gg]]$beta))
value <- list(rbind(x@values[[gg]]$lambda, x@values[[gg]]$beta))
label <- list(rbind(x@labels[[gg]]$lambda, x@labels[[gg]]$beta))
} else {
specify <- list(x@specify[[gg]][[pm]])
value <- list(x@values[[gg]][[pm]])
label <- list(x@labels[[gg]][[pm]])
}
script <- c(script, write.lavaan.syntax(pmat = pm, specify = specify,
value = value, label = label))
} # end pm
} # end gg
} else {
## the usual multigroup lavaan syntax:
## loop over pmats, send all groups together
for (pm in pmatList) {
if (!pm %in% names(x@specify[[1]])) next
if (pm == "lambda" && "beta" %in% names(x@specify[[1]])) {
## add higher-order loadings to lambda matrix
specify.l <- lapply(x@specify, "[[", i = "lambda")
value.l <- lapply(x@values , "[[", i = "lambda")
label.l <- lapply(x@labels , "[[", i = "lambda")
specify.b <- lapply(x@specify, "[[", i = "beta")
value.b <- lapply(x@values , "[[", i = "beta")
label.b <- lapply(x@labels , "[[", i = "beta")
specify <- mapply(rbind, specify.l, specify.b, SIMPLIFY = FALSE)
value <- mapply(rbind, value.l , value.b , SIMPLIFY = FALSE)
label <- mapply(rbind, label.l , label.b , SIMPLIFY = FALSE)
} else {
specify <- lapply(x@specify, "[[", i = pm)
value <- lapply(x@values, "[[", i = pm)
label <- lapply(x@labels, "[[", i = pm)
}
script <- c(script, write.lavaan.syntax(pmat = pm, specify = specify,
value = value, label = label))
}
}
if (length(x@constraints)) script <- c(script,
"## MODEL CONSTRAINTS:\n",
x@constraints, "")
}
#TODO: else if (package == "openmx") # concatenate matrices for RAM specification
## convert GLIST objects to a character string
if (single) return(paste(script, collapse = "\n"))
script
})
##' @rdname measEq.syntax-class
##' @aliases show,measEq.syntax-method
##' @export
setMethod("show", "measEq.syntax", function(object) {
cat('This object contains information for specifying a CFA using lavaan',
'model syntax.\nTo print the syntax (to copy/paste it into an R script),',
'use the as.character() method:\n\n\tcat(as.character(object))\n\nTo fit',
'this model to data, save the syntax to an object and pass it to lavaan:',
'\n\n\tmodel <- as.character(object)\n\tfit <- lavaan(model, ...)',
'\n\nTo view some key features of the model use: \n\n\tsummary(object)')
invisible(object)
})
##' @rdname measEq.syntax-class
##' @aliases summary,measEq.syntax-method
##' @export
setMethod("summary", "measEq.syntax", function(object, verbose = TRUE) {
nG <- object@ngroups
nOrd <- length(object@ordered)
higher <- !is.null(object@specify[[1]]$beta)
## create pattern matrix
lambda <- object@specify[[1]]$lambda
lambda[!lambda] <- ""
for (RR in 1:nrow(lambda)) {
if (rownames(lambda)[RR] %in% object@ordered) {
lambda[RR, object@specify[[1]]$lambda[RR, ] ] <- "ord"
} else lambda[RR, object@specify[[1]]$lambda[RR, ] ] <- "num"
}
if (higher) {
beta <- object@specify[[1]]$beta
beta[!beta] <- ""
for (RR in 1:nrow(beta)) {
beta[RR, object@specify[[1]]$beta[RR, ] ] <- "lat"
}
rownames(beta) <- paste("**", rownames(beta), "**")
lambda <- rbind(lambda, beta)
}
if (!verbose) return(lambda)
## Basics: number of groups, factors, and indicators (higher order?); ID.fac
nHigher <- if (higher) sum(apply(object@specify[[1]]$beta, 2, any)) else 0L
if (object@call$ID.fac == "ul" && !object@meanstructure) {
ID.fac.text <- 'first indicator`s factor loading was fixed to 1.'
} else if (object@call$ID.fac == "ul" && object@meanstructure) {
ID.fac.text <- paste('first indicator`s intercept and factor loading were',
'fixed to 0 and 1, respectively.')
} else if (object@call$ID.fac == "uv" && !object@meanstructure) {
ID.fac.text <- paste('factor variances were fixed to 1, unless equality',
'constraints on factor loadings allow them to be freed.')
} else if (object@call$ID.fac == "uv" && object@meanstructure) {
ID.fac.text <- paste('factor means and variances were fixed to 0 and 1,',
'respectively, unless equality constraints on',
'measurement parameters allow them to be freed.')
} else if (object@call$ID.fac == "fx") {
ID.fac.text <- paste('factor loadings were constrained to average 1',
if (object@meanstructure) 'and intercepts were constrained to average 0',
'within each factor. In models with partial',
'invariance, only the factor loadings',
if (object@meanstructure) 'and intercepts',
'that were constrained to equality across ALL groups',
'and repeated measures (when applicable) are used to',
'identify the common-factor distribution.')
}
cat('This lavaan model syntax specifies a CFA with ',
nrow(object@specify[[1]]$lambda), ' manifest indicators ',
if (nOrd == 1L) {
paste0('(', nOrd, ' of which is ordinal) ')
} else if (nOrd > 1L) {
paste0('(', nOrd, ' of which are ordinal) ')
}, 'of ', ncol(object@specify[[1]]$lambda), ' common factor(s)',
if (nHigher == 1L) {
paste(',', nHigher, 'of which is a higher-order factor. ')
} else if (nHigher > 1L) {
paste(',', nHigher, 'of which are higher-order factors. ')
} else '.\n\n', 'To identify the ',
if (object@meanstructure) 'location and ',
'scale of each common factor, the ', ID.fac.text, "\n\n", sep = '')
## if (ordered) ID.cat and parameterization
if (nOrd) {
if (object@call$ID.cat == "wu") {
ID.cat.author <- 'recommended by Wu & Estabrook (2016). '
ID.cat.DOI <- 'https://doi.org/10.1007/s11336-016-9506-0 \n\n'
} else if (object@call$ID.cat == "millsap") {
ID.cat.author <- 'recommended by Millsap & Tein (2004). '
} else if (object@call$ID.cat == "mplus") {
ID.cat.author <- 'used by default in the Mplus (and lavaan) software. '
} else if (object@call$ID.cat == "lisrel") {
ID.cat.author <- 'used by default in the LISREL software. '
}
if (object@call$ID.cat != "wu") ID.cat.DOI <- 'https://doi.org/10.1207/S15327906MBR3903_4 \n\n'
cat('The location and scale of each latent item-response underlying ', nOrd,
' ordinal indicators were identified using the "', object@parameterization,
'" parameterization, and the identification constraints ',
ID.cat.author, 'For details, read:\n\n\t', ID.cat.DOI, sep = '')
}
## number of occassions per longitudinal construct
if (length(object@call$longFacNames)) {
cat('The following factors were measured on multiple occasions:\n')
for (f in names(object@call$longFacNames)) {
cat('\t"', f, '" was measured on ', length(object@call$longFacNames[[f]]),
' occasions\n', sep = '')
}
cat('\n')
}
## print pattern matrix
cat('Pattern matrix indicating num(eric), ord(ered), and lat(ent)',
'indicators per factor:\n\n')
print(lambda, quote = FALSE)
cat('\n')
## without any constraints, call it the configural model
no.group.equal <- length(object@call$group.equal) == 1L && object@call$group.equal == ""
no.long.equal <- length(object@call$long.equal) == 1L && object@call$long.equal == ""
if (no.group.equal && no.long.equal) {
cat('\nThis model hypothesizes only configural invariance.\n\n')
## return pattern matrix
return(invisible(lambda))
}
## otherwise, print the constraints & exceptions
## constrained parameters across groups (+ partial exceptions)
if (nG > 1L) {
if (no.group.equal) {
cat('No parameters were constrained to equality across groups.\n')
} else {
cat('The following types of parameter were constrained to',
'equality across groups:\n\n')
for (i in object@call$group.equal) {
group.partial <- object@call$group.partial
## first, check for exceptions
if (i == "loadings") {
man.ind <- group.partial$rhs %in% rownames(object@specify[[1]]$lambda)
group.partial <- group.partial[group.partial$op == "=~" & man.ind, ]
} else if (i == "regressions") {
lat.ind <- group.partial$rhs %in% colnames(object@specify[[1]]$lambda)
group.partial <- group.partial[group.partial$op == "=~" & lat.ind, ]
} else if (i == "thresholds") {
man.ind <- group.partial$lhs %in% rownames(object@specify[[1]]$lambda)
group.partial <- group.partial[group.partial$op == "|" & man.ind, ]
} else if (i == "residuals") {
man.ind <- group.partial$rhs %in% rownames(object@specify[[1]]$lambda)
same.ind <- group.partial$rhs == group.partial$lhs
group.partial <- group.partial[group.partial$op == "~~" & man.ind & same.ind, ]
} else if (i == "residual.covariances") {
man.ind <- group.partial$rhs %in% rownames(object@specify[[1]]$lambda)
same.ind <- group.partial$rhs == group.partial$lhs
group.partial <- group.partial[group.partial$op == "~~" & man.ind & !same.ind, ]
} else if (i == "lv.variances") {
lat <- group.partial$rhs %in% colnames(object@specify[[1]]$lambda)
same <- group.partial$rhs == group.partial$lhs
group.partial <- group.partial[group.partial$op == "~~" & lat & same, ]
} else if (i == "lv.covariances") {
lat <- group.partial$rhs %in% colnames(object@specify[[1]]$lambda)
same <- group.partial$rhs == group.partial$lhs
group.partial <- group.partial[group.partial$op == "~~" & lat & !same, ]
} else if (i == "intercepts") {
man.ind <- group.partial$lhs %in% rownames(object@specify[[1]]$lambda)
group.partial <- group.partial[group.partial$op == "~1" & man.ind, ]
} else if (i == "means") {
lat <- group.partial$lhs %in% colnames(object@specify[[1]]$lambda)
group.partial <- group.partial[group.partial$op == "~1" & lat, ]
}
## then print a message
cat('\t', i,
if (nrow(group.partial)) ', with the exception of:\n',
'\n', sep = '')
if (nrow(group.partial)) {
rownames(group.partial) <- paste(" row-",
rownames(group.partial), ": ",
sep = "")
print(group.partial)
cat('\n')
}
}
}
cat('\n')
}
## constrained parameters across repeated measures (+ partial exceptions)
if (length(object@call$longFacNames)) {
if (no.long.equal) {
cat('No parameters were constrained to equality across repeated measures:\n')
} else {
cat('The following types of parameter were constrained to equality',
'across repeated measures:\n\n')
for (i in object@call$long.equal) {
long.partial <- object@call$long.partial
## first, check for exceptions
if (i == "loadings") {
man.ind <- long.partial$rhs %in% names(object@call$longIndNames)
long.partial <- long.partial[long.partial$op == "=~" & man.ind, ]
} else if (i == "regressions") {
lat.ind <- long.partial$rhs %in% names(object@call$longFacNames)
long.partial <- long.partial[long.partial$op == "=~" & lat.ind, ]
} else if (i == "thresholds") {
man.ind <- long.partial$lhs %in% names(object@call$longIndNames)
long.partial <- long.partial[long.partial$op == "|" & man.ind, ]
} else if (i == "residuals") {
man.ind <- long.partial$rhs %in% names(object@call$longIndNames)
same.ind <- long.partial$rhs == long.partial$lhs
long.partial <- long.partial[long.partial$op == "~~" & man.ind & same.ind, ]
} else if (i == "lv.variances") {
lat <- long.partial$rhs %in% names(object@call$longFacNames)
same <- long.partial$rhs == long.partial$lhs
long.partial <- long.partial[long.partial$op == "~~" & lat & same, ]
} else if (i == "intercepts") {
man.ind <- long.partial$lhs %in% names(object@call$longIndNames)
long.partial <- long.partial[long.partial$op == "~1" & man.ind, ]
} else if (i == "means") {
lat <- long.partial$lhs %in% names(object@call$longFacNames)
long.partial <- long.partial[long.partial$op == "~1" & lat, ]
}
## then print a message
cat('\t', i,
if (nrow(long.partial)) ', with the exception of:\n',
'\n', sep = '')
if (nrow(long.partial)) {
rownames(long.partial) <- paste(" row-",
rownames(long.partial), ": ",
sep = "")
print(long.partial)
cat('\n')
}
}
}
cat('\n')
}
## return pattern matrix
invisible(lambda)
})
updateMeasEqSyntax <- function(object, ..., evaluate = TRUE,
change.syntax = NULL) {
# data.frame(stringsAsFactors = FALSE, extras = c(TRUE, FALSE),
# override = c(TRUE, TRUE, FALSE, FALSE),
# eval = c(TRUE, TRUE, TRUE, TRUE,
# FALSE, FALSE, FALSE, FALSE),
# TODO = c("extras; eval; transfer, augment, and apply @updates",
# "apply and augment @updates",
# "extras; eval; transfer and apply @updates", "return object",
# "nothing, can't add to call", "nothing, can't add to call",
# "extras, return call", "return call")) -> foo
# foo[order(foo$extras), ]
# extras override eval TODO
# 1 FALSE TRUE TRUE apply and augment @updates
# 2 FALSE FALSE TRUE return object *
# 3 FALSE TRUE FALSE nothing, can't add to call *
# 4 FALSE FALSE FALSE return call *
# 5 TRUE TRUE TRUE extras; eval; transfer, augment, and apply @updates
# 6 TRUE FALSE TRUE extras, eval, transfer @updates
# 7 TRUE TRUE FALSE nothing, can't add to call *
# 8 TRUE FALSE FALSE extras, return call *
#extras <- match.call(expand.dots = FALSE)$...
extras <- list(...)
custom <- !is.null(change.syntax)
## regardless of customization/evaluation, extras can be added to call first
if (length(extras)) {
## prep 5:8
call <- object@call
existing <- !is.na(match(names(extras), names(call)))
for (a in names(extras)[existing]) call[[a]] <- extras[[a]]
if (any(!existing)) {
call <- c(as.list(call), extras[!existing])
call <- as.call(call)
}
if (!evaluate) {
if (custom) warning('cannot apply "change.syntax" ',
'argument when evaluate=FALSE.')
## finish 7:8
return(call)
}
} else if (!evaluate) {
if (custom) warning('cannot apply "change.syntax" ',
'argument when evaluate=FALSE.')
## finish 3:4
return(object@call)
} else if (!custom) {
## finish 2
return(object)
}
# extras override eval TODO
# 1 FALSE TRUE TRUE apply and augment @updates
# 5 TRUE TRUE TRUE eval; transfer, augment, and apply @updates
# 6 TRUE FALSE TRUE eval; transfer and apply @updates
if (length(extras)) {
## prep 5:6
out <- eval(call, parent.frame())
if (nrow(object@updates$values)) out@updates$values <- object@updates$values
if (nrow(object@updates$labels)) out@updates$labels <- object@updates$labels
} else out <- object # "prep" 1
# extras override eval TODO
# 1 FALSE TRUE TRUE apply and augment @updates
# 5 TRUE TRUE TRUE augment, and apply @updates
# 6 TRUE FALSE TRUE apply @updates
## check before augmenting to prep 1 and 5
if (!is.null(change.syntax)) {
stopifnot(is.character(change.syntax))
## convert syntax to data.frame of updates to make
UPDATES <- char2update(object, change.syntax, return.object = FALSE)
out@updates$values <- rbind(out@updates$values, UPDATES$values)
out@updates$labels <- rbind(out@updates$labels, UPDATES$labels)
}
## nothing left to do but apply @updates
## loop over any values/labels (now stored) to finish 1 and 5:6
if (nrow(out@updates$values)) for (RR in 1:nrow(out@updates$values)) {
valueArgs <- c(list(object = out, slotName = "values"),
as.list(out@updates$values[RR, ]))
BB <- out@updates$values$group[RR]
matName <- out@updates$values$matName[RR]
out@values[[BB]][[matName]] <- do.call(override, valueArgs)
}
if (nrow(out@updates$labels)) for (RR in 1:nrow(out@updates$labels)) {
labelArgs <- c(list(object = out, slotName = "labels"),
as.list(out@updates$labels[RR, ]))
BB <- out@updates$labels$group[RR]
matName <- out@updates$labels$matName[RR]
out@labels[[BB]][[matName]] <- do.call(override, labelArgs)
}
## user-specified parameters to override MUST include:
## - group (eventually block), defaults to 1 (convenient for longitudinal CFA)
## - matName (e.g., lambda)
## - row and col (integer or character indices)
## - replacement (NA or numeric for values, character for labels)
out
}
##' @rdname measEq.syntax-class
##' @aliases update,measEq.syntax-method
##' @importFrom stats update
##' @export
setMethod("update", "measEq.syntax", updateMeasEqSyntax)
## -----------------------
## Syntax-Writing Function
## -----------------------
##' Syntax for measurement equivalence
##'
##' Automatically generates `lavaan` model syntax to specify a confirmatory
##' factor analysis (CFA) model with equality constraints imposed on
##' user-specified measurement (or structural) parameters. Optionally returns
##' the fitted model (if data are provided) representing some chosen level of
##' measurement equivalence/invariance across groups and/or repeated measures.
##'
##' This function is a pedagogical and analytical tool to generate model syntax
##' representing some level of measurement equivalence/invariance across any
##' combination of multiple groups and/or repeated measures. Support is provided
##' for confirmatory factor analysis (CFA) models with simple or complex
##' structure (i.e., cross-loadings and correlated residuals are allowed).
##' For any complexities that exceed the limits of automation, this function is
##' intended to still be useful by providing a means to generate syntax that
##' users can easily edit to accommodate their unique situations.
##'
##' Limited support is provided for bifactor models and higher-order constructs.
##' Because bifactor models have cross-loadings by definition, the option
##' `ID.fac = "effects.code"` is unavailable. `ID.fac = "UV"` is
##' recommended for bifactor models, but `ID.fac = "UL"` is available on
##' the condition that each factor has a unique first indicator in the
##' `configural.model`. In order to maintain generality, higher-order
##' factors may include a mix of manifest and latent indicators, but they must
##' therefore require `ID.fac = "UL"` to avoid complications with
##' differentiating lower-order vs. higher-order (or mixed-level) factors.
##' The keyword `"loadings"` in `group.equal` or `long.equal`
##' constrains factor loadings of all manifest indicators (including loadings on
##' higher-order factors that also have latent indicators), whereas the keyword
##' `"regressions"` constrains factor loadings of latent indicators. Users
##' can edit the model syntax manually to adjust constraints as necessary, or
##' clever use of the `group.partial` or `long.partial` arguments
##' could make it possible for users to still automated their model syntax.
##' The keyword `"intercepts"` constrains the intercepts of all manifest
##' indicators, and the keyword `"means"` constrains intercepts and means
##' of all latent common factors, regardless of whether they are latent
##' indicators of higher-order factors. To test equivalence of lower-order and
##' higher-order intercepts/means in separate steps, the user can either
##' manually edit their generated syntax or conscientiously exploit the
##' `group.partial` or `long.partial` arguments as necessary.
##'
##' **`ID.fac`:** If the `configural.model` fixes any (e.g.,
##' the first) factor loadings, the generated syntax object will retain those
##' fixed values. This allows the user to retain additional constraints that
##' might be necessary (e.g., if there are only 1 or 2 indicators). Some methods
##' must be used in conjunction with other settings:
##' \itemize{
##' \item `ID.cat = "Millsap"` requires `ID.fac = "UL"` and
##' `parameterization = "theta"`.
##' \item `ID.cat = "LISREL"` requires `parameterization = "theta"`.
##' \item `ID.fac = "effects.code"` is unavailable when there are any
##' cross-loadings.
##' }
##'
##' **`ID.cat`:** Wu & Estabrook (2016) recommended constraining
##' thresholds to equality first, and doing so should allow releasing any
##' identification constraints no longer needed. For each `ordered`
##' indicator, constraining one threshold to equality will allow the item's
##' intercepts to be estimated in all but the first group or repeated measure.
##' Constraining a second threshold (if applicable) will allow the item's
##' (residual) variance to be estimated in all but the first group or repeated
##' measure. For binary data, there is no independent test of threshold,
##' intercept, or residual-variance equality. Equivalence of thresholds must
##' also be assumed for three-category indicators. These guidelines provide the
##' least restrictive assumptions and tests, and are therefore the default.
##'
##' The default setting in M*plus* is similar to Wu & Estabrook (2016),
##' except that intercepts are always constrained to zero (so they are assumed
##' to be invariant without testing them). Millsap & Tein (2004) recommended
##' `parameterization = "theta"` and identified an item's residual variance
##' in all but the first group (or occasion; Liu et al., 2017) by constraining
##' its intercept to zero and one of its thresholds to equality. A second
##' threshold for the reference indicator (so `ID.fac = "UL"`) is used to
##' identify the common-factor means in all but the first group/occasion. The
##' LISREL software fixes the first threshold to zero and (if applicable) the
##' second threshold to 1, and assumes any remaining thresholds to be equal
##' across groups / repeated measures; thus, the intercepts are always
##' identified, and residual variances (`parameterization = "theta"`) are
##' identified except for binary data, when they are all fixed to one.
##'
##' **Repeated Measures:** If each repeatedly measured factor is measured
##' by the same indicators (specified in the same order in the
##' `configural.model`) on each occasion, without any cross-loadings, the
##' user can let `longIndNames` be automatically generated. Generic names
##' for the repeatedly measured indicators are created using the name of the
##' repeatedly measured factors (i.e., `names(longFacNames)`) and the
##' number of indicators. So the repeatedly measured first indicator
##' (`"ind"`) of a longitudinal construct called "factor" would be
##' generated as `"._factor_ind.1"`.
##'
##' The same types of parameter can be specified for `long.equal` as for
##' `group.equal` (see [lavaan::lavOptions()] for a list), except
##' for `"residual.covariances"` or `"lv.covariances"`. Instead, users
##' can constrain *auto*covariances using keywords `"resid.autocov"`
##' or `"lv.autocov"`. Note that `group.equal = "lv.covariances"` or
##' `group.equal = "residual.covariances"` will constrain any
##' autocovariances across groups, along with any other covariances the user
##' specified in the `configural.model`. Note also that autocovariances
##' cannot be specified as exceptions in `long.partial`, so anything more
##' complex than the `auto` argument automatically provides should instead
##' be manually specified in the `configural.model`.
##'
##' When users set `orthogonal=TRUE` in the `configural.model` (e.g.,
##' in bifactor models of repeatedly measured constructs), autocovariances of
##' each repeatedly measured factor will still be freely estimated in the
##' generated syntax.
##'
##' **Missing Data:** If users wish to utilize the [auxiliary()]
##' function to automatically include auxiliary variables in conjunction with
##' `missing = "FIML"`, they should first generate the hypothesized-model
##' syntax, then submit that syntax as the model to `auxiliary()`.
##' If users utilized [lavaan.mi::lavaan.mi()] to fit their `configural.model`
##' to multiply imputed data, that model can also be passed to the
##' `configural.model` argument, and if `return.fit = TRUE`, the
##' generated model will be fitted to the multiple imputations.
##'
##' @importFrom lavaan lavInspect lavNames parTable cfa
##'
##' @param configural.model A model with no measurement-invariance constraints
##' (i.e., representing only configural invariance), unless required for model
##' identification. `configural.model` can be either:
##' \itemize{
##' \item [lavaan::model.syntax()] or a [lavaan::parTable()] specifying the
##' configural model. Using this option, the user can also provide
##' either raw `data` or summary statistics via `sample.cov`
##' and (optionally) `sample.mean`. See argument descriptions in
##' [lavaan::lavaan()]. In order to include thresholds in
##' the generated syntax, either users must provide raw `data`,
##' or the `configural.model` syntax must specify all thresholds
##' (see first example). If raw `data` are not provided, the
##' number of blocks (groups, levels, or combination) must be
##' indicated using an arbitrary `sample.nobs` argument (e.g.,
##' 3 groups could be specified using `sample.nobs=rep(1, 3)`).
##' \item a fitted [lavaan::lavaan-class] model (e.g., as returned by
##' [lavaan::cfa()]) estimating the configural model
##' }
##' Note that the specified or fitted model must not contain any latent
##' structural parameters (i.e., it must be a CFA model), unless they are
##' higher-order constructs with latent indicators (i.e., a second-order CFA).
##'
##' @param ... Additional arguments (e.g., `data`, `ordered`, or
##' `parameterization`) passed to the [lavaan::lavaan()]
##' function. See also [lavaan::lavOptions()].
##'
##' @param ID.fac `character`. The method for identifying common-factor
##' variances and (if `meanstructure = TRUE`) means. Three methods are
##' available, which go by different names in the literature:
##' \itemize{
##' \item Standardize the common factor (mean = 0, *SD* = 1) by
##' specifying any of: `"std.lv"`, `"unit.variance"`,
##' `"UV"`, `"fixed.factor"`,
##' `"fixed-factor"`
##' \item Choose a reference indicator by specifying any of:
##' `"auto.fix.first"`, `"unit.loading"`, `"UL"`,
##' `"marker"`, `"ref"`, `"ref.indicator"`,
##' `"reference.indicator"`, `"reference-indicator"`,
##' `"marker.variable"`, `"marker-variable"`
##' \item Apply effects-code constraints to loadings and intercepts by
##' specifying any of: `"FX"`, `"EC"`, `"effects"`,
##' `"effects.coding"`, `"effects-coding"`,
##' `"effects.code"`, `"effects-code"`
##' }
##' See Kloessner & Klopp (2019) for details about all three methods.
##'
##' @param ID.cat `character`. The method for identifying (residual)
##' variances and intercepts of latent item-responses underlying any
##' `ordered` indicators. Four methods are available:
##' \itemize{
##' \item To follow Wu & Estabrook's (2016) guidelines (default), specify
##' any of: `"Wu.Estabrook.2016"`, `"Wu.2016"`,
##' `"Wu.Estabrook"`, `"Wu"`, `"Wu2016"`. For
##' consistency, specify `ID.fac = "std.lv"`.
##' \item To use the default settings of M*plus* and `lavaan`,
##' specify any of: `"default"`, `"Mplus"`, `"Muthen"`.
##' Details provided in Millsap & Tein (2004).
##' \item To use the constraints recommended by Millsap & Tein (2004; see
##' also Liu et al., 2017, for the longitudinal case)
##' specify any of: `"millsap"`, `"millsap.2004"`,
##' `"millsap.tein.2004"`. For consistency, specify
##' `ID.fac = "marker"` and `parameterization = "theta"`.
##' \item To use the default settings of LISREL, specify `"LISREL"`
##' or `"Joreskog"`. Details provided in Millsap & Tein (2004).
##' For consistency, specify `parameterization = "theta"`.
##' }
##' See **Details** and **References** for more information.
##'
##' @param ID.thr `integer`. Only relevant when
##' `ID.cat = "Millsap.Tein.2004"`. Used to indicate which thresholds
##' should be constrained for identification. The first integer indicates the
##' threshold used for all indicators, the second integer indicates the
##' additional threshold constrained for a reference indicator (ignored if
##' binary).
##'
##' @param group optional `character` indicating the name of a grouping
##' variable. See [lavaan::cfa()].
##'
##' @param group.equal optional `character` vector indicating type(s) of
##' parameter to equate across groups. Ignored if `is.null(group)`.
##' See [lavaan::lavOptions()].
##'
##' @param group.partial optional `character` vector or a parameter table
##' indicating exceptions to `group.equal` (see
##' [lavaan::lavOptions()]). Any variables not appearing in the
##' `configural.model` will be ignored, and any parameter constraints
##' needed for identification (e.g., two thresholds per indicator when
##' `ID.cat = "Millsap"`) will be removed.
##'
##' @param longFacNames optional named `list` of `character` vectors,
##' each indicating multiple factors in the model that are actually the same
##' construct measured repeatedly. See **Details** and **Examples**.
##'
##' @param longIndNames optional named `list` of `character` vectors,
##' each indicating multiple indicators in the model that are actually the
##' same indicator measured repeatedly. See **Details** and
##' **Examples**.
##'
##' @param long.equal optional `character` vector indicating type(s) of
##' parameter to equate across repeated measures. Ignored if no factors are
##' indicated as repeatedly measured in `longFacNames`.
##'
##' @param long.partial optional `character` vector or a parameter table
##' indicating exceptions to `long.equal`. Any longitudinal variable
##' names not appearing in `names(longFacNames)` or
##' `names(longIndNames)` will be ignored, and any parameter constraints
##' needed for identification will be removed.
##'
##' @param auto Used to automatically included autocorrelated measurement errors
##' among repeatedly measured indicators in `longIndNames`. Specify a
##' single `integer` to set the maximum order (e.g., `auto = 1L`
##' indicates that an indicator's unique factors should only be correlated
##' between adjacently measured occasions). `auto = TRUE` or `"all"`
##' will specify residual covariances among all possible lags per repeatedly
##' measured indicator in `longIndNames`.
##'
##' @param warn,debug `logical`. Passed to [lavaan::lavaan()]
##' and [lavaan::lavParseModelString()].
##' See [lavaan::lavOptions()].
##'
##' @param return.fit `logical` indicating whether the generated syntax
##' should be fitted to the provided `data` (or summary statistics, if
##' provided via `sample.cov`). If `configural.model` is a fitted
##' lavaan model, the generated syntax will be fitted using the `update`
##' method (see [lavaan::lavaan-class]), and \dots will be passed to
##' [lavaan::lavaan()]. If neither data nor a fitted lavaan model
##' were provided, this must be `FALSE`. If `TRUE`, the generated
##' `measEq.syntax` object will be included in the `lavaan` object's
##' `@@external` slot, accessible by `fit@@external$measEq.syntax`.
##'
##' @return By default, an object of class [measEq.syntax-class].
##' If `return.fit = TRUE`, a fitted [lavaan::lavaan()]
##' model, with the `measEq.syntax` object stored in the
##' `@@external` slot, accessible by `fit@@external$measEq.syntax`.
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' @seealso [compareFit()]
##'
##' @references
##' Kloessner, S., & Klopp, E. (2019). Explaining constraint interaction: How
##' to interpret estimated model parameters under alternative scaling methods.
##' *Structural Equation Modeling, 26*(1), 143--155.
##' \doi{10.1080/10705511.2018.1517356}
##'
##' Liu, Y., Millsap, R. E., West, S. G., Tein, J.-Y., Tanaka, R., & Grimm,
##' K. J. (2017). Testing measurement invariance in longitudinal data with
##' ordered-categorical measures. *Psychological Methods, 22*(3),
##' 486--506. \doi{10.1037/met0000075}
##'
##' Millsap, R. E., & Tein, J.-Y. (2004). Assessing factorial invariance in
##' ordered-categorical measures. *Multivariate Behavioral Research, 39*(3),
##' 479--515. \doi{10.1207/S15327906MBR3903_4}
##'
##' Wu, H., & Estabrook, R. (2016). Identification of confirmatory factor
##' analysis models of different levels of invariance for ordered categorical
##' outcomes. *Psychometrika, 81*(4), 1014--1045.
##' \doi{10.1007/s11336-016-9506-0}
##'
##' @examples
##' mod.cat <- ' FU1 =~ u1 + u2 + u3 + u4
##' FU2 =~ u5 + u6 + u7 + u8 '
##' ## the 2 factors are actually the same factor (FU) measured twice
##' longFacNames <- list(FU = c("FU1","FU2"))
##'
##' ## CONFIGURAL model: no constraints across groups or repeated measures
##' syntax.config <- measEq.syntax(configural.model = mod.cat,
##' # NOTE: data provides info about numbers of
##' # groups and thresholds
##' data = datCat,
##' ordered = paste0("u", 1:8),
##' parameterization = "theta",
##' ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
##' group = "g", longFacNames = longFacNames)
##' ## print lavaan syntax to the Console
##' cat(as.character(syntax.config))
##' ## print a summary of model features
##' summary(syntax.config)
##'
##' ## THRESHOLD invariance:
##' ## only necessary to specify thresholds if you have no data
##' mod.th <- '
##' u1 | t1 + t2 + t3 + t4
##' u2 | t1 + t2 + t3 + t4
##' u3 | t1 + t2 + t3 + t4
##' u4 | t1 + t2 + t3 + t4
##' u5 | t1 + t2 + t3 + t4
##' u6 | t1 + t2 + t3 + t4
##' u7 | t1 + t2 + t3 + t4
##' u8 | t1 + t2 + t3 + t4
##' '
##' syntax.thresh <- measEq.syntax(configural.model = c(mod.cat, mod.th),
##' # NOTE: data not provided, so syntax must
##' # include thresholds, and number of
##' # groups == 2 is indicated by:
##' sample.nobs = c(1, 1),
##' parameterization = "theta",
##' ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
##' group = "g", group.equal = "thresholds",
##' longFacNames = longFacNames,
##' long.equal = "thresholds")
##' ## notice that constraining 4 thresholds allows intercepts and residual
##' ## variances to be freely estimated in all but the first group & occasion
##' cat(as.character(syntax.thresh))
##' ## print a summary of model features
##' summary(syntax.thresh)
##'
##'
##' ## Fit a model to the data either in a subsequent step (recommended):
##' mod.config <- as.character(syntax.config)
##' fit.config <- cfa(mod.config, data = datCat, group = "g",
##' ordered = paste0("u", 1:8), parameterization = "theta")
##' ## or in a single step (not generally recommended):
##' fit.thresh <- measEq.syntax(configural.model = mod.cat, data = datCat,
##' ordered = paste0("u", 1:8),
##' parameterization = "theta",
##' ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
##' group = "g", group.equal = "thresholds",
##' longFacNames = longFacNames,
##' long.equal = "thresholds", return.fit = TRUE)
##' ## compare their fit to test threshold invariance
##' anova(fit.config, fit.thresh)
##'
##'
##' ## --------------------------------------------------------
##' ## RECOMMENDED PRACTICE: fit one invariance model at a time
##' ## --------------------------------------------------------
##'
##' ## - A downside of setting return.fit=TRUE is that if the model has trouble
##' ## converging, you don't have the opportunity to investigate the syntax,
##' ## or even to know whether an error resulted from the syntax-generator or
##' ## from lavaan itself.
##' ## - A downside of automatically fitting an entire set of invariance models
##' ## (like the old measurementInvariance() function did) is that you might
##' ## end up testing models that shouldn't even be fitted because less
##' ## restrictive models already fail (e.g., don't test full scalar
##' ## invariance if metric invariance fails! Establish partial metric
##' ## invariance first, then test equivalent of intercepts ONLY among the
##' ## indicators that have invariate loadings.)
##'
##' ## The recommended sequence is to (1) generate and save each syntax object,
##' ## (2) print it to the screen to verify you are fitting the model you expect
##' ## to (and potentially learn which identification constraints should be
##' ## released when equality constraints are imposed), and (3) fit that model
##' ## to the data, as you would if you had written the syntax yourself.
##'
##' ## Continuing from the examples above, after establishing invariance of
##' ## thresholds, we proceed to test equivalence of loadings and intercepts
##' ## (metric and scalar invariance, respectively)
##' ## simultaneously across groups and repeated measures.
##'
##' \donttest{
##'
##' ## metric invariance
##' syntax.metric <- measEq.syntax(configural.model = mod.cat, data = datCat,
##' ordered = paste0("u", 1:8),
##' parameterization = "theta",
##' ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
##' group = "g", longFacNames = longFacNames,
##' group.equal = c("thresholds","loadings"),
##' long.equal = c("thresholds","loadings"))
##' summary(syntax.metric) # summarize model features
##' mod.metric <- as.character(syntax.metric) # save as text
##' cat(mod.metric) # print/view lavaan syntax
##' ## fit model to data
##' fit.metric <- cfa(mod.metric, data = datCat, group = "g",
##' ordered = paste0("u", 1:8), parameterization = "theta")
##' ## test equivalence of loadings, given equivalence of thresholds
##' anova(fit.thresh, fit.metric)
##'
##' ## scalar invariance
##' syntax.scalar <- measEq.syntax(configural.model = mod.cat, data = datCat,
##' ordered = paste0("u", 1:8),
##' parameterization = "theta",
##' ID.fac = "std.lv", ID.cat = "Wu.Estabrook.2016",
##' group = "g", longFacNames = longFacNames,
##' group.equal = c("thresholds","loadings",
##' "intercepts"),
##' long.equal = c("thresholds","loadings",
##' "intercepts"))
##' summary(syntax.scalar) # summarize model features
##' mod.scalar <- as.character(syntax.scalar) # save as text
##' cat(mod.scalar) # print/view lavaan syntax
##' ## fit model to data
##' fit.scalar <- cfa(mod.scalar, data = datCat, group = "g",
##' ordered = paste0("u", 1:8), parameterization = "theta")
##' ## test equivalence of intercepts, given equal thresholds & loadings
##' anova(fit.metric, fit.scalar)
##'
##'
##' ## For a single table with all results, you can pass the models to
##' ## summarize to the compareFit() function
##' Comparisons <- compareFit(fit.config, fit.thresh, fit.metric, fit.scalar)
##' summary(Comparisons)
##'
##'
##' ## ------------------------------------------------------
##' ## NOT RECOMMENDED: fit several invariance models at once
##' ## ------------------------------------------------------
##' test.seq <- c("thresholds","loadings","intercepts","means","residuals")
##' meq.list <- list()
##' for (i in 0:length(test.seq)) {
##' if (i == 0L) {
##' meq.label <- "configural"
##' group.equal <- ""
##' long.equal <- ""
##' } else {
##' meq.label <- test.seq[i]
##' group.equal <- test.seq[1:i]
##' long.equal <- test.seq[1:i]
##' }
##' meq.list[[meq.label]] <- measEq.syntax(configural.model = mod.cat,
##' data = datCat,
##' ordered = paste0("u", 1:8),
##' parameterization = "theta",
##' ID.fac = "std.lv",
##' ID.cat = "Wu.Estabrook.2016",
##' group = "g",
##' group.equal = group.equal,
##' longFacNames = longFacNames,
##' long.equal = long.equal,
##' return.fit = TRUE)
##' }
##'
##' evalMeasEq <- compareFit(meq.list)
##' summary(evalMeasEq)
##'
##'
##' ## -----------------
##' ## Binary indicators
##' ## -----------------
##'
##' ## borrow example data from Mplus user guide
##' myData <- read.table("https://www.statmodel.com/usersguide/chap5/ex5.16.dat")
##' names(myData) <- c("u1","u2","u3","u4","u5","u6","x1","x2","x3","g")
##' bin.mod <- '
##' FU1 =~ u1 + u2 + u3
##' FU2 =~ u4 + u5 + u6
##' '
##' ## Must SIMULTANEOUSLY constrain thresholds, loadings, and intercepts
##' test.seq <- list(strong = c("thresholds","loadings","intercepts"),
##' means = "means",
##' strict = "residuals")
##' meq.list <- list()
##' for (i in 0:length(test.seq)) {
##' if (i == 0L) {
##' meq.label <- "configural"
##' group.equal <- ""
##' long.equal <- ""
##' } else {
##' meq.label <- names(test.seq)[i]
##' group.equal <- unlist(test.seq[1:i])
##' # long.equal <- unlist(test.seq[1:i])
##' }
##' meq.list[[meq.label]] <- measEq.syntax(configural.model = bin.mod,
##' data = myData,
##' ordered = paste0("u", 1:6),
##' parameterization = "theta",
##' ID.fac = "std.lv",
##' ID.cat = "Wu.Estabrook.2016",
##' group = "g",
##' group.equal = group.equal,
##' #longFacNames = longFacNames,
##' #long.equal = long.equal,
##' return.fit = TRUE)
##' }
##'
##' evalMeasEq <- compareFit(meq.list)
##' summary(evalMeasEq)
##'
##'
#TODO: add ternary example? or note to start with EQ thresholds?
##'
##' ## ---------------------
##' ## Multilevel Invariance
##' ## ---------------------
##'
##' ## To test invariance across levels in a MLSEM, specify syntax as though
##' ## you are fitting to 2 groups instead of 2 levels.
##'
##' mlsem <- ' f1 =~ y1 + y2 + y3
##' f2 =~ y4 + y5 + y6 '
##' ## metric invariance
##' syntax.metric <- measEq.syntax(configural.model = mlsem, meanstructure = TRUE,
##' ID.fac = "std.lv", sample.nobs = c(1, 1),
##' group = "cluster", group.equal = "loadings")
##' ## by definition, Level-1 means must be zero, so fix them
##' syntax.metric <- update(syntax.metric,
##' change.syntax = paste0("y", 1:6, " ~ c(0, NA)*1"))
##' ## save as a character string
##' mod.metric <- as.character(syntax.metric, groups.as.blocks = TRUE)
##' ## convert from multigroup to multilevel
##' mod.metric <- gsub(pattern = "group:", replacement = "level:",
##' x = mod.metric, fixed = TRUE)
##' ## fit model to data
##' fit.metric <- lavaan(mod.metric, data = Demo.twolevel, cluster = "cluster")
##' summary(fit.metric)
##' }
##' @export
measEq.syntax <- function(configural.model, ..., ID.fac = "std.lv",
ID.cat = "Wu.Estabrook.2016", ID.thr = c(1L, 2L),
group = NULL, group.equal = "", group.partial = "",
longFacNames = list(), longIndNames = list(),
long.equal = "", long.partial = "", auto = "all",
warn = TRUE, debug = FALSE, return.fit = FALSE) {
mc <- match.call(expand.dots = TRUE)
## evaluate promises that might change before being evaluated
## (e.g., for-loops or Monte Carlo studies)
mc$ID.fac <- eval(ID.fac)
mc$ID.cat <- eval(ID.cat)
mc$ID.thr <- eval(ID.thr)
mc$group <- eval(group)
mc$group.equal <- eval(group.equal)
mc$group.partial <- eval(group.partial)
mc$longFacNames <- eval(longFacNames)
mc$longIndNames <- eval(longIndNames)
mc$long.equal <- eval(long.equal)
mc$long.partial <- eval(long.partial)
mc$auto <- eval(auto)
## -------------------------------
## Preliminary checks on arguments
## -------------------------------
## check identification arguments
ID.fac <- tolower(as.character(ID.fac)[1])
if (ID.fac %in% c("std.lv","unit.variance","uv",
"fixed.factor","fixed-factor")) {
ID.fac <- "uv"
mc$ID.fac <- "uv"
} else if (ID.fac %in% c("auto.fix.first","unit.loading","ul","marker","ref",
"marker.variable","marker-variable","ref.indicator",
"reference.indicator","reference-indicator")) {
ID.fac <- "ul"
mc$ID.fac <- "ul"
} else if (ID.fac %in% c("fx","ec","effects","effects.coding",
"effects-coding","effects.code","effects-code")) {
ID.fac <- "fx"
mc$ID.fac <- "fx"
} else stop('Invalid choice for argument: ID.fac = "', ID.fac, '"')
ID.cat <- tolower(as.character(ID.cat)[1])
if (ID.cat %in% c("wu.estabrook.2016","wu.2016","wu.estabrook","wu","wu2016")) {
ID.cat <- "wu"
mc$ID.cat <- "wu"
} else if (ID.cat %in% c("millsap","millsap.2004","millsap.tein.2004")) {
ID.cat <- "millsap"
mc$ID.cat <- "millsap"
} else if (ID.cat %in% c("default","mplus","muthen")) {
ID.cat <- "mplus"
mc$ID.cat <- "mplus"
} else if (ID.cat %in% c("joreskog","lisrel")) {
ID.cat <- "lisrel"
mc$ID.cat <- "lisrel"
} else stop('Invalid choice for argument: ID.cat = "', ID.cat, '"')
## pass arguments to lavaan
dots <- list(...)
dots$debug <- debug
dots$warn <- warn
dots$group <- group
## check lavaan arguments
if (!is.null(dots$model)) stop('A model should be specified only with the ',
'"configural.model=" argument, not "model=".')
if (is.null(dots$meanstructure)) {
constrMeanStr <- c("intercepts","means") %in% c(group.equal, long.equal)
if (is.null(dots$data) && is.null(dots$sample.mean) &&
is.null(dots$sample.th) && !any(constrMeanStr)) {
dots$meanstructure <- FALSE
mc$meanstructure <- FALSE
} else {
dots$meanstructure <- TRUE
mc$meanstructure <- TRUE
}
}
## lavaan template from configural model
if (inherits(configural.model, c("lavaan","lavaanList"))) {
lavTemplate <- configural.model
## check that first loading is not constrained unless ID.fac == "ul"
if (ID.fac != "ul" && lavInspect(lavTemplate, "options")$auto.fix.first) {
stop('The "configural.model" argument is a lavaan model fitted using ',
'auto.fix.first=TRUE (or std.lv=FALSE), which conflicts with the ',
'requested "ID.fac" method. To generate syntax using the fixed-',
'factor or effects-coding method of identification, set std.lv=TRUE',
' to prevent initial loadings from being fixed to 1 in the syntax.')
}
## check that if (!meanstructure), not set TRUE in call
if (!is.null(mc$meanstructure)) {
if (!lavInspect(lavTemplate, "options")$meanstructure && mc$meanstructure)
stop('Request for meanstructure=TRUE requires configural.model to be ',
'fitted with meanstructure=TRUE')
} else mc$meanstructure <- lavInspect(lavTemplate, "options")$meanstructure # just in case
} else {
lavArgs <- dots
if (ID.fac != "ul") lavArgs$std.lv <- TRUE
lavArgs$model <- configural.model # let lavaan() do its own checks
lavArgs$do.fit <- FALSE
lavTemplate <- do.call("cfa", lavArgs, envir = getNamespace("lavaan"))
mc$meanstructure <- lavInspect(lavTemplate, "options")$meanstructure # just in case
mc$configural.model <- lavTemplate
}
## warn about regression parameters
if (any(parTable(lavTemplate)$op == "~"))
warning('Regression operator (~) detected. measEq.syntax() was designed ',
'only for multigroup CFA models. Regression operator (~) could be ',
'used to define a higher-order factor (although the =~ operator ',
'is easier), but structural regressions should not be specified.')
## prevent inconsistency
if (lavInspect(lavTemplate, "categorical") &&
ID.cat %in% c("wu","mplus") &&
ID.fac != "uv") warning('For factors measured only by categorical ',
'indicators, constraints on intercepts are ',
'insufficient to identify latent means when the ',
'intercepts are already fixed to zero in order ',
'to identify latent item scales. To prevent',
'underidentified models, it is recommended to ',
'instead set ID.fac = "std.lv".')
## convert *.partial strings to parTables
if (is.character(group.partial)) {
if ((group.partial == "")[1L] && length(group.partial) == 1L) {
group.partial <- data.frame(stringsAsFactors = FALSE, lhs = character(0),
op = character(0), rhs = character(0))
} else {
group.partial <- lavaan::lavParseModelString(group.partial,
as.data.frame. = TRUE,
warn = warn, debug = debug)
}
} #TODO: else {extract information from a measEq.partial object}
if (is.character(long.partial)) {
if ((long.partial == "")[1L] && length(long.partial) == 1L) {
long.partial <- data.frame(stringsAsFactors = FALSE, lhs = character(0),
op = character(0), rhs = character(0))
} else {
long.partial <- lavaan::lavParseModelString(long.partial,
as.data.frame. = TRUE,
warn = warn, debug = debug)
}
} #TODO: else {extract information from a measEq.partial object}
## only relevant when there are longitudinal factors
if (length(longFacNames) > 0L) {
if (!is.atomic(auto)) stop("'auto' must be a non-negative integer or the character 'all'.")
if (is.logical(auto)) { if (auto) auto <- "all" else auto <- 0L}
if (is.factor(auto)) auto <- as.character(auto)
if (is.character(auto) && auto != "all")
stop("'auto' must be a non-negative integer or the character 'all'.")
if (is.numeric(auto)) {
auto <- as.integer(auto[1]) # only the first integer
if (auto < 1L) auto <- NULL
}
mc$auto <- auto
}
## extract options and other information
if (is.null(mc$parameterization)) {
parameterization <- lavInspect(lavTemplate, "options")$parameterization
} else {
parameterization <- try(eval(mc$parameterization), silent = TRUE)
if (inherits(parameterization, "try-error")) {
parameterization <- try(eval.parent(mc$parameterization, 1), silent = TRUE)
}
}
if (is.null(mc$meanstructure)) {
meanstructure <- lavInspect(lavTemplate, "options")$meanstructure
} else meanstructure <- mc$meanstructure
nG <- lavInspect(lavTemplate, "ngroups")
## names of ordinal indicators, number of thresholds for each
allOrdNames <- lavNames(lavTemplate, type = "ov.ord")
if (length(allOrdNames)) {
#TODO: add nThr= argument (named numeric vector?) so data= not required
nThr <- table(sapply(strsplit(lavNames(lavTemplate, "th"),
split = "|", fixed = TRUE),
"[", i = 1))
} else nThr <- numeric(0)
if (length(allOrdNames) && ID.cat == "millsap") {
## Check for ID.thr
if (is.numeric(ID.thr)) {
if (length(ID.thr) == 1L) ID.thr <- rep(ID.thr, 2)
ID.thr <- sapply(allOrdNames, function(x) ID.thr[1:2], simplify = FALSE)
} else if (is.list(ID.thr)) {
if (length((setdiff(allOrdNames, names(ID.thr)))))
stop('If the same thresholds will not be used for all ordered indicators,',
' then "ID.thr" must specify 2 integers per ordered indicator in ',
'a named list (using names of ordered indicators).')
}
## check identification methods
ID.fac <- "ul"
if (parameterization != "theta") stop('If ID.cat == "millsap", you must ',
'use parameterization = "theta"')
}
if (length(allOrdNames) && ID.cat == "lisrel") {
if (parameterization != "theta") stop('If ID.cat == "lisrel", you must ',
'use parameterization = "theta"')
## thresholds must be constrained to equality
if (!"thresholds" %in% group.equal) group.equal <- c("thresholds", group.equal)
if (!"thresholds" %in% long.equal) long.equal <- c("thresholds", long.equal)
## so remove any thresholds from *.partial
partial.thr <- group.partial$op == "|"
if (any(partial.thr)) group.partial <- group.partial[!partial.thr, ]
partial.thr <- long.partial$op == "|"
if (any(partial.thr)) long.partial <- long.partial[!partial.thr, ]
}
if (length(allOrdNames) && ID.cat %in% c("millsap","mplus")) {
## scalar invariance implies equal intercepts, even though they are
## fixed to zero anyway. This will correctly trigger freeing latent mean(s)
if ("loadings" %in% group.equal && "thresholds" %in% group.equal &&
!"intercepts" %in% group.equal) group.equal <- c("intercepts", group.equal)
if ("loadings" %in% long.equal && "thresholds" %in% long.equal &&
!"intercepts" %in% long.equal) long.equal <- c("intercepts", long.equal)
}
if (!meanstructure) {
## make sure *.equal includes no mean-structure parameters
eq.means <- which(group.equal %in% c("means","intercepts"))
if (length(eq.means)) group.equal <- group.equal[-eq.means]
eq.means <- which(long.equal %in% c("means","intercepts"))
if (length(eq.means)) long.equal <- long.equal[-eq.means]
## make sure *.partial includes no mean-structure parameters
partial.means <- group.partial$op == "~1"
if (any(partial.means)) group.partial <- group.partial[!partial.means, ]
partial.means <- long.partial$op == "~1"
if (any(partial.means)) long.partial <- long.partial[!partial.means, ]
}
mc$group.partial <- group.partial[c("lhs","op","rhs")] #FIXME: any more? "block" for multilevel?
mc$long.partial <- long.partial[c("lhs","op","rhs")]
## check logic of constraints
if (length(allOrdNames) && parameterization == "delta") {
if ("residuals" %in% long.equal) {
stop('Residual variances cannot be tested for invariance ',
'across repeated measures when parameterization = "delta". \n',
'Please set parameterization = "theta". \n')
}
if ("residuals" %in% group.equal) {
stop('Residual variances cannot be tested for invariance ',
'across groups when parameterization = "delta". \n',
'Please set parameterization = "theta". \n')
}
}
if (warn) {
if (any(c("lv.variances","lv.autocov") %in% long.equal) && !"loadings" %in% long.equal)
warning('Latent (co)variances are not comparable over repeated measures ',
'if their respective factor loadings are not equal ',
'over repeated measures.')
if (any(c("lv.variances","lv.covariances") %in% group.equal) && !"loadings" %in% group.equal)
warning('Latent (co)variances are not comparable across groups ',
'if their respective factor loadings are not equal across groups.')
if ("intercepts" %in% long.equal && !"loadings" %in% long.equal)
warning('Indicator intercepts are not comparable over repeated measures ',
'if their respective factor loadings are not equal ',
'over repeated measures.')
if ("intercepts" %in% group.equal && !"loadings" %in% group.equal)
warning('Indicator intercepts are not comparable over across groups ',
'if their respective factor loadings are not equal across groups.')
if ("means" %in% long.equal && !all(c("loadings","intercepts") %in% long.equal))
warning('Latent means are not comparable over repeated measures if their ',
'respective factor loadings and intercepts are not equal ',
'over repeated measures.')
if ("means" %in% group.equal && !all(c("loadings","intercepts") %in% group.equal))
warning('Latent means are not comparable across groups if their ',
'respective factor loadings and intercepts are not equal ',
'across groups.')
if ("resid.autocov" %in% long.equal && !"residuals" %in% long.equal)
warning('Residual auto-covariances might not be comparable over repeated ',
'measures if their respective residual variances are not equal ',
'over repeated measures.')
if ("residual.covariances" %in% group.equal && !"residuals" %in% group.equal)
warning('Residual covariances might not be comparable across groups if ',
'their respective residual variances are not equal across groups.')
if ("lv.autocov" %in% long.equal && !"lv.variances" %in% long.equal)
warning('Latent auto-covariances might not be comparable over repeated ',
'measures if their respective latent variances are not equal ',
'over repeated measures.')
if ("lv.covariances" %in% group.equal && !"lv.variances" %in% group.equal)
warning('Latent covariances might not be comparable across groups if ',
'their respective latent variances are not equal across groups.')
}
## ------------------
## Parameter Matrices
## ------------------
## Parameter matrices used for labels, fixed/free values, and whether to specify
GLIST.free <- lavInspect(lavTemplate, "free")
if (nG == 1L) GLIST.free <- list(`1` = GLIST.free)
## only save relevant matrices to specify
pmats <- intersect(c("tau","lambda","beta",
if (meanstructure) "nu" else NULL ,
"theta",
if (meanstructure) "alpha" else NULL ,
"psi",
if (length(allOrdNames) && parameterization == "delta") "delta" else NULL),
names(GLIST.free[[1]]))
if ("beta" %in% pmats && ID.fac != "ul") {
ID.fac <- "ul" #FIXME: could use effects-coding with relative ease?
mc$ID.fac <- ID.fac
message('Higher-order factors detected. ID.fac set to "ul".')
}
## matrices with estimates depends on class of model
if (inherits(lavTemplate, "lavaan")) {
GLIST.est <- lavInspect(lavTemplate, "est")
if (nG == 1L) GLIST.est <- list(`1` = GLIST.est)
} else if (inherits(lavTemplate, "lavaanList")) {
nn <- names(lavTemplate@Model@GLIST) #FIXME: will @Model continue to exist?
GLIST.est <- list()
for (g in 1:nG) {
GLIST.est[[g]] <- list()
for (p in pmats) {
GLIST.est[[g]][[p]] <- lavTemplate@Model@GLIST[[ which(nn == p)[g] ]]
## add dimnames to matrices
dimnames(GLIST.est[[g]][[p]]) <- dimnames(GLIST.free[[g]][[p]])
}
}
}
for (g in 1:nG) {
GLIST.est[[g]] <- GLIST.est[[g]][pmats]
GLIST.free[[g]] <- GLIST.free[[g]][pmats]
if (g > 1L) {
## make sure all groups have the same observed & latent variables
same.obs <- all(rownames(GLIST.free[[g]]$lambda) == rownames(GLIST.free[[1]]$lambda))
same.lat <- all(colnames(GLIST.free[[g]]$lambda) == colnames(GLIST.free[[1]]$lambda))
if (!same.obs) stop('Models contain different observed variables across ',
'groups/blocks. Configural invariance impossible.')
if (!same.lat) stop('Models contain different latent variables across ',
'groups/blocks. Configural invariance impossible.')
}
}
## FIXME: check for others? (e.g., test invariance across multiple levels?)
## In general, specify if GLIST.free > 0 | (GLIST.free == 0 & GLIST.est != 0)
## - tau : specify all
## - lambda: specify any nonzero in free + fixed-nonzero (e.g., auto.fix.first)
## - beta : treat as second-order lambda
## - nu : specify all
## - theta : specify diagonal (unless delta?) + any nonzero off-diagonal
## - delta : specify ONLY if parameterization == "delta"
## - alpha : specify all
## - psi : specify all
GLIST.specify <- sapply(names(GLIST.free), function(g) list())
for (g in 1:nG) {
for (p in pmats) {
## THRESHOLDS
if (p == "tau") {
GLIST.specify[[g]]$tau <- GLIST.free[[g]]$tau == 0
GLIST.specify[[g]]$tau[ , 1] <- TRUE
next
}
## LOADINGS
if (p == "lambda") {
free.loading <- GLIST.free[[g]]$lambda > 0L
fixed.nonzero.loading <- GLIST.free[[g]]$lambda == 0L & GLIST.est[[g]]$lambda != 0
GLIST.specify[[g]]$lambda <- free.loading | fixed.nonzero.loading
next
}
## SECOND-ORDER LOADINGS
if (p == "beta") {
free.loading <- GLIST.free[[g]]$beta > 0L
fixed.nonzero.loading <- GLIST.free[[g]]$beta == 0L & GLIST.est[[g]]$beta != 0
GLIST.specify[[g]]$beta <- free.loading | fixed.nonzero.loading
next
}
## INTERCEPTS
if (p == "nu") {
GLIST.specify[[g]]$nu <- GLIST.free[[g]]$nu == 0
GLIST.specify[[g]]$nu[ , 1] <- TRUE
next
}
## LATENT MEANS
if (p == "alpha") {
GLIST.specify[[g]]$alpha <- GLIST.free[[g]]$alpha == 0
GLIST.specify[[g]]$alpha[ , 1] <- TRUE
next
}
## LATENT (CO)VARIANCES
if (p == "psi") {
GLIST.specify[[g]]$psi <- matrix(TRUE, nrow = nrow(GLIST.free[[g]]$psi),
ncol = ncol(GLIST.free[[g]]$psi),
dimnames = dimnames(GLIST.free[[g]]$psi))
## only specify lower triangle
GLIST.specify[[g]]$psi[upper.tri(GLIST.specify[[g]]$psi)] <- FALSE
next
}
## RESIDUAL (CO)VARIANCES
if (p == "theta") {
free.var <- GLIST.free[[g]]$theta > 0L
fixed.nonzero.var <- GLIST.free[[g]]$theta == 0L & GLIST.est[[g]]$theta != 0
GLIST.specify[[g]]$theta <- free.var | fixed.nonzero.var
## can't specify for ordinal indicators using delta parameterization
if (parameterization == "delta")
diag(GLIST.specify[[g]]$theta)[allOrdNames] <- FALSE
## only specify lower triangle
GLIST.specify[[g]]$theta[upper.tri(GLIST.specify[[g]]$theta)] <- FALSE
next
}
## SCALING FACTORS (delta parameters for latent item-responses)
if (p == "delta") {
GLIST.specify[[g]]$delta <- GLIST.free[[g]]$delta == 1
GLIST.specify[[g]]$delta[ , 1] <- parameterization == "delta"
}
## end loops
}
}
## check for any cross-loadings
#TODO: special check for bifactor models possible? Find factors whose indicators all cross-load...
anyXload <- FALSE
for (g in 1:nG) {
if (any(apply(GLIST.specify[[g]]$lambda, 1, sum) > 1)) anyXload <- TRUE
}
## can the effects-coding identification method be used?
if (ID.fac == "fx" && anyXload) {
stop('Effects-coding method of factor identification ',
'("ID.fac") unavailable in models with cross-loadings.')
}
## Warn about constraining intercepts but not means
freeMeans <- ("intercepts" %in% group.equal && !("means" %in% group.equal)) ||
("intercepts" %in% long.equal && !("means" %in% long.equal) )
if (ID.fac == "uv" && anyXload && freeMeans) {
warning('A factor\'s mean cannot be freed unless it has at least one ',
'indicator without a cross-loading whose intercept is constrained ',
'to equality. Use cat(as.character()) to check whether the syntax ',
'returned by measEq.syntax() must be manually adapted to free the ',
'necessary latent means.')
}
## If it is estimated in the user's configural model, free it (NA).
## If it is specified as fixed but != 0, retain fixed value.
GLIST.values <- sapply(names(GLIST.free), function(g) list())
for (g in 1:nG) {
GLIST.values[[g]] <- mapply(function(est, free) {
est[free > 0L] <- NA
est
}, SIMPLIFY = FALSE, est = GLIST.est[[g]], free = GLIST.free[[g]])
## constrain first loadings to 1 and first indicators to 0?
if (ID.fac == "ul") {
## matrix to store whether each indicator is a reference indicator
lambda.first <- matrix(FALSE, nrow = nrow(GLIST.values[[g]]$lambda),
ncol = ncol(GLIST.values[[g]]$lambda),
dimnames = dimnames(GLIST.values[[g]]$lambda))
if ("beta" %in% pmats) {
beta.first <- matrix(FALSE, nrow = nrow(GLIST.values[[g]]$beta),
ncol = ncol(GLIST.values[[g]]$beta),
dimnames = dimnames(GLIST.values[[g]]$beta))
}
## loop over factors to constrain loadings to 1
for (f in colnames(GLIST.values[[g]]$lambda)) {
## if any loading(s) is(are) fixed to 1 already, no changes needed
ones.lambda <- which(GLIST.values[[g]]$lambda[ , f] == 1L)
if ("beta" %in% pmats) ones.beta <- which(GLIST.values[[g]]$beta[ , f] == 1L)
any1.lambda <- length(ones.lambda) > 0L
any1.beta <- if ("beta" %in% pmats) length(ones.beta) > 0L else FALSE
if (!any1.lambda && !any1.beta) {
## If not already indicated, find the first indicator and fix it to 1.
## Prioritize latent indicators to be first (in case observed has cross-loading)
if ("beta" %in% pmats) {
indicators <- names(which(GLIST.specify[[g]]$beta[ , f]))
if (length(indicators)) {
first.indicator <- indicators[1]
} else first.indicator <- NULL
} else first.indicator <- NULL
if (length(first.indicator)) {
## only true if ("beta" %in% pmats)
GLIST.values[[g]]$beta[first.indicator, f] <- 1L
beta.first[first.indicator, f] <- TRUE
} else {
## no latent indicators, so look in lambda
indicators <- names(which(GLIST.specify[[g]]$lambda[ , f]))
first.indicator <- indicators[1] #FIXME: no chance of NA by now, right?
GLIST.values[[g]]$lambda[first.indicator, f] <- 1L
lambda.first[first.indicator, f] <- TRUE
}
## otherwise, use first fixed == 1 indicator as the marker variable
} else if (any1.beta) {
beta.first[ones.beta[1], f] <- TRUE
} else if (any1.lambda) {
lambda.first[ones.lambda[1], f] <- TRUE
}
}
## loop over indicators to constrain intercepts to zero
if (meanstructure) {
## manifest indicators
for (i in rownames(GLIST.specify[[g]]$lambda)) {
## for the first indicator of a construct, constrain to zero
first.indicator <- lambda.first[i, ]
if (sum(first.indicator) > 1L)
stop('The intercept of indicator "', i, '" can only be fixed to zero ',
'in order to identify one latent mean, but it is specified as ',
'the first indicator of the following factors:\n\t',
paste(names(which(first.indicator)), collapse = ", "), '\n',
'Please respecify the model so that each factor has a unique ',
'first indicator to use as a reference indicator.')
if (any(first.indicator)) GLIST.values[[g]]$nu[i, 1] <- 0
}
## latent indicators of higher-order constructs
if ("beta" %in% pmats) for (i in rownames(GLIST.specify[[g]]$beta)) {
## for the first indicator of a construct, constrain to zero
first.indicator <- beta.first[i, ]
if (sum(first.indicator) > 1L)
stop('The intercept of indicator "', i, '" can only be fixed to zero ',
'in order to identify one factor mean, but it is specified as ',
'the first indicator of the following factors:\n\t',
paste(names(which(first.indicator)), collapse = ", "), '\n',
'Please respecify the model so that each factor has a unique ',
'first indicator to use as a reference indicator.')
if (any(first.indicator)) {
GLIST.values[[g]]$alpha[i, 1] <- 0
} else GLIST.values[[g]]$alpha[i, 1] <- NA
}
}
}
}
## Make labels
GLIST.labels <- sapply(names(GLIST.free), function(g) list())
for (g in 1:nG) {
for (p in pmats) {
if (p == "tau") {
## THRESHOLDS
GLIST.labels[[g]]$tau <- cbind(gsub(x = rownames(GLIST.free[[g]]$tau),
pattern = "|t", replacement = ".thr",
fixed = TRUE))
dimnames(GLIST.labels[[g]]$tau) <- dimnames(GLIST.free[[g]]$tau)
} else {
## ANY OTHER PARAMETERS
GLIST.labels[[g]][[p]] <- matrix("", nrow = nrow(GLIST.free[[g]][[p]]),
ncol = ncol(GLIST.free[[g]][[p]]),
dimnames = dimnames(GLIST.free[[g]][[p]]))
for (RR in rownames(GLIST.free[[g]][[p]])) {
for (CC in colnames(GLIST.free[[g]][[p]])) {
GLIST.labels[[g]][[p]][RR, CC] <- getLabel(GLIST.labels[[g]],
parMat = p,
RR = RR, CC = CC)
}
}
}
## end loops
}
## no labels for scaling factors (cannot equate, not a measuremet parameter)
GLIST.labels[[g]]$delta <- NULL
}
## ------------------------------------
## Preliminary checks on model and data
## ------------------------------------
## check longitudinal factor names
if (!is.list(longFacNames)) stop('"longFacNames" must be a list of character vectors.')
## check that no longitudinal factors are only at 1 occasion
if (length(longFacNames)) longFacNames <- longFacNames[sapply(longFacNames, length) > 1L]
## also check longIndNames, and each non-NULL element
if (!is.list(longIndNames)) stop('"longIndNames" must be a list of character vectors.')
if (length(longIndNames)) {
longIndList <- sapply(longIndNames, is.character)
if (!all(longIndList)) stop('"longIndNames" must be a list of character vectors.')
## No problem if any(length == 1L). It just won't be constrained.
}
## names of factors in syntax
allFacNames <- lapply(GLIST.free, function(x) colnames(x$lambda))
## collapse names of longitudinal factors (plus non-longitudinal factors)
# reducedFacNames <- c(names(longFacNames), setdiff(unlist(allFacNames),
# unlist(longFacNames)))
## check for longitudinal indicator names, automatically generate if empty
make.longIndNames <- length(longIndNames) == 0L
for (f in names(longFacNames)) {
## time-specific factor names
fs <- longFacNames[[f]]
nT <- length(fs) # number of occasions
## get indicators of each
indNames <- sapply(fs, function(ff) {
names(which(GLIST.specify[[1]]$lambda[ , ff]))
}, simplify = FALSE)
if (make.longIndNames) {
# check for same number of indicators, match across factors
nInd <- length(indNames[[1]])
if (!all(sapply(indNames, length) == nInd))
stop('The number of indicators for longitudinal factor "', f,
'" differs across measurement occasions. Please use the ',
'"longIndNames" argument to specify which longitudinal indicators',
' are the same indicator on different occasions of measurement.')
if (nInd > 0L) for (i in 1:nInd) {
longIndNames[[paste0("._", f, "_.ind.", i)]] <- sapply(indNames, "[",
i = i,
USE.NAMES = FALSE)
}
} else {
## add unique indicators per factor (omitted from user-specified matches) ## NO LONGER NECESSARY
# for (i in fs) {
# extraIndicators <- setdiff(indNames[[i]], unlist(longIndNames[[f]]))
# longIndNames[[f]][extraIndicators] <- extraIndicators
# }
}
}
## check none have cross-loadings
longIndTable <- table(unlist(longIndNames))
if (any(longIndTable > 1L))
stop('Some longitudinal indicators define more than one factor:\n ',
paste(names(longIndTable[longIndTable > 1L]), collapse = ", "), "\n ",
'The "longIndNames=" argument must be explicitly declared.')
## check equivalence of data type (ordinal vs. continuous) across time
longOrdNames <- sapply(longIndNames, "%in%", table = allOrdNames, simplify = FALSE)
someNotAll <- sapply(longOrdNames, function(i) any(i) & !all(i))
if (any(someNotAll)) {
stop('At least one longitudinal indicator is declared as "ordered" on',
' at least one, but not every, occasion: \n ',
paste(names(which(someNotAll)), collapse = ", "))
}
## check number of thresholds/categories is equivalent across time
allOrd <- sapply(longOrdNames, all)
if (length(allOrd)) for (i in which(allOrd)) {
checkThr <- nThr[ longIndNames[[ names(allOrd)[i] ]] ]
if (!all(checkThr == checkThr[1]))
stop('These "ordered" longitudinal indicators do not have the same ',
'number of thresholds (endorsed categories) on every occasion: \n',
paste(names(checkThr), collapse = ", "),
"\nConsider collapsing rarely endorsed categories.")
}
## create a backward-key for finding long(Fac/Ind)Names from variable names
longFacKey <- rep(names(longFacNames), times = sapply(longFacNames, length))
names(longFacKey) <- unlist(longFacNames)
longIndKey <- rep(names(longIndNames), times = sapply(longIndNames, length))
names(longIndKey) <- unlist(longIndNames)
mc$longFacNames <- longFacNames
mc$longIndNames <- longIndNames
## -----------------
## Apply constraints
## -----------------
## THRESHOLDS (+ intercept & variance ID constraints for allOrdNames)
## longitudinal constraints (one group at a time, but same across groups)
for (g in 1:nG) {
## loop over ordinal indicators
for (i in allOrdNames) {
## when other variables are this same indicator?
longInds <- names(longIndKey)[ which(longIndKey == longIndKey[i]) ]
if (length(longInds) == 0L) next
## keep track of how many thresholds for the i_th indicator have been
## constrained, in case identification constraints can be released
nEqThr <- 0L
## loop over thresholds of the i_th ordinal indicator
for (th in 1:(nThr[i])) {
## (ADD) constraints across repeated measures?
equate.long <- "thresholds" %in% long.equal
## check whether not to equate because it is in long.partial
partial.th <- long.partial$op == "|" & long.partial$rhs == paste0("t", th)
if (equate.long && any(partial.th)) {
partial.inds <- longIndNames[ long.partial$lhs[which(partial.th)] ]
equate.long <- !i %in% unlist(partial.inds)
}
## check whether to equate for identification (overrides *.partial)
if (ID.cat == "millsap") {
## always equate the first (or only, if binary)
if (th == ID.thr[[i]][1]) {
equate.long <- TRUE
## remove this from long.partial, if necessary
rm.th <- which(long.partial$lhs == longIndKey[i] & partial.th)
if (length(rm.th)) long.partial <- long.partial[-rm.th, ]
}
## for the first indicator of a construct, equate the second
fs <- which(GLIST.specify[[g]]$lambda[i, ])
first.indicator <- sapply(fs, function(f) {
lams <- GLIST.specify[[g]]$lambda[ , f]
lam.eq.1 <- which(GLIST.values[[g]]$lambda[ , f] == 1)
if (length(lam.eq.1)) return(names(lams[ lam.eq.1[1] ]) == i)
names(which(lams))[1] == i
})
if (th == ID.thr[[i]][2] && any(first.indicator)) {
equate.long <- TRUE
if (length(fs) > 1L && warn)
warning('Millsap & Tein`s (2004) identification constraints might ',
'not be optimal when the reference indicator ("', i,
'") has a cross-loading (on factors "',
paste0(names(fs), collapse = '", "'), '")')
## remove this from long.partial, if necessary
rm.th <- which(long.partial$lhs == longIndKey[i] & partial.th)
if (length(rm.th)) long.partial <- long.partial[-rm.th, ]
}
}
## apply longitudinal constraint?
if (equate.long) {
## iterate count of constrained thresholds
nEqThr <- nEqThr + 1L
## apply longitudinal constraint
this.th <- paste0(i, "|t", th)
first.th <- paste0(longInds[1], "|t", th)
GLIST.labels[[g]]$tau[this.th, 1] <- GLIST.labels[[g]]$tau[first.th, 1]
}
} ## end loop over thresholds
## check whether enough thresholds were equated to free
## IDENTIFICATION CONSTRAINTS on intercepts & residuals
equate.int <- "intercepts" %in% long.equal &&
!any(long.partial$lhs == longIndKey[i] & long.partial$op == "~1")
equate.resid <- "residuals" %in% long.equal &&
!any(long.partial$lhs == longIndKey[i] &
long.partial$rhs == longIndKey[i] &
long.partial$op == "~~") #FIXME: leave resid==0 for reference indicators
if (i == longInds[1]) {
if (ID.cat == "lisrel") {
## always estimate intercepts, and variances unless binary
GLIST.values[[g]]$nu[i, 1] <- NA
diag(GLIST.values[[g]]$theta)[i] <- if (nThr[i] == 1L) 1 else NA
} else {
## always set reference occasion's intercepts to 0 and variances to 1
GLIST.values[[g]]$nu[i, 1] <- 0
if (parameterization == "theta") {
diag(GLIST.values[[g]]$theta)[i] <- 1
} else {
GLIST.values[[g]]$delta[i, 1] <- 1
}
}
} else if (ID.cat == "wu") {
## priority to freeing intercepts
if (nEqThr == 0L || equate.int) {
GLIST.values[[g]]$nu[i, 1] <- 0
} else GLIST.values[[g]]$nu[i, 1] <- NA
if (nEqThr == 0L || (nEqThr < 2L && !equate.int) || equate.resid) {
## keep (residual) variances fixed
if (parameterization == "theta") {
diag(GLIST.values[[g]]$theta)[i] <- 1
} else {
GLIST.values[[g]]$delta[i, 1] <- 1
}
} else {
## free (residual) variances
if (parameterization == "theta") {
diag(GLIST.values[[g]]$theta)[i] <- NA
} else {
GLIST.values[[g]]$delta[i, 1] <- NA
}
}
} else if (ID.cat %in% c("mplus","millsap")) {
## never free intercepts, only variances
if (nEqThr == 0L || equate.resid) {
## keep (residual) variances fixed
if (parameterization == "theta") {
diag(GLIST.values[[g]]$theta)[i] <- 1
} else {
GLIST.values[[g]]$delta[i, 1] <- 1
}
} else {
## free (residual) variances
if (parameterization == "theta") {
diag(GLIST.values[[g]]$theta)[i] <- NA
} else {
GLIST.values[[g]]$delta[i, 1] <- NA
}
}
} else if (ID.cat == "lisrel") {
## always estimate intercepts, and variances unless binary
GLIST.values[[g]]$nu[i, 1] <- NA
diag(GLIST.values[[g]]$theta)[i] <- if (nThr[i] == 1L) 1 else NA
}
}
}
## group constraints
if (nG == 1L && ID.cat == "lisrel") {
## Single-group model for repeated measures:
## Longitudinal loop above only places LISREL equality constraints on
## thresholds. Here, still neeed to fix the first 2 == {0, 1}.
## loop over ordinal indicators
for (i in allOrdNames) {
## loop over thresholds of the i_th ordinal indicator
for (th in 1:(nThr[i])) {
## always fix the first (or only, if binary) to zero
if (th == 1L) GLIST.values[[g]]$tau[paste0(i, "|t", th), 1] <- 0
## always fix the second to one
if (th == 2L) GLIST.values[[g]]$tau[paste0(i, "|t", th), 1] <- 1
## estimate any others
if (th > 2L) GLIST.values[[g]]$tau[paste0(i, "|t", th), 1] <- NA
} ## end loop over thresholds
} ## end loop over ordinal indicators
} else if (nG > 1L) for (g in 1:nG) {
## loop over ordinal indicators
for (i in allOrdNames) {
## keep track of how many thresholds for the i_th indicator have
## constrained, in case identification constraints can be released
nEqThr <- 0L
## loop over thresholds of the i_th ordinal indicator
for (th in 1:(nThr[i])) {
## (REMOVE) constraints across groups?
equate.group <- "thresholds" %in% group.equal
## check whether not to equate because it is in group.partial
partial.th <- group.partial$lhs == i & group.partial$op == "|" &
group.partial$rhs == paste0("t", th)
if (equate.group) equate.group <- !any(partial.th)
## check whether to equate for identification (overrides *.partial)
if (ID.cat == "millsap") {
## always equate the first (or only, if binary)
if (th == ID.thr[[i]][1]) {
equate.group <- TRUE
## remove this from group.partial, if necessary
rm.th <- which(partial.th)
if (length(rm.th)) group.partial <- group.partial[-rm.th, ]
}
## for the first indicator of a construct, equate the second
fs <- which(GLIST.specify[[g]]$lambda[i, ])
first.indicator <- sapply(fs, function(f) {
lams <- GLIST.specify[[g]]$lambda[ , f]
lam.eq.1 <- which(GLIST.values[[g]]$lambda[ , f] == 1)
if (length(lam.eq.1)) return(names(lams[ lam.eq.1[1] ]) == i)
names(which(lams))[1] == i
})
if (th == ID.thr[[i]][2] && any(first.indicator)) {
equate.group <- TRUE
if (length(fs) > 1L && warn)
warning('Millsap & Tein`s (2004) identification constraints might ',
'not be optimal when the reference indicator ("', i,
'") has a cross-loading (on factors "',
paste0(names(fs), collapse = '", "'), '")')
## remove this from group.partial, if necessary
rm.th <- which(partial.th)
if (length(rm.th)) group.partial <- group.partial[-rm.th, ]
}
} else if (ID.cat == "lisrel") {
## always fix the first (or only, if binary) to zero
if (th == 1L) GLIST.values[[g]]$tau[paste0(i, "|t", th), 1] <- 0
## always fix the second to one
if (th == 2L) GLIST.values[[g]]$tau[paste0(i, "|t", th), 1] <- 1
## estimate any others
if (th > 2L) GLIST.values[[g]]$tau[paste0(i, "|t", th), 1] <- NA
}
## apply group-specific label, unless constrained
if (!equate.group) {
## row in GLIST
RR <- paste0(i, "|t", th)
GLIST.labels[[g]]$tau[RR, 1] <- paste0(GLIST.labels[[g]]$tau[RR, 1], ".g", g)
} else nEqThr <- nEqThr + 1L # iterate count of constrained thresholds
} ## end loop over thresholds
## check whether enough thresholds were equated to free
## IDENTIFICATION CONSTRAINTS on intercepts & residuals.
## Note: Group 1 constraints already set in longitudinal loop, ONLY if
## there are repeated measures identified by longInds.
## Section below only RELEASES constraints.
## DON'T OVERWRITE FREED CONSTRAINTS AFTER TIME 1.
equate.int <- "intercepts" %in% group.equal &&
!any(group.partial$lhs == i & group.partial$op == "~1")
equate.resid <- "residuals" %in% group.equal &&
!any(group.partial$lhs == i & group.partial$rhs == i & group.partial$op == "~~")
if (g > 1L && ID.cat == "wu") {
## priority to freeing intercepts
#FIXME: binary indicators, latent mean arbitrarily freed, nesting problems
if (nEqThr >= 1L && !equate.int) GLIST.values[[g]]$nu[i, 1] <- NA
if ((nEqThr >= 2L || (nEqThr >= 1L && equate.int)) && !equate.resid) {
## free (residual) variances
if (parameterization == "theta") {
diag(GLIST.values[[g]]$theta)[i] <- NA
} else {
GLIST.values[[g]]$delta[i, 1] <- NA
}
}
} else if (g > 1L && ID.cat %in% c("mplus","millsap")) {
## never free intercepts, only variances
if (nEqThr >= 1L && !equate.resid) {
## free (residual) variances
if (parameterization == "theta") {
diag(GLIST.values[[g]]$theta)[i] <- NA
} else {
GLIST.values[[g]]$delta[i, 1] <- NA
}
}
} else if (ID.cat == "lisrel") {
## always estimate intercepts, and variances unless binary
GLIST.values[[g]]$nu[i, 1] <- NA
diag(GLIST.values[[g]]$theta)[i] <- if (nThr[i] == 1L) 1 else NA
}
}
}
## LATENT MEANS
## longitudinal constraints (one group at a time, but same across groups)
if (meanstructure) for (g in 1:nG) {
## fix or free factor means?
if (ID.fac == "uv") {
GLIST.values[[g]]$alpha[ , 1] <- 0 # free below, if any loading is constrained
## freed when any loading is constrained to equality
} else if ("beta" %in% pmats) {
## latent indicators of any higher-order factors already set to 0 or NA
## in GLIST.values loop above
} else GLIST.values[[g]]$alpha[ , 1] <- NA
## loop over factors
for (f in rownames(GLIST.labels[[g]]$alpha)) {
## which other variables are this same factor?
longFacs <- names(longFacKey)[ which(longFacKey == longFacKey[f]) ]
if (length(longFacs) == 0L) {
## not a longitudinal factor, set first group's mean to 0 for Millsap
if (ID.cat == "millsap" && g == 1L) GLIST.values[[g]]$alpha[f, 1] <- 0
next
}
## first time a factor is measured, set first group's mean to 0 for Millsap
if (ID.cat == "millsap" && g == 1L && longFacs[1] == f) {
GLIST.values[[g]]$alpha[f, 1] <- 0
}
## assign labels
equate.means <- "means" %in% long.equal &&
!any(long.partial$lhs == longFacKey[f] & long.partial$op == "~1")
if (equate.means) {
GLIST.labels[[g]]$alpha[f, 1] <- GLIST.labels[[g]]$alpha[longFacs[1], 1]
}
}
}
## group constraints
if (meanstructure && nG > 1L) for (g in 1:nG) {
## loop over factors
for (f in rownames(GLIST.labels[[g]]$alpha)) {
## assign labels
equate.means <- "means" %in% group.equal &&
!any(group.partial$lhs == f & group.partial$op == "~1")
if (!equate.means) {
GLIST.labels[[g]]$alpha[f, 1] <- paste0(GLIST.labels[[g]]$alpha[f, 1], ".g", g)
}
}
}
## LATENT VARIANCES
## longitudinal constraints (one group at a time, but same across groups)
for (g in 1:nG) {
## fix or free factor variances?
if (ID.fac == "uv") {
diag(GLIST.values[[g]]$psi) <- 1 # free below, if any loading is constrained
## freed when any loading is constrained to equality
} else diag(GLIST.values[[g]]$psi) <- NA
## loop over factors
for (f in colnames(GLIST.labels[[g]]$lambda)) {
## which other variables are this same factor?
longFacs <- names(longFacKey)[ which(longFacKey == longFacKey[f]) ]
if (length(longFacs) == 0L) next
## assign labels
equate.var <- "lv.variances" %in% long.equal &&
!any(long.partial$lhs == longFacKey[f] &
long.partial$op == "~~" &
long.partial$rhs == longFacKey[f])
if (equate.var) {
GLIST.labels[[g]]$psi[f, f] <- GLIST.labels[[g]]$psi[longFacs[1], longFacs[1]]
}
}
}
## group constraints
if (nG > 1L) for (g in 1:nG) {
## loop over factors
for (f in colnames(GLIST.labels[[g]]$lambda)) {
## assign labels
equate.var <- "lv.variances" %in% group.equal &&
!any(group.partial$lhs == f &
group.partial$op == "~~" &
group.partial$rhs == f)
if (!equate.var) {
GLIST.labels[[g]]$psi[f, f] <- paste0(GLIST.labels[[g]]$psi[f, f], ".g", g)
}
}
}
## LOADINGS
## longitudinal constraints (one group at a time, but same across groups)
for (g in 1:nG) {
## loop over factors
for (f in colnames(GLIST.labels[[g]]$lambda)) {
## which other factors are this same factor?
longFacs <- names(longFacKey)[ which(longFacKey == longFacKey[f]) ]
if (length(longFacs) == 0L) next
## loop over any manifest indicators within each factor
for (i in names(which(GLIST.specify[[g]]$lambda[ , f])) ) {
## which other variables are this same indicator?
longInds <- names(longIndKey)[ which(longIndKey == longIndKey[i]) ]
if (length(longInds) == 0L) next
## assign labels
equate.load <- "loadings" %in% long.equal &&
!any(long.partial$lhs == longFacKey[f] &
long.partial$op == "=~" &
long.partial$rhs == longIndKey[i])
if (equate.load) {
GLIST.labels[[g]]$lambda[i, f] <- GLIST.labels[[g]]$lambda[longInds[1], longFacs[1]]
## free factor variance(s) after Time 1
if (ID.fac == "uv" && f %in% longFacs[-1]) diag(GLIST.values[[g]]$psi)[f] <- NA
}
}
## loop over any latent indicators within each factor
if ("beta" %in% pmats) for (i in names(which(GLIST.specify[[g]]$beta[ , f])) ) {
## which other factors are this same factor?
longInds <- names(longFacKey)[ which(longFacKey == longFacKey[i]) ]
if (length(longInds) == 0L) next
## assign labels
equate.load <- "regressions" %in% long.equal &&
!any(long.partial$lhs == longFacKey[f] &
long.partial$op == "=~" &
long.partial$rhs == longFacKey[i])
if (equate.load) {
GLIST.labels[[g]]$beta[i, f] <- GLIST.labels[[g]]$beta[longInds[1], longFacs[1]]
}
}
}
}
## group constraints
if (nG > 1L) for (g in 1:nG) {
## loop over factors
for (f in colnames(GLIST.labels[[g]]$lambda)) {
## loop over any manifest indicators within each factor
for (i in names(which(GLIST.specify[[g]]$lambda[ , f])) ) {
## assign labels
equate.load <- "loadings" %in% group.equal &&
!any(group.partial$lhs == f &
group.partial$op == "=~" &
group.partial$rhs == i)
if (!equate.load) {
GLIST.labels[[g]]$lambda[i, f] <- paste0(GLIST.labels[[g]]$lambda[i, f],
".g", g)
} else if (ID.fac == "uv" && g > 1L) {
## free factor variance(s) in group(s) other than the first
diag(GLIST.values[[g]]$psi)[f] <- NA
}
}
## loop over any latent indicators within each factor
if ("beta" %in% pmats) for (i in names(which(GLIST.specify[[g]]$beta[ , f])) ) {
## assign labels
equate.load <- "regressions" %in% group.equal &&
!any(group.partial$lhs == f &
group.partial$op == "=~" &
group.partial$rhs == i)
if (!equate.load) {
GLIST.labels[[g]]$beta[i, f] <- paste0(GLIST.labels[[g]]$beta[i, f],
".g", g)
}
}
}
}
## INTERCEPTS
## longitudinal constraints (one group at a time, but same across groups)
if (meanstructure) for (g in 1:nG) {
## loop over indicators
for (i in lavNames(lavTemplate, "ov.ind", group = g)) {
## when other variables are this same indicator?
longInds <- names(longIndKey)[ which(longIndKey == longIndKey[i]) ]
if (length(longInds) == 0L) next
## assign labels
equate.int <- "intercepts" %in% long.equal &&
!any(long.partial$lhs == longIndKey[i] & long.partial$op == "~1")
if (equate.int) {
GLIST.labels[[g]]$nu[i, 1] <- GLIST.labels[[g]]$nu[longInds[1], 1]
## free factor mean(s) after Time 1 only if an indicator without a
## cross-loading has an equality-constrained intercept
if (ID.fac == "uv") {
## factors this indicator measures
fs <- colnames(GLIST.specify[[g]]$lambda)[ GLIST.specify[[g]]$lambda[i,] ]
only.measures.1 <- length(fs) == 1L
## name(s) of longitudinal factor(s)
LFN <- longFacKey[fs]
not.time.1 <- fs[1] %in% names(which(longFacKey == LFN))[-1]
if (only.measures.1 && not.time.1) GLIST.values[[g]]$alpha[fs, 1] <- NA
}
}
}
}
## group constraints
if (meanstructure && nG > 1L) for (g in 1:nG) {
## loop over indicators
for (i in lavNames(lavTemplate, "ov.ind", group = g)) {
## assign labels
equate.int <- "intercepts" %in% group.equal &&
!any(group.partial$lhs == i & group.partial$op == "~1")
if (!equate.int) {
GLIST.labels[[g]]$nu[i, 1] <- paste0(GLIST.labels[[g]]$nu[i, 1], ".g", g)
} else if (ID.fac == "uv") {
## factors this indicator measures
fs <- colnames(GLIST.specify[[g]]$lambda)[ GLIST.specify[[g]]$lambda[i,] ]
only.measures.1 <- length(fs) == 1L
## free factor mean(s) other than group 1 only if an indicator without a
## cross-loading has an equality-constrained intercept
if (only.measures.1 && g > 1L) GLIST.values[[g]]$alpha[fs, 1] <- NA
}
}
}
## RESIDUAL VARIANCES
## longitudinal constraints (one group at a time, but same across groups)
for (g in 1:nG) {
## loop over indicators
for (i in lavNames(lavTemplate, "ov.ind", group = g)) {
## when other variables are this same indicator?
longInds <- names(longIndKey)[ which(longIndKey == longIndKey[i]) ]
if (length(longInds) == 0L) next
## assign labels
equate.resid <- "residuals" %in% long.equal &&
!any(long.partial$lhs == longIndKey[i] &
long.partial$rhs == longIndKey[i] &
long.partial$op == "~~")
if (equate.resid) {
diag(GLIST.labels[[g]]$theta)[i] <- diag(GLIST.labels[[g]]$theta)[ longInds[1] ]
}
}
}
## group constraints
if (nG > 1L) for (g in 1:nG) {
## loop over indicators
for (i in lavNames(lavTemplate, "ov.ind", group = g)) {
## assign labels
equate.resid <- "residuals" %in% group.equal &&
!any(group.partial$lhs == i & group.partial$rhs == i & group.partial$op == "~~")
if (!equate.resid) {
diag(GLIST.labels[[g]]$theta)[i] <- paste0(diag(GLIST.labels[[g]]$theta)[i],
".g", g)
}
}
}
## RESIDUAL AUTO-COVARIANCES: longitudinal constraints only
if (length(longIndNames) && !is.null(auto)) for (g in 1:nG) {
## loop over longitudinal indicators
for (i in names(longIndNames)) {
nn <- longIndNames[[i]]
nT <- length(nn) # number repeated measures of indicator i
auto.i <- suppressWarnings(as.integer(auto))[1] # nT can vary over i
if (auto == "all" | is.na(auto.i)) auto.i <- nT - 1L # max lag
if (auto.i >= nT | auto.i < 0L ) auto.i <- nT - 1L # max lag
## for each lag...
for (lag in 1:auto.i) {
for (tt in 1:(nT - lag)) {
## sort indices to ensure the lower.tri is always specified, in case
## order of longIndNames does not match order in syntax/theta
nn.idx <- c(which(rownames(GLIST.specify[[g]]$theta) == nn[tt]),
which(rownames(GLIST.specify[[g]]$theta) == nn[tt + lag]))
idx1 <- nn.idx[ which.max(nn.idx) ] # row index
idx2 <- nn.idx[ which.min(nn.idx) ] # column index
## specify and set free
GLIST.specify[[g]]$theta[idx1, idx2] <- TRUE
GLIST.values[[g]]$theta[ idx1, idx2] <- NA
## constrain to equality across repeated measures?
if ("resid.autocov" %in% long.equal && tt > 1L) {
o.idx <- c(which(rownames(GLIST.specify[[g]]$theta) == nn[1]),
which(rownames(GLIST.specify[[g]]$theta) == nn[1 + lag]))
o1 <- o.idx[ which.max(o.idx) ] # row index
o2 <- o.idx[ which.min(o.idx) ] # column index
first.label <- GLIST.labels[[g]]$theta[o1, o2]
GLIST.labels[[g]]$theta[idx1, idx2] <- first.label
}
}
}
}
}
## group constraints on any RESIDUAL COVARIANCES
if (nG > 1) for (g in 1:nG) {
## add group-specific labels to any off-diagonal GLIST.specify?
freeTheta <- which(GLIST.specify[[g]]$theta, arr.ind = TRUE)
offDiag <- freeTheta[ , "row"] > freeTheta[ , "col"]
if (sum(offDiag) == 0) break # nothing to do
## loop over elements that require action
free.offDiag <- freeTheta[offDiag, , drop = FALSE]
for (RR in 1:nrow(free.offDiag)) {
i <- free.offDiag[RR, "row"]
j <- free.offDiag[RR, "col"]
## check group.partial in both directions
partial.ij <- any(group.partial$lhs == i & group.partial$rhs == j & group.partial$op == "~~")
partial.ji <- any(group.partial$lhs == j & group.partial$rhs == i & group.partial$op == "~~")
equate.rescov <- "residual.covariances" %in% group.equal && !any(partial.ij | partial.ji)
## assign group-specific labels?
if (!equate.rescov) {
GLIST.labels[[g]]$theta[i, j] <- paste0(GLIST.labels[[g]]$theta[i, j],
".g", g)
}
}
}
## LATENT AUTO-COVARIANCES: longitudinal constraints only
if (length(longFacNames)) for (g in 1:nG) {
## loop over longitudinal indicators
for (i in names(longFacNames)) {
nn <- longFacNames[[i]]
nT <- length(nn) # number repeated measures of indicator i
## for each lag...
for (lag in 1:(nT - 1) ) {
for (tt in 1:(nT - lag) ) {
## specify and set free (overwrite possible "orthogonal=TRUE")
GLIST.specify[[g]]$psi[ nn[tt + lag], nn[tt] ] <- TRUE
GLIST.values[[g]]$psi[ nn[tt + lag], nn[tt] ] <- NA
## constrain to equality across repeated measures?
if ("lv.autocov" %in% long.equal && tt > 1L) {
first.label <- GLIST.labels[[g]]$psi[ nn[1 + lag], nn[1] ]
GLIST.labels[[g]]$psi[ nn[tt + lag], nn[tt] ] <- first.label
}
}
}
}
}
## group constraints on any LATENT COVARIANCES
if (nG > 1) for (g in 1:nG) {
## add group-specific labels to any off-diagonal GLIST.specify?
freePsi <- which(GLIST.specify[[g]]$psi, arr.ind = TRUE)
offDiag <- freePsi[ , "row"] > freePsi[ , "col"]
if (sum(offDiag) == 0) break # nothing to do
## loop over elements that require action
free.offDiag <- freePsi[offDiag, , drop = FALSE]
for (RR in 1:nrow(free.offDiag)) {
i <- free.offDiag[RR, "row"]
j <- free.offDiag[RR, "col"]
## check group.partial in both directions
partial.ij <- any(group.partial$lhs == i & group.partial$rhs == j & group.partial$op == "~~")
partial.ji <- any(group.partial$lhs == j & group.partial$rhs == i & group.partial$op == "~~")
equate.latcov <- "lv.covariances" %in% group.equal && !any(partial.ij | partial.ji)
## assign group-specific labels?
if (!equate.latcov) {
GLIST.labels[[g]]$psi[i, j] <- paste0(GLIST.labels[[g]]$psi[i, j], ".g", g)
}
}
}
## assemble parameter labels for effects-code identification constraints
fxList <- character(0)
if (ID.fac == "fx") {
listLabels.L <- list()
if (meanstructure) listLabels.I <- list()
for (g in 1:nG) {
## loadings labels
listLabels.L[[g]] <- sapply(colnames(GLIST.labels[[g]]$lambda), function(f) {
GLIST.labels[[g]]$lambda[GLIST.specify[[g]]$lambda[ , f], f]
}, simplify = FALSE)
## intercept labels
if (meanstructure) {
listLabels.I[[g]] <- sapply(colnames(GLIST.labels[[g]]$lambda), function(f) {
GLIST.labels[[g]]$nu[GLIST.specify[[g]]$lambda[ , f], 1]
}, simplify = FALSE)
#TODO: threshold labels
}
}
## names of factors measured in each group
gFacNames <- lapply(listLabels.L, names)
## loop over common-factor names
for (f in unique(unlist(allFacNames))) {
## in which groups is this factor measured?
groups.with.f <- which(sapply(gFacNames, function(gn) f %in% gn))
## get the labels used for indicators in each group
allLabels.L <- lapply(listLabels.L[groups.with.f], "[[", i = f)
if (meanstructure) allLabels.I <- lapply(listLabels.I[groups.with.f],
"[[", i = f)
## one group, one time --> no checks necessary
if (length(groups.with.f) == 1L && !f %in% names(longFacKey)) {
fxList <- c(fxList, make.FX.constraint(allLabels.L[[1]], "loadings"))
if (meanstructure) {
fxList <- c(fxList, make.FX.constraint(allLabels.I[[1]], "intercepts"))
}
}
## one group, multiple times
if (length(groups.with.f) == 1L && f %in% names(longFacKey)) {
## this factor's name on all occasions
LFN <- names(which(longFacKey == longFacKey[f]))
## count constraints on loadings across time
allConstrained <- which(table(unlist(listLabels.L[[1]][LFN])) == length(LFN))
if (length(allConstrained)) {
if (f == LFN[1]) {
fxList <- c(fxList, make.FX.constraint(names(allConstrained), "loadings"))
}
} else {
## no constraints, each factor gets its own
fxList <- c(fxList, make.FX.constraint(allLabels.L[[1]], "loadings"))
}
## count constraints on intercepts across time
if (meanstructure) {
allConstrained <- which(table(unlist(listLabels.I[[1]][LFN])) == length(LFN))
if (length(allConstrained)) {
if (f == LFN[1]) {
fxList <- c(fxList, make.FX.constraint(names(allConstrained), "intercepts"))
}
} else {
## no constraints, each factor gets its own
fxList <- c(fxList, make.FX.constraint(allLabels.I[[1]], "intercepts"))
}
}
}
## multiple groups, one time
if (length(groups.with.f) > 1L && !f %in% names(longFacKey)) {
## count constraints on loadings across groups
allConstrained <- which(table(unlist(allLabels.L)) == length(groups.with.f))
if (length(allConstrained)) {
fxList <- c(fxList, make.FX.constraint(names(allConstrained), "loadings"))
} else {
## no constraints, each group gets its own
for (g in groups.with.f) {
fxList <- c(fxList, make.FX.constraint(allLabels.L[[g]], "loadings"))
}
}
## count constraints on intercepts across groups
if (meanstructure) {
allConstrained <- which(table(unlist(allLabels.I)) == length(groups.with.f))
if (length(allConstrained)) {
fxList <- c(fxList, make.FX.constraint(names(allConstrained), "intercepts"))
} else {
## no constraints, each group gets its own
for (g in groups.with.f) {
fxList <- c(fxList, make.FX.constraint(allLabels.I[[g]], "loadings"))
}
}
}
}
## multiple groups, multiple times: Constrain across any/all dimensions?
if (length(groups.with.f) > 1L && f %in% names(longFacKey)) {
## This factor's name on all occasions
LFN <- names(which(longFacKey == longFacKey[f]))
## Number of dimensions (number of groups times number of occasions).
## Assumes each occasion was measured in each group.
nGT <- length(LFN)*length(groups.with.f)
## count constraints on loadings across both dimensions
all.GL.Labels.L <- lapply(LFN, function(ff) {
lapply(listLabels.L[groups.with.f], "[[", i = ff)
})
all.GL.Constrained.L <- which(table(unlist(all.GL.Labels.L)) == nGT)
if (length(all.GL.Constrained.L)) {
if (f == LFN[1]) {
fxList <- c(fxList, make.FX.constraint(names(all.GL.Constrained.L), "loadings"))
}
} else {
if (f == LFN[1])
warning('No indicators of longitudinal factor "', longFacKey[f],
'" have loadings constrained across all groups and all ',
'occasions, so the automatically generated syntax applies ',
'effects-code identification constraints separately for each',
' occasion and group. If at least 1 loading is constrained ',
'across either groups or occasions, the user should save the',
' syntax to manually reduce the number of identification ',
'constraints by applying them only to loadings constrained ',
'to equality across groups or occasions.') #TODO: update() method
for (g in groups.with.f) {
fxList <- c(fxList, make.FX.constraint(allLabels.L[[g]], "loadings"))
}
}
## count constraints on intercepts across both dimensions
if (meanstructure) {
all.GL.Labels.I <- lapply(LFN, function(ff) {
lapply(listLabels.I[groups.with.f], "[[", i = ff)
})
all.GL.Constrained.I <- which(table(unlist(all.GL.Labels.I)) == nGT)
if (length(all.GL.Constrained.I)) {
if (f == LFN[1]) {
fxList <- c(fxList, make.FX.constraint(names(all.GL.Constrained.I), "intercepts"))
}
} else {
if (f == LFN[1])
warning('No indicators of longitudinal factor "', longFacKey[f],
'" have intercepts constrained across all groups and all ',
'occasions, so the automatically generated syntax applies ',
'effects-code identification constraints separately for each',
' occasion and group. If at least 1 loading is constrained ',
'across either groups or occasions, the user should save the',
' syntax to manually reduce the number of identification ',
'constraints by applying them only to intercepts constrained ',
'to equality across groups or occasions.') #TODO: update() method
for (g in groups.with.f) {
fxList <- c(fxList, make.FX.constraint(allLabels.I[[g]], "intercepts"))
}
}
}
}
} # end loop over common factors
#TODO: Implement effects-coding constraints for thresholds?
# For each latent item-response, mean(thresholds) == 0, which
# identifies intercepts, resolving the problem of effects-coding with
# categorical indicators!
# (i.e., constraining intercepts that == 0 to average 0 is redundant)
}
## -------------
## Return object
## -------------
out <- new("measEq.syntax", package = "lavaan", model.type = "cfa", call = mc,
meanstructure = meanstructure,
numeric = lavNames(lavTemplate, "ov.num"),
ordered = lavNames(lavTemplate, "ov.ord"),
parameterization = parameterization,
specify = GLIST.specify,
values = GLIST.values,
labels = GLIST.labels,
constraints = fxList,
updates = list(values = data.frame(NULL),
labels = data.frame(NULL)),
ngroups = nG)
if (return.fit) {
if (inherits(configural.model, "lavaan")) {
fit <- try(lavaan::update(configural.model,
model = as.character(out), ...),
silent = TRUE)
} else if (inherits(configural.model, "lavaanList")) {
configural.model@call$model <- as.character(out)
configural.model@call$do.fit <- TRUE
fit <- try(eval(configural.model@call, parent.frame()), silent = TRUE)
} else {
lavArgs$model <- as.character(out)
lavArgs$do.fit <- TRUE
fit <- try(do.call("cfa", lavArgs, envir = getNamespace("lavaan")),
silent = TRUE)
}
## check whether the model could be fit
if (inherits(fit, "try-error")) {
warning('The generated model syntax was not successfully fit to the ',
'data, and generated the following error message(s): \n\n',
fit[1:length(fit)], "\n",
"The measEq.syntax object was returned instead.")
} else {
fit@external$measEq.syntax <- out # save the syntax to the lavaan(.mi) object
out <- fit # return the fitted lavaan(.mi) object
}
}
out
}
## ----------------
## Hidden Functions
## ----------------
## function to label a parameter by its location in a parameter matrix
getLabel <- function(GLIST, parMat, RR, CC = 1L) {
dn <- dimnames(GLIST[[parMat]])
out <- paste(parMat, which(dn[[1]] == RR), sep = ".")
if (!parMat %in% c("alpha","nu")) out <- paste(out, which(dn[[2]] == CC),
sep = "_")
out
}
## function to assemble a model constraint for effects-code identification
make.FX.constraint <- function(parLabels, param) {
nCon <- length(parLabels)
conVal <- if (param == "loadings") nCon else 0 #TODO: algorithm for thresholds
out <- paste0(parLabels[1], " == ", conVal)
if (nCon > 1) out <- paste(c(out, parLabels[-1]), collapse = " - ")
out
}
## function to generate a character vector of lines of syntax (for as.character)
write.lavaan.syntax <- function(pmat, specify, value, label) {
nG <- length(specify)
## LOADINGS
if (pmat == "lambda") {
params <- "## LOADINGS:\n"
for (fac in colnames(specify[[1]])) {
for (ind in rownames(specify[[1]])) {
if (!specify[[1]][ind, fac]) next
if (nG > 1L) {
params <- c(params,
paste0(fac, " =~ c(",
paste(sapply(value, "[", i = ind, j = fac),
collapse = ", "),
")*", ind, " + c(",
paste(sapply(label, "[", i = ind, j = fac),
collapse = ", "),
")*", ind))
} else {
params <- c(params,
paste0(fac, " =~ ", value[[1]][ind, fac], "*", ind,
" + ", label[[1]][ind, fac], "*", ind))
}
}
}
return(c(params, ""))
}
## THRESHOLDS
if (pmat == "tau") {
params <- sapply(rownames(specify[[1]]), function(th) {
th.names <- strsplit(th, split = "|", fixed = TRUE)[[1]]
if (nG > 1L) {
param <- paste0(th.names[1], " | c(",
paste(sapply(value, "[", i = th, j = 1),
collapse = ", "),
")*", th.names[2], " + c(",
paste(sapply(label, "[", i = th, j = 1),
collapse = ", "),
")*", th.names[2])
} else {
param <- paste0(th.names[1], " | ", value[[1]][th, 1], "*", th.names[2],
" + ", label[[1]][th, 1], "*", th.names[2])
}
param
})
return(c("## THRESHOLDS:\n", params, ""))
}
## INTERCEPTS or LATENT MEANS
if (pmat %in% c("nu","alpha")) {
## specify all, so no need to check
params <- sapply(rownames(specify[[1]]), function(x) {
if (nG > 1L) {
param <- paste0(x, " ~ c(",
paste(sapply(value, "[", i = x, j = 1),
collapse = ", "),
")*1 + c(",
paste(sapply(label, "[", i = x, j = 1),
collapse = ", "),
")*1")
} else {
param <- paste0(x, " ~ ", value[[1]][x, 1], "*1 + ",
label[[1]][x, 1], "*1")
}
param
})
if (pmat == "nu") params <- c("## INTERCEPTS:\n", params)
if (pmat == "alpha") params <- c("## LATENT MEANS/INTERCEPTS:\n", params)
return(c(params, ""))
}
## SCALING FACTORS (delta)
if (pmat == "delta") {
## specify any?
spec.delta <- which(specify[[1]][ , 1])
if (length(spec.delta) == 0L) return(NULL)
## if so...
params <- sapply(names(spec.delta), function(x) {
if (nG > 1L) {
param <- paste0(x, " ~*~ c(",
paste(sapply(value, "[", i = x, j = 1),
collapse = ", "),
")*", x)
} else {
param <- paste0(x, " ~*~ ", value[[1]][x, 1], "*", x)
}
param
})
return(c("## SCALING FACTORS:\n", params, ""))
}
## LATENT or RESIDUAL (CO)VARIANCES
if (pmat %in% c("theta","psi")) {
## do diagonal first, then check for off-diagonal
spec.vars <- which(diag(specify[[1]]))
if (pmat == "psi") {
params <- "## COMMON-FACTOR VARIANCES:\n"
} else if (pmat == "theta" && length(spec.vars)) {
params <- "## UNIQUE-FACTOR VARIANCES:\n"
} else params <- character(0)
## variances
if (length(spec.vars)) {
params <- c(params,
sapply(names(spec.vars), function(x) {
if (nG > 1L) {
param <- paste0(x, " ~~ c(",
paste(sapply(value, function(j) diag(j)[x]),
collapse = ", "),
")*", x, " + c(",
paste(sapply(label, function(j) diag(j)[x]),
collapse = ", "),
")*", x)
} else {
param <- paste0(x, " ~~ ", diag(value[[1]])[x], "*", x,
" + ", diag(label[[1]])[x], "*", x)
}
param
}))
}
## covariances
if (any(specify[[1]][lower.tri(specify[[1]], diag = FALSE)])) {
if (pmat == "psi") params <- c(params, "\n## COMMON-FACTOR COVARIANCES:\n")
if (pmat == "theta") params <- c(params, "\n## UNIQUE-FACTOR COVARIANCES:\n")
}
nn <- rownames(specify[[1]])
if (length(nn) > 1L) for (CC in 1:(length(nn) - 1)) {
for (RR in (CC + 1):length(nn)) {
if (!specify[[1]][RR, CC]) next
if (nG > 1L) {
params <- c(params,
paste0(nn[CC], " ~~ c(",
paste(sapply(value, "[", i = RR, j = CC),
collapse = ", "),
")*", nn[RR], " + c(",
paste(sapply(label, "[", i = RR, j = CC),
collapse = ", "),
")*", nn[RR]))
} else {
params <- c(params,
paste0(nn[CC], " ~~ ", value[[1]][RR, CC], "*", nn[RR],
" + ", label[[1]][RR, CC], "*", nn[RR]))
}
}
}
return(c(params, ""))
}
## out of options, should never get this far
invisible(NULL)
}
#TODO: adapt routine to write Mplus MODEL statements and OpenMx RAM commands
write.mplus.syntax <- function(object, group = 1, params = NULL) {
out <- character()
pmatList <- intersect(c("lambda","tau","nu", object@parameterization,
"alpha","psi"), names(object@specify[[group]]))
names(pmatList) <- c("loadings","thresholds","intercepts",
ifelse(object@parameterization == "delta",
"scales", "residuals"),"means","lv.variances")
## selected parameter types?
if (!is.null(params)) {
requested <- intersect(names(pmatList), params)
if (!length(requested)) stop('invalid choice: params = c("',
paste(params, collapse = '", "'), '")\n',
'Valid choices include: ',
paste(names(pmatList), collapse = ", "))
pmatList <- pmatList[requested]
}
## concatenate all latent-variable definitions
if ("beta" %in% names(object@specify[[group]])) {
specify.lambda <- rbind(object@specify[[group]]$lambda,
object@specify[[group]]$beta)
values.lambda <- rbind(object@values[[group]]$lambda,
object@values[[group]]$beta)
labels.lambda <- rbind(object@labels[[group]]$lambda,
object@labels[[group]]$beta)
} else {
specify.lambda <- object@specify[[group]]$lambda
values.lambda <- object@values[[group]]$lambda
labels.lambda <- object@labels[[group]]$lambda
}
## check for @ordered, define latent item-response factors,
if (length(object@ordered)) {
out <- c(out, "! Define LATENT ITEM-RESPONSES as factors",
paste0("LIR", 1:length(object@ordered), " BY ", object@ordered,
"@1; LIR", 1:length(object@ordered), "@0;"))
for (i in seq_along(object@ordered)) {
## update rownames in Lambda
#FIXME: update names in concatenated Lambda instead?
idx <- which(rownames(object@specify[[group]]$lambda) == object@ordered[i])
rownames(specify.lambda)[idx] <- paste0("LIR", i)
rownames(values.lambda)[ idx] <- paste0("LIR", i)
rownames(labels.lambda)[ idx] <- paste0("LIR", i)
## update rownames in Nu
idx <- which(rownames(object@specify[[group]]$nu) == object@ordered[i])
rownames(object@specify[[group]]$nu)[idx] <- paste0("LIR", i)
rownames(object@values[[group]]$nu)[idx] <- paste0("LIR", i)
rownames(object@labels[[group]]$nu)[idx] <- paste0("LIR", i)
}
}
out <- c(out, "! FACTOR LOADINGS")
## shorten labels
labels.lambda <- gsub(pattern = "lambda.", replacement = "L",
x = labels.lambda, fixed = TRUE)
labels.lambda <- gsub(pattern = ".g", replacement = "_",
x = labels.lambda, fixed = TRUE)
## loop over factors
for (fac in colnames(specify.lambda)) {
out <- c(out, paste(fac, "BY"))
ind <- names(which(specify.lambda[ , fac]))
lastInd <- rev(ind)[1]
for (i in ind) {
val <- values.lambda[i, fac]
out <- c(out, paste0(" ", i,
if (is.na(val)) "*" else paste0("@", val),
" (", labels.lambda[i, fac],
")", if (i == lastInd) ";" else ""))
}
}
if ("tau" %in% pmatList) {
out <- c(out, "! THRESHOLDS")
## find unique names to shorten labels
allThrNames <- unique(do.call(c, lapply(object@labels, "[[", i = "tau")))
## loop over ordinal indicators, specify set on a single line
for (i in object@ordered) {
iThr <- grep(paste0(i, "\\|"), rownames(object@labels[[group]]$tau))
specify <- object@specify[[group]]$tau[iThr, 1] #NOTE: These are now vectors
values <- object@values[[ group]]$tau[iThr, 1]
labels <- object@labels[[ group]]$tau[iThr, 1]
## identify unique parameter number among thresholds (for short labels)
idx <- integer()
for (lab in labels) idx <- c(idx, which(allThrNames == lab))
out <- c(out,
paste0("[", i, "$", 1:length(iThr),
ifelse(is.na(values), "", paste("@", values)),
"] (T", idx, ");", collapse = " "))
}
}
## INDICATOR-LEVEL PARAMETERS
hasInts <- object@meanstructure
hasResid <- length(object@numeric) || object@parameterization == "theta"
hasScales <- length(object@ordered) && object@parameterization == "delta"
## assemble comment for this section
if (sum(hasInts, hasResid, hasScales) == 3L) {
out <- c(out, "! INDICATOR INTERCEPTS, RESIDUAL VARIANCES, & SCALING FACTORS")
} else {
element.names <- c("INTERCEPTS","RESIDUAL VARIANCES","SCALING FACTORS")
element.tests <- c(hasInts, hasResid, hasScales)
out <- c(out, paste0("! INDICATOR ", paste(element.names[element.tests],
collapse = " and ")))
}
i.nu <- character()
i.var <- character()
## Loop over indicators
for (i in 1:nrow(object@specify[[group]]$lambda)) {
LIR <- rownames(specify.lambda)[i] # LIR names
RR <- rownames(object@specify[[group]]$lambda)[i]
if (object@meanstructure) {
## INTERCEPTS
N.val <- object@values[[group]]$nu[LIR, 1]
## shorten labels
N.lab <- gsub(pattern = "nu.", replacement = "N",
x = object@labels[[group]]$nu[LIR, 1], fixed = TRUE)
N.lab <- gsub(pattern = ".g", replacement = "_", x = N.lab, fixed = TRUE)
i.nu <- c(i.nu, paste0("[", LIR, ifelse(is.na(N.val), yes = "*",
no = paste0("@", N.val)),
"] (", N.lab, "); "))
}
if (RR %in% object@ordered && object@parameterization == "delta") {
## SCALING FACTORS
E.val <- object@values[[group]]$delta[RR, 1]
E.lab <- ""
i.var <- c(i.var, paste0("{", RR, ifelse(is.na(E.val), yes = "*",
no = paste0("@", E.val)), "};"))
} else {
## RESIDUAL VARIANCES
E.val <- object@values[[group]]$theta[RR, RR]
## shorten labels
E.lab <- gsub(pattern = "theta.", replacement = "E",
x = object@labels[[group]]$theta[RR, RR], fixed = TRUE)
E.lab <- gsub(pattern = ".g", replacement = "_", x = E.lab, fixed = TRUE)
i.var <- c(i.var, paste0(RR, ifelse(is.na(E.val), yes = "*",
no = paste0("@", E.val)),
" (", E.lab, ");"))
}
}
out <- c(out, paste(i.nu, i.var))
E.specify <- object@specify[[group]]$theta
LT <- E.specify & lower.tri(E.specify, diag = FALSE)
if (any(LT)) {
out <- c(out, "! RESIDUAL COVARIANCES")
E.values <- object@values[[group]]$theta
## shorten labels
E.labels <- gsub(pattern = "theta.", replacement = "E",
x = object@labels[[group]]$theta, fixed = TRUE)
E.labels <- gsub(pattern = ".g", replacement = "_",
x = E.labels, fixed = TRUE)
for (CC in 1:(ncol(LT) - 1)) {
if (!any(LT[ , CC])) next
if (sum(LT[ , CC]) == 1L) {
RR <- which(LT[ , CC])
out <- c(out,
paste0(colnames(LT)[CC], " WITH ", rownames(LT)[RR],
ifelse(is.na(E.values[RR, CC]), yes = "",
no = paste("@", E.values[RR, CC])),
" (", E.labels[RR, CC], ");"))
next
}
## else, there are multiple covariates with LT[CC]
out <- c(out, paste(colnames(LT)[CC], "WITH"))
ind <- names(which(LT[ , CC]))
lastInd <- rev(ind)[1]
for (RR in ind) {
val <- E.values[RR, CC]
out <- c(out, paste0(" ", RR,
if (is.na(val)) "" else paste0("@", val),
" (", E.labels[RR, CC],
")", if (RR == lastInd) ";" else ""))
}
}
}
## FACTOR-LEVEL PARAMETERS
out <- c(out, paste("! FACTOR",
if (object@meanstructure) "INTERCEPTS &" else NULL,
"(RESIDUAL) VARIANCES"))
i.alpha <- character()
i.psi <- character()
## Loop over factors
for (i in rownames(object@specify[[group]]$psi)) {
if (object@meanstructure) {
## INTERCEPTS
A.val <- object@values[[group]]$alpha[i, 1]
## shorten labels
A.lab <- gsub(pattern = "alpha.", replacement = "A",
x = object@labels[[group]]$alpha[i, 1], fixed = TRUE)
A.lab <- gsub(pattern = ".g", replacement = "_", x = A.lab, fixed = TRUE)
i.alpha <- c(i.alpha, paste0("[", i, ifelse(is.na(A.val), yes = "*",
no = paste0("@", A.val)),
"] (", A.lab, "); "))
}
## RESIDUAL VARIANCES
P.val <- object@values[[group]]$psi[i, i]
## shorten labels
P.lab <- gsub(pattern = "psi.", replacement = "P",
x = object@labels[[group]]$psi[i, i], fixed = TRUE)
P.lab <- gsub(pattern = ".g", replacement = "_", x = P.lab, fixed = TRUE)
i.psi <- c(i.psi, paste0(i, ifelse(is.na(P.val), yes = "",
no = paste0("@", P.val)),
" (", P.lab, ");"))
}
out <- c(out, paste(i.alpha, i.psi))
P.specify <- object@specify[[group]]$psi
LT <- P.specify & lower.tri(P.specify, diag = FALSE)
if (any(LT)) {
out <- c(out, "! FACTOR COVARIANCES")
P.values <- object@values[[group]]$psi
## shorten labels
P.labels <- gsub(pattern = "psi.", replacement = "P",
x = object@labels[[group]]$psi, fixed = TRUE)
P.labels <- gsub(pattern = ".g", replacement = "_",
x = P.labels, fixed = TRUE)
for (CC in 1:(ncol(LT) - 1)) {
if (!any(LT[ , CC])) next
if (sum(LT[ , CC]) == 1L) {
RR <- which(LT[ , CC])
out <- c(out,
paste0(colnames(LT)[CC], " WITH ", rownames(LT)[RR],
ifelse(is.na(P.values[RR, CC]), yes = "",
no = paste("@", P.values[RR, CC])),
" (", P.labels[RR, CC], ");"))
next
}
## else, there are multiple covariates with LT[CC]
out <- c(out, paste(colnames(LT)[CC], "WITH"))
ind <- names(which(LT[ , CC]))
lastInd <- rev(ind)[1]
for (RR in ind) {
val <- P.values[RR, CC]
out <- c(out, paste0(" ", RR,
if (is.na(val)) "" else paste0("@", val),
" (", P.labels[RR, CC],
")", if (RR == lastInd) ";" else ""))
}
}
}
## MODEL CONSTRAINTs
if (length(object@constraints) && group == object@ngroups) {
con <- object@constraints
con <- gsub("lambda.", "L", con)
con <- gsub("theta.", "E", con)
con <- gsub("psi.", "P", con)
if (length(object@ordered)) for (th in object@labels[[group]]$tau[ , 1]) {
con <- gsub(th, paste0("T", which(allThrNames == th)), con)
}
if (object@meanstructure) {
con <- gsub("nu.", "N", con)
con <- gsub("alpha.", "A", con)
}
con <- gsub(".g", "_", con)
con <- gsub("==", "=", con)
out <- c(out, "\nMODEL CONSTRAINT:", paste0(con, ";"))
}
#TODO: gsub = for ==, add ";", anything else? object@constraints, "")
paste(out, collapse = "\n")
}
# write.OpenMx.syntax <- function(pmat, specify, value, label) {}
## function to allow users to customize syntax with update(),
## so they don't necessarily have to copy/paste a script to adapt it.
override <- function(object, slotName = "values", group = 1L,
matName, row, col, replacement) {
stopifnot(inherits(object, "measEq.syntax"))
MM <- methods::slot(object, slotName)[[group]] # only "values" or "labels"
## check indices
if (is.character(row)) {
if (! row %in% rownames(MM[[matName]]))
stop("'", row, "' not found in rownames(",
deparse(substitute(object)), "@", slotName, "[[", group, "]]$",
matName, ")")
} else if (is.numeric(row)) {
if (! as.integer(row) %in% 1:nrow(MM[[matName]]))
stop(as.integer(row), "' is outside the number of nrow(",
deparse(substitute(object)), "@", slotName, "[[", group, "]]$",
matName, ")")
} else stop('row argument must be numeric/character indices')
## repeat for col
if (matName %in% c("nu","alpha","delta","tau")) col <- 1L else {
if (is.character(col)) {
if (! col %in% colnames(MM[[matName]]))
stop("'", col, "' not found in colnames(",
deparse(substitute(object)), "@", slotName, "[[", group, "]]$",
matName, ")")
} else if (is.numeric(col)) {
if (! as.integer(col) %in% 1:ncol(MM[[matName]]))
stop(as.integer(col), "' is outside the number of ncol(",
deparse(substitute(object)), "@", slotName, "[[", group, "]]$",
matName, ")")
} else stop('col argument must be numeric/character indices')
}
newM <- MM[[matName]]
newM[row, col] <- replacement
if (matName %in% c("theta","psi")) newM[col, row] <- replacement
newM
}
## function to assemble values/labels to update
char2update <- function(object, model, return.object = TRUE) {
stopifnot(inherits(object, "measEq.syntax"))
stopifnot(inherits(model, "character"))
PT <- lavaan::lavParseModelString(model, as.data.frame. = TRUE)
indNames <- lapply(object@values, function(x) rownames(x$lambda)) # per block
facNames <- lapply(object@values, function(x) colnames(x$lambda))
values <- PT$fixed
labels <- PT$label
## check for multigroup specification of values/labels
if (any(grepl(pattern = ";", x = values))) {
values <- strsplit(values, split = ";")
nValues <- sapply(values, length)
} else nValues <- rep(1L, length(values))
if (any(grepl(pattern = ";", x = labels))) {
labels <- strsplit(labels, split = ";")
nLabels <- sapply(labels, length)
} else nLabels <- rep(1L, length(labels))
nBlocks <- length(facNames)
values.DF <- data.frame(NULL)
labels.DF <- data.frame(NULL)
for (RR in 1:nrow(PT)) {
## check whether numbers match
if (nValues[RR] > 1L && nValues[RR] != nBlocks) {
stop('Number of fixed/free values (', nValues[RR],
') specified for parameter "', PT$lhs[RR], PT$op[RR], PT$rhs[RR],
'" does not match the number of groups (', nBlocks, ')')
}
if (nLabels[RR] > 1L && nLabels[RR] != nBlocks) {
stop('Number of labels (', nLabels[RR],
') specified for parameter "', PT$lhs[RR], PT$op[RR], PT$rhs[RR],
'" does not match the number of groups (', nBlocks, ')')
}
## loop over blocks (currently only groups)
for (BB in 1:nBlocks) {
## make template for values and labels, depending on parameter matrix
## INTERCEPTS
if (PT$op[RR] == "~1" && PT$lhs[RR] %in% indNames[[BB]]) {
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "nu",
row = PT$lhs[RR], col = "intercept")
## LATENT MEANS
} else if (PT$op[RR] == "~1" && PT$lhs[RR] %in% facNames[[BB]]) {
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "alpha",
row = PT$lhs[RR], col = "intercept")
## LOADINGS
} else if (PT$op[RR] == "=~" && PT$rhs[RR] %in% indNames[[BB]]) {
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "lambda",
row = PT$rhs[RR], col = PT$lhs[RR])
## SECOND-ORDER LOADINGS
} else if (PT$op[RR] == "=~" && PT$rhs[RR] %in% facNames[[BB]]) {
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "beta",
row = PT$rhs[RR], col = PT$lhs[RR])
## LATENT (CO)VARIANCES
} else if (PT$op[RR] == "~~" && PT$rhs[RR] %in% facNames[[BB]]) {
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "psi",
# symmetry handled in override
row = PT$rhs[RR], col = PT$lhs[RR])
## RESIDUAL (CO)VARIANCES
} else if (PT$op[RR] == "~~" && PT$rhs[RR] %in% indNames[[BB]]) {
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "theta",
# symmetry handled in override
row = PT$rhs[RR], col = PT$lhs[RR])
## THRESHOLDS
} else if (PT$op[RR] == "|") {
if (!length(object@ordered)) {
warning('Thresholds ignored when no indicators are declared as ordered')
}
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "tau",
row = paste0(PT$lhs[RR], "|", PT$rhs[RR]),
col = "threshold")
## SCALING FACTORS (delta parameters for latent item-responses)
} else if (PT$op[RR] == "~*~") {
if (!length(object@ordered)) {
warning('Thresholds ignored when no indicators are declared as ordered')
}
if (object@parameterization == "theta") {
warning('Latent-response scales (specified with the "~*~" operator) ',
'ignored when parameterization = "theta"')
}
if (PT$lhs[RR] != PT$rhs[RR]) {
warning('Latent-response scales (specified with the "~*~" operator) ',
'ignored when left- and right-hand side do not match (',
PT$lhs[RR], '~*~', PT$rhs[RR], ')')
next
}
DF <- data.frame(stringsAsFactors = FALSE, group = BB, matName = "delta",
row = PT$lhs[RR], col = "scales")
}
#FIXME? anything that does not match is simply ignored (no error messages)
## change labels?
if (BB > 1L && nLabels[RR] == 1L) {
if (labels[[RR]] != "") {
labels.DF <- rbind(labels.DF, cbind(DF, stringsAsFactors = FALSE,
replacement = labels[[RR]]))
}
} else if (labels[[RR]][BB] != "") {
labels.DF <- rbind(labels.DF, cbind(DF, stringsAsFactors = FALSE,
replacement = labels[[RR]][BB]))
}
## change fixed/free values?
if (BB > 1L && nValues[RR] == 1L) {
if (values[[RR]] != "") {
values.DF <- rbind(values.DF, cbind(DF, stringsAsFactors = FALSE,
replacement = values[[RR]]))
}
} else if (values[[RR]][BB] != "") {
values.DF <- rbind(values.DF, cbind(DF, stringsAsFactors = FALSE,
replacement = values[[RR]][BB]))
}
} # end loop over blocks
} # end loop over parameters
## make sure values are stored as numeric, not character
if (nrow(values.DF)) {
suppressWarnings(values.DF$replacement <- as.numeric(values.DF$replacement))
}
if (return.object) {
object@updates$values <- rbind(object@updates$values, values.DF)
object@updates$labels <- rbind(object@updates$labels, labels.DF)
return(object)
}
## else return the list of data.frames with updates to make
list(values = values.DF, labels = labels.DF)
}
semTools/R/loadingFromAlpha.R 0000644 0001762 0000144 00000001325 14006342740 015613 0 ustar ligges users ### Sunthud Pornprasertmanit
### Last updated: 3 April 2017
#' Find standardized factor loading from coefficient alpha
#'
#' Find standardized factor loading from coefficient alpha assuming that all
#' items have equal loadings.
#'
#' @param alpha A desired coefficient alpha value.
#' @param ni A desired number of items.
#' @return \item{result}{The standardized factor loadings that make desired
#' coefficient alpha with specified number of items.}
#' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
#' @examples
#'
#' loadingFromAlpha(0.8, 4)
#'
#' @export
loadingFromAlpha <- function(alpha, ni) {
denominator <- ni - ((ni - 1) * alpha)
result <- sqrt(alpha/denominator)
return(result)
}
semTools/R/quark.R 0000644 0001762 0000144 00000046260 14764270116 013546 0 ustar ligges users ### Steven R. Chesnut, Danny Squire, Terrence D. Jorgensen
### Last updated: 12 March 2025
##' Quark
##'
##' The `quark` function provides researchers with the ability to calculate
##' and include component scores calculated by taking into account the variance
##' in the original dataset and all of the interaction and polynomial effects of
##' the data in the dataset.
##'
##' The `quark` function calculates these component scores by first filling
##' in the data via means of multiple imputation methods and then expanding the
##' dataset by aggregating the non-overlapping interaction effects between
##' variables by calculating the mean of the interactions and polynomial
##' effects. The multiple imputation methods include one of iterative sampling
##' and group mean substitution and multiple imputation using a polytomous
##' regression algorithm (mice). During the expansion process, the dataset is
##' expanded to three times its normal size (in width). The first third of the
##' dataset contains all of the original data post imputation, the second third
##' contains the means of the polynomial effects (squares and cubes), and the
##' final third contains the means of the non-overlapping interaction effects. A
##' full principal componenent analysis is conducted and the individual
##' components are retained. The subsequent [combinequark()] function
##' provides researchers the control in determining how many components to
##' extract and retain. The function returns the dataset as submitted (with
##' missing values) and the component scores as requested for a more accurate
##' multiple imputation in subsequent steps.
##'
##' @param data The data frame is a required component for `quark`. In
##' order for `quark` to process a data frame, it must not contain any
##' factors or text-based variables. All variables must be in numeric format.
##' Identifiers and dates can be left in the data; however, they will need to be
##' identified under the `id` argument.
##' @param id Identifiers and dates within the dataset will need to be
##' acknowledged as `quark` cannot process these. By acknowledging the
##' identifiers and dates as a vector of column numbers or variable names,
##' `quark` will remove them from the data temporarily to complete its main
##' processes. Among many potential issues of not acknowledging identifiers and
##' dates are issues involved with imputation, product and polynomial effects,
##' and principal component analysis.
##' @param order Order is an optional argument provided by quark that can be
##' used when the imputation procedures in mice fail. Under some circumstances,
##' mice cannot calculate missing values due to issues with extreme missingness.
##' Should an error present itself stating a failure due to not having any
##' columns selected, set the argument `order = 2` in order to reorder the
##' imputation method procedure. Otherwise, use the default `order = 1`.
##' @param silent If `FALSE`, the details of the `quark` process are
##' printed.
##' @param \dots additional arguments to pass to [mice::mice()].
##'
##' @return The output value from using the quark function is a list. It will
##' return a list with 7 components.
##' \item{ID Columns}{Is a vector of the identifier columns entered when
##' running quark.}
##' \item{ID Variables}{Is a subset of the dataset that contains the identifiers
##' as acknowledged when running quark.}
##' \item{Used Data}{Is a matrix / dataframe of the data provided by user as
##' the basis for quark to process.}
##' \item{Imputed Data}{Is a matrix / dataframe of the data after the multiple
##' method imputation process.}
##' \item{Big Matrix}{Is the expanded product and polynomial matrix.}
##' \item{Principal Components}{Is the entire dataframe of principal components
##' for the dataset. This dataset will have the same number of rows of the big
##' matrix, but will have 1 less column (as is the case with principal
##' component analyses).}
##' \item{Percent Variance Explained}{Is a vector of the percent variance
##' explained with each column of principal components.}
##'
##' @author Steven R. Chesnut (University of Southern Mississippi;
##' \email{Steven.Chesnut@@usm.edu})
##'
##' Danny Squire (Texas Tech University)
##'
##' Terrence D. Jorgensen (University of Amsterdam)
##'
##' The PCA code is copied and modified from the `FactoMineR` package.
##'
##' @seealso [combinequark()]
##'
##' @references Howard, W. J., Rhemtulla, M., & Little, T. D. (2015). Using
##' Principal Components as Auxiliary Variables in Missing Data Estimation.
##' *Multivariate Behavioral Research, 50*(3), 285--299.
##' \doi{10.1080/00273171.2014.999267}
##'
##' @examples
##'
##' set.seed(123321)
##'
##' dat <- HolzingerSwineford1939[,7:15]
##' misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
##' dat[misspat] <- NA
##' dat <- cbind(HolzingerSwineford1939[,1:3], dat)
##' \donttest{
##' quark.list <- quark(data = dat, id = c(1, 2))
##'
##' final.data <- combinequark(quark = quark.list, percent = 80)
##'
##' ## Example to rerun quark after imputation failure:
##' quark.list <- quark(data = dat, id = c(1, 2), order = 2)
##' }
##'
##' @export
quark <- function(data, id, order = 1, silent = FALSE, ...){
if(!is.data.frame(data) && !is.matrix(data)) {
stop("Inappropriate data file provided.")
}
if(!silent) cat("Data Check Passed.\n")
if(is.character(id)) id <- match(id, colnames(data))
for(i in 1:length(id)){
if(id[i] > ncol(data) || id[i] < 1){
stop("At least one of the IDs is out of bounds.")
}
}
if(!silent) cat("ID Check Passed.\n")
if(!(order %in% 1:2)) stop("Currently, the order argument can take either 1 or 2.")
final.collect <- list()
final.collect$ID_Columns <- id
final.collect$ID_Vars <- data[,id]
final.collect$Used_Data <- data[,-c(id)]
##FIXME 26-June-2018: Terrence had to add a logical check for whether mice
## is installed, otherwise won't pass CRAN checks.
checkMice <- requireNamespace("mice", quietly = TRUE)
if (!checkMice) {
message('The quark function requires the "mice" package to be installed.')
return(invisible(NULL))
}
final.collect$Imputed_Data <- imputequark(data = final.collect$Used_Data,
order = order, silent = silent, ...)
final.collect$Big_Data_Matrix <- bigquark(data = final.collect$Imputed_Data,
silent = silent)
cmp <- compquark(data = final.collect$Big_Data_Matrix, silent = silent)
final.collect$Prin_Components <- cmp[[1]]
final.collect$Prin_Components_Prcnt <- cmp[[2]]
return(final.collect)
}
##' Combine the results from the quark function
##'
##' This function builds upon the [quark()] function to provide a
##' final dataset comprised of the original dataset provided to
##' [quark()] and enough principal components to be able to account
##' for a certain level of variance in the data.
##'
##'
##' @param quark Provide the [quark()] object that was returned. It
##' should be a list of objects. Make sure to include it in its entirety.
##' @param percent Provide a percentage of variance that you would like to have
##' explained. That many components (columns) will be extracted and kept with
##' the output dataset. Enter this variable as a number WITHOUT a percentage
##' sign.
##'
##' @return The output of this function is the original dataset used in quark
##' combined with enough principal component scores to be able to account for
##' the amount of variance that was requested.
##'
##' @author Steven R. Chesnut (University of Southern Mississippi
##' \email{Steven.Chesnut@@usm.edu})
##'
##' @seealso [quark()]
##'
##' @examples
##'
##' set.seed(123321)
##' dat <- HolzingerSwineford1939[,7:15]
##' misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
##' dat[misspat] <- NA
##' dat <- cbind(HolzingerSwineford1939[,1:3], dat)
##'
##' quark.list <- quark(data = dat, id = c(1, 2))
##'
##' final.data <- combinequark(quark = quark.list, percent = 80)
##'
##' @export
combinequark <- function(quark, percent) {
data <- cbind(quark$ID_Vars, quark$Used_Data)
pct <- quark$Prin_Components_Prcnt
comp <- quark$Prin_Components
for (i in 1:length(pct)) {
if(pct[i] >= percent) {
num <- i
break
}
}
return(cbind(data, comp[,1:num]))
}
## ----------------
## Hidden Functions
## ----------------
imputequark <- function(data, order, silent = FALSE, ...){
if (order == 1){
data <- aImp(data = data, silent = silent, ...)
data <- gImp(data = data, silent = silent)
} else if(order == 2) {
data <- gImp(data = data, silent = silent)
if (length(which(is.na(data > 0)))) {
data <- aImp(data = data, silent = silent, ...)
}
}
return(data)
}
#' @importFrom stats cor
gImp <- function(data, silent = FALSE) {
imputed_data <- data
num_adds <- vector(length = ncol(data)) # number of columns combined into one for averaging.
data.cor <- cor(data, use = "pairwise", method = "pearson")
class(data.cor) <- c("lavaan.matrix.symmetric","matrix")
if (!silent) print(data.cor)
#populate multiple matrices that can then be utilized to determine if one column should enhance another based upon
#the correlations they share...
if (!silent) cat("Imputing Column... \n")
for (a in 1:ncol(data)) {
temp_mat <- matrix(ncol = ncol(data), nrow = nrow(data))
list <- unique(sort(data[,a]))
if (length(list) > 1 && length(list) <= 10) {
for (b in 1:nrow(data)) {
for (c in 1:length(list)) {
if (data[b, a] == list[c] && !is.na(data[b,a])) {
temp_mat[b,] <- round(colMeans(subset(data, data[ , a] == list[c]), na.rm = TRUE), digits = 1)
} else if (is.na(data[b,a])) {
for (p in 1:ncol(data)) temp_mat[b,p] <- data[b,p]
}
}
}
# Here I need to determine if the other columns are correlated enough with
# the reference to ensure accuracy of predictions
temp_cor <- data.cor[,a]
# if (countNA(temp_cor)==0) {
for (i in 1:length(temp_cor)) {
if (i != a) {
if (abs(temp_cor[i]) >= .5 && !is.na(temp_cor[i])) { # Using a moderate effect size, column a, will inform other columns.
for (x in 1:nrow(imputed_data)){
imputed_data[x,i] <- sum(imputed_data[x,i], temp_mat[x,a], na.rm = TRUE)
}
num_adds[i] <- num_adds[i] + 1
}
}
}
#}
if (!silent) cat("\t", colnames(data)[a])
}
}
if (!silent) cat("\n")
imputed_data <- cleanMat(m1 = data, m2 = imputed_data, impact = num_adds)
imputed_data <- fixData(imputed_data)
return(imputed_data)
}
cleanMat <- function(m1, m2, impact) {
#Impact is the number of influences on each column...
#We need to clean up and then try to determine what final values should be...
#Go through each of the cells...
new_mat <- m2
for (a in 1:ncol(m1)) {
for (b in 1:nrow(m1)) {
if (!is.na(m1[b,a])) {
new_mat[b,a] <- m1[b,a]
} else if (is.na(m1[b,a])) {
new_mat[b,a] <- new_mat[b,a] / impact[a]
}
}
}
return(new_mat)
}
fixData <- function(data) {
for (a in 1:ncol(data)) {
for (b in 1:nrow(data)) {
data[b,a] <- round(data[b,a], digits = 1)
}
}
return(data)
}
aImp <- function(data, silent = FALSE, ...) {
miceArgs <- list(...)
miceArgs$data <- data
miceArgs$maxit <- 1
miceArgs$m <- 1
miceArgs$printFlag <- !silent
if (!("package:mice" %in% search())) attachNamespace("mice")
if (!silent) cat("Starting Algorithm Imputation...\n")
impData <- mice::complete(do.call("mice", miceArgs))
if (!silent) cat("Ending Algorithm Imputation...\n")
return(impData)
}
bigquark <- function(data, silent = FALSE) {
if (!silent) cat("Calculating Polynomial Effects.\n")
poly <- ((data^2)+(data^3))/2
if (!silent) cat("Creating Matrix for Interaction Effects.\n")
prod <- matrix(ncol=(ncol(data)-1),nrow=nrow(data))
if (!silent) cat("Calculating Interaction Effects...0%..")
for (i in 1:nrow(data)) {
if (!silent) printpct(percent = i/nrow(data))
for (j in 1:(ncol(data)-1)) {
prod[i,j] <- mean(as.numeric(data[i,j])*as.numeric(data[i,(j+1):ncol(data)]))
}
}
cat("\n")
data <- cbind(data,poly,prod)
return(data)
}
compquark <- function(data, silent = FALSE) {
if (!silent) cat("Calculating values for the PCA\n")
pcam <- pcaquark(data, ncp = ncol(data))
cmp <- list()
cmp$pca <- pcam$ind$coord
cmp$var <- pcam$eig[,3]
colnames(cmp$pca) <- c(paste0("AuxVar",1:ncol(cmp$pca)))
return(cmp)
}
printpct <- function(percent) {
if (round(percent, digits = 10) == 0) cat("0%..")
if (round(percent, digits = 10) == .10) cat("10%..")
if (round(percent, digits = 10) == .20) cat("20%..")
if (round(percent, digits = 10) == .30) cat("30%..")
if (round(percent, digits = 10) == .40) cat("40%..")
if (round(percent, digits = 10) == .50) cat("50%..")
if (round(percent, digits = 10) == .60) cat("60%..")
if (round(percent, digits = 10) == .70) cat("70%..")
if (round(percent, digits = 10) == .80) cat("80%..")
if (round(percent, digits = 10) == .90) cat("90%..")
if (round(percent, digits = 10) == 1) cat("100%..")
}
## This function is modified from the FactoMinoR package.
pcaquark <- function(X, ncp = 5) {
moy.p <- function(V, poids) res <- sum(V * poids)/sum(poids)
ec <- function(V, poids) res <- sqrt(sum(V^2 * poids)/sum(poids))
X <- as.data.frame(X)
if (any(is.na(X))) {
warnings("Missing values are imputed by the mean of the variable: you should use the imputePCA function of the missMDA package")
X[is.na(X)] <- matrix(apply(X,2,mean,na.rm=TRUE),ncol=ncol(X),nrow=nrow(X),byrow=TRUE)[is.na(X)]
}
if (is.null(rownames(X))) rownames(X) <- 1:nrow(X)
if (is.null(colnames(X))) colnames(X) <- paste("V", 1:ncol(X), sep = "")
colnames(X)[colnames(X) == ""] <- paste("V", 1:sum(colnames(X)==""),sep="")
rownames(X)[is.null(rownames(X))] <- paste("row",1:sum(rownames(X)==""),sep="")
Xtot <- X
if (any(!sapply(X, is.numeric))) {
auxi <- NULL
for (j in 1:ncol(X)) if (!is.numeric(X[, j])) auxi <- c(auxi, colnames(X)[j])
stop(paste("\nThe following variables are not quantitative: ", auxi))
}
ncp <- min(ncp, nrow(X) - 1, ncol(X))
row.w <- rep(1, nrow(X))
row.w.init <- row.w
row.w <- row.w/sum(row.w)
col.w <- rep(1, ncol(X))
centre <- apply(X, 2, moy.p, row.w)
X <- as.matrix(sweep(as.matrix(X), 2, centre, FUN = "-"))
ecart.type <- apply(X, 2, ec, row.w)
ecart.type[ecart.type <= 1e-16] <- 1
X <- sweep(as.matrix(X), 2, ecart.type, FUN = "/")
dist2.ind <- apply(sweep(X,2,sqrt(col.w),FUN="*")^2,1,sum)
dist2.var <- apply(sweep(X,1,sqrt(row.w),FUN="*")^2,2,sum)
tmp <- svd.triplet.quark(X, row.w = row.w, col.w = col.w, ncp = ncp)
eig <- tmp$vs^2
vp <- as.data.frame(matrix(NA, length(eig), 3))
rownames(vp) <- paste("comp", 1:length(eig))
colnames(vp) <- c("eigenvalue","percentage of variance",
"cumulative percentage of variance")
vp[, "eigenvalue"] <- eig
vp[, "percentage of variance"] <- (eig/sum(eig)) * 100
vp[, "cumulative percentage of variance"] <- cumsum(vp[, "percentage of variance"])
V <- tmp$V
U <- tmp$U
eig <- eig[1:ncp]
coord.ind <- sweep(as.matrix(U), 2, sqrt(eig), FUN = "*")
coord.var <- sweep(as.matrix(V), 2, sqrt(eig), FUN = "*")
contrib.var <- sweep(as.matrix(coord.var^2), 2, eig, "/")
contrib.var <- sweep(as.matrix(contrib.var), 1, col.w, "*")
dist2 <- dist2.var
cor.var <- sweep(as.matrix(coord.var), 1, sqrt(dist2), FUN = "/")
cos2.var <- cor.var^2
rownames(coord.var) <- rownames(cos2.var) <- rownames(cor.var) <- rownames(contrib.var) <- colnames(X)
colnames(coord.var) <- colnames(cos2.var) <- colnames(cor.var) <- colnames(contrib.var) <- paste("Dim", c(1:ncol(V)), sep = ".")
res.var <- list(coord = coord.var[, 1:ncp], cor = cor.var[, 1:ncp],
cos2 = cos2.var[, 1:ncp], contrib = contrib.var[, 1:ncp] * 100)
dist2 <- dist2.ind
cos2.ind <- sweep(as.matrix(coord.ind^2), 1, dist2, FUN = "/")
contrib.ind <- sweep(as.matrix(coord.ind^2), 1, row.w/sum(row.w), FUN = "*")
contrib.ind <- sweep(as.matrix(contrib.ind), 2, eig, FUN = "/")
rownames(coord.ind) <- rownames(cos2.ind) <- rownames(contrib.ind) <- names(dist2) <- rownames(X)
colnames(coord.ind) <- colnames(cos2.ind) <- colnames(contrib.ind) <- paste("Dim", c(1:ncol(U)), sep = ".")
res.ind <- list(coord = coord.ind[, 1:ncp], cos2 = cos2.ind[, 1:ncp],
contrib = contrib.ind[, 1:ncp] * 100, dist = sqrt(dist2))
res <- list(eig = vp, var = res.var, ind = res.ind, svd = tmp)
class(res) <- c("PCA", "list")
return(res)
}
## This function is modified from the FactoMinoR package.
svd.triplet.quark <- function (X, row.w = NULL, col.w = NULL, ncp = Inf) {
tryCatch.W.E <- function(expr) { ## function proposed by Maechlmr
W <- NULL
w.handler <- function(w) { # warning handler
W <<- w
invokeRestart("muffleWarning")
}
list(value = withCallingHandlers(tryCatch(expr, error = function(e) e),
warning = w.handler), warning = W)
}
ncp <- min(ncp,nrow(X)-1,ncol(X))
row.w <- row.w / sum(row.w)
X <- sweep(X, 2, sqrt(col.w), FUN = "*")
X <- sweep(X, 1, sqrt(row.w), FUN = "*")
if (ncol(X) < nrow(X)) {
svd.usuelle <- tryCatch.W.E(svd(X, nu = ncp, nv = ncp))$val
if (names(svd.usuelle)[[1]] == "message") {
svd.usuelle <- tryCatch.W.E(svd(t(X), nu = ncp, nv = ncp))$val
if (names(svd.usuelle)[[1]] == "d") {
aux <- svd.usuelle$u
svd.usuelle$u <- svd.usuelle$v
svd.usuelle$v <- aux
} else {
bb <- eigen(t(X) %*% X, symmetric = TRUE)
svd.usuelle <- vector(mode = "list", length = 3)
svd.usuelle$d[svd.usuelle$d < 0] <- 0
svd.usuelle$d <- sqrt(svd.usuelle$d)
svd.usuelle$v <- bb$vec[,1:ncp]
svd.usuelle$u <- sweep(X %*% svd.usuelle$v, 2, svd.usuelle$d[1:ncp], FUN = "/")
}
}
U <- svd.usuelle$u
V <- svd.usuelle$v
if (ncp > 1) {
mult <- sign(apply(V, 2, sum))
mult[mult == 0] <- 1
U <- sweep(U, 2, mult, FUN = "*")
V <- sweep(V, 2, mult, FUN = "*")
}
U <- sweep(as.matrix(U), 1, sqrt(row.w), FUN = "/")
V <- sweep(as.matrix(V), 1, sqrt(col.w), FUN = "/")
} else {
svd.usuelle <- tryCatch.W.E(svd(t(X), nu = ncp, nv = ncp))$val
if (names(svd.usuelle)[[1]] == "message") {
svd.usuelle <- tryCatch.W.E(svd(X, nu = ncp, nv = ncp))$val
if (names(svd.usuelle)[[1]] == "d") {
aux <- svd.usuelle$u
svd.usuelle$u <- svd.usuelle$v
svd.usuelle$v <- aux
} else {
bb <- eigen(X%*%t(X),symmetric=TRUE)
svd.usuelle <- vector(mode = "list", length = 3)
svd.usuelle$d[svd.usuelle$d < 0] <- 0
svd.usuelle$d <- sqrt(svd.usuelle$d)
svd.usuelle$v <- bb$vec[,1:ncp]
svd.usuelle$u <- sweep(t(X) %*% svd.usuelle$v, 2, svd.usuelle$d[1:ncp], FUN = "/")
}
}
U <- svd.usuelle$v
V <- svd.usuelle$u
mult <- sign(apply(V, 2, sum))
mult[mult == 0] <- 1
V <- sweep(V, 2, mult, FUN = "*")
U <- sweep(U, 2, mult, FUN = "*")
U <- sweep(U, 1, sqrt(row.w), FUN = "/")
V <- sweep(V, 1, sqrt(col.w), FUN = "/")
}
vs <- svd.usuelle$d[1:min(ncol(X), nrow(X) - 1)]
num <- which(vs[1:ncp] < 1e-15)
if (length(num)==1) {
U[,num] <- U[,num] * vs[num]
V[,num] <- V[,num] * vs[num]
}
if (length(num) > 1) {
U[,num] <- sweep(U[,num], 2, vs[num], FUN = "*")
V[,num] <- sweep(V[,num], 2, vs[num], FUN = "*")
}
res <- list(vs = vs, U = U, V = V)
return(res)
}
semTools/R/TSML.R 0000644 0001762 0000144 00000067025 14753125033 013177 0 ustar ligges users ## Terrence D. Jorgensen
### Last updated: 12 February 2025
### semTools function to implement 2-stage ML
## -----------------
## Class and Methods
## -----------------
##' Class for the Results of 2-Stage Maximum Likelihood (TSML) Estimation for
##' Missing Data
##'
##' This class contains the results of 2-Stage Maximum Likelihood (TSML)
##' estimation for missing data. The `summary`, `anova`, `vcov`
##' methods return corrected *SE*s and test statistics. Other methods are
##' simply wrappers around the corresponding [lavaan::lavaan-class]
##' methods.
##'
##'
##' @name twostage-class
##' @aliases twostage-class show,twostage-method summary,twostage-method
##' anova,twostage-method vcov,twostage-method coef,twostage-method
##' fitted.values,twostage-method fitted,twostage-method
##' residuals,twostage-method resid,twostage-method nobs,twostage-method
##' @docType class
##'
##' @slot saturated A fitted [lavaan::lavaan-class] object containing the
##' saturated model results
##' @slot target A fitted [lavaan::lavaan-class] object containing the
##' target/hypothesized model results
##' @slot baseline A fitted [lavaan::lavaan-class] object containing the
##' baseline/null model results
##' @slot auxNames A character string (potentially of `length == 0`) of any
##' auxiliary variable names, if used
##'
##' @param object An object of class `twostage`.
##' @param ... arguments passed to [lavaan::parameterEstimates()].
##' @param h1 An object of class `twostage` in which `object` is
##' nested, so that their difference in fit can be tested using
##' `anova` (see **Value** section for details).
##' @param baseline `logical` indicating whether to return results for the
##' baseline model, rather than the default target (hypothesized) model.
##' @param type The meaning of this argument varies depending on which method it
##' it used for. Find detailed descriptions in the **Value** section
##' under `coef`, `nobs`, and `residuals`.
##' @param model `character` naming the slot for which to return the
##' model-implied sample moments (see `fitted.values` description.)
##' @param labels `logical` indicating whether the model-implied sample
##' moments should have (row/column) labels.
##'
##' @return
##' \item{show}{`signature(object = "twostage"):` The `show` function
##' is used to display the results of the `anova` method, as well as the
##' header of the (uncorrected) target model results.}
##' \item{summary}{`signature(object = "twostage", ...):` The summary
##' function prints the same information from the `show` method, but also
##' provides (and returns) the output of
##' `parameterEstimates(object@target, ...)` with corrected
##' *SE*s, test statistics, and confidence intervals. Additional
##' arguments can be passed to [lavaan::parameterEstimates()],
##' including `fmi = TRUE` to provide an estimate of the fraction of
##' missing information.}
##' \item{anova}{`signature(object = "twostage", h1 = NULL, baseline = FALSE):`
##' The `anova` function returns the residual-based \eqn{\chi^2} test
##' statistic result, as well as the scaled \eqn{\chi^2} test statistic result,
##' for the model in the `target` slot, or for the model in the
##' `baseline` slot if `baseline = TRUE`. The user can also provide
##' a single additional `twostage` object to the `h1` argument, in
##' which case `anova` returns residual-based and scaled
##' (\eqn{\Delta})\eqn{\chi^2} test results, under the assumption that the
##' models are nested. The models will be automatically sorted according their
##' degrees of freedom.}
##' \item{nobs}{`signature(object = "twostage",
##' type = c("ntotal", "ngroups", "n.per.group", "norig", "patterns", "coverage")):`
##' The `nobs` function will return the total sample sized used in the
##' analysis by default. Also available are the number of groups or the sample
##' size per group, the original sample size (if any rows were deleted because
##' all variables were missing), the missing data patterns, and the matrix of
##' coverage (diagonal is the proportion of sample observed on each variable,
##' and off-diagonal is the proportion observed for both of each pair of
##' variables).}
##' \item{coef}{`signature(object = "twostage", type = c("free", "user")):`
##' This is simply a wrapper around the corresponding
##' [lavaan::lavaan-class] method, providing point estimates from the
##' `target` slot.}
##' \item{vcov}{`signature(object = "twostage", baseline = FALSE):` Returns
##' the asymptotic covariance matrix of the estimated parameters (corrected for
##' additional uncertainty due to missing data) for the model in the
##' `target` slot, or for the model in the `baseline` slot if
##' `baseline = TRUE`.}
##' \item{fitted.values, fitted}{`signature(object = "twostage",
##' model = c("target", "saturated", "baseline")):` This is simply a wrapper
##' around the corresponding [lavaan::lavaan-class] method, providing
##' model-implied sample moments from the slot specified in the `model`
##' argument.}
##' \item{residuals, resid}{`signature(object = "twostage", type = c("raw",
##' "cor", "normalized", "standardized")):` This is simply a wrapper around the
##' corresponding [lavaan::lavaan-class] method, providing residuals of
##' the specified `type` from the `target` slot.}
##'
##' @section Objects from the Class: Objects can be created via the
##' [twostage()] function.
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' @seealso [twostage()]
##'
##' @examples
##'
##' # See the example from the twostage function
##'
setClass("twostage",
slots = c(saturated = "lavaan", target = "lavaan", baseline = "lavaan",
auxNames = "character"))
##' @rdname twostage-class
##' @aliases show,twostage-method
##' @export
setMethod("show", "twostage", function(object) {
## show chi-squared test results
cat("Chi-squared test(s) results, ADJUSTED for missing data:\n\n")
getMethod("anova", "twostage")(object)
cat("\n\nChi-squared test results, UNADJUSTED for missing data:\n\n")
show(object@target)
invisible(object)
})
##' @rdname twostage-class
##' @aliases summary,twostage-method
##' @importFrom stats pnorm qnorm
##' @importFrom lavaan parTable
##' @export
setMethod("summary", "twostage", function(object, ...) {
## show chi-squared test results AND estimates
getMethod("show", "twostage")(object)
cat("\n\nParameter Estimates, with SEs (and tests/CIs) ADJUSTED for missing data:\n\n")
dots <- list(...)
if (!"fmi" %in% names(dots)) dots$fmi <- FALSE
if (!"ci" %in% names(dots)) dots$ci <- TRUE
if (!"level" %in% names(dots)) dots$level <- .95
PT <- parTable(object@target)
PT <- PT[PT$group > 0, ]
PE <- do.call(lavaan::parameterEstimates, c(dots, object = object@target))
SEs <- sqrt(diag(getMethod("vcov", "twostage")(object)))
PE$se[PT$free > 0] <- SEs[PT$free]
PE$z[PT$free > 0] <- PE$est[PT$free > 0] / PE$se[PT$free > 0]
PE$pvalue[PT$free > 0] <- pnorm(abs(PE$z[PT$free > 0]), lower.tail = FALSE)*2
if (dots$ci) {
crit <- qnorm(1 - (1 - dots$level) / 2)
PE$ci.lower[PT$free > 0] <- PE$est[PT$free > 0] - crit * PE$se[PT$free > 0]
PE$ci.upper[PT$free > 0] <- PE$est[PT$free > 0] + crit * PE$se[PT$free > 0]
}
if (dots$fmi) {
compVar <- diag(lavaan::vcov(object@target))[PT$free] # uncorrected Stage-2 SE estimates
# compFit <- lavaan::update(object@target, sample.nobs = lavaan::nobs(object@target),
# sample.cov = lavInspect(object@target, "cov.ov"),
# sample.mean = lavInspect(object@target, "mean.ov"))
# compVar <- diag(lavaan::vcov(compFit))[PT$free]
missVar <- SEs^2
PE$fmi[PT$free > 0] <- 1 - compVar / missVar
}
PE
})
## (hidden) function utilized by vcov and anova methods
##' @importFrom lavaan lavInspect parTable
twostageMatrices <- function(object, baseline) {
SLOT <- if (baseline) "baseline" else "target"
## extract parameter table to isolate estimates by group
PTsat <- parTable(object@saturated)
nG <- max(PTsat$group)
isMG <- nG > 1L
## model derivatives
delta <- lavInspect(slot(object, SLOT), "delta")
if (!isMG) delta <- list(delta)
for (g in 1:nG) {
covparams <- grep(pattern = "~~", x = rownames(delta[[g]]))
meanparams <- grep(pattern = "~1", x = rownames(delta[[g]]))
delta[[g]] <- delta[[g]][c(covparams, meanparams), ]
}
## stack groups' deltas into 1 matrix
delta <- do.call(rbind, delta)
## extract estimated moments from saturated model, and number of moments
satSigma <- lavInspect(object@saturated, "cov.ov")
satMu <- lavInspect(object@saturated, "mean.ov")
if (!isMG) {
satSigma <- list(satSigma)
satMu <- list(satMu)
}
if (length(object@auxNames)) {
an <- object@auxNames
tn <- lavaan::lavNames(slot(object, SLOT))
for (g in 1:nG) {
satSigma[[g]] <- satSigma[[g]][tn, tn]
satMu[[g]] <- satMu[[g]][tn]
}
}
p <- length(satMu[[1]])
pStar <- p*(p + 1) / 2
## extract model-implied moments
muHat <- lavInspect(slot(object, SLOT), "mean.ov")
sigmaHat <- lavInspect(slot(object, SLOT), "cov.ov")
if (!isMG) {
sigmaHat <- list(sigmaHat)
muHat <- list(muHat)
}
shinv <- list()
for (g in 1:nG) {
muHat[[g]] <- muHat[[g]][names(satMu[[g]])]
sigmaHat[[g]] <- sigmaHat[[g]][rownames(satSigma[[g]]), colnames(satSigma[[g]])]
shinv[[g]] <- solve(sigmaHat[[g]])
}
## assemble complete-data information matrix
H <- list()
for (g in 1:nG) H[[g]] <- matrix(0, (pStar + p), (pStar + p))
# if (lavInspect(slot(object, SLOT), "options")$estimator == "expected") {
if (TRUE) {
## complete-data expected information (Savalei, 2010, Eq. 6; Savalei & Bentler, 2009, Eq. 7)
for (g in 1:nG) {
H[[g]][1:pStar, 1:pStar] <- .5*lavaan::lav_matrix_duplication_pre_post(shinv[[g]] %x% shinv[[g]])
H[[g]][(pStar + 1):(pStar + p), (pStar + 1):(pStar + p)] <- shinv[[g]]
}
} else {
## information == "observed" (Savalei, 2010, Eq. 5)
dMu <- list()
for (g in 1:nG) {
dMu[[g]] <- satMu[[g]] - muHat[[g]]
H[[g]][1:pStar, 1:pStar] <- lavaan::lav_matrix_duplication_pre_post(shinv[[g]] %x% (shinv[[g]] %*% (satSigma[[g]] + dMu[[g]] %*% t(dMu[[g]])) %*% shinv[[g]] - .5*shinv[[g]]))
H[[g]][(pStar + 1):(pStar + p), 1:pStar] <- lavaan::lav_matrix_duplication_post(shinv[[g]] %x% (t(dMu[[g]]) %*% shinv[[g]]))
H[[g]][1:pStar, (pStar + 1):(pStar + p)] <- t(H[[g]][(pStar + 1):(pStar + p), 1:pStar])
H[[g]][(pStar + 1):(pStar + p), (pStar + 1):(pStar + p)] <- shinv[[g]]
}
}
## combine into 1 block-diagonal matrix
H <- do.call(lavaan::lav_matrix_bdiag, H)
## asymptotic information and covariance matrices of target model
satACOV <- lavaan::vcov(object@saturated) #FIXME: inverting this is NOT equivalent to
satInfo <- solve(satACOV * lavaan::nobs(object@saturated)) # weighing by first-order info
## all(round(acov*N, 8) == round(solve(info), 8))
## all(round(acov, 8) == round(solve(info)/N, 8))
if (length(object@auxNames)) {
dimTar <- !(PTsat$lhs %in% an | PTsat$rhs %in% an)
dimAux <- PTsat$lhs %in% an | PTsat$rhs %in% an
infoTar <- satInfo[dimTar, dimTar]
infoAux <- satInfo[dimAux, dimAux]
infoAT <- satInfo[dimAux, dimTar]
satInfo <- infoTar - t(infoAT) %*% solve(infoAux) %*% infoAT
satACOV <- solve(satInfo) / lavaan::nobs(object@saturated)
}
list(delta = delta, H = H, satACOV = satACOV, satInfo = satInfo)
}
## (hidden?) function utilized by anova method to test 1 or 2 models
##' @importFrom stats pchisq
##' @importFrom lavaan lavInspect
twostageLRT <- function(object, baseline, print = FALSE) {
SLOT <- if (baseline) "baseline" else "target"
## calculate model derivatives and complete-data information matrix
MATS <- twostageMatrices(object, baseline)
## residual-based statistic (Savalei & Bentler, 2009, eq. 8)
N <- lavaan::nobs(slot(object, SLOT))
nG <- lavInspect(slot(object, SLOT), "ngroups")
res <- lavaan::residuals(slot(object, SLOT))
if (nG == 1L) res <- list(res)
etilde <- do.call(c, lapply(res, function(x) c(lavaan::lav_matrix_vech(x$cov), x$mean)))
ID <- MATS$satInfo %*% MATS$delta
T.res <- N*t(etilde) %*% (MATS$satInfo - ID %*% MASS::ginv(t(MATS$delta) %*% ID) %*% t(ID)) %*% etilde # FIXME: why not solve()?
DF <- lavInspect(slot(object, SLOT), "fit")[["df"]]
pval.res <- pchisq(T.res, df = DF, lower.tail = FALSE)
residual <- c(chisq = T.res, df = DF, pvalue = pval.res)
class(residual) <- c("lavaan.vector","numeric")
## scaled test statistic (Savalei & Bentler, 2009, eq. 9)
meat <- MATS$H %*% MATS$delta
bread <- MASS::ginv(t(MATS$delta) %*% meat) # FIXME: why not solve()?
cc <- DF / sum(diag(N*MATS$satACOV %*% (MATS$H - meat %*% bread %*% t(meat))))
chisq <- lavaan::lavTest(slot(object, SLOT))$stat
T.scaled <- cc * chisq
pval.scaled <- pchisq(T.scaled, df = DF, lower.tail = FALSE)
scaled <- c(chisq.naive = chisq, scaling.factor = 1 / cc,
chisq.scaled = T.scaled, df = DF, pvalue = pval.scaled)
class(scaled) <- c("lavaan.vector","numeric")
## return both statistics
if (print) {
if (lavInspect(object@saturated, "options")$se == "standard") {
cat("Browne (1984) residual-based test statistic:\n\n")
print(residual)
}
cat("\n\nSatorra-Bentler (2001) scaled test statistic:\n\n")
print(scaled)
}
invisible(list(residual = residual, scaled = scaled))
}
##' @rdname twostage-class
##' @aliases anova,twostage-method
##' @importFrom lavaan lavInspect
##' @export
setMethod("anova", "twostage", function(object, h1 = NULL, baseline = FALSE) {
if (is.null(h1)) {
return(twostageLRT(object, baseline, print = TRUE))
}
H0 <- twostageLRT(object, baseline = FALSE)
H1 <- twostageLRT(h1, baseline = FALSE)
DF0 <- H0$residual[["df"]]
DF1 <- H1$residual[["df"]]
if (DF0 == DF1) stop("Models have the same degrees of freedom.")
if (min(c(DF0, DF1)) == 0L) return(twostageLRT(object, baseline, print = TRUE))
parent <- which.min(c(DF0, DF1))
if (parent == 1L) {
parent <- H0
H0 <- H1
H1 <- parent
DF0 <- H0$residual[["df"]]
DF1 <- H1$residual[["df"]]
}
DF <- DF0 - DF1
## residual-based statistic
T.res <- H0$residual[["chisq"]] - H1$residual[["chisq"]]
residual <- c(chisq = T.res, df = DF,
pvalue = pchisq(T.res, df = DF, lower.tail = FALSE))
class(residual) <- c("lavaan.vector","numeric")
## scaled test statistic
chisq.naive <- H0$scaled[["chisq.naive"]] - H1$scaled[["chisq.naive"]]
cc <- (DF0*H0$scaled[["scaling.factor"]] - DF1*H1$scaled[["scaling.factor"]]) / DF
if (cc < 0) {
warning("Scaling factor is negative, so it was set to missing.")
cc <- NA
}
scaled <- c(chisq.naive = chisq.naive, scaling.factor = cc,
chisq.scaled = chisq.naive / cc, DF = DF,
pvalue = pchisq(chisq.naive / cc, df = DF, lower.tail = FALSE))
class(scaled) <- c("lavaan.vector","numeric")
## return both statistics
if (lavInspect(object@saturated, "options")$se == "standard") {
cat("Difference test for Browne (1984) residual-based statistics:\n\n")
print(residual)
}
cat("\n\nSatorra-Bentler (2001) scaled difference test:\n\n")
print(scaled)
invisible(list(residual = residual, scaled = scaled))
})
##' @rdname twostage-class
##' @aliases nobs,twostage-method
##' @importFrom lavaan lavInspect
##' @export
setMethod("nobs", "twostage",
function(object, type = c("ntotal","ngroups","n.per.group","norig",
"patterns","coverage")) {
type <- type[1]
if (type == "n.per.group") type <- "nobs"
lavInspect(object@saturated, what = type)
})
##' @rdname twostage-class
##' @aliases coef,twostage-method
##' @export
setMethod("coef", "twostage", function(object, type = c("free","user")) {
type <- type[1]
lavaan::coef(object@target, type = type)
})
##' @rdname twostage-class
##' @aliases vcov,twostage-method
##' @export
setMethod("vcov", "twostage", function(object, baseline = FALSE) {
SLOT <- if (baseline) "baseline" else "target"
## calculate model derivatives and complete-data information matrix
MATS <- twostageMatrices(object, baseline)
meat <- MATS$H %*% MATS$delta
bread <- MASS::ginv(t(MATS$delta) %*% meat) # FIXME: why not solve()?
out <- bread %*% t(meat) %*% MATS$satACOV %*% meat %*% bread
class(out) <- c("lavaan.matrix.symmetric","matrix")
if (baseline) {
rownames(out) <- names(getMethod("coef", "lavaan")(object@baseline))
} else {
rownames(out) <- names(getMethod("coef", "twostage")(object))
}
colnames(out) <- rownames(out)
out
})
##' @rdname twostage-class
##' @aliases fitted.values,twostage-method
##' @export
setMethod("fitted.values", "twostage",
function(object, model = c("target","saturated","baseline"),
type = "moments", labels = TRUE) {
model <- model[1]
lavaan::fitted.values(slot(object, model), type = type, labels = labels)
})
##' @rdname twostage-class
##' @aliases fitted,twostage-method
##' @export
setMethod("fitted", "twostage",
function(object, model = c("target","saturated","baseline"),
type = "moments", labels = TRUE) {
model <- model[1]
lavaan::fitted.values(slot(object, model), type = type, labels = labels)
})
##' @rdname twostage-class
##' @aliases residuals,twostage-method
##' @export
setMethod("residuals", "twostage",
function(object, type = c("raw","cor","normalized","standardized")) {
type <- type[1]
lavaan::residuals(object@target, type = type)
})
##' @rdname twostage-class
##' @aliases resid,twostage-method
##' @export
setMethod("resid", "twostage",
function(object, type = c("raw","cor","normalized","standardized")) {
type <- type[1]
lavaan::residuals(object@target, type = type)
})
# fitS <- cfa(model = model, data = dat1, missing = "fiml", se = "standard")
# fitR <- cfa(model = model, data = dat1, missing = "fiml", se = "robust.huber.white")
# all(lavInspect(fitS, "information") == lavInspect(fitR, "information"))
# all(vcov(fitS) == vcov(fitR))
## ---------------------
## Constructor Functions
## ---------------------
##' Fit a lavaan model using 2-Stage Maximum Likelihood (TSML) estimation for
##' missing data.
##'
##' This function automates 2-Stage Maximum Likelihood (TSML) estimation,
##' optionally with auxiliary variables. Step 1 involves fitting a saturated
##' model to the partially observed data set (to variables in the hypothesized
##' model as well as auxiliary variables related to missingness). Step 2
##' involves fitting the hypothesized model to the model-implied means and
##' covariance matrix (also called the "EM" means and covariance matrix) as if
##' they were complete data. Step 3 involves correcting the Step-2 standard
##' errors (*SE*s) and chi-squared statistic to account for additional
##' uncertainty due to missing data (using information from Step 1; see
##' References section for sources with formulas).
##'
##' All variables (including auxiliary variables) are treated as endogenous
##' varaibles in the Step-1 saturated model (`fixed.x = FALSE`), so data
##' are assumed continuous, although not necessarily multivariate normal
##' (dummy-coded auxiliary variables may be included in Step 1, but categorical
##' endogenous variables in the Step-2 hypothesized model are not allowed). To
##' avoid assuming multivariate normality, request `se =
##' "robust.huber.white"`. CAUTION: In addition to setting `fixed.x =
##' FALSE` and `conditional.x = FALSE` in [lavaan::lavaan()],
##' this function will automatically set `meanstructure = TRUE`,
##' `estimator = "ML"`, `missing = "fiml"`, and `test =
##' "standard"`. [lavaan::lavaan()]'s `se` option can only be
##' set to `"standard"` to assume multivariate normality or to
##' `"robust.huber.white"` to relax that assumption.
##'
##'
##' @aliases twostage cfa.2stage sem.2stage growth.2stage lavaan.2stage
##' @importFrom lavaan lavInspect
##'
##' @param \dots Arguments passed to the [lavaan::lavaan()] function
##' specified in the `fun` argument. See also
##' [lavaan::lavOptions()]. At a minimum, the user must supply the
##' first two named arguments to [lavaan::lavaan()] (i.e.,
##' `model` and `data`).
##' @param aux An optional character vector naming auxiliary variable(s) in
##' `data`
##' @param fun The character string naming the lavaan function used to fit the
##' Step-2 hypothesized model (`"cfa"`, `"sem"`, `"growth"`, or
##' `"lavaan"`).
##' @param baseline.model An optional character string, specifying the lavaan
##' [lavaan::model.syntax()] for a user-specified baseline model.
##' Interested users can use the fitted baseline model to calculate incremental
##' fit indices (e.g., CFI and TLI) using the corrected chi-squared values (see
##' the `anova` method in [twostage-class]). If `NULL`,
##' the default "independence model" (i.e., freely estimated means and
##' variances, but all covariances constrained to zero) will be specified
##' internally.
##'
##' @return The [twostage-class] object contains 3 fitted lavaan
##' models (saturated, target/hypothesized, and baseline) as well as the names
##' of auxiliary variables. None of the individual models provide the correct
##' model results (except the point estimates in the target model are unbiased).
##' Use the methods in [twostage-class] to extract corrected
##' *SE*s and test statistics.
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso [twostage-class]
##'
##' @references
##' Savalei, V., & Bentler, P. M. (2009). A two-stage approach to missing data:
##' Theory and application to auxiliary variables.
##' *Structural Equation Modeling, 16*(3), 477--497.
##' \doi{10.1080/10705510903008238}
##'
##' Savalei, V., & Falk, C. F. (2014). Robust two-stage approach outperforms
##' robust full information maximum likelihood with incomplete nonnormal data.
##' *Structural Equation Modeling, 21*(2), 280--302.
##' \doi{10.1080/10705511.2014.882692}
##'
##' @examples
##'
##' ## impose missing data for example
##' HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
##' "ageyr","agemo","school")]
##' set.seed(12345)
##' HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
##' age <- HSMiss$ageyr + HSMiss$agemo/12
##' HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
##'
##' ## specify CFA model from lavaan's ?cfa help page
##' HS.model <- '
##' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' '
##'
##' ## use ageyr and agemo as auxiliary variables
##' out <- cfa.2stage(model = HS.model, data = HSMiss, aux = c("ageyr","agemo"))
##'
##' ## two versions of a corrected chi-squared test results are shown
##' out
##' ## see Savalei & Bentler (2009) and Savalei & Falk (2014) for details
##'
##' ## the summary additionally provides the parameter estimates with corrected
##' ## standard errors, test statistics, and confidence intervals, along with
##' ## any other options that can be passed to parameterEstimates()
##' summary(out, standardized = TRUE)
##'
##'
##'
##' ## use parameter labels to fit a more constrained model
##' modc <- '
##' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + a*x8 + a*x9
##' '
##' outc <- cfa.2stage(model = modc, data = HSMiss, aux = c("ageyr","agemo"))
##'
##'
##' ## use the anova() method to test this constraint
##' anova(out, outc)
##' ## like for a single model, two corrected statistics are provided
##'
##' @export
twostage <- function(..., aux, fun, baseline.model = NULL) {
if (all(aux == "")) aux <- NULL
dots <- list(...)
if (is.null(dots$model)) stop("lavaan model syntax argument must be named 'model'.")
lavaanifyArgs <- dots[intersect(names(dots), names(formals(lavaan::lavaanify)))]
args4lavaan <- c(names(formals(lavaan::lavaan)), names(lavaan::lavOptions()))
funArgs <- dots[intersect(names(dots), args4lavaan)]
## set some non-optional lavaan arguments
funArgs$meanstructure <- TRUE
funArgs$conditional.x <- FALSE
funArgs$fixed.x <- FALSE
funArgs$missing <- "fiml"
funArgs$estimator <- "ML"
funArgs$test <- "standard"
if (is.null(funArgs$information)) funArgs$information <- "observed"
if (funArgs$information[1] == "expected") {
message("If data are MAR, only the observed information matrix is consistent.")
if (!is.null(aux)) {
funArgs$information <- "observed"
message(c("Using auxiliary variables implies assuming that data are MAR. ",
"The lavaan argument 'information' was set to 'observed'."))
}
if (!is.null(funArgs$se)) if(funArgs$se != "standard") {
funArgs$information <- "observed"
message(c("The lavaan argument 'information' was set to 'observed' ",
"because adjusting SEs for non-normality requires it."))
}
}
funArgs$NACOV <- NULL
funArgs$do.fit <- NULL
## STAGE 1:
## fit saturated model
if (!is.null(funArgs$group))
lavaanifyArgs$ngroups <- length(table(funArgs$data[ , funArgs$group]))
targetNames <- lavaan::lavNames(do.call(lavaan::lavaanify, lavaanifyArgs))
varnames <- c(targetNames, aux)
covstruc <- outer(varnames, varnames, function(x, y) paste(x, "~~", y))
satArgs <- funArgs
satArgs$constraints <- NULL
satArgs$group.equal <- ""
satArgs$model <- c(covstruc[lower.tri(covstruc, diag = TRUE)],
paste(varnames, "~ 1"))
satFit <- do.call(lavaan::lavaan, satArgs)
## check for robust estimators
opts <- lavInspect(satFit, "options")
if (!opts$se %in% c("standard","robust.huber.white"))
stop(c("Two-Stage estimation requires either se = 'standard' for ",
"multivariate normal data or se = 'robust.huber.white' to ",
"correct for non-normality."))
## STAGE 2:
## fit target model to saturated estimates
targetArgs <- funArgs
targetArgs$data <- NULL
targetArgs$sample.cov <- lavInspect(satFit, "cov.ov")
targetArgs$sample.mean <- lavInspect(satFit, "mean.ov")
targetArgs$sample.nobs <- lavInspect(satFit, "nobs")
targetArgs$se <- "standard"
targetArgs$sample.cov.rescale <- FALSE
targetFit <- do.call(fun, targetArgs)
## STAGE 0:
## fit baseline model (for incremental fit indices)
baseArgs <- targetArgs
if (is.null(baseline.model)) {
basecov <- outer(targetNames, targetNames, function(x, y) paste0(x, " ~~ 0*", y))
diag(basecov) <- paste(targetNames, "~~", targetNames)
baseArgs$model <- c(basecov[lower.tri(basecov, diag = TRUE)],
paste(targetNames, "~ 1"))
} else baseArgs$model <- baseline.model
baseArgs$se <- "standard"
baseFit <- do.call(lavaan::lavaan, baseArgs)
if (length(setdiff(lavaan::lavNames(baseFit), targetNames)))
warning("The baseline model includes variables excluded from the target model.")
if (length(setdiff(targetNames, lavaan::lavNames(baseFit))))
warning("The target model includes variables excluded from the baseline model.")
## return both models
out <- new("twostage", saturated = satFit, target = targetFit,
baseline = baseFit, auxNames = as.character(aux))
out
}
##' @rdname twostage
##' @export
lavaan.2stage <- function(..., aux = NULL, baseline.model = NULL) {
twostage(..., aux = aux, fun = "lavaan", baseline.model = baseline.model)
}
##' @rdname twostage
##' @export
cfa.2stage <- function(..., aux = NULL, baseline.model = NULL) {
twostage(..., aux = aux, fun = "cfa", baseline.model = baseline.model)
}
##' @rdname twostage
##' @export
sem.2stage <- function(..., aux = NULL, baseline.model = NULL) {
twostage(..., aux = aux, fun = "sem", baseline.model = baseline.model)
}
##' @rdname twostage
##' @export
growth.2stage <- function(..., aux = NULL, baseline.model = NULL) {
twostage(..., aux = aux, fun = "growth", baseline.model = baseline.model)
}
semTools/R/ordMoments.R 0000644 0001762 0000144 00000034655 14751370431 014554 0 ustar ligges users ### Terrence D. Jorgensen & Andrew R. Johnson
### Last updated: 20 January 2025
### function to derive ordinal-scale moments implied by LRV-scale moments
##' Calculate Population Moments for Ordinal Data Treated as Numeric
##'
##' This function calculates ordinal-scale moments implied by LRV-scale moments
##'
##' Binary and ordinal data are frequently accommodated in SEM by incorporating
##' a threshold model that links each observed categorical response variable to
##' a corresponding latent response variable that is typically assumed to be
##' normally distributed (Kamata & Bauer, 2008; Wirth & Edwards, 2007).
##' This function can be useful for real-data analysis or for designing
##' Monte Carlo simulations, as described by Jorgensen and Johnson (2022).
##'
##' @importFrom stats dnorm setNames
##' @importFrom lavaan lavInspect
##' @importFrom pbivnorm pbivnorm
##'
##' @param Sigma Population covariance [matrix()], with variable names
##' saved in the [dimnames()] attribute.
##' @param Mu Optional `numeric` vector of population means. If missing,
##' all means will be set to zero.
##' @param thresholds Either a single `numeric` vector of population
##' thresholds used to discretize each normally distributed variable, or a
##' named `list` of each discretized variable's vector of thresholds.
##' The discretized variables may be a subset of all variables in `Sigma`
##' if the remaining variables are intended to be observed rather than latent
##' normally distributed variables.
##' @param cWts Optional (default when missing is to use 0 for the lowest
##' category, followed by successive integers for each higher category).
##' Either a single `numeric` vector of category weights (if they are
##' identical across all variables) or a named `list` of each
##' discretized variable's vector of category weights.
##'
##' @return A `list` including the LRV-scale population moments (means,
##' covariance matrix, correlation matrix, and thresholds), the category
##' weights, a `data.frame` of implied univariate moments (means,
##' *SD*s, skewness, and excess kurtosis (i.e., in excess of 3, which is
##' the kurtosis of the normal distribution) for discretized data treated as
##' `numeric`, and the implied covariance and correlation matrix of
##' discretized data treated as `numeric`.
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' Andrew R. Johnson (Curtin University; \email{andrew.johnson@@curtin.edu.au})
##'
##' @references
##'
##' Jorgensen, T. D., & Johnson, A. R. (2022). How to derive expected values of
##' structural equation model parameters when treating discrete data as
##' continuous. *Structural Equation Modeling, 29*(4), 639--650.
##' \doi{10.1080/10705511.2021.1988609}
##'
##' Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor
##' analytic and item response theory models.
##' *Structural Equation Modeling, 15*(1), 136--153.
##' \doi{10.1080/10705510701758406}
##'
##' Wirth, R. J., & Edwards, M. C. (2007). Item factor analysis: Current
##' approaches and future directions. *Psychological Methods, 12*(1),
##' 58--79. \doi{10.1037/1082-989X.12.1.58}
##'
##' @examples
##'
##' ## SCENARIO 1: DIRECTLY SPECIFY POPULATION PARAMETERS
##'
##' ## specify population model in LISREL matrices
##' Nu <- rep(0, 4)
##' Alpha <- c(1, -0.5)
##' Lambda <- matrix(c(1, 1, 0, 0, 0, 0, 1, 1), nrow = 4, ncol = 2,
##' dimnames = list(paste0("y", 1:4), paste0("eta", 1:2)))
##' Psi <- diag(c(1, .75))
##' Theta <- diag(4)
##' Beta <- matrix(c(0, .5, 0, 0), nrow = 2, ncol = 2)
##'
##' ## calculate model-implied population means and covariance matrix
##' ## of latent response variables (LRVs)
##' IB <- solve(diag(2) - Beta) # to save time and space
##' Mu_LRV <- Nu + Lambda %*% IB %*% Alpha
##' Sigma_LRV <- Lambda %*% IB %*% Psi %*% t(IB) %*% t(Lambda) + Theta
##'
##' ## Specify (unstandardized) thresholds to discretize normally distributed data
##' ## generated from Mu_LRV and Sigma_LRV, based on marginal probabilities
##' PiList <- list(y1 = c(.25, .5, .25),
##' y2 = c(.17, .33, .33, .17),
##' y3 = c(.1, .2, .4, .2, .1),
##' ## make final variable highly asymmetric
##' y4 = c(.33, .25, .17, .12, .08, .05))
##' sapply(PiList, sum) # all sum to 100%
##' CumProbs <- sapply(PiList, cumsum)
##' ## unstandardized thresholds
##' TauList <- mapply(qnorm, p = lapply(CumProbs, function(x) x[-length(x)]),
##' m = Mu_LRV, sd = sqrt(diag(Sigma_LRV)))
##' for (i in 1:4) names(TauList[[i]]) <- paste0(names(TauList)[i], "|t",
##' 1:length(TauList[[i]]))
##'
##' ## assign numeric weights to each category (optional, see default)
##' NumCodes <- list(y1 = c(-0.5, 0, 0.5), y2 = 0:3, y3 = 1:5, y4 = 1:6)
##'
##'
##' ## Calculate Population Moments for Numerically Coded Ordinal Variables
##' lrv2ord(Sigma = Sigma_LRV, Mu = Mu_LRV, thresholds = TauList, cWts = NumCodes)
##'
##'
##' ## SCENARIO 2: USE ESTIMATED PARAMETERS AS POPULATION
##'
##' data(datCat) # already stored as c("ordered","factor")
##' fit <- cfa(' f =~ 1*u1 + 1*u2 + 1*u3 + 1*u4 ', data = datCat)
##' lrv2ord(Sigma = fit, thresholds = fit) # use same fit for both
##' ## or use estimated thresholds with specified parameters, but note that
##' ## lrv2ord() will only extract standardized thresholds
##' dimnames(Sigma_LRV) <- list(paste0("u", 1:4), paste0("u", 1:4))
##' lrv2ord(Sigma = cov2cor(Sigma_LRV), thresholds = fit)
##'
##' @export
lrv2ord <- function(Sigma, Mu, thresholds, cWts) {
if (inherits(Sigma, "lavaan")) {
if (lavInspect(Sigma, "ngroups") > 1L || lavInspect(Sigma, "nlevels") > 1L) {
stop('Sigma= only accepts single-group/level lavaan models')
}
fitSigma <- Sigma
Sigma <- lavInspect(fitSigma, "cov.ov")
} else stopifnot(is.matrix(Sigma))
vn <- rownames(Sigma) # variable names
SDs <- sqrt(diag(Sigma))
if (missing(Mu)) {
Mu <- rep(0, nrow(Sigma))
} else if (inherits(Mu, "lavaan")) {
if (lavInspect(Mu, "ngroups") > 1L || lavInspect(Mu, "nlevels") > 1L) {
stop('Mu= only accepts single-group/level lavaan models')
}
fitMu <- Mu
Mu <- lavInspect(fitMu, "mean.ov")
}
names(Mu) <- names(SDs) <- vn
## If a single vector of thresholds is passed, broadcast to a list
if (inherits(thresholds, "lavaan")) {
if (lavInspect(thresholds, "ngroups") > 1L || lavInspect(thresholds, "nlevels") > 1L) {
stop('thresholds= only accepts single-group/level lavaan models')
}
## check whether diag(Sigma) == 1
isSTD <- sapply(SDs, function(x) {
isTRUE(all.equal(x, current = 1, tolerance = .001))
})
if (!all(isSTD)) warning('standardized thresholds= extracted from a ',
'lavaan object, but Sigma= is not a ',
'correlation matrix.')
fitThr <- thresholds
thresholds <- lavInspect(fitThr, "th") # STANDARDIZED thresholds
thresh <- lapply(unique(lavInspect(fitThr, "th.idx")), function(x) {
thresholds[lavInspect(fitThr, "th.idx") == x]
})
names(thresh) <- sapply(thresh, function(x) {
strsplit(names(x)[1], "|t", fixed = TRUE)[[1]][1]
})
} else if (is.atomic(thresholds)) {
thresh <- sapply(vn, function(x) {thresholds}, simplify = FALSE)
} else {
stopifnot(is.list(thresholds)) # must be a list
stopifnot(length(thresholds) <= nrow(Sigma)) # no more than 1 per variable
stopifnot(all(names(thresholds) %in% vn)) # names must match
thresh <- thresholds
}
cn <- names(thresh)
stopifnot(length(cn) > 0L)
## If no category weights are passed, default to 0:nCat
if (missing(cWts)) {
cWts <- sapply(thresh, function(x) { 0:length(x) }, simplify = FALSE)
} else if (is.atomic(cWts)) {
## If a single vector of category weights is passed, broadcast to a list
#FIXME: assumes same number of thresholds across variables
cWts <- sapply(cn, function(x) { cWts }, simplify = FALSE)
} else {
stopifnot(is.list(cWts)) # must be a list
stopifnot(length(cWts) <= nrow(Sigma)) # no more than 1 per variable
stopifnot(all(names(cWts) %in% vn)) # names must match
stopifnot(all(cn %in% names(cWts))) # names must match
cWts <- cWts[cn] # discard any others
}
stopifnot(all((sapply(thresh, length) + 1L) == sapply(cWts, length)))
## Calculate marginal probabilities implied by thresholds on moments
get_marg_probs <- function(threshs, m, sd) {
thr <- c(-Inf, threshs, Inf)
sapply(2:length(thr), function(k) {
pnorm(thr[k], m, sd) - pnorm(thr[k-1], m, sd)
})
}
marginal_probs <- mapply(get_marg_probs, SIMPLIFY = FALSE, threshs = thresh,
m = Mu[cn], sd = SDs[cn])
## Marginal means
Mu_ord <- Mu
Mu_ord[cn] <- mapply(function(p, w) {
stopifnot(length(p) == length(w))
sum(p * w)
}, p = marginal_probs, w = cWts)
## marginal variances (fill in covariances below)
Sigma_ord <- Sigma
if (length(cn) == 1) {
## drop=FALSE is not a solution (yields different error)
Sigma_ord[cn,cn] <- mapply(function(p, w, mu) {
stopifnot(length(p) == length(w))
sum(p * (w - mu)^2)
}, p = marginal_probs, w = cWts, mu = Mu_ord[cn])
} else {
diag(Sigma_ord[cn,cn]) <- mapply(function(p, w, mu) {
stopifnot(length(p) == length(w))
sum(p * (w - mu)^2)
}, p = marginal_probs, w = cWts, mu = Mu_ord[cn])
}
## marginal (standardized) skew
skew_ord <- setNames(rep(0, nrow(Sigma)), nm = vn)
skew_ord[cn] <- mapply(function(p, w, mu) {
stopifnot(length(p) == length(w))
numerator <- sum(p * (w - mu)^3)
Moment2 <- sum(p * (w - mu)^2)
denominator <- sqrt(Moment2)^3
numerator / denominator
}, p = marginal_probs, w = cWts, mu = Mu_ord[cn])
## marginal (standardized, excess) kurtosis
kurt_ord <- setNames(rep(0, nrow(Sigma)), nm = vn)
kurt_ord[cn] <- mapply(function(p, w, mu) {
stopifnot(length(p) == length(w))
numerator <- sum(p * (w - mu)^4)
Moment2 <- sum(p * (w - mu)^2)
denominator <- sqrt(Moment2)^4
numerator / denominator
}, p = marginal_probs, w = cWts, mu = Mu_ord[cn]) - 3 # excess kurtosis
## all marginal descriptives
(margMoments <- data.frame(Mean = Mu_ord, SD = sqrt(diag(Sigma_ord)),
Skew = skew_ord, Kurtosis3 = kurt_ord,
row.names = vn))
class(margMoments) <- c("lavaan.data.frame","data.frame") # for printing
## save old copies to return with new
out <- list(Mu_LRV = Mu, Sigma_LRV = Sigma, R_LRV = stats::cov2cor(Sigma),
Thresholds = thresh, Category_weights = cWts, Uni_ord = margMoments)
class(out$Mu_LRV) <- c("lavaan.vector","numeric")
class(out$Sigma_LRV) <- c("lavaan.matrix.symmetric","matrix")
class(out$R_LRV) <- c("lavaan.matrix.symmetric","matrix")
out$Thresholds <- lapply(out$Thresholds, "class<-",
c("lavaan.vector","numeric"))
out$Category_weights <- lapply(out$Category_weights, "class<-",
c("lavaan.vector","numeric"))
## need bivariate moments?
if (length(vn) == 1L) return(out)
## function to apply to any pair of indicators (i and j) in Sigma
getOrdCov <- function(i, j) {
## to use apply(), i= can be 2 values indicating the [row, column]
if (length(i) > 1L) {
if (!missing(j)) warning("j ignored when i has multiple values")
if (length(i) > 2L) stop("i had ", length(i), " values. Only the first 2 were used.")
j <- i[2]
i <- i[1]
}
## if i/j are numeric, get names
if (is.numeric(i)) i <- vn[i]
if (is.numeric(j)) j <- vn[j]
## make sure thresholds are standardized
# i.thr <-
# j.thr <-
## template for matrices of joint probabilities and cross-products
JointProbs <- CP <- matrix(0, nrow = length(cWts[[i]]),
ncol = length(cWts[[j]]))
i.thr <- c(-1e5, (thresh[[i]] - Mu[i]) / SDs[i], 1e5)
j.thr <- c(-1e5, (thresh[[j]] - Mu[j]) / SDs[j], 1e5)
tCombos <- cbind(expand.grid(i = i.thr, j = j.thr),
expand.grid(cat1 = c(0, seq_along(cWts[[i]])),
cat2 = c(0, seq_along(cWts[[j]]))))
tCombos$cp <- pbivnorm(x = tCombos$i, y = tCombos$j, rho = out$R_LRV[i,j])
## loop over rows & columns
for (RR in seq_along(cWts[[i]])) for (CC in seq_along(cWts[[j]])) {
## calculate joint probabilities
idx1 <- which(tCombos$cat1 == RR & tCombos$cat2 == CC )
idx2 <- which(tCombos$cat1 == RR - 1 & tCombos$cat2 == CC )
idx3 <- which(tCombos$cat1 == RR & tCombos$cat2 == CC - 1)
idx4 <- which(tCombos$cat1 == RR - 1 & tCombos$cat2 == CC - 1)
JointProbs[RR,CC] <- tCombos$cp[idx1] - tCombos$cp[idx2] - tCombos$cp[idx3] + tCombos$cp[idx4]
## calculate cross-products
CP[RR,CC] <- (cWts[[i]][RR] - Mu_ord[i]) * (cWts[[j]][CC] - Mu_ord[j])
}
sum(JointProbs * CP) # return covariance
}
## check whether all variables are being discretized
stayCon <- setdiff(vn, cn)
if (length(stayCon) == 0) {
## all are polychoric
(ij <- which(lower.tri(Sigma_ord), arr.ind = TRUE))
Sigma_ord[ij] <- mapply(getOrdCov, i = ij[,1], j = ij[,2])
Sigma_ord[ ij[,2:1] ] <- Sigma_ord[ij] # copy lower to upper triangle
} else {
## pair by pair, choose polychoric or polyserial
for (i in vn[-length(vn)]) for (j in vn[(which(vn == i)+1):length(vn)]) {
if (i %in% stayCon && j %in% stayCon) next
if (j %in% cn && j %in% cn) {
## both discretized, calculate polychoric
Sigma_ord[i,j] <- Sigma_ord[j,i] <- getOrdCov(i, j)
next
}
## else, calculate polyserial
if (i %in% stayCon) {
CON <- i
CAT <- j
} else {
CAT <- i
CON <- j
}
DENS <- mapply(function(tau, interval, m = 0, sd = 1) {
dnorm(tau, mean = m, sd = sd) * interval
}, tau = thresh[[CAT]], interval = diff(cWts[[CAT]]),
m = Mu[CAT], sd = SDs[CAT])
## Note: polyserial correlation divides by sqrt(diag(Sigma_ord)[CAT]),
## but that cancels out when scaling by both SDs to get covariance
Sigma_ord[CON, CAT] <- Sigma_ord[CAT, CON] <-
out$R_LRV[CON, CAT] * sum(DENS) * sqrt(diag(out$Sigma_LRV)[CON])
}
}
class(Sigma_ord) <- c("lavaan.matrix.symmetric","matrix")
if (nrow(Sigma_ord) > 1L) {
R_ord <- cov2cor(Sigma_ord)
class(R_ord) <- c("lavaan.matrix.symmetric","matrix")
} else R_ord <- NULL
c(out, list(Sigma_ord = Sigma_ord, R_ord = R_ord))
}
semTools/R/gorica.R 0000644 0001762 0000144 00000010466 14753073531 013666 0 ustar ligges users ### Leonard Vanbrabant (Roxygen edits by Terrence D. Jorgensen)
### Last updated: 12 February 2025
##' Wrapper for `goric.lavaan()` from the `restriktor` package
##'
##' The `goricaSEM()` function is an interface to [restriktor::goric.lavaan()],
##' allowing users to perform generalized order-restricted information criterion
##' approximation (GORICA) analysis specifically for structural equation
##' models fitted using the \pkg{lavaan} package.
##'
##' @details
##' This function is designed as a wrapper for the [restriktor::goric.lavaan()]
##' function. It calculates GORICA values and weights, which can be used to
##' compare models or hypotheses under inequality constraints.
##'
##' The `hypotheses=` argument allows users to specify constraints in text-based
##' syntax or matrix notation. For text-based syntax, constraints are specified
##' as a string (e.g., `"a1 > a2"`). For matrix notation, a named list with
##' `$constraints`, `$rhs`, and `$neq` elements can be provided.
##'
##' The `comparison=` argument determines whether the specified hypothesis is
##' compared against its `"complement"`, the `"unconstrained"` model, or
##' neither (`"none"`).
##'
##' @param object A [lavaan::lavaan-class] object.
##' @param hypotheses A named `list` of hypotheses to test. See **Details** for
##' information on how to specify hypotheses.
##' @param comparison A `character` string specifying the type of comparison.
##' Options are `"unconstrained"`, `"complement"`, or `"none"`.
##' Default behavior depends on the number of hypotheses.
##' @param type A `character` string indicating the type of analysis, either
##' `"gorica"` (default) or `"goricac"`.
##' @param standardized `logical` indicating whether standardized estimates are
##' used in the analysis. Defaults to `FALSE`.
##' @param debug `logical` indicating whether to print debugging information.
##' Defaults to \code{FALSE}.
##' @param ... Additional arguments passed to [restriktor::goric.lavaan()].
##'
##' @return
##' A `list` containing the results of the \code{goric.lavaan} function,
##' including:
##' \itemize{
##' \item The log-likelihood.
##' \item Penalty term.
##' \item GORIC(A) values and weights.
##' \item Relative GORIC(A) weights.
##' }
##'
##' @references
##' Kuiper, R. M., Hoijtink, H., & Silvapulle, M. J. (2011). An Akaike-type
##' information criterion for model selection under inequality constraints.
##' \emph{Biometrika, 98}(2), 495--501. \doi{10.1093/biomet/asr002}
##'
##' Vanbrabant, L., Van Loey, N., & Kuiper, R. M. (2020). Evaluating a
##' theory-based hypothesis against its complement using an AIC-type
##' information criterion with an application to facial burn injury.
##' \emph{Psychological Methods, 25}(2), 129--142. \doi{10.1037/met0000238}
##'
##' @seealso [restriktor::goric.lavaan()]
##'
##' @author Leonard Vanbrabant and Rebecca Kuiper
##'
##' @examples
##'
##' ## Example: Perform GORICA analysis on a lavaan model
##' library(lavaan)
##' library(restriktor)
##'
##' ## Define the SEM model
##' model <- '
##' ind60 =~ x1 + x2 + x3
##' dem60 =~ y1 + a1*y2 + b1*y3 + c1*y4
##' dem65 =~ y5 + a2*y6 + b2*y7 + c2*y8
##' dem60 ~ ind60
##' dem65 ~ ind60 + dem60
##' y1 ~~ y5
##' y2 ~~ y4 + y6
##' y3 ~~ y7
##' y4 ~~ y8
##' y6 ~~ y8
##' '
##'
##' ## Fit the model
##' data(PoliticalDemocracy)
##' fit <- sem(model, data = PoliticalDemocracy)
##'
##' ## Define hypotheses
##' myHypothesis <- 'a1 > a2, b1 > b2, c1 > c2'
##'
##' ## Perform GORICA analysis
##' result <- goricaSEM(fit, hypotheses = list(H1 = myHypothesis),
##' standardized = FALSE, comparison = "complement",
##' type = "gorica")
##'
##' ## Print result
##' print(result)
##'
##' @export
goricaSEM <- function(object, ..., hypotheses = NULL,
comparison = NULL,
type = "gorica",
standardized = FALSE,
debug = FALSE) {
## Check if the original function is available
if (!requireNamespace("restriktor", quietly = TRUE)) {
stop("The 'restriktor' package is required but not installed.")
}
## Call the original function
restriktor::goric.lavaan(
object = object,
hypotheses = hypotheses,
comparison = comparison,
type = type,
standardized = standardized,
debug = debug,
...
)
}
semTools/R/zzz.R 0000644 0001762 0000144 00000003404 14634011255 013242 0 ustar ligges users .onAttach <- function(libname, pkgname) {
version <- read.dcf(file = system.file("DESCRIPTION", package = pkgname), fields = "Version")
packageStartupMessage(" ")
packageStartupMessage("###############################################################################")
packageStartupMessage("This is ", paste(pkgname, version))
packageStartupMessage("All users of R (or SEM) are invited to submit functions or ideas for functions.")
packageStartupMessage("###############################################################################")
## if lavaan.mi is already loaded, warn users to use :: or switch order
if ("lavaan.mi" %in% names(utils::sessionInfo()$otherPkgs)) {
packageStartupMessage(" ")
packageStartupMessage("The lavaan.mi package was already attached when semTools was attached.")
packageStartupMessage("To access lavaan.mi functions masked by semTools, use the double-colon:")
packageStartupMessage(" ")
packageStartupMessage("\t lavaan.mi::cfa.mi() or lavaan.mi::lavTestLRT.mi()\n")
packageStartupMessage("It is preferable to first attach semTools, then attach lavaan.mi, but")
packageStartupMessage("you can change the path order by detaching and reloading lavaan.mi,")
packageStartupMessage("using this syntax:\n")
packageStartupMessage('\t detach("package:lavaan.mi", unload = TRUE)')
packageStartupMessage('\t library(lavaan.mi) \n')
packageStartupMessage("Then the deprecated semTools::lavaan.mi functionality will be masked.")
}
}
.onLoad <- function(libname, pkgname) {
## "register" emmeans functionality
emInstalled <- try(loadNamespace("emmeans"), silent = TRUE)
if (!inherits(emInstalled, "try-error")) {
emmeans::.emm_register("lavaan", pkgname)
}
}
semTools/R/tukeySEM.R 0000644 0001762 0000144 00000005055 14632016456 014125 0 ustar ligges users ### Alexander M. Schoemann
### Last updated: 9 March 2018
#' Tukey's WSD post-hoc test of means for unequal variance and sample size
#'
#' This function computes Tukey's WSD post hoc test of means when variances and
#' sample sizes are not equal across groups. It can be used as a post hoc test
#' when comparing latent means in multiple group SEM.
#'
#' After conducting an omnibus test of means across three of more groups,
#' researchers often wish to know which sets of means differ at a particular
#' Type I error rate. Tukey's WSD test holds the error rate stable across
#' multiple comparisons of means. This function implements an adaptation of
#' Tukey's WSD test from Maxwell & Delaney (2004), that allows variances and
#' sample sizes to differ across groups.
#'
#'
#' @importFrom stats ptukey
#'
#' @param m1 Mean of group 1.
#' @param m2 Mean of group 2.
#' @param var1 Variance of group 1.
#' @param var2 Variance of group 2.
#' @param n1 Sample size of group 1.
#' @param n2 Sample size of group 2.
#' @param ng Total number of groups to be compared (i.e., the number of groups
#' compared in the omnibus test).
#' @return A vector with three elements:
#' \enumerate{
#' \item `q`: The *q* statistic
#' \item `df`: The degrees of freedom for the *q* statistic
#' \item `p`: A *p* value based on the *q* statistic, *df*,
#' and the total number of groups to be compared
#' }
#' @author Alexander M. Schoemann (East Carolina University;
#' \email{schoemanna@@ecu.edu})
#' @references Maxwell, S. E., & Delaney, H. D. (2004). *Designing
#' experiments and analyzing data: A model comparison perspective* (2nd ed.).
#' Mahwah, NJ: Lawrence Erlbaum Associates.
#' @examples
#'
#' ## For a case where three groups have been compared:
#' ## Group 1: mean = 3.91, var = 0.46, n = 246
#' ## Group 2: mean = 3.96, var = 0.62, n = 465
#' ## Group 3: mean = 2.94, var = 1.07, n = 64
#'
#' ## compare group 1 and group 2
#' tukeySEM(3.91, 3.96, 0.46, 0.62, 246, 425, 3)
#'
#' ## compare group 1 and group 3
#' tukeySEM(3.91, 2.94, 0.46, 1.07, 246, 64, 3)
#'
#' ## compare group 2 and group 3
#' tukeySEM(3.96, 2.94, 0.62, 1.07, 465, 64, 3)
#'
#' @export
tukeySEM <- function(m1, m2, var1, var2, n1, n2, ng) {
qNum <- abs(m1 - m2)
qDenom <- sqrt(((var1/n1) + (var2/n2))/2)
Tukeyq <- qNum / qDenom
Tukeydf <- ((var1/n1) + (var2/n2))^2 /
(((var1/n1)^2 / (n1 - 1)) + ((var2/n2)^2 / (n2 - 1)))
c(q = Tukeyq, df = Tukeydf, p = 1 - ptukey(Tukeyq, ng, Tukeydf))
}
##Example from Schoemann (2013)
##Bio vs. policial science on evo misconceptions
#tukeySEM(3.91, 3.96,.46, .62, 246, 425,3)
semTools/R/miPowerFit.R 0000644 0001762 0000144 00000154575 15142657341 014521 0 ustar ligges users ### Sunthud Pornprasertmanit; with contributions by Terrence D. Jorgensen
### Last updated: 3 February 2026
#' EPC Equivalence Fit Evaluation Using Modification Indices
#'
#' Evaluates model fit from an equivalence-testing perspective by
#' aggregating local EPC-based diagnostics into a global, fit-style
#' assessment. The procedure combines modification indices (MI),
#' expected parameter changes (EPC), statistical power, and confidence
#' intervals relative to a smallest effect size of interest (SESOI).
#'
#' Two complementary local decision rules are implemented:
#'
#' \strong{Method 1 (Power-based; Saris, Satorra, & van der Veld, 2009).}
#' Modification indices, statistical power, and EPC magnitude are jointly
#' evaluated (the J-rule) to classify fixed parameters as misspecified,
#' not misspecified, or inconclusive.
#'
#' \strong{Method 2 (CI-based equivalence testing).}
#' Confidence intervals of EPCs are compared against a trivial
#' misspecification region defined by the SESOI to determine whether
#' fixed parameters are substantially misspecified, trivially misspecified,
#' underpowered, or inconclusive.
#'
#' The resulting local classifications are returned in a single data
#' frame and can be summarized to yield a global equivalence-style fit
#' evaluation.
#'
#' @param lavaanObj A fitted \code{lavaan} object used to evaluate model fit.
#' @param stdLoad Standardized factor loading defining the SESOI for
#' loading misspecifications. Default is 0.4.
#' @param cor Default standardized correlation defining the SESOI for
#' covariance misspecifications. Used for both latent and residual
#' covariances unless overridden.
#' @param corLatent Standardized latent factor correlation defining the
#' SESOI for latent covariance misspecifications. If \code{NULL},
#' defaults to \code{cor}.
#' @param corResidual Standardized residual correlation defining the
#' SESOI for residual covariance misspecifications. If \code{NULL},
#' defaults to \code{cor}.
#' @param stdBeta Standardized regression coefficient defining the SESOI
#' for structural misspecifications. Default is 0.1.
#' @param stdIntcept Standardized intercept (Cohen's \emph{d}) defining
#' the SESOI for intercept misspecifications. Default is 0.2.
#' @param stdSesoi Optional vector of standardized SESOI values. If
#' provided, overrides operator-specific SESOI definitions.
#' @param sesoi Optional vector of unstandardized SESOI values. If
#' provided, overrides \code{stdSesoi} and all operator-specific SESOI
#' arguments.
#' @param cilevel Confidence level for EPC confidence intervals used in
#' CI-based equivalence testing.
#' @param \dots Additional arguments passed to
#' \code{\link[lavaan]{modificationIndices}}.
#'
#' @details
#' This function provides a local-to-global equivalence-based alternative
#' to traditional exact-fit evaluation. It is designed to assess whether
#' fixed parameters are substantively misspecified relative to a SESOI,
#' rather than whether a model fits exactly.
#'
#' Models with categorical indicators or unsupported constraints may
#' not be fully supported.
#'
#' @return A data frame with one row per fixed parameter, containing:
#' \enumerate{
#' \item Parameter identifiers: \code{lhs}, \code{op}, \code{rhs}, and \code{group}.
#' \item Modification index (\code{mi}) and expected parameter change estimates
#' (\code{epc}).
#' \item Unstandardized and standardized smallest effect size of interest values
#' (\code{sesoi}, \code{std.sesoi}).
#' \item Power-based decision (\code{decision.pow}) and related diagnostics, including
#' whether the modification index is statistically significant
#' (\code{significant.mi}) and whether the misfit at the SESOI has power greater
#' than 0.80 (\code{high.power}). Decision labels are:
#' M = Substantially Misspecified,
#' I = Inconclusive,
#' NM = Trivially Misspecified,
#' EPC:M = Substantially Misspecified based on EPC information,
#' EPC:NM = Trivially Misspecified based on EPC information.
#' \item EPC-related statistics, including the standard error of the EPC
#' (\code{se.epc}), confidence interval bounds for the EPC
#' (\code{lower.epc}, \code{upper.epc}), and confidence interval bounds for the
#' standardized EPC (\code{lower.std.epc}, \code{upper.std.epc}).
#' \item Confidence-interval–based equivalence decision (\code{decision.ci}), with
#' labels:
#' M = Substantially Misspecified (EPC exceeds the SESOI),
#' I = Inconclusive,
#' NM = Trivially Misspecified,
#' U = Underpowered (CI too wide to evaluate equivalence relative to the SESOI).
#' }
#'
#' @references
#' Saris, W. E., Satorra, A., & van der Veld, W. M. (2009).
#' Testing structural equation models or detection of misspecifications?
#' \emph{Structural Equation Modeling, 16}(4), 561--582.
#'
#' @seealso \code{\link{epcEquivCheck}}
#'
#' @importFrom stats qchisq pchisq qnorm
#' @importFrom lavaan modificationIndices lavNames fitMeasures
#'
#' @aliases epcEquivFit miPowerFit
#'
#' @examples
#'
#' library(lavaan)
#'
#' one.model <- ' onefactor =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
#' fit <- cfa(one.model, data = HolzingerSwineford1939)
#' out <- epcEquivFit(fit)
#' out
#' summary(out)
#'
#' @export
epcEquivFit <- function(lavaanObj,
stdLoad = 0.4,
cor = 0.1,
corLatent = NULL,
corResidual = NULL,
stdBeta = 0.1,
stdIntcept = 0.2,
stdSesoi = NULL,
sesoi = NULL,
cilevel = 0.90, ...) {
dots <- list(...)
df_model <- lavaan::fitMeasures(lavaanObj, "df")
if (is.na(df_model) || df_model <= 0) {
stop(
"epcEquivFit() requires a model with positive degrees of freedom (df > 0).\n",
"The supplied lavaan model has df = ", df_model, ".\n",
"EPC-based equivalence testing is not defined for just-identified or ",
"under-identified models.",
call. = FALSE
)
}
# deprecated aliases
if (!is.null(dots$intcept)) {
warning(
"'intcept' is deprecated; please use 'stdIntcept' instead.",
call. = FALSE
)
stdIntcept <- dots$intcept
}
if (!is.null(dots$stdDelta)) {
warning(
"'stdDelta' is deprecated; please use 'stdSesoi' instead.",
call. = FALSE
)
stdSesoi <- dots$stdDelta
}
if (!is.null(dots$delta)) {
warning(
"'delta' is deprecated; please use 'sesoi' instead.",
call. = FALSE
)
sesoi <- dots$delta
}
lv_names <- lavaan::lavNames(lavaanObj, type="lv")
mi <- lavaan::modificationIndices(lavaanObj)
mi <- mi[mi$op != "==",]
sigma <- mi[,"epc"] / sqrt(mi[,"mi"])
sigma[!is.finite(sigma)] <- NA_real_
if (is.null(corLatent)) corLatent <- cor
if (is.null(corResidual)) corResidual <- cor
if (is.null(sesoi)) {
if (is.null(stdSesoi))
stdSesoi <- getTrivialEpc(mi, lv_names=lv_names, stdLoad = stdLoad,
corLatent = corLatent, corResidual = corResidual,
stdBeta = stdBeta, stdIntcept = stdIntcept)
if (length(stdSesoi) == 1) stdSesoi <- rep(stdSesoi, nrow(mi))
sesoi <- unstandardizeEpc(mi, stdSesoi, lavInspectTotalVar(lavaanObj), lavInspectResidualVar(lavaanObj))
}
if (length(sesoi) == 1) sesoi <- rep(sesoi, nrow(mi))
ncp <- (sesoi / sigma)^2
alpha <- 0.05
desiredPow <- 0.80
cutoff <- stats::qchisq(1 - alpha, df = 1)
pow <- 1 - stats::pchisq(cutoff, df = 1, ncp = ncp)
sigMI <- mi[,"mi"] > cutoff
highPow <- pow > desiredPow
group <- rep(1, nrow(mi))
if ("group" %in% colnames(mi)) group <- mi[ , "group"]
decision <- mapply(decisionMIPow, sigMI = sigMI, highPow = highPow,
epc = mi[ , "epc"], trivialEpc = sesoi)
if (is.null(stdSesoi)) stdSesoi <- standardizeEpc(mi, lavInspectTotalVar(lavaanObj),
lavInspectResidualVar(lavaanObj),
sesoi = sesoi)
result <- cbind(mi[ , 1:3], group, as.numeric(mi[ , "mi"]), mi[ , "epc"],
sesoi, mi[ , "sepc.all"],
stdSesoi, sigMI, highPow, decision)
# New method
crit <- abs(stats::qnorm((1 - cilevel)/2))
seepc <- abs(result[,6]) / sqrt(abs(result[,5]))
lowerepc <- result[,6] - crit * seepc
upperepc <- result[,6] + crit * seepc
stdlowerepc <- standardizeEpc(mi, lavInspectTotalVar(lavaanObj),
lavInspectResidualVar(lavaanObj), sesoi = lowerepc)
stdupperepc <- standardizeEpc(mi, lavInspectTotalVar(lavaanObj),
lavInspectResidualVar(lavaanObj), sesoi = upperepc)
isVar <- mi[,"op"] == "~~" & mi[,"lhs"] == mi[,"rhs"]
decisionci <- mapply(decisionCIEpc, targetval = as.numeric(stdSesoi),
lower = stdlowerepc, upper = stdupperepc,
positiveonly = isVar)
result <- cbind(result, pow, seepc, lowerepc, upperepc, stdlowerepc,
stdupperepc, decisionci)
result <- result[!is.na(decision), ]
colnames(result) <- c("lhs","op","rhs","group","mi","epc","sesoi",
"std.epc","std.sesoi","significant.mi",
"high.power","decision.pow","pow","se.epc","lower.epc",
"upper.epc","lower.std.epc","upper.std.epc","decision.ci")
class(result) <- c("epcequivfit.data.frame","lavaan.data.frame","data.frame")
# backward compatibility alias
result$target.epc <- result$sesoi
result$std.target.epc <- result$std.sesoi
return(result)
}
#FIXME: Change to .Defunct after a few version updates
miPowerFit <- function(...) {
.Deprecated("epcEquivFit",
msg = "miPowerFit() has been replaced by epcEquivFit().")
epcEquivFit(...)
}
#' EPC Equivalence Feasibility Check for Standardized Parameters
#'
#' Performs an EPC-based feasibility check to assess whether a set of
#' standardized population parameters defines a valid population
#' covariance matrix and whether trivially misspecified parameters
#' remain within a user-defined smallest effect size of interest (SESOI).
#' Feasibility is evaluated by constructing implied population models
#' under targeted parameter perturbations and examining EPC behavior
#' using \code{\link{epcEquivFit}}.
#'
#' This function focuses on standardized parameters and supports
#' recursive SEMs with continuous indicators only.
#'
#' @param lavaanObj A fitted \code{lavaan} object representing the target model.
#' @param minRelEffect A scalar in (0, 1) specifying the minimum relative
#' magnitude of the standardized perturbation to be evaluated. The
#' default value of 0.75 indicates that perturbations equal to 75\% of
#' the SESOI are treated as trivial. If EPCs exceed the SESOI under
#' such perturbations, EPC equivalence testing is not recommended.
#' @param stdLoad Standardized factor loading used to define the SESOI
#' for loading misspecifications.
#' @param cor Standardized correlation used as a default SESOI for
#' covariance misspecifications. This value is used for both latent
#' and residual covariances unless overridden by
#' \code{corLatent} or \code{corResidual}.
#' @param corLatent Standardized latent factor correlation used to
#' define the SESOI for latent covariance misspecifications. If
#' \code{NULL}, defaults to \code{cor}.
#' @param corResidual Standardized residual correlation used to define
#' the SESOI for indicator residual covariance misspecifications. If
#' \code{NULL}, defaults to \code{cor}.
#' @param stdBeta Standardized regression coefficient used to define
#' the SESOI for structural misspecifications.
#'
#' @details
#' The procedure first checks whether the standardized parameters imply
#' a positive definite population covariance matrix. It then evaluates
#' EPC behavior under both positive and negative trivial
#' misspecifications by repeatedly constructing implied population
#' covariance matrices with perturbed parameters
#' (\code{minRelEffect} \eqn{\times} SESOI), refitting the model, and
#' re-evaluating EPCs.
#'
#' Models with categorical indicators, formative indicators, or
#' multiple-group structures are not supported.
#'
#' @return An object of class \code{"epcEquivCheckStd"} containing:
#' \itemize{
#' \item \code{feasible}: Logical indicator of whether a valid
#' standardized population model exists.
#' \item \code{any_M}: Logical indicator of whether any EPC exceeded
#' the SESOI under the evaluated misspecifications.
#' \item \code{recommendation}: Character string summarizing feasibility
#' (e.g., \code{"RECOMMENDED"}, \code{"NOT RECOMMENDED"}).
#' \item \code{M_table}: Data frame summarizing EPCs exceeding the SESOI,
#' if any.
#' \item \code{testeffect}: Data frame reporting the smallest tested
#' standardized perturbations in each direction.
#' }
#'
#' @importFrom lavaan lavaan
#'
#' @seealso \code{\link{epcEquivFit}}
#'
#' @examples
#'
#' library(lavaan)
#'
#' one.model <- ' onefactor =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 '
#' fit <- cfa(one.model, data = HolzingerSwineford1939)
#' \donttest{
#' epcEquivCheck(fit)
#' }
#'
#' @export
epcEquivCheck <- function(lavaanObj,
minRelEffect = 0.75,
stdLoad = 0.4,
cor = 0.1,
corLatent = NULL,
corResidual = NULL,
stdBeta = 0.1) {
tol <- 1e-10
.new_epcEquivCheckStd_infeasible <- function(reason) {
out <- list(
feasible = FALSE,
any_M = NA,
recommendation = "NOT APPLICABLE",
reason = reason,
M_table = NULL,
testeffect = NULL
)
class(out) <- "epcEquivCheckStd"
out
}
if (minRelEffect <= 0 || minRelEffect >= 1) {
stop("minRelEffect must be between 0 and 1.")
}
if (is.null(corLatent)) corLatent <- cor
if (is.null(corResidual)) corResidual <- cor
# Candidate misspecifications and EPC-based feasibility outputs
miout <- epcEquivFit(lavaanObj, stdLoad = stdLoad, corLatent = corLatent,
corResidual = corResidual, stdBeta = stdBeta)
# Scope checks: features not supported in this standardized-parameter feasibility check
if (any(miout$op == "|")) {
stop("Models with categorical indicators are not supported.")
}
if (any(miout$op == "~1")) {
stop("Models with a mean structure are not supported.")
}
if (any(miout$op == "<~")) {
stop("Models with formative indicators are not supported.")
}
if (lavaan::lavInspect(lavaanObj, "ngroups") > 1) {
stop("Multiple-group models are not supported yet.")
}
# Extract standardized parameter matrices
std <- lavaan::lavInspect(lavaanObj, "std.all")
lambda <- std$lambda
corpsi <- cov2cor_safe(std$psi)
cortheta <- cov2cor_safe(std$theta)
stdbeta <- std$beta
if (is.null(stdbeta)) stdbeta <- matrix(0, nrow(corpsi), ncol(corpsi),
dimnames = dimnames(corpsi))
# ---- Existence check: do standardized parameters define ANY PD population Sigma? ----
residVarPsi0 <- findFactorResidualVar(
beta = stdbeta,
corPsi = corpsi,
totalVarPsi = rep(1, nrow(stdbeta))
)
if (any(!is.finite(residVarPsi0)) || any(residVarPsi0 < -tol)) {
return(.new_epcEquivCheckStd_infeasible("std_params_not_generative"))
}
residVarPsi0[abs(residVarPsi0) < tol] <- 0
Phi0 <- findFactorTotalCov(
beta = stdbeta,
corPsi = corpsi,
errorVarPsi = residVarPsi0
)
Sigma_y0 <- lambda %*% Phi0 %*% t(lambda)
residVarTheta0 <- findIndResidualVar(
lambda = lambda,
totalFactorCov = Phi0,
totalVarTheta = rep(1, nrow(lambda))
)
if (any(!is.finite(residVarTheta0)) || any(residVarTheta0 < -tol)) {
return(.new_epcEquivCheckStd_infeasible("std_params_not_generative"))
}
residVarTheta0[abs(residVarTheta0) < tol] <- 0
Theta0 <- cor2cov_safe(cortheta, sqrt(residVarTheta0))
Sigma0 <- Sigma_y0 + Theta0
if (!isPD(Sigma0)) {
return(.new_epcEquivCheckStd_infeasible("std_params_not_generative"))
}
# ---- Search feasibility for positive perturbations ----
result <- matrix(NA, nrow(miout), nrow(miout))
sepc_mat <- matrix(NA, nrow(miout), nrow(miout))
testeffect <- rep(NA_real_, nrow(miout))
kseq <- seq(minRelEffect, 0.1, length.out = 10) # decreasing misspecification
for (i in 1:nrow(miout)) {
row <- miout[i,]
found <- FALSE
k_found <- NA_real_
for (k in kseq) {
# Start from base standardized parameter matrices each time
tlambda <- lambda
tcorpsi <- corpsi
tcortheta <- cortheta
tstdbeta <- stdbeta
tempRelEffect <- NA_real_
# Apply a single targeted perturbation (scaled by k)
if (row$op == "=~") {
tlambda[row$rhs, row$lhs] <- stdLoad * k
tempRelEffect <- stdLoad
} else if (row$op == "~~") {
# latent (psi) vs observed residual (theta) determined by membership
if (row$lhs %in% rownames(corpsi)) {
tcorpsi[row$lhs, row$rhs] <- corLatent * k
tcorpsi[row$rhs, row$lhs] <- corLatent * k
tempRelEffect <- corLatent
} else if (row$lhs %in% rownames(lambda)) {
tcortheta[row$lhs, row$rhs] <- corResidual * k
tcortheta[row$rhs, row$lhs] <- corResidual * k
tempRelEffect <- corResidual
}
} else if (row$op == "~") {
tstdbeta[row$lhs, row$rhs] <- stdBeta * k
tempRelEffect <- stdBeta
}
# Derive implied factor residual variances needed for total factor covariance
residVarPsi <- findFactorResidualVar(
beta = tstdbeta,
corPsi = tcorpsi,
totalVarPsi = rep(1, nrow(tstdbeta))
)
if (any(!is.finite(residVarPsi)) || any(residVarPsi < -tol)) next
residVarPsi[abs(residVarPsi) < tol] <- 0
Phi <- tryCatch(
findFactorTotalCov(
beta = tstdbeta,
corPsi = tcorpsi,
errorVarPsi = residVarPsi
),
error = function(e) NULL
)
if (is.null(Phi) || any(!is.finite(Phi))) next
# Implied indicator covariance (excluding residuals)
Sigma_y_noTheta <- tlambda %*% Phi %*% t(tlambda)
# Derive indicator residual variances under standardized total variances (=1)
residVarTheta <- findIndResidualVar(
lambda = tlambda,
totalFactorCov = Phi,
totalVarTheta = rep(1, nrow(tlambda))
)
if (any(!is.finite(residVarTheta)) || any(residVarTheta < -tol)) next
residVarTheta[abs(residVarTheta) < tol] <- 0
Theta <- cor2cov_safe(tcortheta, sqrt(residVarTheta))
Sigma_implied <- Sigma_y_noTheta + Theta
# Only proceed if Sigma is PD and the model is estimable under that population
if (isPD(Sigma_implied)) {
tempout <- tryCatch(
suppressWarnings(lavaan::lavaan(lavaanObj,
sample.cov = Sigma_implied,
sample.nobs = 1000000L,
std.lv = TRUE)),
error = function(e) NULL
)
if (is.null(tempout)) next
if (lavCheckAdmissibleFit(tempout)) {
tempEqTest <- epcEquivFit(tempout,
stdLoad = stdLoad,
corLatent = corLatent,
corResidual = corResidual,
stdBeta = stdBeta)
result[i, ] <- tempEqTest[, "decision.ci"]
sepc_mat[i, ] <- tempEqTest[,"std.epc"]
found <- TRUE
k_found <- k
}
}
if (found) break
}
testeffect[i] <- k_found * tempRelEffect
}
# ---- Search feasibility for negative perturbations ----
result2 <- matrix(NA, nrow(miout), nrow(miout))
sepc_mat2 <- matrix(NA, nrow(miout), nrow(miout))
testeffect2 <- rep(NA_real_, nrow(miout))
kseq2 <- seq(minRelEffect, 0.1, length.out = 10) # decreasing misspecification
for (i in 1:nrow(miout)) {
row <- miout[i,]
found2 <- FALSE
k_found2 <- NA_real_
for (k2 in kseq2) {
tlambda2 <- lambda
tcorpsi2 <- corpsi
tcortheta2 <- cortheta
tstdbeta2 <- stdbeta
tempRelEffect2 <- NA_real_
if (row$op == "=~") {
tlambda2[row$rhs, row$lhs] <- -stdLoad * k2
tempRelEffect2 <- -stdLoad
} else if (row$op == "~~") {
if (row$lhs %in% rownames(corpsi)) {
tcorpsi2[row$lhs, row$rhs] <- -corLatent * k2
tcorpsi2[row$rhs, row$lhs] <- -corLatent * k2
tempRelEffect2 <- -corLatent
} else if (row$lhs %in% rownames(lambda)) {
tcortheta2[row$lhs, row$rhs] <- -corResidual * k2
tcortheta2[row$rhs, row$lhs] <- -corResidual * k2
tempRelEffect2 <- -corResidual
}
} else if (row$op == "~") {
tstdbeta2[row$lhs, row$rhs] <- -stdBeta * k2
tempRelEffect2 <- -stdBeta
}
residVarPsi2 <- findFactorResidualVar(
beta = tstdbeta2,
corPsi = tcorpsi2,
totalVarPsi = rep(1, nrow(tstdbeta2))
)
if (any(!is.finite(residVarPsi2)) || any(residVarPsi2 < -tol)) next
residVarPsi2[abs(residVarPsi2) < tol] <- 0
Phi2 <- tryCatch(
findFactorTotalCov(
beta = tstdbeta2,
corPsi = tcorpsi2,
errorVarPsi = residVarPsi2
),
error = function(e) NULL
)
if (is.null(Phi2) || any(!is.finite(Phi2))) next
Sigma_y_noTheta2 <- tlambda2 %*% Phi2 %*% t(tlambda2)
residVarTheta2 <- findIndResidualVar(
lambda = tlambda2,
totalFactorCov = Phi2,
totalVarTheta = rep(1, nrow(tlambda2))
)
if (any(!is.finite(residVarTheta2)) || any(residVarTheta2 < -tol)) next
residVarTheta2[abs(residVarTheta2) < tol] <- 0
Theta2 <- cor2cov_safe(tcortheta2, sqrt(residVarTheta2))
Sigma_implied2 <- Sigma_y_noTheta2 + Theta2
if (isPD(Sigma_implied2)) {
tempout2 <- tryCatch(
suppressWarnings(lavaan(lavaanObj,
sample.cov = Sigma_implied2,
sample.nobs = 1000000L,
std.lv = TRUE)),
error = function(e) NULL
)
if (is.null(tempout2)) next
if (lavCheckAdmissibleFit(tempout2)) {
tempEqTest2 <- epcEquivFit(tempout2,
stdLoad = stdLoad,
corLatent = corLatent,
corResidual = corResidual,
stdBeta = stdBeta)
result2[i, ] <- tempEqTest2[, "decision.ci"]
sepc_mat2[i, ] <- tempEqTest2[,"std.epc"]
found2 <- TRUE
k_found2 <- k2
}
}
if (found2) break
}
testeffect2[i] <- tempRelEffect2 * k_found2
}
resultall <- cbind(result, result2)
M_pos <- extract_M_table(
result_mat = result,
miout = miout,
sepc_mat = sepc_mat,
direction = "positive"
)
M_neg <- extract_M_table(
result_mat = result2,
miout = miout,
sepc_mat = sepc_mat2,
direction = "negative"
)
M_all <- rbind(M_pos, M_neg)
T_all <- data.frame(
lhs = miout$lhs,
op = miout$op,
rhs = miout$rhs,
effect_positive = testeffect,
effect_negative = testeffect2,
stringsAsFactors = FALSE
)
feasible <- TRUE
any_M <- any(resultall == "M", na.rm = TRUE)
recommendation <- if (!feasible) {
"NOT APPLICABLE"
} else if (any_M) {
"NOT RECOMMENDED"
} else {
"RECOMMENDED"
}
out <- list(
feasible = feasible,
any_M = any_M,
recommendation = recommendation,
M_table = M_all,
testeffect = T_all
)
class(out) <- "epcEquivCheckStd"
return(out)
}
## ----------------
## Hidden Functions
## ----------------
# lavInspectTotalVar()
# ------------------------------------------------------------------
# Internal lavaan helper used by epcEquivFit() and EPC-related utilities.
# Extracts total variances of observed and latent variables from a
# fitted lavaan object using the implied covariance matrix. The
# function supports single-group and multi-group models and returns
# group-specific variance vectors for use in EPC standardization and
# variance-based transformations. No EPC estimation or statistical
# inference is performed.
##' @importFrom lavaan lavInspect
lavInspectTotalVar <- function(lavaanObj) {
result <- list()
nGroups <- lavInspect(lavaanObj, "ngroups")
cov.all <- lavInspect(lavaanObj, "cov.all")
if (nGroups == 1) cov.all <- list(cov.all)
for (i in 1:nGroups) {
temp <- diag(cov.all[[i]])
names(temp) <- rownames(cov.all[[i]])
result[[i]] <- temp
}
return(result)
}
# lavInspectResidualVar()
# ------------------------------------------------------------------
# Internal lavaan helper used by epcEquivFit() and EPC-related utilities.
# Extracts estimated residual variances for indicators (theta) and
# latent variables (psi) from a fitted lavaan object, handling both
# single-group and multi-group models.
##' @importFrom lavaan lavInspect
lavInspectResidualVar <- function(lavaanObj) {
result <- list()
nGroups <- lavInspect(lavaanObj, "ngroups")
est <- lavInspect(lavaanObj, "est")
if (nGroups == 1) est <- list(est)
for (i in 1:nGroups) {
temppsi <- diag(est[[i]]$psi)
names(temppsi) <- rownames(est[[i]]$psi)
templambda <- est[[i]]$lambda
if(ncol(templambda) < nrow(templambda)) {
temptheta <- diag(est[[i]]$theta)
names(temptheta) <- rownames(est[[i]]$theta)
temp <- c(temptheta, temppsi)
} else {
temp <- temppsi
}
result[[i]] <- temp
}
return(result)
}
# getTrivialEpc()
# ------------------------------------------------------------------
# Internal utility used by epcEquivFit() and EPC equivalence diagnostics.
# Assigns operator-specific smallest effect size of interest (SESOI)
# values to each fixed parameter based on its type (e.g., factor
# loadings, structural regressions, latent covariances, residual
# covariances, intercepts). The resulting vector defines the magnitude
# of trivial misspecification used for EPC evaluation.
getTrivialEpc <- function(
mi,
lv_names,
stdLoad = 0.4,
corLatent = 0.1,
corResidual = 0.1,
stdBeta = 0.1,
stdIntcept = 0.2
) {
result <- numeric(nrow(mi))
for (i in seq_len(nrow(mi))) {
op <- mi[i, "op"]
lhs <- mi[i, "lhs"]
rhs <- mi[i, "rhs"]
if (op == "=~") {
result[i] <- stdLoad
} else if (op == "~~") {
if (lhs %in% lv_names && rhs %in% lv_names) {
result[i] <- corLatent
} else {
result[i] <- corResidual
}
} else if (op == "~1") {
result[i] <- stdIntcept
} else if (op == "~") {
result[i] <- stdBeta
} else {
result[i] <- NA_real_
}
}
result
}
# unstandardizeEpc()
# ------------------------------------------------------------------
# Internal utility used by epcEquivFit() and related EPC diagnostics.
# Converts standardized effect-size thresholds (SESOI) back to the
# unstandardized EPC scale using total and residual variances of the
# involved variables. The transformation is operator-specific
# (e.g., loadings, regressions, covariances, intercepts) and provides
# unstandardized quantities required for EPC evaluation.
unstandardizeEpc <- function(mi, sesoi, totalVar, residualVar) {
name <- names(totalVar[[1]])
lhsPos <- match(mi[,"lhs"], name)
rhsPos <- match(mi[,"rhs"], name)
group <- rep(1, nrow(mi))
if("group" %in% colnames(mi)) group <- mi[,"group"]
getVar <- function(pos, group) totalVar[[group]][pos]
getVarRes <- function(pos, group) residualVar[[group]][pos]
lhsVar <- mapply(getVar, pos=lhsPos, group=group)
rhsVar <- mapply(getVar, pos=rhsPos, group=group)
lhsVarRes <- mapply(getVarRes, pos=lhsPos, group=group)
rhsVarRes <- mapply(getVarRes, pos=rhsPos, group=group)
FUN <- function(op, lhsVar, rhsVar, lhsVarRes, rhsVarRes, sesoi) {
if(op == "|") return(NA)
lhsSD <- sqrt(lhsVar)
rhsSD <- sqrt(rhsVar)
lhsSDRes <- sqrt(lhsVarRes)
rhsSDRes <- sqrt(rhsVarRes)
if(!is.numeric(sesoi)) sesoi <- as.numeric(sesoi)
if(op == "=~") {
return((rhsSD * sesoi) / lhsSD)
} else if (op == "~~") {
return(lhsSDRes * sesoi * rhsSDRes)
} else if (op == "~1") {
return(lhsSD * sesoi)
} else if (op == "~") {
return((lhsSD * sesoi) / rhsSD)
} else {
return(NA)
}
}
sesoi <- mapply(FUN, op=mi[,"op"], lhsVar=lhsVar, rhsVar=rhsVar,
lhsVarRes=lhsVarRes, rhsVarRes=rhsVarRes, sesoi=sesoi)
return(sesoi)
}
# standardizeEpc()
# ------------------------------------------------------------------
# Internal utility used by epcEquivFit() and related EPC diagnostics.
# Transforms unstandardized EPCs or SESOI values into standardized
# effect-size metrics based on total and residual variances of the
# involved variables. The standardization is operator-specific
# (e.g., loadings, regressions, covariances, intercepts) and produces
# standardized quantities suitable for comparison against standardized
# SESOI thresholds.
standardizeEpc <- function(mi, totalVar, residualVar, sesoi = NULL) {
if(is.null(sesoi)) sesoi <- mi[,"epc"]
name <- names(totalVar[[1]])
lhsPos <- match(mi[,"lhs"], name)
rhsPos <- match(mi[,"rhs"], name)
group <- rep(1, nrow(mi))
if("group" %in% colnames(mi)) group <- mi[,"group"]
getVar <- function(pos, group) totalVar[[group]][pos]
getVarRes <- function(pos, group) residualVar[[group]][pos]
lhsVar <- mapply(getVar, pos=lhsPos, group=group)
rhsVar <- mapply(getVar, pos=rhsPos, group=group)
lhsVarRes <- mapply(getVarRes, pos=lhsPos, group=group)
rhsVarRes <- mapply(getVarRes, pos=rhsPos, group=group)
FUN <- function(op, lhsVar, rhsVar, lhsVarRes, rhsVarRes, sesoi) {
lhsSD <- sqrt(lhsVar)
rhsSD <- sqrt(rhsVar)
lhsSDRes <- sqrt(lhsVarRes)
rhsSDRes <- sqrt(rhsVarRes)
if(!is.numeric(sesoi)) sesoi <- as.numeric(sesoi)
if(op == "=~") {
#stdload = beta * sdlatent / sdindicator = beta * lhs / rhs
return((sesoi / rhsSD) * lhsSD)
} else if (op == "~~") {
#r = cov / (sd1 * sd2)
return(sesoi / (lhsSDRes * rhsSDRes))
} else if (op == "~1") {
#d = meanDiff/sd
return(sesoi / lhsSD)
} else if (op == "~") {
#beta = b * sdX / sdY = b * rhs / lhs
return((sesoi / lhsSD) * rhsSD)
} else {
return(NA)
}
}
stdSesoi <- mapply(FUN, op=mi[,"op"], lhsVar=lhsVar, rhsVar=rhsVar,
lhsVarRes=lhsVarRes, rhsVarRes=rhsVarRes, sesoi=sesoi)
return(stdSesoi)
}
# decisionMIPow()
# ------------------------------------------------------------------
# Core decision rule for power-based (SSV-style) EPC classification.
# Given the significance of the modification index, an indicator of
# high statistical power, the EPC value, and a smallest effect size
# of interest (SESOI), this function classifies the parameter into
# categories reflecting severity and substantive relevance
# (e.g., M, NM, I, EPC:M, EPC:NM).
decisionMIPow <- function(sigMI, highPow, epc, trivialEpc) {
if(is.na(sigMI) | is.na(highPow)) return(NA)
if(sigMI & highPow) {
if(abs(epc) > abs(trivialEpc)) {
return("EPC:M")
} else {
return("EPC:NM")
}
} else if (sigMI & !highPow) {
return("M")
} else if (!sigMI & highPow) {
return("NM")
} else if (!sigMI & !highPow) {
return("I")
} else {
return(NA)
}
}
# decisionCIEpc()
# ------------------------------------------------------------------
# Core decision rule for CI-based EPC classification.
# Given an EPC confidence interval and a smallest effect size of
# interest (SESOI), this function classifies the parameter as
# Substantial (M), Not Misspecified (NM), Inconclusive (I), or Underpowered (U).
decisionCIEpc <- function(targetval, lower, upper, positiveonly = FALSE) {
if(is.na(lower) | is.na(upper)) return(NA)
if(positiveonly) {
ciwidth <- upper - lower
trivialwidth <- targetval
if (ciwidth > trivialwidth) {
return("U")
} else if (lower > targetval) {
return("M")
} else if (upper < targetval) {
return("NM")
} else {
return("I")
}
} else {
negtargetval <- -targetval
ciwidth <- upper - lower
trivialwidth <- 2*targetval
if(ciwidth > trivialwidth) {
return("U")
} else if(lower > targetval | upper < negtargetval) {
return("M")
} else if (upper < targetval & negtargetval < lower) {
return("NM")
} else {
return("I")
}
}
}
# isPD()
# ------------------------------------------------------------------
# Internal numerical utility used by epcEquivCheck() and related helpers.
# Checks whether a symmetric matrix is positive definite by verifying
# that all eigenvalues exceed a small tolerance. This function is used
# to screen implied population covariance matrices for admissibility
# before refitting models under perturbed parameters.
isPD <- function(M, tol = 1e-8) {
if (is.list(M)) {
return(all(vapply(M, isPD, logical(1), tol = tol)))
}
ev <- eigen(M, symmetric = TRUE, only.values = TRUE)$values
all(ev > tol)
}
# cov2cor_safe()
# ------------------------------------------------------------------
# Internal numerical utility used by epcEquivCheck() and related helpers.
# Converts a covariance matrix to a correlation matrix while safely
# handling zero or non-positive variances. The function computes
# correlations only for valid sub-blocks, enforces unit diagonals,
# and suppresses numerical artifacts to maintain stability in
# downstream standardized-parameter computations.
cov2cor_safe <- function(S) {
S <- as.matrix(S)
p <- nrow(S)
if (p != ncol(S)) stop("Input must be a square matrix.")
v <- diag(S)
# initialize correlation matrix
R <- matrix(0, p, p)
dimnames(R) <- dimnames(S)
# indices with positive variance
pos <- which(v > 0)
# normal cov2cor where possible
if (length(pos) > 0) {
sd <- sqrt(v[pos])
R[pos, pos] <- S[pos, pos] / (sd %o% sd)
}
# enforce symmetry and unit diagonal
R[!is.finite(R)] <- 0
diag(R) <- 1
R
}
# cor2cov_safe()
# ------------------------------------------------------------------
# Internal numerical utility used by epcEquivCheck() and related helpers.
# Converts a correlation matrix to a covariance matrix given a vector
# of standard deviations, while safely handling zero, negative, or
# non-finite standard deviations. The function operates only on valid
# sub-blocks, enforces symmetry, and removes numerical artifacts to
# ensure stability in subsequent covariance-based computations.
cor2cov_safe <- function(R, sd) {
R <- as.matrix(R)
p <- nrow(R)
if (p != ncol(R)) stop("R must be square.")
if (length(sd) != p) stop("Length of sd must match dimension of R.")
# initialize covariance matrix
S <- matrix(0, p, p)
dimnames(S) <- dimnames(R)
# indices with positive sd
pos <- which(sd > 0 & is.finite(sd))
if (length(pos) > 0) {
D <- diag(sd[pos], length(pos))
S[pos, pos] <- D %*% R[pos, pos, drop = FALSE] %*% D
}
# enforce symmetry and clean numerical noise
S[!is.finite(S)] <- 0
S
}
# extract_M_table()
# ------------------------------------------------------------------
# Internal helper used by epcEquivCheck().
# Extracts and formats combinations of perturbed and resulting fixed
# parameters that yield Substantial (M) EPC decisions. The function maps
# row–column indices from a decision matrix back to the corresponding
# parameter labels in the \code{epcEquivFit} output and returns a tidy summary
# table for reporting purposes only.
extract_M_table <- function(result_mat, miout, sepc_mat, direction) {
idx <- which(result_mat == "M", arr.ind = TRUE)
if (nrow(idx) == 0) return(NULL)
data.frame(
perturbed_lhs = miout$lhs[idx[, "row"]],
perturbed_op = miout$op[idx[, "row"]],
perturbed_rhs = miout$rhs[idx[, "row"]],
resulting_lhs = miout$lhs[idx[, "col"]],
resulting_op = miout$op[idx[, "col"]],
resulting_rhs = miout$rhs[idx[, "col"]],
resulting_sepc = sepc_mat[cbind(idx[, "row"], idx[, "col"])],
direction = direction,
stringsAsFactors = FALSE
)
}
# print.epcEquivCheckStd()
# ------------------------------------------------------------------
# Internal print method for epcEquivCheckStd objects.
# Formats and displays feasibility and recommendation results from
# standardized-parameter EPC equivalence checks.
#' @export
print.epcEquivCheckStd <- function(x, ...) {
cat("EPC Equivalence Feasibility (Standardized Parameters)\n")
cat("----------------------------------------------------\n")
cat("Feasible standardized population:", x$feasible, "\n")
cat("Any EPC exceeding SESOI:", x$any_M, "\n")
cat("Recommendation:", x$recommendation, "\n\n")
if (x$recommendation == "NOT RECOMMENDED") {
cat("Non-equivalent EPCs detected (summary):\n")
print(x$M_table)
} else if (x$recommendation == "RECOMMENDED") {
cat("No EPC exceeded the SESOI under tested misspecifications.\n")
} else {
cat("Standardized parameters do not define a valid population model.\n")
}
invisible(x)
}
# summary.epcequivfit.data.frame()
# ------------------------------------------------------------------
# Internal summary constructor used by epcEquivFit().
# Aggregates and ranks fixed-parameter EPC results from a data-frame
# representation into a structured summaryEpcEquivFit object for
# downstream printing and global decision reporting. This function
# performs data processing only and does not conduct EPC estimation
# or statistical inference.
#' @method summary epcequivfit.data.frame
#' @rdname epcEquivFit
#' @param object An object returned by \code{\link{epcEquivFit}}.
#' @param top Number of top-ranked EPCs to display.
#' @param ssv Logical; whether to include power-based diagnostics.
#' @export
summary.epcequivfit.data.frame <- function(object, ..., top = 5, ssv = FALSE) {
miout <- object
stopifnot(is.data.frame(miout))
num_cols <- c(
"std.epc", "std.sesoi",
"lower.std.epc", "upper.std.epc",
"se.epc", "pow"
)
for (nm in num_cols) {
if (!is.null(miout[[nm]])) {
miout[[nm]] <- as.numeric(as.character(miout[[nm]]))
}
}
miout$std.sesoi[miout$std.sesoi == 0] <- NA
miout$se.epc[miout$se.epc == 0] <- NA
# ---- Derived quantities ----
miout$severity <- abs(miout$std.epc / miout$std.sesoi)
miout$ci_gap <- with(miout, pmin(
abs(lower.std.epc - std.sesoi),
abs(upper.std.epc - std.sesoi),
abs(lower.std.epc + std.sesoi),
abs(upper.std.epc + std.sesoi),
na.rm = TRUE
))
# ---- EPC equivalence testing (CI-based; primary) ----
epc_equiv <- list(
n_M = sum(miout$decision.ci == "M", na.rm = TRUE),
n_I = sum(miout$decision.ci == "I", na.rm = TRUE),
n_U = sum(miout$decision.ci == "U", na.rm = TRUE),
n_NM = sum(miout$decision.ci == "NM", na.rm = TRUE),
any_M = any(miout$decision.ci == "M", na.rm = TRUE)
)
# Substantially Misspecified EPCs
top_M <- miout[miout$decision.ci == "M", ]
top_M <- top_M[is.finite(top_M$severity), ]
top_M <- top_M[order(top_M$severity, decreasing = TRUE), ]
if (nrow(top_M) > top) top_M <- top_M[1:top, ]
# CI-inconclusive EPCs (exclude NM)
top_I_ci <- miout[miout$decision.ci == "I", ]
top_I_ci <- top_I_ci[is.finite(top_I_ci$ci_gap), ]
top_I_ci <- top_I_ci[order(top_I_ci$ci_gap, decreasing = TRUE), ]
if (nrow(top_I_ci) > top) top_I_ci <- top_I_ci[1:top, ]
# ---- SSV / power-based diagnostics (secondary) ----
ssv_out <- list(
n_M = sum(miout$decision.pow %in% c("M", "EPC:M"), na.rm = TRUE),
n_I = sum(miout$decision.pow == "I", na.rm = TRUE),
n_NM = sum(miout$decision.pow %in% c("NM", "EPC:NM"), na.rm = TRUE),
any_M = any(miout$decision.pow %in% c("M", "EPC:M"), na.rm = TRUE)
)
# Substantially Misspecified SSV
top_M_pow <- miout[miout$decision.pow == "M", ]
top_M_pow <- top_M_pow[is.finite(top_M_pow$severity), ]
top_M_pow <- top_M_pow[order(top_M_pow$severity, decreasing = TRUE), ]
if (nrow(top_M_pow) > top) top_M_pow <- top_M_pow[1:top, ]
# Underpowered EPCs (exclude NM)
top_I_pow <- miout[miout$decision.pow == "I", ]
top_I_pow <- top_I_pow[is.finite(top_I_pow$pow), ]
top_I_pow <- top_I_pow[order(top_I_pow$pow), ] # lowest power first
if (nrow(top_I_pow) > top) top_I_pow <- top_I_pow[1:top, ]
out <- list(
epc_equivalence = epc_equiv,
ssv = ssv_out,
top_non_equiv = top_M,
top_inconclusive_ci = top_I_ci,
top_non_pow = top_M_pow,
top_inconclusive_pow = top_I_pow,
show_ssv = ssv
)
class(out) <- "summaryEpcEquivFit"
out
}
# global_localfit_decision()
# ------------------------------------------------------------------
# Internal decision utility used by print.summaryEpcEquivFit().
# Aggregates local fixed-parameter classifications (e.g., Substantial,
# Inconclusive, Underpowered, Not Misspecified) into a single global
# decision using a conservative priority rule.
global_localfit_decision <- function(n_M, n_I, n_U, n_NM) {
if (n_U > 0) {
return("UNDERPOWERED")
}
if (n_M > 0) {
return("SUBSTANTIAL MISSPECIFICATION")
}
if (n_I > 0) {
return("INCONCLUSIVE")
}
if (n_NM > 0) {
return("EQUIVALENT")
}
return("NO PARAMETERS EVALUATED")
}
# print.summaryEpcEquivFit()
# ------------------------------------------------------------------
# Internal print method for summaryEpcEquivFit objects.
# Formats and displays results from global EPC equivalence testing
# and the Saris, Satorra, and van der Veld (2009) power-based method.
# This function summarizes aggregate fixed-parameter diagnostics
# for display purposes only and is not part of the analytical
# implementation of epcEquivFit().
#' @export
print.summaryEpcEquivFit <- function(x, ...) {
cat("Global EPC Evaluation Summary\n")
cat("--------------------------------\n\n")
## ---- EPC equivalence testing (primary) ----
cat("[1. EPC Equivalence Testing: CI-based]\n")
cat("Substantially Misspecified (M):", x$epc_equivalence$n_M, "\n")
cat("Inconclusive (I):", x$epc_equivalence$n_I, "\n")
cat("CI-Underpowered (U):", x$epc_equivalence$n_U, "\n")
cat("Trivial / Not Misspecified (NM):", x$epc_equivalence$n_NM, "\n\n")
global_epc <- global_localfit_decision(
n_M = x$epc_equivalence$n_M,
n_I = x$epc_equivalence$n_I,
n_U = x$epc_equivalence$n_U,
n_NM = x$epc_equivalence$n_NM
)
cat("Global EPC Equivalence Decision:", global_epc, "\n\n")
if (x$epc_equivalence$n_M > 0) {
cat("1.1 Top Substantially Misspecified EPCs (ranked by |std.epc / SESOI|):\n")
print(
x$top_non_equiv[
, c("lhs","op","rhs","std.epc","std.sesoi","severity"),
drop = FALSE
]
)
cat("\n")
}
if (x$epc_equivalence$n_I > 0) {
cat("1.2 Top CI-inconclusive EPCs\n")
cat("(ranked by distance to equivalence bounds; larger = needs narrower CI):\n")
print(
x$top_inconclusive_ci[
, c("lhs","op","rhs","lower.std.epc","upper.std.epc","ci_gap"),
drop = FALSE
]
)
cat("\n")
}
if(isTRUE(x$show_ssv)) {
## ---- SSV / power-based diagnostics (secondary) ----
cat("[2. Saris, Satorra, Van der Veld (2009) / Power-based Diagnostics]\n")
cat("Substantially Misspecified (M, EPC:M):", x$ssv$n_M, "\n")
cat("Inconclusive (I):", x$ssv$n_I, "\n")
cat("Trivial / Not Misspecified (NM, EPC:NM):", x$ssv$n_NM, "\n\n")
global_ssv <- global_localfit_decision(
n_M = x$ssv$n_M,
n_I = x$ssv$n_I,
n_U = 0,
n_NM = x$ssv$n_NM
)
cat("Global SSV / Power-based Decision:", global_ssv, "\n\n")
if (x$ssv$n_M > 0) {
cat("2.1 Top Substantially Misspecified EPCs (ranked by |std.epc / SESOI|):\n")
print(
x$top_non_pow[
, c("lhs","op","rhs","std.epc","std.sesoi","severity"),
drop = FALSE
]
)
cat("\n")
}
if (x$ssv$n_I > 0) {
cat("2.2 Top inconclusive EPCs\n")
cat("(ranked by lowest approximate power):\n")
print(
x$top_inconclusive_pow[
, c("lhs","op","rhs","pow","se.epc"),
drop = FALSE
]
)
}
}
invisible(x)
}
# augmentBetaWithGamma()
# ------------------------------------------------------------------
# Copied from simsem. Internal utility used by epcEquivCheck().
# Augments the factor regression coefficient matrix (beta) with
# regression paths from exogenous observed variables (gamma) so that
# endogenous and exogenous predictors can be handled within a single
# block-recursive system.
# NOTE: This function is prepared for future support of exogenous covariates
augmentBetaWithGamma <- function(beta, gamma) {
nf <- nrow(beta)
nz <- ncol(gamma)
result <- matrix(0, nf + nz, nf + nz)
result[(nz + 1):(nz + nf), (nz + 1):(nz + nf)] <- beta
result[(nz + 1):(nz + nf), 1:nz] <- gamma
result
}
# augmentPsiWithExogenousCov()
# ------------------------------------------------------------------
# Copied from simsem. Internal utility used by epcEquivCheck().
# Augments the factor residual covariance matrix (psi) with the
# covariance matrix of exogenous observed variables (sigmaxx),
# producing a joint covariance structure compatible with the
# augmented regression system.
# NOTE: This function is prepared for future support of exogenous covariates
augmentPsiWithExogenousCov <- function(psi, sigmaxx) {
nf <- nrow(psi)
nz <- ncol(sigmaxx)
result <- matrix(0, nf + nz, nf + nz)
result[(nz + 1):(nz + nf), (nz + 1):(nz + nf)] <- psi
result[1:nz, 1:nz] <- sigmaxx
result
}
# findRecursiveSet()
# ------------------------------------------------------------------
# Copied from simsem. Internal helper for epcEquivCheck().
# Identifies a recursive (block-triangular) ordering of variables
# implied by the regression coefficient matrix (beta). This ordering
# is required to compute analytical residual variances in recursive
# SEMs.
findRecursiveSet <- function(beta) {
if (any(diag(beta) != 0))
stop("Diagonal elements of beta must be zero.")
result <- list()
ni <- nrow(beta)
fix.variable <- rep(FALSE, ni)
ni.sofar <- 0
i <- 1
while (ni.sofar < ni) {
temp <- findRowZero(beta, fix.variable)
if (is.null(temp))
stop("The matrix is not recursive.")
fix.variable[temp] <- TRUE
result[[i]] <- temp
i <- i + 1
ni.sofar <- ni.sofar + length(temp)
}
return(result)
}
# findRowZero()
# ------------------------------------------------------------------
# Internal helper for recursive SEM utilities (copied/adapted from simsem).
# Identifies rows of a square coefficient matrix whose unfixed entries
# are all zero. This is used when constructing a recursive (block-triangular)
# ordering of variables implied by a regression matrix (e.g., beta).
#
# Rows already marked as fixed are ignored, and the function returns the
# indices of rows that contain only zeros among the remaining unfixed columns.
# If no such rows exist, NULL is returned.
#
# This function is used by findRecursiveSet() to iteratively determine
# which variables can be treated as exogenous at each recursion step.
findRowZero <- function(square.matrix, is.row.fixed = FALSE) {
ni <- nrow(square.matrix)
if (length(is.row.fixed) == 1) {
if (is.row.fixed == FALSE)
is.row.fixed <- rep(FALSE, ni)
}
result <- NULL
desired.zero <- sum(!is.row.fixed)
for (i in 1:ni) {
if (is.row.fixed[i] == FALSE) {
temp <- sum(square.matrix[i, !is.row.fixed] == 0, na.rm = TRUE)
if (temp == desired.zero)
result <- c(result, i)
}
}
return(result)
}
# findIndResidualVar()
# ------------------------------------------------------------------
# Copied from simsem. Internal utility used by epcEquivCheck().
# Computes indicator residual variances given factor loadings, total
# factor covariance, and total indicator variances. The resulting
# implied residual variances are used when constructing EPC values
# under trivial misspecification based on a fitted lavaan object.
findIndResidualVar <- function(lambda, totalFactorCov, totalVarTheta = NULL,
kappa = NULL, covcov = NULL) {
ni <- nrow(lambda)
if (is.null(totalVarTheta)) totalVarTheta <- rep(1, ni)
factor.part <- lambda %*% totalFactorCov %*% t(lambda)
error.var <- totalVarTheta - pmax(diag(factor.part), 0)
if (!is.null(kappa)) error.var <- error.var - diag(kappa %*% covcov %*% t(kappa))
error.var[(error.var < 0) & (totalVarTheta == 0)] <- 0
return(as.vector(error.var))
}
# findFactorTotalCov()
# ------------------------------------------------------------------
# Copied from simsem. Internal utility used by epcEquivCheck().
# Computes the total covariance matrix of latent factors implied by
# a recursive regression structure and factor residual covariances.
# The resulting implied factor covariance matrix is used as an
# intermediate quantity for evaluating EPCs under trivial
# misspecification based on a fitted lavaan object.
findFactorTotalCov <- function(beta, psi = NULL, corPsi = NULL,
totalVarPsi = NULL, errorVarPsi = NULL,
gamma = NULL, covcov = NULL) {
if (is.null(psi)) {
if (is.null(errorVarPsi))
errorVarPsi <- findFactorResidualVar(beta, corPsi, totalVarPsi)
psi <- suppressWarnings(cor2cov_safe(as.matrix(corPsi), sqrt(errorVarPsi)))
}
iden <- diag(nrow(beta))
inv <- solve(iden - beta)
facTotalCov <- inv %*% psi %*% t(inv)
if (!is.null(gamma)) {
facTotalCov <- facTotalCov + (inv %*% gamma %*% covcov %*% t(gamma) %*% t(inv))
}
return(facTotalCov)
}
# findFactorResidualVar()
# ------------------------------------------------------------------
# Copied from simsem. Core analytical routine used by epcEquivCheck().
# Computes factor residual variances implied by a recursive SEM given
# total variances, factor correlations, and regression coefficients.
# These analytically derived residual variances provide the basis
# for evaluating EPCs under trivial misspecification based on a
# fitted lavaan object.
findFactorResidualVar <- function(beta, corPsi, totalVarPsi = NULL,
gamma = NULL, covcov = NULL) {
stopifnot(nrow(beta) == ncol(beta))
stopifnot(nrow(corPsi) == ncol(corPsi))
stopifnot(nrow(beta) == nrow(corPsi))
if (!is.null(gamma)) {
if (is.null(totalVarPsi)) totalVarPsi <- rep(1, nrow(beta))
beta <- augmentBetaWithGamma(beta, gamma)
corPsi <- augmentPsiWithExogenousCov(corPsi, cov2cor(covcov))
totalVarPsi <- c(diag(covcov), totalVarPsi)
}
if (sum(diag(corPsi)) == 0) diag(corPsi) <- 1
ni <- nrow(beta)
set <- findRecursiveSet(beta)
errorVar <- rep(1, ni)
if (is.null(totalVarPsi)) totalVarPsi <- rep(1, ni)
errorVar[set[[1]]] <- totalVarPsi[set[[1]]]
iv <- NULL
ivCor <- corPsi[set[[1]], set[[1]]]
startVar <- totalVarPsi[set[[1]]]
ivCov <- suppressWarnings(cor2cov_safe(as.matrix(ivCor), sqrt(startVar)))
for (i in seq_len(length(set) - 1)) {
iv <- c(iv, set[[i]])
dv <- set[[i + 1]]
tempBeta <- matrix(beta[dv, iv], nrow = length(dv), ncol = length(iv))
var.reg <- (tempBeta %*% ivCov %*% t(tempBeta))
tempPsi <- corPsi[dv, dv]
tempPsiSd <- rep(0, length(dv))
for (j in 1:length(dv)) {
errorVar[dv[j]] <- totalVarPsi[dv[j]] - var.reg[j, j]
if (is.na(errorVar[dv[j]]) || errorVar[dv[j]] < 0) {
tempPsiSd[j] <- NA_real_
} else {
tempPsiSd[j] <- sqrt(errorVar[dv[j]])
}
}
if (i < (length(set) - 1)) {
tempPsi <- suppressWarnings(cor2cov_safe(as.matrix(tempPsi), as.matrix(tempPsiSd))) #### FLAG CHANGES
real.tempPsi <- matrix(0, length(iv) + length(dv), length(iv) + length(dv))
real.tempPsi[1:length(iv), 1:length(iv)] <- ivCov
real.tempPsi[(length(iv) + 1):(length(iv) + length(dv)), (length(iv) +
1):(length(iv) + length(dv))] <- tempPsi
agg <- c(iv, dv)
blank.path <- matrix(0, nrow = length(iv), ncol = length(agg))
temp.path2 <- beta[dv, agg]
temp.path2 <- rbind(blank.path, temp.path2)
ID <- matrix(0, length(agg), length(agg))
diag(ID) <- 1
ivCov <- solve(ID - temp.path2) %*% real.tempPsi %*% t(solve(ID - temp.path2))
}
}
if (!is.null(gamma)) {
errorVar <- errorVar[(nrow(covcov) + 1):length(errorVar)]
}
return(as.vector(errorVar))
}
# lavCheckAdmissibleFit()
# ------------------------------------------------------------------
# Internal admissibility and numerical-sanity check for fitted lavaan
# objects. This function verifies that a fitted model is suitable for
# downstream analytical use (e.g., EPC evaluation, population-based
# refitting) by screening for common estimation pathologies.
#
# The checks include:
# - model convergence
# - positive definiteness of observed, implied, residual, and latent
# covariance matrices
# - strictly positive residual and latent variances
# - positive definiteness of the parameter vcov matrix (if available)
# - positive definiteness of the Hessian matrix (if available)
#
# The function is intentionally conservative: failure of any check
# returns FALSE. It is not intended as a model-fit diagnostic, but as a
# numerical admissibility gate for internal algorithms.
lavCheckAdmissibleFit <- function(fit, tol = 1e-8) {
cov.ov <- lavaan::lavInspect(fit, "cov.ov") # sample cov / polychoric
sigma <- lavaan::lavInspect(fit, "sigma.hat") # model-implied cov
theta <- lavaan::lavInspect(fit, "theta") # residual covariances
psi <- lavaan::lavInspect(fit, "cov.lv") # latent covariances
vc <- lavaan::lavInspect(fit, "vcov") # Variance-Covariance Matrix
# hessian may not exist (e.g., se = "none")
hess <- try(lavaan::lavInspect(fit, "hessian"), silent = TRUE)
hess_pd <- if (inherits(hess, "try-error") || is.null(hess)) NA else isPD(hess, tol)
vccheck <- TRUE
if(!is.null(vc)) {
if(all(round(vc,4) == 0)) {
vccheck <- TRUE
} else {
vccheck <- isPD(vc, tol)
}
}
all(
converged = lavaan::lavInspect(fit, "converged"),
cov.ov.pd = isPD(cov.ov, tol),
sigma.pd = isPD(sigma, tol),
theta.pd = isPD(theta, tol),
psi.pd = isPD(psi, tol),
vc = vccheck,
theta.var.pos = diag_pos(theta, tol), # no negative residual variances
psi.var.pos = diag_pos(psi, tol), # no negative latent variances
hessian.pd = hess_pd
)
}
# diag_pos()
# ------------------------------------------------------------------
# Internal numerical utility for model-admissibility checks.
# Tests whether all diagonal elements of a matrix are strictly positive
# up to a numerical tolerance. This is used to screen for negative or
# zero variances in residual (theta) and latent (psi) covariance
# matrices returned by lavaan.
#
# If a list of matrices is supplied (e.g., multigroup models), the
# check is applied recursively to each element.
diag_pos <- function(M, tol = 1e-8) {
if (is.list(M)) {
return(all(vapply(M, diag_pos, logical(1), tol = tol)))
}
all(diag(M) > tol)
}
semTools/R/plausibleValues.R 0000644 0001762 0000144 00000065225 15142336665 015570 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 9 February 2026
### function to draw plausible values of factor scores from lavPredict
# library(blavaan)
# bfit <- bcfa(HS.model, data=HolzingerSwineford1939, save.lvs = TRUE,
# bcontrol=list(method="rjparallel"), group = "school",
# #target = "stan", control=list(cores = 4, seed = 123),
# burnin = 4000, sample = 30, n.chains = 2)
# bFS <- do.call(rbind, blavInspect(bfit, "lvs"))
# do.call()
## -------------
## Main function
## -------------
##' Plausible-Values Imputation of Factor Scores Estimated from a lavaan Model
##'
##' Draw plausible values of factor scores estimated from a fitted
##' [lavaan::lavaan()] model, then treat them as multiple imputations
##' of missing data using [lavaan.mi::lavaan.mi()].
##'
##'
##' Because latent variables are unobserved, they can be considered as missing
##' data, which can be imputed using Monte Carlo methods. This may be of
##' interest to researchers with sample sizes too small to fit their complex
##' structural models. Fitting a factor model as a first step,
##' [lavaan::lavPredict()] provides factor-score estimates, which can
##' be treated as observed values in a path analysis (Step 2). However, the
##' resulting standard errors and test statistics could not be trusted because
##' the Step-2 analysis would not take into account the uncertainty about the
##' estimated factor scores. Using the asymptotic sampling covariance matrix
##' of the factor scores provided by [lavaan::lavPredict()],
##' `plausibleValues` draws a set of `nDraws` imputations from the
##' sampling distribution of each factor score, returning a list of data sets
##' that can be treated like multiple imputations of incomplete data. If the
##' data were already imputed to handle missing data, `plausibleValues`
##' also accepts an object of class [lavaan.mi::lavaan.mi-class], and will
##' draw `nDraws` plausible values from each imputation. Step 2 would
##' then take into account uncertainty about both missing values and factor
##' scores. Bayesian methods can also be used to generate factor scores, as
##' available with the \pkg{blavaan} package, in which case plausible
##' values are simply saved parameters from the posterior distribution. See
##' Asparouhov and Muthen (2010) for further technical details and references.
##'
##' Each returned `data.frame` includes a `case.idx` column that
##' indicates the corresponding rows in the data set to which the model was
##' originally fitted (unless the user requests only Level-2 variables). This
##' can be used to merge the plausible values with the original observed data,
##' but users should note that including any new variables in a Step-2 model
##' might not accurately account for their relationship(s) with factor scores
##' because they were not accounted for in the Step-1 model from which factor
##' scores were estimated.
##'
##' If `object` is a multilevel `lavaan` model, users can request
##' plausible values for latent variables at particular levels of analysis by
##' setting the [lavaan::lavPredict()] argument `level=1` or
##' `level=2`. If the `level` argument is not passed via \dots,
##' then both levels are returned in a single merged data set per draw. For
##' multilevel models, each returned `data.frame` also includes a column
##' indicating to which cluster each row belongs (unless the user requests only
##' Level-2 variables).
##'
##'
##' @importFrom lavaan lavInspect lavPredict
##'
##' @param object A fitted model of class [lavaan::lavaan-class],
##' [blavaan::blavaan-class], or [lavaan.mi::lavaan.mi-class]
##' @param nDraws `integer` specifying the number of draws, analogous to
##' the number of imputed data sets. If `object` is of class
##' [lavaan.mi::lavaan.mi-class], this will be the number of draws taken
##' *per imputation*. If `object` is of class
##' [blavaan::blavaan-class], `nDraws` cannot exceed
##' `blavInspect(object, "niter") * blavInspect(bfitc, "n.chains")`
##' (number of MCMC samples from the posterior). The drawn samples will be
##' evenly spaced (after permutation for `target="stan"`), using
##' [ceiling()] to resolve decimals.
##' @param seed `integer` passed to [set.seed()].
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations when `object` is of class [lavaan.mi::lavaan.mi-class].
##' Can include any of `c("no.conv", "no.se", "no.npd")`.
##' @param ... Optional arguments to pass to [lavaan::lavPredict()].
##' `assemble` will be ignored because multiple groups are always
##' assembled into a single `data.frame` per draw. `type` will be
##' ignored because it is set internally to `type="lv"`.
##'
##' @return A `list` of length `nDraws`, each of which is a
##' `data.frame` containing plausible values, which can be treated as
##' a `list` of imputed data sets to be passed to the `lavaan.mi` package
##' (see **Examples**). If `object` is of class
##' [lavaan.mi::lavaan.mi-class], the `list` will be of length
##' `nDraws*m`, where `m` is the number of imputations.
##'
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Asparouhov, T. & Muthen, B. O. (2010). *Plausible values for latent
##' variables using M*plus. Technical Report. Retrieved from
##' www.statmodel.com/download/Plausible.pdf
##'
##' @seealso [lavaan.mi::lavaan.mi()], [lavaan.mi::lavaan.mi-class]
##'
##' @examples
##'
##' ## example from ?cfa and ?lavPredict help pages
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' fit1 <- cfa(HS.model, data = HolzingerSwineford1939)
##' fs1 <- plausibleValues(fit1, nDraws = 3,
##' ## lavPredict() can add only the modeled data
##' append.data = TRUE)
##' lapply(fs1, head)
##'
##' \donttest{
##' ## To merge factor scores to original data.frame (not just modeled data)
##' fs1 <- plausibleValues(fit1, nDraws = 3)
##' idx <- lavInspect(fit1, "case.idx") # row index for each case
##' if (is.list(idx)) idx <- do.call(c, idx) # for multigroup models
##' data(HolzingerSwineford1939) # copy data to workspace
##' HolzingerSwineford1939$case.idx <- idx # add row index as variable
##' ## loop over draws to merge original data with factor scores
##' for (i in seq_along(fs1)) {
##' fs1[[i]] <- merge(fs1[[i]], HolzingerSwineford1939, by = "case.idx")
##' }
##' lapply(fs1, head)
##'
##'
##' ## multiple-group analysis, in 2 steps
##' step1 <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
##' group.equal = c("loadings","intercepts"))
##' PV.list <- plausibleValues(step1)
##'
##' ## subsequent path analysis
##' path.model <- ' visual ~ c(t1, t2)*textual + c(s1, s2)*speed '
##' if(requireNamespace("lavaan.mi")){
##' library(lavaan.mi)
##' step2 <- sem.mi(path.model, data = PV.list, group = "school")
##' ## test equivalence of both slopes across groups
##' lavTestWald.mi(step2, constraints = 't1 == t2 ; s1 == s2')
##' }
##'
##'
##' ## multilevel example from ?Demo.twolevel help page
##' model <- '
##' level: 1
##' fw =~ y1 + y2 + y3
##' fw ~ x1 + x2 + x3
##' level: 2
##' fb =~ y1 + y2 + y3
##' fb ~ w1 + w2
##' '
##' msem <- sem(model, data = Demo.twolevel, cluster = "cluster")
##' mlPVs <- plausibleValues(msem, nDraws = 3) # both levels by default
##' lapply(mlPVs, head, n = 10)
##' ## only Level 1
##' mlPV1 <- plausibleValues(msem, nDraws = 3, level = 1)
##' lapply(mlPV1, head)
##' ## only Level 2
##' mlPV2 <- plausibleValues(msem, nDraws = 3, level = 2)
##' lapply(mlPV2, head)
##'
##'
##'
##' ## example with 20 multiple imputations of missing data:
##' nPVs <- 5
##' nImps <- 20
##'
##' if (requireNamespace("lavaan.mi")) {
##' data(HS20imps, package = "lavaan.mi")
##'
##' ## specify CFA model from lavaan's ?cfa help page
##' HS.model <- '
##' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' '
##' out2 <- cfa.mi(HS.model, data = HS20imps)
##' PVs <- plausibleValues(out2, nDraws = nPVs)
##'
##' idx <- out2@@Data@@case.idx # can't use lavInspect() on lavaan.mi
##' ## empty list to hold expanded imputations
##' impPVs <- list()
##' for (m in 1:nImps) {
##' HS20imps[[m]]["case.idx"] <- idx
##' for (i in 1:nPVs) {
##' impPVs[[ nPVs*(m - 1) + i ]] <- merge(HS20imps[[m]],
##' PVs[[ nPVs*(m - 1) + i ]],
##' by = "case.idx")
##' }
##' }
##' lapply(impPVs, head)
##' }
##'
##' }
##'
##' @export
plausibleValues <- function(object, nDraws = 20L, seed = 12345,
omit.imps = c("no.conv","no.se"), ...) {
if (inherits(object, "lavaan")) {
## generate vector of seeds
set.seed(seed)
seeds <- sample(100000:9999999, size = nDraws, replace = FALSE)
PV <- lapply(seeds, plaus.lavaan, object = object, ...)
} else if (inherits(object, "lavaan.mi")) {
## generate vector of seeds
set.seed(seed)
seeds <- sample(100000:9999999, size = nDraws, replace = FALSE)
PV <- plaus.mi(object, seeds = seeds, omit.imps = omit.imps, ...)
} else if (inherits(object, "blavaan")) {
PV <- plaus.blavaan(object, nDraws = nDraws, seed = seed, ...)
#TODO: pass nDraws to sample() iterations?
} else stop("object's class not valid: ", class(object))
PV
}
## ----------------
## Hidden functions
## ----------------
## draw 1 set of plausible values from a lavaan object
##' @importFrom lavaan lavInspect lavPredict lavNames
plaus.lavaan <- function(seed = 1, object, ...) {
stopifnot(inherits(object, "lavaan"))
if (lavInspect(object, "categorical")) {
stop("Plausible values not available (yet) for categorical data")
}
if (lavInspect(object, "options")$missing %in% c("ml", "ml.x")) {
#TODO: verify this:
stop("Plausible values not available (yet) for missing data + fiml.\n",
" Multiple imputations can be used via lavaan.mi()")
}
#FIXME? https://github.com/yrosseel/lavaan/issues/156
set.seed(seed)
cluster <- lavInspect(object, "cluster")
group <- lavInspect(object, "group")
group.label <- lavInspect(object, "group.label")
nG <- lavInspect(object, "ngroups")
nL <- lavInspect(object, "nlevels")
l.names <- o.names <- list()
if (nG == 1L && nL == 1L) {
## single block
l.names <- list(lavNames(object, "lv"))
o.names <- list(lavNames(object, "ov"))
} else if (nG == 1L && nL > 1L) {
## multilevel
for (BB in 1:nL) {
l.names <- c(l.names, list(lavNames(object, "lv", block = BB)))
o.names <- c(o.names, list(lavNames(object, "ov", block = BB)))
}
} else if (nG > 1L && nL == 1L) {
## multigroup
for (BB in 1:nG) {
l.names <- c(l.names, list(lavNames(object, "lv", block = BB)))
o.names <- c(o.names, list(lavNames(object, "ov", block = BB)))
}
} else {
## multilevel + multigroup
for (BB in 1:(nG*nL)) { #FIXME: lavInspect(object, "nblocks")
l.names <- c(l.names, list(lavNames(object, "lv", block = BB)))
o.names <- c(o.names, list(lavNames(object, "ov", block = BB)))
}
}
## extract factor scores + covariance matrix
fsArgs <- list(...)
fsArgs$type <- "lv"
fsArgs$assemble <- FALSE # assemble after drawing
append.data <- fsArgs$append.data
if (is.null(append.data)) append.data <- FALSE # default in lavPredict()
only.L2 <- fsArgs$level == 2L
if (length(only.L2) == 0L) only.L2 <- FALSE
if (only.L2) fsArgs$append.data <- append.data <- FALSE #FIXME: how will Yves handle lavPredict(fit, append=T, level=2)?
bothLevels <- nL > 1L && is.null(fsArgs$level)
fsArgs$object <- object
fsArgs$acov <- "standard" #FIXME: update if other options become available
FS <- do.call(lavPredict, fsArgs) #FIXME: breaks when multigroup MLSEM: https://github.com/yrosseel/lavaan/issues/157
## also draw Level 2, if multilevel and no specific level requested
if (bothLevels) {
fsArgs$level <- 2L
fsArgs$append.data <- FALSE #FIXME: how will Yves handle lavPredict(fit, append=T, level=2)?
FS2 <- do.call(lavPredict, fsArgs)
}
## draw plausible values, if factor scores exist
if (nG == 1L) {
if (ncol(FS) == 0L) {
PV <- FS
} else {
ACOV <- attr(FS, "acov")[[1]]
v.idx <- if (only.L2) 2L else 1L
PV <- apply(FS[ , l.names[[v.idx]], drop = FALSE], 1, function(mu) {
mnormt::rmnorm(n = 1, mean = mu, varcov = ACOV)
})
if (is.null(dim(PV))) {
PV <- as.matrix(PV)
colnames(PV) <- l.names[[v.idx]]
} else PV <- t(PV)
if (append.data) {
PV <- cbind(FS[ , o.names[[v.idx]], drop = FALSE], PV)
}
}
## add Level 2 if multilevel and no specific level requested
if (bothLevels) {
if (ncol(FS2) == 0L) {
PV2 <- FS2
} else {
ACOV2 <- attr(FS2, "acov")[[1]]
#FIXME: how will Yves handle lavPredict(fit, append=T, level=2)?
PV2 <- apply(FS2, 1, function(mu) {
out <- mnormt::rmnorm(n = 1, mean = mu, varcov = ACOV2)
})
if (is.null(dim(PV2))) {
PV2 <- as.matrix(PV2)
colnames(PV2) <- l.names[[2]]
} else PV2 <- t(PV2)
}
}
} else {
ACOV <- list()
PV <- list()
for (gg in 1:nG) {
if (ncol(FS[[gg]]) == 0L) {
PV[[gg]] <- FS[[gg]]
} else {
ACOV[[gg]] <- attr(FS, "acov")[[gg]]
v.idx <- if (only.L2) (2L + (gg - 1L)*nL) else (1L + (gg - 1L)*nL)
PV[[gg]] <- apply(FS[[gg]][ , l.names[[v.idx]], drop = FALSE], 1, function(mu) {
mnormt::rmnorm(n = 1, mean = mu, varcov = ACOV[[gg]])
})
if (is.null(dim(PV[[gg]]))) {
PV[[gg]] <- as.matrix(PV[[gg]])
colnames(PV[[gg]]) <- l.names[[v.idx]]
} else PV[[gg]] <- t(PV[[gg]])
}
if (append.data) {
PV[[gg]] <- cbind(FS[[gg]][ , o.names[[v.idx]], drop = FALSE], PV[[gg]])
}
}
## add Level 2 if multilevel and no specific level requested
if (bothLevels) {
ACOV2 <- list()
PV2 <- list()
for (gg in 1:nG) {
if (ncol(FS2[[gg]]) == 0L) {
PV2[[gg]] <- FS2[[gg]]
} else {
ACOV2[[gg]] <- attr(FS2, "acov")[[gg]]
#FIXME: how will Yves handle lavPredict(fit, append=T, level=2)?
PV2[[gg]] <- apply(FS2[[gg]], 1, function(mu) {
mnormt::rmnorm(n = 1, mean = mu, varcov = ACOV2[[gg]])
})
if (is.null(dim(PV2[[gg]]))) {
PV2[[gg]] <- as.matrix(PV2[[gg]])
colnames(PV2[[gg]]) <- colnames(FS2[[gg]])
} else PV2[[gg]] <- t(PV2[[gg]])
}
}
}
}
## save as data.frame
if (nG > 1L) {
temp <- lapply(1:nG, function(gg) {
dd <- data.frame(PV[[gg]])
## add groups if multiple
dd[ , group] <- group.label[gg]
dd <- dd[ , c(group, setdiff(names(dd), group)), drop = FALSE ]
## attach row indices from original data for optional merging
if (only.L2) {
dd[ , cluster] <- lavInspect(object, "cluster.id")[[gg]]
dd <- dd[ , c(cluster, setdiff(names(dd), cluster)), drop = FALSE ]
} else {
dd <- cbind(case.idx = lavInspect(object, "case.idx")[[gg]], dd)
}
## attach cluster IDs, if multilevel and no level requested
if (bothLevels) {
dd[ , cluster] <- lavInspect(object, "cluster.label")[[gg]]
d2 <- data.frame(PV2[[gg]])
d2[ , group] <- group.label[gg]
d2[ , cluster] <- lavInspect(object, "cluster.id")[[gg]]
dd <- merge(dd, d2, by = c(group, cluster), all = TRUE)
}
dd
})
PV <- do.call(rbind, temp)
} else {
PV <- data.frame(PV)
## attach row indices from original data for optional merging
if (only.L2) {
PV[ , cluster] <- lavInspect(object, "cluster.id")
PV <- PV[ , c(cluster, setdiff(names(PV), cluster)), drop = FALSE ]
} else {
PV <- cbind(case.idx = lavInspect(object, "case.idx"), PV)
}
## attach cluster IDs, if multilevel and no level requested
if (bothLevels) {
PV[ , cluster] <- lavInspect(object, "cluster.label")
PV2 <- data.frame(PV2)
PV2[ , cluster] <- lavInspect(object, "cluster.id")
PV <- merge(PV, PV2, by = cluster, all = TRUE)
}
}
PV
}
## draw plausible values from a lavaan.mi object
##' @importFrom lavaan lavInspect lavPredict
plaus.mi <- function(object, seeds = 1:5, omit.imps = c("no.conv","no.se"), ...) {
stopifnot(inherits(object, "lavaan.mi"))
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(object@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(object@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(object@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(object@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(object@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
m <- sum(useImps)
useImps <- which(useImps)
## check if model has converged
if (m == 0L) stop("No models converged. Score tests unavailable.")
oldCall <- object@lavListCall
if (!is.null(oldCall$parallel)) {
if (oldCall$parallel == "snow") {
oldCall$parallel <- "no"
oldCall$ncpus <- 1L
message("Unable to pass lavaan::lavPredict() arguments ",
"when parallel='snow'. Switching to parallel='no'.",
" Unless using Windows, parallel='multicore' should work.")
}
}
## call lavaanList() again to run lavTestScore() on each imputation
oldCall$dataList <- object@DataList[useImps]
oldCall$FUN <- function(obj) lapply(seeds, plaus.lavaan, object = obj, ...)
FIT <- eval(as.call(oldCall))
## check if there are any results
noFS <- sapply(FIT@funList, is.null)
if (all(noFS)) stop("No success drawing plausible values for any imputations.")
do.call(c, FIT@funList) # concatenate lists
}
## draw 1 set of plausible values from a blavaan object
##' @importFrom lavaan lavNames lavInspect
plaus.blavaan <- function(object, nDraws = 20L, seed = 12345, ...) {
stopifnot(inherits(object, "blavaan"))
if (!"package:blavaan" %in% search()) attachNamespace("blavaan")
# cluster <- lavInspect(object, "cluster")
group <- lavInspect(object, "group")
group.label <- lavInspect(object, "group.label")
nG <- lavInspect(object, "ngroups")
# nL <- lavInspect(object, "nlevels")
case.idx <- lavInspect(object, "case.idx")
## plausible values of what? (could be latent item responses)
dots <- list(...)
if (is.null(dots$type)) dots$type <- "lv" # default to factor scores
dots$object <- object
## stack factor scores from each chain (one row per PV)
FS <- do.call(blavaan::blavPredict, dots)
#NOTE: might be latent responses
## only save nDraws from posterior
if (nDraws >= length(FS)) {
## why would anyone want this many? Or sample so few during estimation?
message('nDraws cannot exceed number of iterations in `object=`. \nSet to ',
'nDraws = blavInspect(object, "niter") * blavInspect(bfitc, "n.chains")')
nDraws <- length(FS)
}
set.seed(seed)
idx.sample <- ceiling(1:nDraws * length(FS)/nDraws)
#FIXME: if Ed accepts pull request, format will be the same as c("yhat","ypred")
if (dots$type == "lv" && utils::compareVersion(utils::packageDescription('blavaan')$Version, '0.4-2.949') >= 0L) {
## column names contain indices to store PVs in matrix
eta.idx <- colnames(FS)
## N and latent variable names, to know dimensions of PV
N <- lavInspect(object, "ntotal")
etas <- lavNames(object, "lv") #FIXME: assumes same model in both groups
PV <- list()
## loop over nDraws rows, assign columns to eta matrix, save in PV list
set.seed(seed)
idx.sample <- ceiling(1:nDraws * length(FS)/nDraws)
for (i in idx.sample) {
eta <- matrix(NA, nrow = N, ncol = length(etas), dimnames = list(NULL, etas))
for (j in eta.idx) eval(parse(text = paste(j, "<-", FS[i, j]) ))
PV[[i]] <- data.frame(eta)
## add case indices, and groups (if applicable)
if (nG == 1L) PV[[i]]$case.idx <- case.idx else {
PV[[i]]$case.idx <- do.call(c, case.idx)
PV[[i]][ , group] <- rep(group.label, times = lavInspect(object, "nobs"))
}
}
} else {
## latent responses, already a list
PV <- list()
for (i in idx.sample) {
## convert matrix to data.frame
PV[[i]] <- data.frame(FS[[i]])
## add case indices, and groups (if applicable)
if (nG == 1L) PV[[i]]$case.idx <- case.idx else {
PV[[i]]$case.idx <- do.call(c, case.idx)
PV[[i]][ , group] <- rep(group.label, times = lavInspect(object, "nobs"))
}
}
} #else stop('Not implemented for blavPredict(object, type="', dots$type,'")')
PV
}
## ------
## Checks
## ------
# HS.model <- ' visual =~ x1 + x2 + x3
# textual =~ x4 + x5 + x6
# speed =~ x7 + x8 + x9 '
#
# fit1 <- cfa(HS.model, data = HolzingerSwineford1939)
# fs1 <- plausibleValues(fit1, nDraws = 3, append.data = T)
# lapply(fs1, head)
#
#
#
# step1 <- cfa(HS.model, data = HolzingerSwineford1939, group = "school",
# group.equal = c("loadings","intercepts"))
# PV.list <- plausibleValues(step1, append.data = T)
# lapply(PV.list[1:3], head)
#
#
# model <- '
# level: 1
# fw =~ y1 + y2 + y3
# fw ~ x1 + x2 + x3
# level: 2
# fb =~ y1 + y2 + y3
# fb ~ w1 + w2
# '
# msem <- sem(model, data = Demo.twolevel, cluster = "cluster")
# mlPVs <- plausibleValues(msem, nDraws = 3, append.data = T) # both levels by default
# lapply(mlPVs, head, n = 10)
# ## only Level 1
# mlPV1 <- plausibleValues(msem, nDraws = 3, level = 1, append.data = T)
# lapply(mlPV1, head)
# ## only Level 2
# mlPV2 <- plausibleValues(msem, nDraws = 3, level = 2, append.data = T)
# lapply(mlPV2, head)
#
#
#
# data(Demo.twolevel)
# Demo.twolevel$g <- ifelse(Demo.twolevel$cluster %% 2L, "foo", "bar") # arbitrary groups
# table(Demo.twolevel$g)
# model2 <- ' group: foo
# level: within
# fw =~ y1 + L2*y2 + L3*y3
# fw ~ x1 + x2 + x3
# level: between
# fb =~ y1 + L2*y2 + L3*y3
# fb ~ w1 + w2
#
# group: bar
#
# level: within
# fw =~ y1 + L2*y2 + L3*y3
# fw ~ x1 + x2 + x3
# level: between
# fb =~ y1 + L2*y2 + L3*y3
# fb ~ w1 + w2
# '
# msem2 <- sem(model2, data = Demo.twolevel, cluster = "cluster", group = "g")
# ml2PVs <- plausibleValues(msem2, nDraws = 3, append.data = T) # both levels by default
# lapply(ml2PVs, head, n = 10)
# ## only Level 1
# ml2PV1 <- plausibleValues(msem2, nDraws = 3, level = 1, append.data = T)
# lapply(ml2PV1, head)
# ## only Level 2
# ml2PV2 <- plausibleValues(msem2, nDraws = 3, level = 2, append.data = T)
# lapply(ml2PV2, head)
## ordered-categorical data
# data(datCat)
#
# modc <- ' ## Set thresholds equal across groups
# ## thresholds at Time 1
# u1 | c(tau1.1, tau1.1)*t1 + c(tau1.2, tau1.2)*t2 + c(tau1.3, tau1.3)*t3 + c(tau1.4, tau1.4)*t4
# u2 | c(tau2.1, tau2.1)*t1 + c(tau2.2, tau2.2)*t2 + c(tau2.3, tau2.3)*t3 + c(tau2.4, tau2.4)*t4
# u3 | c(tau3.1, tau3.1)*t1 + c(tau3.2, tau3.2)*t2 + c(tau3.3, tau3.3)*t3 + c(tau3.4, tau3.4)*t4
# u4 | c(tau4.1, tau4.1)*t1 + c(tau4.2, tau4.2)*t2 + c(tau4.3, tau4.3)*t3 + c(tau4.4, tau4.4)*t4
# ## thresholds at Time 2 equal to Time 1
# u5 | c(tau1.1, tau1.1)*t1 + c(tau1.2, tau1.2)*t2 + c(tau1.3, tau1.3)*t3 + c(tau1.4, tau1.4)*t4
# u6 | c(tau2.1, tau2.1)*t1 + c(tau2.2, tau2.2)*t2 + c(tau2.3, tau2.3)*t3 + c(tau2.4, tau2.4)*t4
# u7 | c(tau3.1, tau3.1)*t1 + c(tau3.2, tau3.2)*t2 + c(tau3.3, tau3.3)*t3 + c(tau3.4, tau3.4)*t4
# u8 | c(tau4.1, tau4.1)*t1 + c(tau4.2, tau4.2)*t2 + c(tau4.3, tau4.3)*t3 + c(tau4.4, tau4.4)*t4
# ## define latent responses as single-indicator factors (resid. var = 0)
# y1 =~ 1*u1 ; u1 ~~ c(0, 0)*u1
# y2 =~ 1*u2 ; u2 ~~ c(0, 0)*u2
# y3 =~ 1*u3 ; u3 ~~ c(0, 0)*u3
# y4 =~ 1*u4 ; u4 ~~ c(0, 0)*u4
# y5 =~ 1*u5 ; u5 ~~ c(0, 0)*u5
# y6 =~ 1*u6 ; u6 ~~ c(0, 0)*u6
# y7 =~ 1*u7 ; u7 ~~ c(0, 0)*u7
# y8 =~ 1*u8 ; u8 ~~ c(0, 0)*u8
# ## only fix mean=0 in first group/occasion
# y1 + y2 + y3 + y4 ~ c( 0, NA)*1
# y5 + y6 + y7 + y8 ~ c(NA, NA)*1
# ## only fix variance=1 in first groop/occasion
# y1 ~~ c( 1, NA)*y1
# y2 ~~ c( 1, NA)*y2
# y3 ~~ c( 1, NA)*y3
# y4 ~~ c( 1, NA)*y4
# y5 ~~ c(NA, NA)*y5
# y6 ~~ c(NA, NA)*y6
# y7 ~~ c(NA, NA)*y7
# y8 ~~ c(NA, NA)*y8
# ## estimate all covariances
# y1 ~~ y2 + y3 + y4 + y5 + y6 + y7 + y8
# y2 ~~ y3 + y4 + y5 + y6 + y7 + y8
# y3 ~~ y4 + y5 + y6 + y7 + y8
# y4 ~~ y5 + y6 + y7 + y8
# y5 ~~ y6 + y7 + y8
# y6 ~~ y7 + y8
# y7 ~~ y8
# '
# ## fit in lavaan for sanity check
# fitc <- lavaan(modc, data = datCat, group = "g", parameterization = "theta")
# summary(fitc)
#
# ## impose 5% MCAR
# set.seed(123)
# for (i in 1:8) datCat[sample(1:nrow(datCat), size = .05*nrow(datCat)), i] <- NA
#
# ## try with pairwise deletion
# fitcm <- lavaan(modc, data = datCat, group = "g", parameterization = "theta",
# missing = "pairwise")
# summary(fitcm)
#
# ## try blavaan
# data(datCat) # doesn't yet work with missing ordinal data
# bfitc <- blavaan(modc, data = datCat, group = "g", ordered = TRUE, # why is this needed when they are already ordered?
# n.chains = 2, burnin = 500, sample = 101, seed = 123,
# bcontrol = list(cores = 2),
# save.lvs = TRUE)
# summary(bfitc)
# LIRs <- blavPredict(bfitc, type = "ypred")
# yhats <- blavPredict(bfitc, type = "yhat") # same when resid. var = 0?
# fscores <- blavPredict(bfitc, type = "lv") # same as LIRs?
#
#
# length(LIRs) # list: 1 N*p matrix per chain
# length(yhats) # list: 1 N*p matrix per chain
# length(fscores) # matrix: 1 row per chain, 1 column per [N, fs]
# ## all are basically interchangeable
# ch <- 2
# cor(cbind(yhats = as.numeric(yhats[[ch]]),
# LIRs = as.numeric(LIRs[[ch]]),
# fscores = fscores[ch,]))
# ## compare means (by group)
# aggregate(yhats[[ch]], by = datCat["g"], FUN = mean)
# aggregate( LIRs[[ch]], by = datCat["g"], FUN = mean)
# aggregate(matrix(fscores[ch,], ncol = 8), by = datCat["g"], FUN = mean)
# ## compare to lavaan
# do.call(rbind, sapply(lavInspect(fitc, "est"),
# function(i) i$alpha[,1], simplify = FALSE))
#
#
#
# ## now a CFA (so there are latent item responses and common factors)
# mod <- ' FU1 =~ u1 + u2 + u3 + u4
# FU2 =~ u5 + u6 + u7 + u8 '
# fit <- cfa(mod, data = datCat, std.lv = TRUE)
# bfit <- bcfa(mod, data = datCat, std.lv = TRUE, ordered = TRUE,
# n.chains = 2, burnin = 100, sample = 101, seed = 123,
# save.lvs = TRUE)
# summary(bfit)
# fscores <- blavPredict(bfit, type = "lv")
# LIRs <- blavPredict(bfit, type = "ypred")
# FS <- plausibleValues(bfitc)
# LIRs <- plausibleValues(bfitc, type = "ypred")
semTools/R/htmt.R 0000644 0001762 0000144 00000014531 14632150323 013362 0 ustar ligges users ### Ylenio Longo & Terrence D. Jorgensen
### Last updated: 31 January 2023
##' Assessing Discriminant Validity using Heterotrait--Monotrait Ratio
##'
##' This function assesses discriminant validity through the
##' heterotrait-monotrait ratio (HTMT) of the correlations (Henseler, Ringlet &
##' Sarstedt, 2015). Specifically, it assesses the arithmetic (Henseler et al.,
##' ) or geometric (Roemer et al., 2021) mean correlation
##' among indicators across constructs (i.e. heterotrait--heteromethod
##' correlations) relative to the geometric-mean correlation among indicators
##' within the same construct (i.e. monotrait--heteromethod correlations).
##' The resulting HTMT(2) values are interpreted as estimates of inter-construct
##' correlations. Absolute values of the correlations are recommended to
##' calculate the HTMT matrix, and are required to calculate HTMT2. Correlations
##' are estimated using the [lavaan::lavCor()] function.
##'
##'
##' @importFrom stats cov2cor
##'
##' @param model lavaan [lavaan::model.syntax()] of a confirmatory factor
##' analysis model where at least two factors are required for indicators
##' measuring the same construct.
##' @param data A `data.frame` or data `matrix`
##' @param sample.cov A covariance or correlation matrix can be used, instead of
##' `data=`, to estimate the HTMT values.
##' @param missing If `"listwise"`, cases with missing values are removed listwise
##' from the data frame. If `"direct"` or `"ml"` or `"fiml"` and the estimator is
##' maximum likelihood, an EM algorithm is used to estimate the unrestricted
##' covariance matrix (and mean vector). If `"pairwise"`, pairwise deletion is
##' used. If `"default"`, the value is set depending on the estimator and the
##' mimic option (see details in [lavaan::lavCor()]).
##' @param ordered Character vector. Only used if object is a `data.frame`.
##' Treat these variables as ordered (ordinal) variables. Importantly, all
##' other variables will be treated as numeric (unless `is.ordered` in
##' `data=`). See also [lavaan::lavCor()].
##' @param absolute `logical` indicating whether HTMT values should be
##' estimated based on absolute correlations (default is `TRUE`). This
##' is recommended for HTMT but required for HTMT2 (so silently ignored).
##' @param htmt2 `logical` indicating whether to use the geometric mean
##' (default, appropriate for congeneric indicators) or arithmetic mean
##' (which assumes tau-equivalence).
##'
##' @return A matrix showing HTMT(2) values (i.e., discriminant validity)
##' between each pair of factors.
##'
##' @author
##' Ylenio Longo (University of Nottingham; \email{yleniolongo@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for
##' assessing discriminant validity in variance-based structural equation
##' modeling. *Journal of the Academy of Marketing Science, 43*(1),
##' 115--135. \doi{10.1007/s11747-014-0403-8}
##'
##' Roemer, E., Schuberth, F., & Henseler, J. (2021). HTMT2---An improved
##' criterion for assessing discriminant validity in structural equation
##' modeling. *Industrial Management & Data Systems, 121*(21), 2637--2650.
##' \doi{10.1108/IMDS-02-2021-0082}
##'
##' Voorhees, C. M., Brady, M. K., Calantone, R., & Ramirez, E. (2016).
##' Discriminant validity testing in marketing: An analysis, causes for
##' concern, and proposed remedies.
##' *Journal of the Academy of Marketing Science, 44*(1), 119--134.
##' \doi{10.1007/s11747-015-0455-4}
##'
##' @examples
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' dat <- HolzingerSwineford1939[, paste0("x", 1:9)]
##' htmt(HS.model, dat)
##'
##' ## save covariance matrix
##' HS.cov <- cov(HolzingerSwineford1939[, paste0("x", 1:9)])
##' ## HTMT using arithmetic mean
##' htmt(HS.model, sample.cov = HS.cov, htmt2 = FALSE)
##'
##' @export
htmt <- function(model, data = NULL, sample.cov = NULL, missing = "listwise",
ordered = NULL, absolute = TRUE, htmt2 = TRUE) {
model <- lavaan::lavaanify(model)
model <- model[model$op %in% "=~", ]
factors <- unique(model$lhs)
nf <- length(factors)
var <- list()
for (i in 1:nf) {
var[[i]] <- model$rhs[which(model$lhs %in% factors[i])]
}
varnames <- c(unlist(var))
if(!is.null(data)) { # if data
if(any(! varnames %in% colnames(data))) {
absent.vars <- which(! varnames %in% colnames(data))
stop("Missing observed variables in the dataset: ",
paste(varnames[absent.vars], collapse = " "))
}
data <- data[ , c(varnames)]
R <- lavaan::lavCor(data, missing = missing, ordered = ordered)
rownames(R) <- names(data)
colnames(R) <- names(data)
} else {
if (any(! varnames %in% colnames(sample.cov))) {
absent.vars <- which(! varnames %in% colnames(sample.cov))
stop("Missing observed variables in the covariance or correlation matrix: ",
paste(varnames[absent.vars], collapse = " "))
}
diagR <- diag(sample.cov)
if (max(diagR) != 1 & min(diagR) != 1) { #if covariance matrix
R <- cov2cor(sample.cov[varnames, varnames])
} else { # if correlation matrix
R <- sample.cov[varnames, varnames]
}
}
if (absolute || htmt2) {
R <- abs(R)
}
diag(R) <- NA
m.cor.w <- list()
for (i in 1:nf) {
if (htmt2) {
m.cor.w[[i]] <- exp(mean(log(R[ var[[i]], var[[i]] ]), na.rm = TRUE))
} else m.cor.w[[i]] <- mean(R[ var[[i]], var[[i]] ], na.rm = TRUE)
}
m.cor.w <- as.numeric(m.cor.w)
comb <- expand.grid(1:nf, 1:nf)
g <- list()
for (i in 1:nrow(comb)) {
g[[i]] <- sqrt(m.cor.w[comb[i, 2]] * m.cor.w[comb[i, 1]])
}
g <- as.numeric(g)
paste(comb[, 2], comb[, 1])
m.cor.a <- list()
for (i in 1:nrow(comb)) {
if (htmt2) {
m.cor.a[[i]] <- exp(mean(log(R[ var[[comb[i, 2]]],
var[[comb[i, 1]]] ]),
na.rm = TRUE))
} else m.cor.a[[i]] <- mean(R[ var[[ comb[i,2] ]],
var[[ comb[i,1] ]] ], na.rm = TRUE)
}
m.cor.a <- as.numeric(m.cor.a)
outhtmt <- m.cor.a / g
res <- matrix(outhtmt, nrow = nf, ncol = nf, dimnames = list(factors))
colnames(res) <- factors
class(res) <- c("lavaan.matrix.symmetric", "matrix")
res
}
semTools/R/semTools.R 0000644 0001762 0000144 00000005171 15142325143 014214 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 10 June 2024
### package documentation, along with convenience documentation (e.g., imports)
##' semTools: Useful Tools for Structural Equation Modeling
##'
##' The \pkg{semTools} package provides many miscellaneous functions that are
##' useful for statistical analysis involving SEM in R. Many functions extend
##' the funtionality of the \pkg{lavaan} package. Some sets of functions in
##' \pkg{semTools} correspond to the same theme. We call such a collection of
##' functions a *suite*. Our suites include:
##' \itemize{
##' \item{Model Fit Evaluation:
##' [moreFitIndices()],
##' [nullRMSEA()],
##' [singleParamTest()],
##' [epcEquivFit()], and
##' [chisqSmallN()]}
##' \item{Measurement Invariance:
##' [measEq.syntax()],
##' [partialInvariance()],
##' [partialInvarianceCat()], and
##' [permuteMeasEq()]}
##' \item{Power Analysis:
##' [SSpower()],
##' [findRMSEApower()],
##' [plotRMSEApower()],
##' [plotRMSEAdist()],
##' [findRMSEAsamplesize()],
##' [findRMSEApowernested()],
##' [plotRMSEApowernested()], and
##' [findRMSEAsamplesizenested()]}
##' \item{Missing Data Analysis:
## [runMI()], # DEPRECATE
##' [auxiliary()],
##' [twostage()],
##' [fmi()],
##' [bsBootMiss()],
##' [quark()], and
##' [combinequark()]}
##' \item{Latent Interactions:
##' [indProd()],
##' [orthogonalize()],
##' [probe2WayMC()],
##' [probe3WayMC()],
##' [probe2WayRC()],
##' [probe3WayRC()], and
##' [plotProbe()]}
##' \item{Exploratory Factor Analysis (EFA):
##' [efa.ekc()]}
##' \item{Reliability Estimation:
##' [compRelSEM()] and
##' [maximalRelia()]
##' (see also [AVE()])}
##' \item{Parceling:
##' [parcelAllocation()],
##' [PAVranking()], and
##' [poolMAlloc()]}
##' \item{Non-Normality:
##' [skew()],
##' [kurtosis()],
##' [mardiaSkew()],
##' [mardiaKurtosis()], and
##' [mvrnonnorm()]}
##' }
##' All users of R (or SEM) are invited to submit functions or ideas for
##' functions by contacting the maintainer, Terrence Jorgensen
##' (\email{TJorgensen314@gmail.com}). Contributors are encouraged to use
##' `Roxygen` comments to document their contributed code, which is
##' consistent with the rest of \pkg{semTools}. Read the vignette from the
##' \pkg{roxygen2} package for details:
##' `vignette("rd", package = "roxygen2")`
##'
##' @name semTools
##' @aliases semTools-package
NULL
##' @importFrom methods setClass setMethod getMethod show is new slot as hasArg
NULL
##' @importFrom graphics hist plot par abline lines legend
NULL
##' @importFrom stats nobs residuals resid fitted fitted.values coef anova vcov
NULL
semTools/R/mvrnonnorm.R 0000644 0001762 0000144 00000014566 14632143377 014644 0 ustar ligges users ### Yves Rosseel, Sunthud Pornprasertmanit, & Terrence D. Jorgensen
### Last updated: 10 June 2024
##' Generate Non-normal Data using Vale and Maurelli (1983) method
##'
##' Generate Non-normal Data using Vale and Maurelli (1983) method. The function
##' is designed to be as similar as the popular `mvrnorm` function in the
##' `MASS` package. The codes are copied from `mvrnorm` function in
##' the `MASS` package for argument checking and `lavaan` package for
##' data generation using Vale and Maurelli (1983) method.
##'
##'
##' @importFrom stats cov2cor
##'
##' @param n Sample size
##' @param mu A mean vector. If elements are named, those will be used as
##' variable names in the returned data matrix.
##' @param Sigma A positive-definite symmetric matrix specifying the covariance
##' matrix of the variables. If rows or columns are named (and `mu` is
##' unnamed), those will be used as variable names in the returned data matrix.
##' @param skewness A vector of skewness of the variables
##' @param kurtosis A vector of excessive kurtosis of the variables
##' @param empirical deprecated, ignored.
##' @return A data matrix
##' @author The original function is the [lavaan::simulateData()]
##' function written by Yves Rosseel in the `lavaan` package. The function
##' is adjusted for a convenient usage by Sunthud Pornprasertmanit
##' (\email{psunthud@@gmail.com}). Terrence D. Jorgensen added the feature to
##' retain variable names from `mu` or `Sigma`.
##'
##' @references Vale, C. D. & Maurelli, V. A. (1983). Simulating multivariate
##' nonormal distributions. *Psychometrika, 48*(3), 465--471.
##' \doi{10.1007/BF02293687}
##'
##' @examples
##'
##' set.seed(123)
##' mvrnonnorm(20, c(1, 2), matrix(c(10, 2, 2, 5), 2, 2),
##' skewness = c(5, 2), kurtosis = c(3, 3))
##' ## again, with variable names specified in mu
##' set.seed(123)
##' mvrnonnorm(20, c(a = 1, b = 2), matrix(c(10, 2, 2, 5), 2, 2),
##' skewness = c(5, 2), kurtosis = c(3, 3))
##'
##' @export
mvrnonnorm <- function(n, mu, Sigma, skewness = NULL,
kurtosis = NULL, empirical = FALSE) {
## number of variables
p <- length(mu)
if (!all(dim(Sigma) == c(p, p))) stop("incompatible arguments")
## save variable names, if they exist
varnames <- names(mu)
if (is.null(varnames)) varnames <- rownames(Sigma)
if (is.null(varnames)) varnames <- colnames(Sigma)
## check for NPD
eS <- eigen(Sigma, symmetric = TRUE)
ev <- eS$values
if (!all(ev >= -1e-06 * abs(ev[1L])))
stop("'Sigma' is not positive definite")
## simulate X <- NULL
if (is.null(skewness) && is.null(kurtosis)) {
X <- mnormt::rmnorm(n = n, mean = mu, varcov = Sigma)
} else {
if (is.null(skewness)) skewness <- rep(0, p)
if (is.null(kurtosis)) kurtosis <- rep(0, p)
Z <- ValeMaurelli1983copied(n = n, COR = cov2cor(Sigma),
skewness = skewness, kurtosis = kurtosis)
TMP <- scale(Z, center = FALSE, scale = 1/sqrt(diag(Sigma)))[ , , drop = FALSE]
X <- sweep(TMP, MARGIN = 2, STATS = mu, FUN = "+")
}
colnames(X) <- varnames
X
}
## ----------------
## Hidden Functions
## ----------------
## Copied from lavaan package
##' @importFrom stats nlminb
ValeMaurelli1983copied <- function(n = 100L, COR, skewness, kurtosis,
debug = FALSE) {
fleishman1978_abcd <- function(skewness, kurtosis) {
system.function <- function(x, skewness, kurtosis) {
b.=x[1L]; c.=x[2L]; d.=x[3L]
eq1 <- b.^2 + 6*b.*d. + 2*c.^2 + 15*d.^2 - 1
eq2 <- 2*c.*(b.^2 + 24*b.*d. + 105*d.^2 + 2) - skewness
eq3 <- 24*(b.*d. + c.^2*(1 + b.^2 + 28*b.*d.) +
d.^2*(12 + 48*b.*d. + 141*c.^2 + 225*d.^2)) - kurtosis
eq <- c(eq1,eq2,eq3)
sum(eq^2) ## SS
}
out <- nlminb(start = c(1, 0, 0), objective = system.function,
scale = 10, control = list(trace = 0),
skewness = skewness, kurtosis = kurtosis)
if(out$convergence != 0) warning("no convergence")
b. <- out$par[1L]; c. <- out$par[2L]; d. <- out$par[3L]; a. <- -c.
c(a.,b.,c.,d.)
}
getICOV <- function(b1, c1, d1, b2, c2, d2, R) {
objectiveFunction <- function(x, b1, c1, d1, b2, c2, d2, R) {
rho=x[1L]
eq <- rho*(b1*b2 + 3*b1*d2 + 3*d1*b2 + 9*d1*d2) +
rho^2*(2*c1*c2) + rho^3*(6*d1*d2) - R
eq^2
}
#gradientFunction <- function(x, bcd1, bcd2, R) {
#
#}
out <- nlminb(start=R, objective=objectiveFunction,
scale=10, control=list(trace=0),
b1=b1, c1=c1, d1=d1, b2=b2, c2=c2, d2=d2, R=R)
if(out$convergence != 0) warning("no convergence")
rho <- out$par[1L]
rho
}
# number of variables
nvar <- ncol(COR)
# check skewness
if(is.null(skewness)) {
SK <- rep(0, nvar)
} else if(length(skewness) == nvar) {
SK <- skewness
} else if(length(skewness) == 1L) {
SK <- rep(skewness, nvar)
} else {
stop("skewness has wrong length")
}
if(is.null(kurtosis)) {
KU <- rep(0, nvar)
} else if(length(kurtosis) == nvar) {
KU <- kurtosis
} else if(length(kurtosis) == 1L) {
KU <- rep(kurtosis, nvar)
} else {
stop("kurtosis has wrong length")
}
# create Fleishman table
FTable <- matrix(0, nvar, 4L)
for(i in 1:nvar) {
FTable[i,] <- fleishman1978_abcd(skewness=SK[i], kurtosis=KU[i])
}
# compute intermediate correlations between all pairs
ICOR <- diag(nvar)
for(j in 1:(nvar-1L)) {
for(i in (j+1):nvar) {
if(COR[i,j] == 0) next
ICOR[i,j] <- ICOR[j,i] <-
getICOV(FTable[i,2], FTable[i,3], FTable[i,4],
FTable[j,2], FTable[j,3], FTable[j,4], R=COR[i,j])
}
}
if(debug) {
cat("\nOriginal correlations (for Vale-Maurelli):\n")
print(COR)
cat("\nIntermediate correlations (for Vale-Maurelli):\n")
print(ICOR)
cat("\nEigen values ICOR:\n")
print( eigen(ICOR)$values )
}
# generate Z
X <- Z <- mnormt::rmnorm(n = n, mean = rep(0,nvar), varcov = ICOR)
# transform Z using Fleischman constants
for(i in 1:nvar) {
X[,i] <- FTable[i,1L] + FTable[i,2L]*Z[,i] + FTable[i,3L]*Z[,i]^2 +
FTable[i,4L]*Z[,i]^3
}
X
}
semTools/R/imposeStart.R 0000644 0001762 0000144 00000010446 14632016456 014731 0 ustar ligges users ### Sunthud Pornprasertmanit
### Last updated: 2 April 2017
#' Specify starting values from a lavaan output
#'
#' This function will save the parameter estimates of a lavaan output and
#' impose those parameter estimates as starting values for another analysis
#' model. The free parameters with the same names or the same labels across two
#' models will be imposed the new starting values. This function may help to
#' increase the chance of convergence in a complex model (e.g.,
#' multitrait-multimethod model or complex longitudinal invariance model).
#'
#'
#' @param out The `lavaan` output that users wish to use the parameter
#' estimates as staring values for an analysis model
#' @param expr The original code that users use to run a lavaan model
#' @param silent Logical to print the parameter table with new starting values
#' @return A fitted lavaan model
#' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
#' @examples
#'
#' ## The following example show that the longitudinal weak invariance model
#' ## using effect coding was not convergent with three time points but convergent
#' ## with two time points. Thus, the parameter estimates from the model with
#' ## two time points are used as starting values of the three time points.
#' ## The model with new starting values is convergent properly.
#'
#' weak2time <- '
#' # Loadings
#' f1t1 =~ LOAD1*y1t1 + LOAD2*y2t1 + LOAD3*y3t1
#' f1t2 =~ LOAD1*y1t2 + LOAD2*y2t2 + LOAD3*y3t2
#'
#' # Factor Variances
#' f1t1 ~~ f1t1
#' f1t2 ~~ f1t2
#'
#' # Factor Covariances
#' f1t1 ~~ f1t2
#'
#' # Error Variances
#' y1t1 ~~ y1t1
#' y2t1 ~~ y2t1
#' y3t1 ~~ y3t1
#' y1t2 ~~ y1t2
#' y2t2 ~~ y2t2
#' y3t2 ~~ y3t2
#'
#' # Error Covariances
#' y1t1 ~~ y1t2
#' y2t1 ~~ y2t2
#' y3t1 ~~ y3t2
#'
#' # Factor Means
#' f1t1 ~ NA*1
#' f1t2 ~ NA*1
#'
#' # Measurement Intercepts
#' y1t1 ~ INT1*1
#' y2t1 ~ INT2*1
#' y3t1 ~ INT3*1
#' y1t2 ~ INT4*1
#' y2t2 ~ INT5*1
#' y3t2 ~ INT6*1
#'
#' # Constraints for Effect-coding Identification
#' LOAD1 == 3 - LOAD2 - LOAD3
#' INT1 == 0 - INT2 - INT3
#' INT4 == 0 - INT5 - INT6
#' '
#' model2time <- lavaan(weak2time, data = exLong)
#'
#' weak3time <- '
#' # Loadings
#' f1t1 =~ LOAD1*y1t1 + LOAD2*y2t1 + LOAD3*y3t1
#' f1t2 =~ LOAD1*y1t2 + LOAD2*y2t2 + LOAD3*y3t2
#' f1t3 =~ LOAD1*y1t3 + LOAD2*y2t3 + LOAD3*y3t3
#'
#' # Factor Variances
#' f1t1 ~~ f1t1
#' f1t2 ~~ f1t2
#' f1t3 ~~ f1t3
#'
#' # Factor Covariances
#' f1t1 ~~ f1t2 + f1t3
#' f1t2 ~~ f1t3
#'
#' # Error Variances
#' y1t1 ~~ y1t1
#' y2t1 ~~ y2t1
#' y3t1 ~~ y3t1
#' y1t2 ~~ y1t2
#' y2t2 ~~ y2t2
#' y3t2 ~~ y3t2
#' y1t3 ~~ y1t3
#' y2t3 ~~ y2t3
#' y3t3 ~~ y3t3
#'
#' # Error Covariances
#' y1t1 ~~ y1t2
#' y2t1 ~~ y2t2
#' y3t1 ~~ y3t2
#' y1t1 ~~ y1t3
#' y2t1 ~~ y2t3
#' y3t1 ~~ y3t3
#' y1t2 ~~ y1t3
#' y2t2 ~~ y2t3
#' y3t2 ~~ y3t3
#'
#' # Factor Means
#' f1t1 ~ NA*1
#' f1t2 ~ NA*1
#' f1t3 ~ NA*1
#'
#' # Measurement Intercepts
#' y1t1 ~ INT1*1
#' y2t1 ~ INT2*1
#' y3t1 ~ INT3*1
#' y1t2 ~ INT4*1
#' y2t2 ~ INT5*1
#' y3t2 ~ INT6*1
#' y1t3 ~ INT7*1
#' y2t3 ~ INT8*1
#' y3t3 ~ INT9*1
#'
#' # Constraints for Effect-coding Identification
#' LOAD1 == 3 - LOAD2 - LOAD3
#' INT1 == 0 - INT2 - INT3
#' INT4 == 0 - INT5 - INT6
#' INT7 == 0 - INT8 - INT9
#' '
#' ### The following command does not provide convergent result
#' # model3time <- lavaan(weak3time, data = exLong)
#'
#' ### Use starting values from the model with two time points
#' model3time <- imposeStart(model2time, lavaan(weak3time, data = exLong))
#' summary(model3time)
#'
#' @export
imposeStart <- function(out, expr, silent = TRUE) {
if(!is(out, "lavaan")) stop("The first argument of the function must be a lavaan output.")
template2 <- template <- substitute(expr)
template2$do.fit <- FALSE
model <- eval(expr = template2, enclos = parent.frame())
ptmodel <- parTable(model)
coefmodel <- lavaan::coef(model)
coefout <- lavaan::coef(out)
start <- coefout[match(names(coefmodel), names(coefout))]
ptmodel$start[ptmodel$free != 0] <- start[ptmodel$free[ptmodel$free != 0]]
ptmodel$est <- NULL
ptmodel$se <- NULL
if(!silent) {
cat("########## Model with imposed starting values #########\n")
print(ptmodel)
}
if("model" %in% names(template)) {
template$model <- ptmodel
} else {
template[[2]] <- ptmodel
}
eval(expr = template, enclos = parent.frame())
}
semTools/R/auxiliary.R 0000644 0001762 0000144 00000027406 14753073437 014440 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 12 February 2025
##' Implement Saturated Correlates with FIML
##'
##' Automatically add auxiliary variables to a lavaan model when using full
##' information maximum likelihood (FIML) to handle missing data
##'
##' These functions are wrappers around the corresponding lavaan functions.
##' You can use them the same way you use [lavaan::lavaan()], but you
##' *must* pass your full `data.frame` to the `data` argument.
##' Because the saturated-correlates approaches (Enders, 2008) treats exogenous
##' variables as random, `fixed.x` must be set to `FALSE`. Because FIML
##' requires continuous data (although nonnormality corrections can still be
##' requested), no variables in the model nor auxiliary variables specified in
##' `aux` can be declared as `ordered`.
##'
##' @aliases auxiliary lavaan.auxiliary cfa.auxiliary sem.auxiliary growth.auxiliary
##' @importFrom lavaan lavInspect parTable lavNames
##' @importFrom stats cov quantile
##'
##' @param model The analysis model can be specified with 1 of 2 objects:
##' \enumerate{
##' \item lavaan [lavaan::model.syntax()] specifying a hypothesized
##' model *without* mention of auxiliary variables in `aux`
##' \item a parameter table, as returned by [lavaan::parTable()],
##' specifying the target model *without* auxiliary variables.
##' This option requires these columns (and silently ignores all others):
##' `c("lhs","op","rhs","user","group","free","label","plabel","start")`
##' }
##' @param data `data.frame` that includes auxiliary variables as well as
##' any observed variables in the `model`
##' @param aux `character`. Names of auxiliary variables to add to `model`
##' @param fun `character`. Name of a specific lavaan function used to fit
##' `model` to `data` (i.e., `"lavaan"`, `"cfa"`,
##' `"sem"`, or `"growth"`). Only required for `auxiliary`.
##' @param ... Additional arguments to pass to `fun=`.
##' @param envir Passed to [do.call()].
##' @param return.syntax `logical` indicating whether to return a
##' `character` string of [lavaan::model.syntax()] that can be
##' added to a target `model=` that is also a `character` string.
##' This can be advantageous, for example, to use add saturated correlates to
##' a \pkg{blavaan} model.
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Enders, C. K. (2008). A note on the use of missing auxiliary variables in
##' full information maximum likelihood-based structural equation models.
##' *Structural Equation Modeling, 15*(3), 434--448.
##' \doi{10.1080/10705510802154307}
##'
##' @return a fitted [lavaan::lavaan-class] object. Additional
##' information is stored as a `list` in the `@@external` slot:
##' \itemize{
##' \item `baseline.model`. a fitted [lavaan::lavaan-class]
##' object. Results of fitting an appropriate independence model for
##' the calculation of incremental fit indices (e.g., CFI, TLI) in
##' which the auxiliary variables remain saturated, so only the target
##' variables are constrained to be orthogonal. See Examples for how
##' to send this baseline model to [lavaan::fitMeasures()].
##' \item `aux`. The character vector of auxiliary variable names.
##' \item `baseline.syntax`. A character vector generated within the
##' `auxiliary` function, specifying the `baseline.model`
##' syntax.
##' }
##'
##' @examples
##' dat1 <- lavaan::HolzingerSwineford1939
##' set.seed(12345)
##' dat1$z <- rnorm(nrow(dat1))
##' dat1$x5 <- ifelse(dat1$z < quantile(dat1$z, .3), NA, dat1$x5)
##' dat1$x9 <- ifelse(dat1$z > quantile(dat1$z, .8), NA, dat1$x9)
##'
##' targetModel <- "
##' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' "
##'
##' ## works just like cfa(), but with an extra "aux" argument
##' fitaux1 <- cfa.auxiliary(targetModel, data = dat1, aux = "z",
##' missing = "fiml", estimator = "mlr")
##'
##' ## with multiple auxiliary variables and multiple groups
##' fitaux2 <- cfa.auxiliary(targetModel, data = dat1, aux = c("z","ageyr","grade"),
##' group = "school", group.equal = "loadings")
##'
##' ## calculate correct incremental fit indices (e.g., CFI, TLI)
##' fitMeasures(fitaux2, fit.measures = c("cfi","tli"))
##' ## NOTE: lavaan will use the internally stored baseline model, which
##' ## is the independence model plus saturated auxiliary parameters
##' lavInspect(fitaux2@external$baseline.model, "free")
##'
##'
##' @export
auxiliary <- function(model, data, aux, fun, ...,
envir = getNamespace("lavaan"), return.syntax = FALSE) {
lavArgs <- list(...)
lavArgs$data <- substitute(data)
lavArgs$fixed.x <- FALSE
if (fun %in% c("lavaan","cfa","sem","growth")) {
#TODO: allow fun = "efa"?
lavArgs$missing <- "fiml"
lavArgs$meanstructure <- TRUE
}
lavArgs$ordered <- NULL
if (missing(aux))
stop("Please provide a character vector with names of auxiliary variables")
if (missing(data))
stop("Please provide a data.frame that includes modeled and auxiliary variables")
if (fun %in% c("lavaan","cfa","sem","growth") && !all(sapply(data[aux], is.numeric)))
stop("missing = 'FIML' is unavailable for categorical data")
PTcols <- c("lhs","op","rhs","user","block","group","free","label","plabel","start")
#TODO: add any? if fun %in% c("blavaan","bcfa","bsem","bgrowth")
## check parameter table, or create one from syntax
if (is.list(model)) {
if (any(model$exo == 1))
stop("All exogenous variables (covariates) must be treated as endogenous",
" by the 'auxiliary' function. Please set 'fixed.x = FALSE'")
if (!is.null(lavArgs$group.equal))
warning("The 'group.equal' argument is ignored when 'model' is a parameter table.")
if (is.null(model$start)) {
startArgs <- lavArgs
startArgs$model <- model
startArgs$do.fit <- FALSE
model$start <- parTable(do.call(fun, startArgs, envir = envir))$start
}
missingCols <- setdiff(PTcols, names(model))
if (length(missingCols)) stop("If the 'model' argument is a parameter table",
" it must also include these columns: \n",
paste(missingCols, collapse = ", "))
PT <- as.data.frame(model, stringsAsFactors = FALSE)[PTcols]
} else if (is.character(model)) {
ptArgs <- lavArgs
ptArgs$model <- model
ptArgs$do.fit <- FALSE #FIXME: does this work with blavaan?
PT <- parTable(do.call(fun, ptArgs, envir = envir))[PTcols]
} else stop("The 'model' argument must be a character vector of",
" lavaan syntax or a parameter table")
## separately store rows with constraints or user-defined parameters
conRows <- PT$op %in% c("==","<",">",":=")
if (any(conRows)) {
CON <- PT[ conRows, ]
PT <- PT[ !conRows, ]
} else CON <- data.frame(NULL)
## variable names
varnames <- lavNames(PT, type = "ov")
if (length(intersect(varnames, aux))) stop('modeled variable declared as auxiliary')
## specify a saturated model for auxiliaries
covstruc <- outer(aux, aux, function(x, y) paste(x, "~~", y))
satMod <- c(covstruc[lower.tri(covstruc, diag = TRUE)], paste(aux, "~ 1"), # among auxiliaries
outer(aux, varnames, function(x, y) paste(x, "~~", y))) # between aux and targets
if (return.syntax) return(satMod)
satPT <- lavaan::lavaanify(satMod, ngroups = max(PT$group))[c("lhs","op","rhs",
"user","block","group")]
## after omitting duplicates, check number of added parameters, add columns
mergedPT <- lavaan::lav_partable_merge(PT, satPT, remove.duplicated = TRUE, warn = FALSE)
nAuxPar <- nrow(mergedPT) - nrow(PT)
newRows <- 1L:nAuxPar + nrow(PT)
#TODO: mergedPT$user[newRows] <- 2L (list as constraints to omit printing?) or new code (9L)?
mergedPT$free[newRows] <- 1L:nAuxPar + max(PT$free)
mergedPT$plabel[newRows] <- paste0(".p", 1L:nAuxPar + nrow(PT), ".")
## calculate sample moments as starting values (recycle over groups)
# if (is.null(lavArgs$group)) {
# auxCov <- cov(data[aux], use = "pairwise.complete.obs")
# auxM <- colMeans(data[aux], na.rm = TRUE)
# auxTarget <- cov(data[c(aux, varnames)],
# use = "pairwise.complete.obs")[aux, varnames]
# ## match order of parameters in syntax above
# mergedPT$start[newRows] <- c(auxCov[lower.tri(auxCov, diag = TRUE)],
# auxM, as.numeric(auxTarget))
# } else {
# auxCovs <- list()
# auxMs <- list()
# auxTargets <- list()
# startVals <- numeric(0)
# for (g in unique(data[ , lavArgs$group])) {
# auxCovs[[g]] <- cov(data[data[ , lavArgs$group] == g, aux],
# use = "pairwise.complete.obs")
# auxMs[[g]] <- colMeans(data[data[ , lavArgs$group] == g, aux], na.rm = TRUE)
# auxTargets[[g]] <- cov(data[data[ , lavArgs$group] == g, c(aux, varnames)],
# use = "pairwise.complete.obs")[aux, varnames]
# startVals <- c(startVals, auxCovs[[g]][lower.tri(auxCovs[[g]], diag = TRUE)],
# auxMs[[g]], as.numeric(auxTargets[[g]]))
# }
# ## match order of parameters in syntax above
# mergedPT$start[newRows] <- startVals
# }
lavArgs$model <- lavaan::lav_partable_complete(rbind(mergedPT, CON)) #FIXME for blavaan
result <- do.call(fun, lavArgs, envir = envir)
if (fun %in% c("lavaan","cfa","sem","growth")) {
## specify, fit, and attach an appropriate independence model
baseArgs <- list()
baseArgs$model <- lavaan::lav_partable_complete(satPT)
baseArgs$data <- data
baseArgs$group <- lavArgs$group
baseArgs$group.label <- lavArgs$group.label
baseArgs$missing <- "fiml"
baseArgs$cluster <- lavArgs$cluster
baseArgs$sample.cov.rescale <- lavArgs$sample.cov.rescale
baseArgs$estimator <- lavArgs$estimator
baseArgs$information <- lavArgs$information
baseArgs$se <- lavArgs$se
baseArgs$test <- lavArgs$test
baseArgs$bootstrap <- lavArgs$bootstrap
baseArgs$control <- lavArgs$control
baseArgs$optim.method <- lavArgs$optim.method
result@external$baseline.model <- do.call("lavaan", baseArgs, envir = envir)
result@external$aux <- aux
result@external$baseline.syntax <- satMod
} # else something similar for blavaan? Better to fit manually for blavFitIndices
result
}
##' @rdname auxiliary
##' @aliases lavaan.auxiliary
##' @export
lavaan.auxiliary <- function(model, data, aux, ...,
envir = getNamespace("lavaan")) {
mc <- match.call(expand.dots = TRUE)
mc$fun <- "lavaan"
mc[[1L]] <- quote(semTools::auxiliary)
eval(mc, parent.frame())
}
##' @rdname auxiliary
##' @aliases cfa.auxiliary
##' @export
cfa.auxiliary <- function(model, data, aux, ...,
envir = getNamespace("lavaan")) {
mc <- match.call(expand.dots = TRUE)
mc$fun <- "cfa"
mc[[1L]] <- quote(semTools::auxiliary)
eval(mc, parent.frame())
}
##' @rdname auxiliary
##' @aliases sem.auxiliary
##' @export
sem.auxiliary <- function(model, data, aux, ...,
envir = getNamespace("lavaan")) {
mc <- match.call(expand.dots = TRUE)
mc$fun <- "sem"
mc[[1L]] <- quote(semTools::auxiliary)
eval(mc, parent.frame())
}
##' @rdname auxiliary
##' @aliases growth.auxiliary
##' @export
growth.auxiliary <- function(model, data, aux, ...,
envir = getNamespace("lavaan")) {
mc <- match.call(expand.dots = TRUE)
mc$fun <- "growth"
mc[[1L]] <- quote(semTools::auxiliary)
eval(mc, parent.frame())
}
semTools/R/kd.R 0000644 0001762 0000144 00000010101 14632016456 013001 0 ustar ligges users ### Edgar Merkle
### Last updated: 10 January 2021
### Kaiser-Dickman (1962) algorithm for generating sample data
### based on the input covmat, which is a covariance matrix.
##' Generate data via the Kaiser-Dickman (1962) algorithm.
##'
##' Given a covariance matrix and sample size, generate raw data that correspond
##' to the covariance matrix. Data can be generated to match the covariance
##' matrix exactly, or to be a sample from the population covariance matrix.
##'
##' By default, R's `cov()` function divides by `n`-1. The data
##' generated by this algorithm result in a covariance matrix that matches
##' `covmat`, but you must divide by `n` instead of `n`-1.
##'
##'
##' @importFrom stats cov2cor rnorm
##'
##' @param covmat a symmetric, positive definite covariance matrix
##' @param n the sample size for the data that will be generated
##' @param type type of data generation. `exact` generates data that
##' exactly correspond to `covmat`. `sample` treats `covmat` as
##' a poulation covariance matrix, generating a sample of size `n`.
##'
##' @return `kd` returns a data matrix of dimension `n` by
##' `nrow(covmat)`.
##'
##' @author Ed Merkle (University of Missouri; \email{merklee@@missouri.edu})
##'
##' @references Kaiser, H. F. and Dickman, K. (1962). Sample and population
##' score matrices and sample correlation matrices from an arbitrary population
##' correlation matrix. *Psychometrika, 27*(2), 179--182.
##' \doi{10.1007/BF02289635}
##'
##' @examples
##'
##' #### First Example
##'
##' ## Get data
##' dat <- HolzingerSwineford1939[ , 7:15]
##' hs.n <- nrow(dat)
##'
##' ## Covariance matrix divided by n
##' hscov <- ((hs.n-1)/hs.n) * cov(dat)
##'
##' ## Generate new, raw data corresponding to hscov
##' newdat <- kd(hscov, hs.n)
##'
##' ## Difference between new covariance matrix and hscov is minimal
##' newcov <- (hs.n-1)/hs.n * cov(newdat)
##' summary(as.numeric(hscov - newcov))
##'
##' ## Generate sample data, treating hscov as population matrix
##' newdat2 <- kd(hscov, hs.n, type = "sample")
##'
##' #### Another example
##'
##' ## Define a covariance matrix
##' covmat <- matrix(0, 3, 3)
##' diag(covmat) <- 1.5
##' covmat[2:3,1] <- c(1.3, 1.7)
##' covmat[3,2] <- 2.1
##' covmat <- covmat + t(covmat)
##'
##' ## Generate data of size 300 that have this covariance matrix
##' rawdat <- kd(covmat, 300)
##'
##' ## Covariances are exact if we compute sample covariance matrix by
##' ## dividing by n (vs by n - 1)
##' summary(as.numeric((299/300)*cov(rawdat) - covmat))
##'
##' ## Generate data of size 300 where covmat is the population covariance matrix
##' rawdat2 <- kd(covmat, 300)
##'
##' @export
kd <- function(covmat, n, type=c("exact","sample")) {
type <- match.arg(type)
## Check to ensure that covmat is a valid covariance matrix.
if (nrow(covmat) != ncol(covmat)) stop("non-square matrix supplied")
symmetric <- isSymmetric.matrix(covmat)
if (!symmetric) stop("non-symmetric matrix supplied")
pd <- all(eigen(covmat, only.values = TRUE)$values > 0)
if (!pd) stop("covariance matrix is not positive definite")
p <- nrow(covmat)
## Algorithm works on a correlation matrix
mv.vars <- matrix(0, nrow(covmat), nrow(covmat))
diag(mv.vars) <- sqrt(diag(covmat))
cormat <- cov2cor(covmat)
## Generate standard normal data and mean center each variable
Xscore <- matrix(rnorm(p*n), p, n)
Xsub0 <- t(apply(Xscore, 1, scale, scale = FALSE))
## Correlation matrix factored via Cholesky decomposition:
Fcomp <- t(chol(cormat))
## Equation 2 from K&D:
Zhat <- Fcomp %*% Xscore
## Equation 3 from K&D:
Xsub0.prod <- Xsub0 %*% t(Xsub0)
## Get singular value decomp of Xsub0.prod
Xsub0.svd <- svd(Xsub0.prod)
M.sqrt <- matrix(0, p, p)
diag(M.sqrt) <- 1 / sqrt(Xsub0.svd$d)
## Equation 5 from K&D:
Z <- Fcomp %*% M.sqrt %*% t(Xsub0.svd$u) %*% Xsub0
Z <- Z * sqrt(n)
dat <- Z
if (type == "sample") { dat <- Zhat }
## Scale data to correspond to covmat
dat <- t(dat) %*% mv.vars
## convert to data.frame, use any existing names from covmat
dat <- data.frame(dat)
if(!is.null(colnames(covmat))) names(dat) <- colnames(covmat)
dat
}
semTools/R/probeInteraction.R 0000644 0001762 0000144 00000245461 14753073565 015745 0 ustar ligges users ### Sunthud Pornprasertmanit & Terrence D. Jorgensen
### Last updated: 12 February 2025
## --------
## 2-way MC
## --------
##' Probing two-way interaction on the no-centered or mean-centered latent
##' interaction
##'
##' Probing interaction for simple intercept and simple slope for the
##' no-centered or mean-centered latent two-way interaction
##'
##' Before using this function, researchers need to make the products of the
##' indicators between the first-order factors using mean centering (Marsh, Wen,
##' & Hau, 2004). Note that the double-mean centering may not be appropriate for
##' probing interaction if researchers are interested in simple intercepts. The
##' mean or double-mean centering can be done by the [indProd()]
##' function. The indicator products can be made for all possible combination or
##' matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
##' with the regression with latent interaction will be used to fit all original
##' indicators and the product terms. See the example for how to fit the product
##' term below. Once the lavaan result is obtained, this function will be used
##' to probe the interaction.
##'
##' Let that the latent interaction model regressing the dependent variable
##' (\eqn{Y}) on the independent variable (\eqn{X}) and the moderator (\eqn{Z})
##' be \deqn{ Y = b_0 + b_1X + b_2Z + b_3XZ + r, } where \eqn{b_0} is the
##' estimated intercept or the expected value of \eqn{Y} when both \eqn{X} and
##' \eqn{Z} are 0, \eqn{b_1} is the effect of \eqn{X} when \eqn{Z} is 0,
##' \eqn{b_2} is the effect of \eqn{Z} when \eqn{X} is 0, \eqn{b_3} is the
##' interaction effect between \eqn{X} and \eqn{Z}, and \eqn{r} is the residual
##' term.
##'
##' To probe a two-way interaction, the simple intercept of the independent
##' variable at each value of the moderator (Aiken & West, 1991; Cohen, Cohen,
##' West, & Aiken, 2003; Preacher, Curran, & Bauer, 2006) can be obtained by
##' \deqn{ b_{0|X = 0, Z} = b_0 + b_2 Z. }
##'
##' The simple slope of the independent varaible at each value of the moderator
##' can be obtained by
##' \deqn{ b_{X|Z} = b_1 + b_3 Z. }
##'
##' The variance of the simple intercept formula is
##' \deqn{ Var\left(b_{0|X = 0, Z}\right) =
##' Var\left(b_0\right) + 2Z \times Cov\left(b_0, b_2\right) +
##' Z^2 \times Var\left(b_2\right) },
##' where \eqn{Var} denotes the variance of a parameter
##' estimate and \eqn{Cov} denotes the covariance of two parameter estimates.
##'
##' The variance of the simple slope formula is
##' \deqn{ Var\left(b_{X|Z}\right) = Var\left(b_1\right) + 2Z \times
##' Cov\left(b_1, b_3\right) + Z^2 \times Var\left(b_3\right) }
##'
##' Wald *z* statistic is used for test statistic (even for objects of
##' class [lavaan.mi::lavaan.mi-class]).
##'
##'
##' @importFrom lavaan lavInspect parTable
##' @importFrom stats pnorm
##' @importFrom methods getMethod
##'
##' @param fit A fitted [lavaan::lavaan-class] or
##' [lavaan.mi::lavaan.mi-class] object with a latent 2-way interaction.
##' @param nameX `character` vector of all 3 factor names used as the
##' predictors. The lower-order factors must be listed first, and the final
##' name must be the latent interaction factor.
##' @param nameY The name of factor that is used as the dependent variable.
##' @param modVar The name of factor that is used as a moderator. The effect of
##' the other independent factor will be probed at each value of the
##' moderator variable listed in `valProbe`.
##' @param valProbe The values of the moderator that will be used to probe the
##' effect of the focal predictor.
##' @param group In multigroup models, the label of the group for which the
##' results will be returned. Must correspond to one of
##' `lavInspect(fit, "group.label")`, or an integer
##' corresponding to which of those group labels.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Ignored unless `fit` is of
##' class [lavaan.mi::lavaan.mi-class]. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##'
##' @return A list with two elements:
##' \enumerate{
##' \item `SimpleIntercept`: The simple intercepts given each value of the
##' moderator.
##' \item `SimpleSlope`: The simple slopes given each value of the moderator.
##' }
##' In each element, the first column represents the values of the moderator
##' specified in the `valProbe` argument. The second column is the simple
##' intercept or simple slope. The third column is the *SE* of the simple
##' intercept or simple slope. The fourth column is the Wald (*z*)
##' statistic, and the fifth column is the associated *p* value testing
##' the null hypothesis that each simple intercept or slope is 0.
##'
##' @author
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso \itemize{
##' \item [indProd()] For creating the indicator products with no
##' centering, mean centering, double-mean centering, or residual centering.
##' \item [probe3WayMC()] For probing the three-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe2WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' \item [probe3WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' \item [plotProbe()] Plot the simple intercepts and slopes of the
##' latent interaction.
##' }
##'
##' @references
##'
##' Tutorial:
##'
##' Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
##' latent variable interactions using the `semTools` package.
##' *Psych, 3*(3), 322--335. \doi{10.3390/psych3030024}
##'
##' Background literature:
##'
##' Aiken, L. S., & West, S. G. (1991). *Multiple regression: Testing
##' and interpreting interactions*. Newbury Park, CA: Sage.
##'
##' Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). *Applied
##' multiple regression/correlation analysis for the behavioral sciences*
##' (3rd ed.). New York, NY: Routledge.
##'
##' Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of
##' latent interactions: Evaluation of alternative estimation strategies and
##' indicator construction. *Psychological Methods, 9*(3), 275--300.
##' \doi{10.1037/1082-989X.9.3.275}
##'
##' Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools
##' for probing interactions in multiple linear regression, multilevel modeling,
##' and latent curve analysis. *Journal of Educational and Behavioral
##' Statistics, 31*(4), 437--448. \doi{10.3102/10769986031004437}
##'
##' @examples
##'
##' dat2wayMC <- indProd(dat2way, 1:3, 4:6) # double mean centered by default
##'
##' model1 <- "
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f3 =~ x7 + x8 + x9
##' f3 ~ f1 + f2 + f12
##' f12 ~~ 0*f1 + 0*f2 # not necessary, but implied by double mean centering
##' "
##'
##' fitMC2way <- sem(model1, data = dat2wayMC, meanstructure = TRUE)
##' summary(fitMC2way)
##'
##' probe2WayMC(fitMC2way, nameX = c("f1", "f2", "f12"), nameY = "f3",
##' modVar = "f2", valProbe = c(-1, 0, 1))
##'
##'
##' ## can probe multigroup models, one group at a time
##' dat2wayMC$g <- 1:2
##'
##' model2 <- "
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f3 =~ x7 + x8 + x9
##' f3 ~ c(b1.g1, b1.g2)*f1 + c(b2.g1, b2.g2)*f2 + c(b12.g1, b12.g2)*f12
##' f12 ~~ 0*f1 + 0*f2
##' "
##' fit2 <- sem(model2, data = dat2wayMC, group = "g")
##' probe2WayMC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
##' modVar = "f2", valProbe = c(-1, 0, 1)) # group = 1 by default
##' probe2WayMC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
##' modVar = "f2", valProbe = c(-1, 0, 1), group = 2)
##'
##' @export
probe2WayMC <- function(fit, nameX, nameY, modVar, valProbe, group = 1L,
omit.imps = c("no.conv","no.se")) {
## TDJ: verify class
if (inherits(fit, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(fit@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(fit@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(fit@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(fit@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(fit@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
## custom removal by imputation number
rm.imps <- omit.imps[ which(omit.imps %in% 1:length(useImps)) ]
if (length(rm.imps)) useImps[as.numeric(rm.imps)] <- FALSE
## whatever is left
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
} else if (!inherits(fit, "lavaan")) {
stop('"fit" must inherit from lavaan or lavaan.mi class', call. = FALSE)
}
meanstruc <- lavInspect(fit, "options")$meanstructure
# Check whether modVar is correct
if (is.character(modVar)) modVar <- match(modVar, nameX)
if (is.na(modVar) || !(modVar %in% 1:2))
stop("The moderator name is not in the name of independent factors or not 1 or 2.")
## TDJ: If multigroup, check group %in% group.label
nG <- lavInspect(fit, "ngroups")
if (nG > 1L) {
group.label <- lavInspect(fit, "group.label")
## assign numeric to character
if (is.numeric(group)) {
if (group %in% 1:nG) {
group <- group.label[group]
} else group <- as.character(group)
} else group <- as.character(group)
## check that character is a group
if (!as.character(group) %in% group.label)
stop('"group" must be a character string naming a group of interest, or ',
'an integer corresponding to a group in lavInspect(fit, "group.label")')
group.number <- which(group.label == group)
} else group.number <- 1L
## Get the parameter estimates for that group
if (nG > 1L) {
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")[[group]]
if (!meanstruc) {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
GLIST <- fit@coefList[useImps]
est$beta <- Reduce("+", lapply(GLIST, function(i) i[[group]]$beta)) / m
if (meanstruc) {
est$alpha <- Reduce("+", lapply(GLIST, function(i) i[[group]]$alpha)) / m
} else {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
}
} else {
## single-group model
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")
if (!meanstruc) {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
est$beta <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "beta")) / m
if (meanstruc) {
est$alpha <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "alpha")) / m
} else {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
}
}
## extract relevant slopes
betaNC <- matrix(est$beta[nameY, nameX], ncol = 1,
dimnames = list(nameX, nameY))
if (modVar == 1) betaNC <- betaNC[c(2, 1, 3)]
## Extract all sampling (co)variances
if (inherits(fit, "lavaan")) {
varEst <- lavaan::vcov(fit)
} else if (inherits(fit, "lavaan.mi")) {
varEst <- getMethod(f = "vcov", signature = "lavaan.mi",
where = getNamespace("lavaan.mi"))(fit, omit.imps = omit.imps)
}
## for indexing the sampling (co)variances
PT <- parTable(fit)
## index slopes in varEst
slope.idx <- which(PT$lhs == nameY & PT$op == "~" & PT$rhs %in% nameX & PT$group == group.number)
## check for user-defined labels; otherwise, set defaults
slope.label <- PT$label[slope.idx]
if (any(slope.label == "")) for (i in which(slope.label == "")) {
slope.label[i] <- paste0(nameY, "~", nameX[i],
ifelse(nG > 1L && group.number > 1L, no = "",
yes = paste0(".g", group.number)))
}
if (modVar == 1) slope.label <- slope.label[c(3, 2, 4)]
## index intercept in varEst
if (meanstruc) {
int.idx <- PT$label[PT$lhs == nameY & PT$op == "~1" & PT$group == group.number]
if (int.idx == "") {
## no custom label, use default
int.idx <- paste0(nameY, "~1")
if (nG > 1L && group.number > 1L) {
int.idx <- paste0(int.idx, ".g", group.number)
}
}
## check whether it is actually estimated (appears in $alpha and varEst)
estimateIntcept <- int.idx %in% rownames(varEst)
} else {
estimateIntcept <- FALSE
## use default label
int.idx <- paste0(nameY, "~1")
if (nG > 1L && group.number > 1L) {
int.idx <- paste0(int.idx, ".g", group.number)
}
}
## if intercept is not estimated, add a row/column with zero sampling variance
if (!estimateIntcept) {
dn <- c(rownames(varEst), int.idx)
varEst <- rbind(cbind(varEst, 0), 0)
dimnames(varEst) <- list(dn, dn)
}
## save all indices in a vector
targetcol <- c(int.idx, slope.label)
## collect relevant estimates and variances to probe the interaction
usedBeta <- rbind(est$alpha[nameY,], betaNC) # point estimates
usedVar <- varEst[targetcol, targetcol] # sampling (co)variances
## function to calculate p values for Wald z tests
pvalue <- function(x) (1 - pnorm(abs(x))) * 2
## Calculate simple intercepts
simpleIntcept <- usedBeta[1] + usedBeta[3] * valProbe
varIntcept <- usedVar[1, 1] + 2 * valProbe * usedVar[1, 3] + (valProbe^2) * usedVar[3, 3]
zIntcept <- simpleIntcept / sqrt(varIntcept)
pIntcept <- pvalue(zIntcept)
resultIntcept <- data.frame(valProbe, simpleIntcept, sqrt(varIntcept), zIntcept, pIntcept)
colnames(resultIntcept) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultIntcept) <- c("lavaan.data.frame","data.frame")
## Calculate simple slopes
simpleSlope <- usedBeta[2] + usedBeta[4] * valProbe
varSlope <- usedVar[2, 2] + 2 * valProbe * usedVar[2, 4] + (valProbe^2) * usedVar[4, 4]
zSlope <- simpleSlope / sqrt(varSlope)
pSlope <- pvalue(zSlope)
resultSlope <- data.frame(valProbe, simpleSlope, sqrt(varSlope), zSlope, pSlope)
colnames(resultSlope) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultSlope) <- c("lavaan.data.frame","data.frame")
list(SimpleIntcept = resultIntcept, SimpleSlope = resultSlope)
}
## --------
## 2-way RC
## --------
##' Probing two-way interaction on the residual-centered latent interaction
##'
##' Probing interaction for simple intercept and simple slope for the
##' residual-centered latent two-way interaction (Geldhof et al., 2013)
##'
##' Before using this function, researchers need to make the products of the
##' indicators between the first-order factors and residualize the products by
##' the original indicators (Lance, 1988; Little, Bovaird, & Widaman, 2006). The
##' process can be automated by the [indProd()] function. Note that
##' the indicator products can be made for all possible combination or
##' matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
##' with the regression with latent interaction will be used to fit all original
##' indicators and the product terms. To use this function the model must be fit
##' with a mean structure. See the example for how to fit the product term
##' below. Once the lavaan result is obtained, this function will be used to
##' probe the interaction.
##'
##' The probing process on residual-centered latent interaction is based on
##' transforming the residual-centered result into the no-centered result. See
##' Geldhof et al. (2013) for further details. Note that this approach is based
##' on a strong assumption that the first-order latent variables are normally
##' distributed. The probing process is applied after the no-centered result
##' (parameter estimates and their covariance matrix among parameter estimates)
##' has been computed. See the [probe2WayMC()] for further details.
##'
##'
##' @importFrom lavaan lavInspect parTable
##' @importFrom stats pnorm
##' @importFrom methods getMethod
##'
##' @param fit A fitted [lavaan::lavaan-class] or
##' [lavaan.mi::lavaan.mi-class] object with a latent 2-way interaction.
##' @param nameX `character` vector of all 3 factor names used as the
##' predictors. The lower-order factors must be listed first, and the final
##' name must be the latent interaction factor.
##' @param nameY The name of factor that is used as the dependent variable.
##' @param modVar The name of factor that is used as a moderator. The effect of
##' the other independent factor will be probed at each value of the
##' moderator variable listed in `valProbe`.
##' @param valProbe The values of the moderator that will be used to probe the
##' effect of the focal predictor.
##' @param group In multigroup models, the label of the group for which the
##' results will be returned. Must correspond to one of
##' `lavInspect(fit, "group.label")`, or an integer
##' corresponding to which of those group labels.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Ignored unless `fit` is of
##' class [lavaan.mi::lavaan.mi-class]. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##'
##' @return A list with two elements:
##' \enumerate{
##' \item `SimpleIntercept`: The simple intercepts given each value of the
##' moderator.
##' \item `SimpleSlope`: The simple slopes given each value of the moderator.
##' }
##' In each element, the first column represents the values of the moderators
##' specified in the `valProbe` argument. The second column is the simple
##' intercept or simple slope. The third column is the standard error of the
##' simple intercept or slope. The fourth column is the Wald (*z*)
##' statistic, and the fifth column is the associated *p* value testing
##' the null hypothesis that each simple intercept or slope is 0.
##'
##' @author
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso \itemize{
##' \item [indProd()] For creating the indicator products with no
##' centering, mean centering, double-mean centering, or residual centering.
##' \item [probe2WayMC()] For probing the two-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe3WayMC()] For probing the three-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe3WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' \item [plotProbe()] Plot the simple intercepts and slopes of the
##' latent interaction.
##' }
##' @references
##'
##' Tutorial:
##'
##' Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
##' latent variable interactions using the `semTools` package.
##' *Psych, 3*(3), 322--335. \doi{10.3390/psych3030024}
##'
##' Background literature:
##'
##' Lance, C. E. (1988). Residual centering, exploratory and confirmatory
##' moderator analysis, and decomposition of effects in path models containing
##' interactions. *Applied Psychological Measurement, 12*(2), 163--175.
##' \doi{10.1177/014662168801200205}
##'
##' Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of
##' orthogonalizing powered and product terms: Implications for modeling
##' interactions. *Structural Equation Modeling, 13*(4), 497--519.
##' \doi{10.1207/s15328007sem1304_1}
##'
##' Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of
##' latent interactions: Evaluation of alternative estimation strategies and
##' indicator construction. *Psychological Methods, 9*(3), 275--300.
##' \doi{10.1037/1082-989X.9.3.275}
##'
##' Geldhof, G. J., Pornprasertmanit, S., Schoemann, A. M., & Little, T. D.
##' (2013). Orthogonalizing through residual centering: Extended applications
##' and caveats. *Educational and Psychological Measurement, 73*(1), 27--46.
##' \doi{10.1177/0013164412445473}
##'
##' @examples
##'
##' dat2wayRC <- orthogonalize(dat2way, 1:3, 4:6)
##'
##' model1 <- "
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f3 =~ x7 + x8 + x9
##' f3 ~ f1 + f2 + f12
##' f12 ~~ 0*f1 + 0*f2
##' x1 + x4 + x1.x4 + x7 ~ 0*1 # identify latent means
##' f1 + f2 + f12 + f3 ~ NA*1
##' "
##'
##' fitRC2way <- sem(model1, data = dat2wayRC, meanstructure = TRUE)
##' summary(fitRC2way)
##'
##' probe2WayRC(fitRC2way, nameX = c("f1", "f2", "f12"), nameY = "f3",
##' modVar = "f2", valProbe = c(-1, 0, 1))
##'
##'
##' ## can probe multigroup models, one group at a time
##' dat2wayRC$g <- 1:2
##'
##' model2 <- "
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f3 =~ x7 + x8 + x9
##' f3 ~ c(b1.g1, b1.g2)*f1 + c(b2.g1, b2.g2)*f2 + c(b12.g1, b12.g2)*f12
##' f12 ~~ 0*f1 + 0*f2
##' x1 + x4 + x1.x4 + x7 ~ 0*1 # identify latent means
##' f1 + f2 + f12 ~ NA*1
##' f3 ~ NA*1 + c(b0.g1, b0.g2)*1
##' "
##' fit2 <- sem(model2, data = dat2wayRC, group = "g")
##' probe2WayRC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
##' modVar = "f2", valProbe = c(-1, 0, 1)) # group = 1 by default
##' probe2WayRC(fit2, nameX = c("f1", "f2", "f12"), nameY = "f3",
##' modVar = "f2", valProbe = c(-1, 0, 1), group = 2)
##'
##' @export
probe2WayRC <- function(fit, nameX, nameY, modVar, valProbe, group = 1L,
omit.imps = c("no.conv","no.se")) {
## TDJ: verify class
if (inherits(fit, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(fit@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(fit@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(fit@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(fit@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(fit@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
## custom removal by imputation number
rm.imps <- omit.imps[ which(omit.imps %in% 1:length(useImps)) ]
if (length(rm.imps)) useImps[as.numeric(rm.imps)] <- FALSE
## whatever is left
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
} else if (!inherits(fit, "lavaan")) {
stop('"fit" must inherit from lavaan or lavaan.mi class', call. = FALSE)
}
if (!lavInspect(fit, "options")$meanstructure)
stop('This function requires the model to be fit with a mean structure.',
call. = FALSE)
# Check whether modVar is correct
if (is.character(modVar)) modVar <- match(modVar, nameX)
if (is.na(modVar) || !(modVar %in% 1:2))
stop("The moderator name is not in the name of independent factors or not 1 or 2.")
## TDJ: If multigroup, check group %in% group.label
nG <- lavInspect(fit, "ngroups")
if (nG > 1L) {
group.label <- lavInspect(fit, "group.label")
## assign numeric to character
if (is.numeric(group)) {
if (group %in% 1:nG) {
group <- group.label[group]
} else group <- as.character(group)
} else group <- as.character(group)
## check that character is a group
if (!as.character(group) %in% group.label)
stop('"group" must be a character string naming a group of interest, or ',
'an integer corresponding to a group in lavInspect(fit, "group.label")')
group.number <- which(group.label == group)
} else group.number <- 1L
## Get the parameter estimates for that group
if (nG > 1L) {
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")[[group]]
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
GLIST <- fit@coefList[useImps]
est$beta <- Reduce("+", lapply(GLIST, function(i) i[[group]]$beta)) / m
est$alpha <- Reduce("+", lapply(GLIST, function(i) i[[group]]$alpha)) / m
est$psi <- Reduce("+", lapply(GLIST, function(i) i[[group]]$psi)) / m
}
} else {
## single-group model
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
est$beta <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "beta")) / m
est$alpha <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "alpha")) / m
est$psi <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "psi")) / m
}
}
# Find the mean and covariance matrix of independent factors
varX <- est$psi[ nameX, nameX]
meanX <- est$alpha[nameX, , drop = FALSE]
# Find the intercept, regression coefficients, and residual variance of residual-centered regression
intceptRC <- est$alpha[nameY,]
resVarRC <- est$psi[ nameY, nameY]
betaRC <- t(est$beta[ nameY, nameX, drop = FALSE])
# Find the number of observations
numobs <- lavInspect(fit, "nobs")[group.number]
# Compute SSRC
meanXwith1 <- rbind(1, meanX)
varXwith0 <- cbind(0, rbind(0, varX))
SSRC <- numobs * (varXwith0 + (meanXwith1 %*% t(meanXwith1)))
# Compute Mean(Y) and Var(Y)
betaRCWithIntcept <- rbind(intceptRC, betaRC)
meanY <- t(meanXwith1) %*% betaRCWithIntcept
varY <- (t(betaRCWithIntcept) %*% SSRC %*% betaRCWithIntcept)/numobs - meanY^2 + resVarRC
# Compute Cov(Y, X)
covY <- as.matrix((varX %*% betaRC)[1:2,])
# Compute E(XZ)
meanX[3] <- meanX[1] * meanX[2] + varX[1, 2]
# Compute Var(XZ)
varX[3, 3] <- meanX[1]^2 * varX[2, 2] + meanX[2]^2 * varX[1, 1] + 2 * meanX[1] * meanX[2] * varX[1, 2] + varX[1, 1] * varX[2, 2] + varX[1, 2]^2
# Compute Cov(X, XZ), Cov(Z, XZ)
varX[1, 3] <- varX[3, 1] <- meanX[1] * varX[1, 2] + meanX[2] * varX[1, 1]
varX[2, 3] <- varX[3, 2] <- meanX[1] * varX[2, 2] + meanX[2] * varX[1, 2]
# Compute Cov(Y, XZ) and regression coefficients of no-centering
betaNC <- solve(varX[1:2,1:2], covY - rbind(varX[1,3] * betaRC[3,1], varX[2, 3] * betaRC[3,1]))
betaNC <- rbind(betaNC, betaRC[3, 1])
covY <- rbind(covY, (varX %*% betaNC)[3, 1])
# Aggregate the non-centering sufficient statistics (Just show how to do but not necessary)
fullCov <- rbind(cbind(varX, covY), c(covY, varY))
fullMean <- rbind(meanX, meanY)
# Compute the intercept of no-centering
intceptNC <- meanY - t(betaNC) %*% meanX
# Compute SSNC
betaNCWithIntcept <- rbind(intceptNC, betaNC)
meanXwith1 <- rbind(1, meanX)
varXwith0 <- rbind(0, cbind(0, varX))
SSNC <- numobs * (varXwith0 + (meanXwith1 %*% t(meanXwith1)))
# Compute residual variance on non-centering
resVarNC <- varY - (t(betaNCWithIntcept) %*% SSNC %*% betaNCWithIntcept)/numobs + meanY^2
## function to calculate p values for Wald z tests
pvalue <- function(x) (1 - pnorm(abs(x))) * 2
## Extract all sampling (co)variances
if (inherits(fit, "lavaan")) {
varEst <- lavaan::vcov(fit)
} else if (inherits(fit, "lavaan.mi")) {
varEst <- getMethod(f = "vcov", signature = "lavaan.mi",
where = getNamespace("lavaan.mi"))(fit, omit.imps = omit.imps)
}
## for indexing sampling (co)variances
PT <- parTable(fit)
int.idx <- PT$label[PT$lhs == nameY & PT$op == "~1" & PT$group == group.number]
if (int.idx == "") {
## no custom label, use default
int.idx <- paste0(nameY, "~1")
if (nG > 1L && group.number > 1L) {
int.idx <- paste0(int.idx, ".g", group.number)
}
}
## check intercept is actually estimated (thus, has sampling variance)
estimateIntcept <- int.idx %in% rownames(varEst)
## if intercept is not estimated, add a row/column with zero sampling variance
if (!estimateIntcept) {
dn <- c(rownames(varEst), int.idx)
varEst <- rbind(cbind(varEst, 0), 0)
dimnames(varEst) <- list(dn, dn)
}
## index slopes in varEst
slope.idx <- which(PT$lhs == nameY & PT$op == "~" & PT$rhs %in% nameX & PT$group == group.number)
## check for user-defined labels; otherwise, set defaults
slope.label <- PT$label[slope.idx]
if (any(slope.label == "")) for (i in which(slope.label == "")) {
slope.label[i] <- paste0(nameY, "~", nameX[i],
ifelse(nG > 1L && group.number > 1L, no = "",
yes = paste0(".g", group.number)))
}
## save all indices in a vector
targetcol <- c(int.idx, slope.label)
varEstSlopeRC <- varEst[targetcol, targetcol]
## collect relevant estimates and variances to probe the interaction
usedVar <- as.numeric(resVarNC/resVarRC) * (varEstSlopeRC %*% SSRC %*% solve(SSNC))
usedBeta <- betaNCWithIntcept
# Change the order of usedVar and usedBeta if the moderator variable is listed first
if (modVar == 1) {
usedVar <- usedVar[c(1, 3, 2, 4), c(1, 3, 2, 4)]
usedBeta <- usedBeta[c(1, 3, 2, 4)]
}
## Calculate simple intercepts
simpleIntcept <- usedBeta[1] + usedBeta[3] * valProbe
varIntcept <- usedVar[1, 1] + 2 * valProbe * usedVar[1, 3] + (valProbe^2) * usedVar[3, 3]
zIntcept <- simpleIntcept/sqrt(varIntcept)
pIntcept <- pvalue(zIntcept)
resultIntcept <- data.frame(valProbe, simpleIntcept, sqrt(varIntcept), zIntcept, pIntcept)
colnames(resultIntcept) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultIntcept) <- c("lavaan.data.frame","data.frame")
## Calculate simple slopes
simpleSlope <- usedBeta[2] + usedBeta[4] * valProbe
varSlope <- usedVar[2, 2] + 2 * valProbe * usedVar[2, 4] + (valProbe^2) * usedVar[4, 4]
zSlope <- simpleSlope/sqrt(varSlope)
pSlope <- pvalue(zSlope)
resultSlope <- data.frame(valProbe, simpleSlope, sqrt(varSlope), zSlope, pSlope)
colnames(resultSlope) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultSlope) <- c("lavaan.data.frame","data.frame")
list(SimpleIntcept = resultIntcept, SimpleSlope = resultSlope)
}
## --------
## 3-way MC
## --------
##' Probing three-way interaction on the no-centered or mean-centered latent
##' interaction
##'
##' Probing interaction for simple intercept and simple slope for the
##' no-centered or mean-centered latent two-way interaction
##'
##' Before using this function, researchers need to make the products of the
##' indicators between the first-order factors using mean centering (Marsh, Wen,
##' & Hau, 2004). Note that the double-mean centering may not be appropriate for
##' probing interaction if researchers are interested in simple intercepts. The
##' mean or double-mean centering can be done by the [indProd()]
##' function. The indicator products can be made for all possible combination or
##' matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
##' with the regression with latent interaction will be used to fit all original
##' indicators and the product terms. See the example for how to fit the product
##' term below. Once the lavaan result is obtained, this function will be used
##' to probe the interaction.
##'
##' Let that the latent interaction model regressing the dependent variable
##' (\eqn{Y}) on the independent variable (\eqn{X}) and two moderators (\eqn{Z}
##' and \eqn{W}) be \deqn{ Y = b_0 + b_1X + b_2Z + b_3W + b_4XZ + b_5XW + b_6ZW
##' + b_7XZW + r, } where \eqn{b_0} is the estimated intercept or the expected
##' value of \eqn{Y} when \eqn{X}, \eqn{Z}, and \eqn{W} are 0, \eqn{b_1} is the
##' effect of \eqn{X} when \eqn{Z} and \eqn{W} are 0, \eqn{b_2} is the effect of
##' \eqn{Z} when \eqn{X} and \eqn{W} is 0, \eqn{b_3} is the effect of \eqn{W}
##' when \eqn{X} and \eqn{Z} are 0, \eqn{b_4} is the interaction effect between
##' \eqn{X} and \eqn{Z} when \eqn{W} is 0, \eqn{b_5} is the interaction effect
##' between \eqn{X} and \eqn{W} when \eqn{Z} is 0, \eqn{b_6} is the interaction
##' effect between \eqn{Z} and \eqn{W} when \eqn{X} is 0, \eqn{b_7} is the
##' three-way interaction effect between \eqn{X}, \eqn{Z}, and \eqn{W}, and
##' \eqn{r} is the residual term.
##'
##' To probe a three-way interaction, the simple intercept of the independent
##' variable at the specific values of the moderators (Aiken & West, 1991) can
##' be obtained by
##' \deqn{ b_{0|X = 0, Z, W} = b_0 + b_2Z + b_3W + b_6ZW. }
##' The simple slope of the independent variable at the specific values of the
##' moderators can be obtained by
##' \deqn{ b_{X|Z, W} = b_1 + b_3Z + b_4W + b_7ZW.}
##' The variance of the simple intercept formula is
##' \deqn{ Var\left(b_{0|X = 0, Z, W}\right) =
##' Var\left(b_0\right) + Z^2Var\left(b_2\right) + W^2Var\left(b_3\right) +
##' Z^2W^2Var\left(b_6\right)}
##' \deqn{+ 2ZCov\left(b_0, b_2\right) + 2WCov\left(b_0, b_3\right) +
##' 2ZWCov\left(b_0, b_6\right) + 2ZWCov\left(b_2, b_3\right) +
##' 2Z^2WCov\left(b_2, b_6\right) + 2ZW^2Cov\left(b_3, b_6\right), }
##' where \eqn{Var} denotes the variance of a parameter estimate and \eqn{Cov}
##' denotes the covariance of two parameter estimates.
##'
##' The variance of the simple slope formula is
##' \deqn{ Var\left(b_{X|Z, W}\right) =
##' Var\left(b_1\right) + Z^2Var\left(b_4\right) + W^2Var\left(b_5\right)
##' + Z^2W^2Var\left(b_7\right) }
##' \deqn{+ 2ZCov\left(b_1, b_4\right) + 2WCov\left(b_1, b_5\right) +
##' 2ZWCov\left(b_1, b_7\right) + 2ZWCov\left(b_4, b_5\right) +
##' 2Z^2WCov\left(b_4, b_7\right) + 2ZW^2Cov\left(b_5, b_7\right). }
##'
##' Wald *z* statistics are calculated (even for objects of class
##' [lavaan.mi::lavaan.mi-class]) to test null hypotheses that simple
##' intercepts or slopes are 0.
##'
##'
##' @importFrom lavaan lavInspect parTable
##' @importFrom stats pnorm
##' @importFrom methods getMethod
##'
##' @param fit A fitted [lavaan::lavaan-class] or
##' [lavaan.mi::lavaan.mi-class] object with a latent 2-way interaction.
##' @param nameX `character` vector of all 7 factor names used as the
##' predictors. The 3 lower-order factors must be listed first, followed by
##' the 3 second-order factors (specifically, the 4th element must be the
##' interaction between the factors listed first and second, the 5th element
##' must be the interaction between the factors listed first and third, and
##' the 6th element must be the interaction between the factors listed second
##' and third). The final name will be the factor representing the 3-way
##' interaction.
##' @param nameY The name of factor that is used as the dependent variable.
##' @param modVar The name of two factors that are used as the moderators. The
##' effect of the independent factor will be probed at each combination of
##' the moderator variables' chosen values.
##' @param valProbe1 The values of the first moderator that will be used to
##' probe the effect of the independent factor.
##' @param valProbe2 The values of the second moderator that will be used to
##' probe the effect of the independent factor.
##' @param group In multigroup models, the label of the group for which the
##' results will be returned. Must correspond to one of
##' `lavInspect(fit, "group.label")`.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Ignored unless `fit` is of
##' class [lavaan.mi::lavaan.mi-class]. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##'
##' @return A list with two elements:
##' \enumerate{
##' \item `SimpleIntercept`: The model-implied intercepts given each
##' combination of moderator values.
##' \item `SimpleSlope`: The model-implied slopes given each combination
##' of moderator values.
##' }
##' In each element, the first column represents values of the first moderator
##' specified in the `valProbe1` argument. The second column represents
##' values of the second moderator specified in the `valProbe2` argument.
##' The third column is the simple intercept or simple slope. The fourth column
##' is the standard error of the simple intercept or simple slope. The fifth
##' column is the Wald (*z*) statistic, and the sixth column is its
##' associated *p* value to test the null hypothesis that each simple
##' intercept or simple slope equals 0.
##'
##' @author
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso \itemize{
##' \item [indProd()] For creating the indicator products with no
##' centering, mean centering, double-mean centering, or residual centering.
##' \item [probe2WayMC()] For probing the two-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe2WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' \item [probe3WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' \item [plotProbe()] Plot the simple intercepts and slopes of the
##' latent interaction.
##' }
##'
##' @references
##' Tutorial:
##'
##' Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
##' latent variable interactions using the `semTools` package.
##' *Psych, 3*(3), 322--335. \doi{10.3390/psych3030024}
##'
##' Background literature:
##'
##' Aiken, L. S., & West, S. G. (1991). *Multiple regression: Testing
##' and interpreting interactions*. Newbury Park, CA: Sage.
##'
##' Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of
##' latent interactions: Evaluation of alternative estimation strategies and
##' indicator construction. *Psychological Methods, 9*(3), 275--300.
##' \doi{10.1037/1082-989X.9.3.275}
##'
##' @examples
##'
##' dat3wayMC <- indProd(dat3way, 1:3, 4:6, 7:9)
##'
##' model3 <- " ## define latent variables
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f3 =~ x7 + x8 + x9
##' ## 2-way interactions
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f13 =~ x1.x7 + x2.x8 + x3.x9
##' f23 =~ x4.x7 + x5.x8 + x6.x9
##' ## 3-way interaction
##' f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
##' ## outcome variable
##' f4 =~ x10 + x11 + x12
##'
##' ## latent regression model
##' f4 ~ b1*f1 + b2*f2 + b3*f3 + b12*f12 + b13*f13 + b23*f23 + b123*f123
##'
##' ## orthogonal terms among predictors
##' ## (not necessary, but implied by double mean centering)
##' f1 ~~ 0*f12 + 0*f13 + 0*f123
##' f2 ~~ 0*f12 + 0*f23 + 0*f123
##' f3 ~~ 0*f13 + 0*f23 + 0*f123
##' f12 + f13 + f23 ~~ 0*f123
##' "
##'
##' fitMC3way <- sem(model3, data = dat3wayMC, meanstructure = TRUE)
##' summary(fitMC3way)
##'
##' probe3WayMC(fitMC3way, nameX = c("f1" ,"f2" ,"f3",
##' "f12","f13","f23", # this order matters!
##' "f123"), # 3-way interaction
##' nameY = "f4", modVar = c("f1", "f2"),
##' valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
##'
##' @export
probe3WayMC <- function(fit, nameX, nameY, modVar, valProbe1, valProbe2,
group = 1L, omit.imps = c("no.conv","no.se")) {
## TDJ: verify class
if (inherits(fit, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(fit@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(fit@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(fit@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(fit@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(fit@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
## custom removal by imputation number
rm.imps <- omit.imps[ which(omit.imps %in% 1:length(useImps)) ]
if (length(rm.imps)) useImps[as.numeric(rm.imps)] <- FALSE
## whatever is left
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
} else if (!inherits(fit, "lavaan")) {
stop('"fit" must inherit from lavaan or lavaan.mi class', call. = FALSE)
}
meanstruc <- lavInspect(fit, "options")$meanstructure
# Check whether modVar is correct
if (is.character(modVar)) modVar <- match(modVar, nameX)
if ((NA %in% modVar) || !(do.call("&", as.list(modVar %in% 1:3))))
stop("The moderator name is not in the list of independent factors and is not 1, 2 or 3.")
## TDJ: If multigroup, check group %in% group.label
nG <- lavInspect(fit, "ngroups")
if (nG > 1L) {
group.label <- lavInspect(fit, "group.label")
## assign numeric to character
if (is.numeric(group)) {
if (group %in% 1:nG) {
group <- group.label[group]
} else group <- as.character(group)
} else group <- as.character(group)
## check that character is a group
if (!as.character(group) %in% group.label)
stop('"group" must be a character string naming a group of interest, or ',
'an integer corresponding to a group in lavInspect(fit, "group.label")')
group.number <- which(group.label == group)
} else group.number <- 1L
## Get the parameter estimates for that group
if (nG > 1L) {
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")[[group]]
if (!meanstruc) {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
GLIST <- fit@coefList[useImps]
est$beta <- Reduce("+", lapply(GLIST, function(i) i[[group]]$beta)) / m
if (meanstruc) {
est$alpha <- Reduce("+", lapply(GLIST, function(i) i[[group]]$alpha)) / m
} else {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
}
} else {
## single-group model
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")
if (!meanstruc) {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
est$beta <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "beta")) / m
if (meanstruc) {
est$alpha <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "alpha")) / m
} else {
est$alpha <- matrix(0, 1, 1, dimnames = list(nameY, "intercept"))
}
}
}
## extract relevant slopes
betaNC <- matrix(est$beta[nameY, nameX], ncol = 1,
dimnames = list(nameX, nameY))
## Extract all sampling (co)variances
if (inherits(fit, "lavaan")) {
varEst <- lavaan::vcov(fit)
} else if (inherits(fit, "lavaan.mi")) {
varEst <- getMethod(f = "vcov", signature = "lavaan.mi",
where = getNamespace("lavaan.mi"))(fit, omit.imps = omit.imps)
}
## for indexing the sampling (co)variances
PT <- parTable(fit)
## index slopes in varEst
slope.idx <- which(PT$lhs == nameY & PT$op == "~" & PT$rhs %in% nameX & PT$group == group.number)
## check for user-defined labels; otherwise, set defaults
slope.label <- PT$label[slope.idx]
if (any(slope.label == "")) for (i in which(slope.label == "")) {
slope.label[i] <- paste0(nameY, "~", nameX[i],
ifelse(nG > 1L && group.number > 1L, no = "",
yes = paste0(".g", group.number)))
}
## index intercept in varEst
if (meanstruc) {
int.idx <- PT$label[PT$lhs == nameY & PT$op == "~1" & PT$group == group.number]
if (int.idx == "") {
## no custom label, use default
int.idx <- paste0(nameY, "~1")
if (nG > 1L && group.number > 1L) {
int.idx <- paste0(int.idx, ".g", group.number)
}
}
## check whether it is actually estimated (appears in $alpha and varEst)
estimateIntcept <- int.idx %in% rownames(varEst)
} else {
estimateIntcept <- FALSE
## use default label
int.idx <- paste0(nameY, "~1")
if (nG > 1L && group.number > 1L) {
int.idx <- paste0(int.idx, ".g", group.number)
}
}
## if intercept is not estimated, add a row/column with zero sampling variance
if (!estimateIntcept) {
dn <- c(rownames(varEst), int.idx)
varEst <- rbind(cbind(varEst, 0), 0)
dimnames(varEst) <- list(dn, dn)
}
## save all indices in a vector
targetcol <- c(int.idx, slope.label)
## collect relevant estimates and variances to probe the interaction
usedBeta <- rbind(est$alpha[nameY,], betaNC) # point estimates
usedVar <- varEst[targetcol, targetcol] # sampling (co)variances
## In case a moderator is listed first, find the order to rearrange
ord <- c(setdiff(1:3, modVar), modVar)
ord <- c(ord, 7 - rev(ord))
usedBeta <- usedBeta[c(1, ord+1, 8)]
usedVar <- usedVar[ c(1, ord+1, 8), c(1, ord+1, 8)]
## set all combinations of moderator values
val <- expand.grid(valProbe1, valProbe2)
## function to calculate p values for Wald z tests
pvalue <- function(x) (1 - pnorm(abs(x))) * 2
## Calculate simple intercepts
simpleIntcept <- usedBeta[1] + usedBeta[3] * val[,1] + usedBeta[4] * val[,2] + usedBeta[7] * val[,1] * val[,2]
varIntcept <- usedVar[1, 1] + val[,1]^2 * usedVar[3, 3] + val[,2]^2 * usedVar[4, 4] + val[,1]^2 * val[,2]^2 * usedVar[7, 7] + 2 * val[,1] * usedVar[1, 3] + 2 * val[,2] * usedVar[1, 4] + 2 * val[,1] * val[,2] * usedVar[1, 7] + 2 * val[,1] * val[,2] * usedVar[3, 4] + 2 * val[,1]^2 * val[,2] * usedVar[3, 7] + 2* val[,1] * val[,2]^2 * usedVar[4, 7]
zIntcept <- simpleIntcept / sqrt(varIntcept)
pIntcept <- pvalue(zIntcept)
resultIntcept <- cbind(val, simpleIntcept, sqrt(varIntcept), zIntcept, pIntcept)
colnames(resultIntcept) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultIntcept) <- c("lavaan.data.frame","data.frame")
## Calculate simple slopes
simpleSlope <- usedBeta[2] + usedBeta[5] * val[,1] + usedBeta[6] * val[,2] + usedBeta[8] * val[,1] * val[,2]
varSlope <- usedVar[2, 2] + val[,1]^2 * usedVar[5, 5] + val[,2]^2 * usedVar[6, 6] + val[,1]^2 * val[,2]^2 * usedVar[8, 8] + 2 * val[,1] * usedVar[2, 5] + 2 * val[,2] * usedVar[2, 6] + 2 * val[,1] * val[,2] * usedVar[2, 8] + 2 * val[,1] * val[,2] * usedVar[5, 6] + 2 * val[,1]^2 * val[,2] * usedVar[5, 8] + 2 * val[,1] * val[,2]^2 * usedVar[6, 8]
zSlope <- simpleSlope / sqrt(varSlope)
pSlope <- pvalue(zSlope)
resultSlope <- cbind(val, simpleSlope, sqrt(varSlope), zSlope, pSlope)
colnames(resultSlope) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultSlope) <- c("lavaan.data.frame","data.frame")
list(SimpleIntcept = resultIntcept, SimpleSlope = resultSlope)
}
## --------
## 3-way RC
## --------
##' Probing three-way interaction on the residual-centered latent interaction
##'
##' Probing interaction for simple intercept and simple slope for the
##' residual-centered latent three-way interaction (Geldhof et al., 2013)
##'
##' Before using this function, researchers need to make the products of the
##' indicators between the first-order factors and residualize the products by
##' the original indicators (Lance, 1988; Little, Bovaird, & Widaman, 2006). The
##' process can be automated by the [indProd()] function. Note that
##' the indicator products can be made for all possible combination or
##' matched-pair approach (Marsh et al., 2004). Next, the hypothesized model
##' with the regression with latent interaction will be used to fit all original
##' indicators and the product terms (Geldhof et al., 2013). To use this
##' function the model must be fit with a mean structure. See the example for
##' how to fit the product term below. Once the lavaan result is obtained, this
##' function will be used to probe the interaction.
##'
##' The probing process on residual-centered latent interaction is based on
##' transforming the residual-centered result into the no-centered result. See
##' Geldhof et al. (2013) for further details. Note that this approach based on
##' a strong assumption that the first-order latent variables are normally
##' distributed. The probing process is applied after the no-centered result
##' (parameter estimates and their covariance matrix among parameter estimates)
##' has been computed. See the [probe3WayMC()] for further details.
##'
##'
##' @importFrom lavaan lavInspect parTable
##' @importFrom stats pnorm
##' @importFrom methods getMethod
##'
##' @param fit A fitted [lavaan::lavaan-class] or
##' [lavaan.mi::lavaan.mi-class] object with a latent 2-way interaction.
##' @param nameX `character` vector of all 7 factor names used as the
##' predictors. The 3 lower-order factors must be listed first, followed by
##' the 3 second-order factors (specifically, the 4th element must be the
##' interaction between the factors listed first and second, the 5th element
##' must be the interaction between the factors listed first and third, and
##' the 6th element must be the interaction between the factors listed second
##' and third). The final name will be the factor representing the 3-way
##' interaction.
##' @param nameY The name of factor that is used as the dependent variable.
##' @param modVar The name of two factors that are used as the moderators. The
##' effect of the independent factor on each combination of the moderator
##' variable values will be probed.
##' @param valProbe1 The values of the first moderator that will be used to
##' probe the effect of the independent factor.
##' @param valProbe2 The values of the second moderator that will be used to
##' probe the effect of the independent factor.
##' @param group In multigroup models, the label of the group for which the
##' results will be returned. Must correspond to one of
##' `lavInspect(fit, "group.label")`.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Ignored unless `fit` is of
##' class [lavaan.mi::lavaan.mi-class]. Can include any of
##' `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##' default setting, which excludes any imputations that did not
##' converge or for which standard errors could not be computed. The
##' last option (`"no.npd"`) would exclude any imputations which
##' yielded a nonpositive definite covariance matrix for observed or
##' latent variables, which would include any "improper solutions" such
##' as Heywood cases. NPD solutions are not excluded by default because
##' they are likely to occur due to sampling error, especially in small
##' samples. However, gross model misspecification could also cause
##' NPD solutions, users can compare pooled results with and without
##' this setting as a sensitivity analysis to see whether some
##' imputations warrant further investigation.
##'
##' @return A list with two elements:
##' \enumerate{
##' \item `SimpleIntercept`: The model-implied intercepts given each
##' combination of moderator values.
##' \item `SimpleSlope`: The model-implied slopes given each combination
##' of moderator values.
##' }
##' In each element, the first column represents values of the first moderator
##' specified in the `valProbe1` argument. The second column represents
##' values of the second moderator specified in the `valProbe2` argument.
##' The third column is the simple intercept or simple slope. The fourth column
##' is the *SE* of the simple intercept or simple slope. The fifth column
##' is the Wald (*z*) statistic, and the sixth column is its associated
##' *p* value to test the null hypothesis that each simple intercept or
##' simple slope equals 0.
##'
##' @author
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso \itemize{
##' \item [indProd()] For creating the indicator products with no
##' centering, mean centering, double-mean centering, or residual centering.
##' \item [probe2WayMC()] For probing the two-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe3WayMC()] For probing the three-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe2WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' \item [plotProbe()] Plot the simple intercepts and slopes of the
##' latent interaction.
##' }
##'
##' @references
##' Tutorial:
##'
##' Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
##' latent variable interactions using the `semTools` package.
##' *Psych, 3*(3), 322--335. \doi{10.3390/psych3030024}
##'
##' Background literature:
##'
##' Geldhof, G. J., Pornprasertmanit, S., Schoemann, A., & Little,
##' T. D. (2013). Orthogonalizing through residual centering: Extended
##' applications and caveats. *Educational and Psychological Measurement,
##' 73*(1), 27--46. \doi{10.1177/0013164412445473}
##'
##' Lance, C. E. (1988). Residual centering, exploratory and confirmatory
##' moderator analysis, and decomposition of effects in path models containing
##' interactions. *Applied Psychological Measurement, 12*(2), 163--175.
##' \doi{10.1177/014662168801200205}
##'
##' Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of
##' orthogonalizing powered and product terms: Implications for modeling
##' interactions. *Structural Equation Modeling, 13*(4), 497--519.
##' \doi{10.1207/s15328007sem1304_1}
##'
##' Marsh, H. W., Wen, Z., & Hau, K. T. (2004). Structural equation models of
##' latent interactions: Evaluation of alternative estimation strategies and
##' indicator construction. *Psychological Methods, 9*(3), 275--300.
##' \doi{10.1037/1082-989X.9.3.275}
##'
##' Pornprasertmanit, S., Schoemann, A. M., Geldhof, G. J., & Little, T. D.
##' (submitted). *Probing latent interaction estimated with a residual
##' centering approach.*
##'
##' @examples
##'
##' dat3wayRC <- orthogonalize(dat3way, 1:3, 4:6, 7:9)
##'
##' model3 <- " ## define latent variables
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f3 =~ x7 + x8 + x9
##' ## 2-way interactions
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f13 =~ x1.x7 + x2.x8 + x3.x9
##' f23 =~ x4.x7 + x5.x8 + x6.x9
##' ## 3-way interaction
##' f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
##' ## outcome variable
##' f4 =~ x10 + x11 + x12
##'
##' ## latent regression model
##' f4 ~ b1*f1 + b2*f2 + b3*f3 + b12*f12 + b13*f13 + b23*f23 + b123*f123
##'
##' ## orthogonal terms among predictors
##' f1 ~~ 0*f12 + 0*f13 + 0*f123
##' f2 ~~ 0*f12 + 0*f23 + 0*f123
##' f3 ~~ 0*f13 + 0*f23 + 0*f123
##' f12 + f13 + f23 ~~ 0*f123
##'
##' ## identify latent means
##' x1 + x4 + x7 + x1.x4 + x1.x7 + x4.x7 + x1.x4.x7 + x10 ~ 0*1
##' f1 + f2 + f3 + f12 + f13 + f23 + f123 + f4 ~ NA*1
##' "
##'
##' fitRC3way <- sem(model3, data = dat3wayRC, meanstructure = TRUE)
##' summary(fitRC3way)
##'
##' probe3WayMC(fitRC3way, nameX = c("f1" ,"f2" ,"f3",
##' "f12","f13","f23", # this order matters!
##' "f123"), # 3-way interaction
##' nameY = "f4", modVar = c("f1", "f2"),
##' valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
##'
##' @export
probe3WayRC <- function(fit, nameX, nameY, modVar, valProbe1, valProbe2,
group = 1L, omit.imps = c("no.conv","no.se")) {
## TDJ: verify class
if (inherits(fit, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
useImps <- rep(TRUE, length(fit@DataList))
if ("no.conv" %in% omit.imps) useImps <- sapply(fit@convergence, "[[", i = "converged")
if ("no.se" %in% omit.imps) useImps <- useImps & sapply(fit@convergence, "[[", i = "SE")
if ("no.npd" %in% omit.imps) {
Heywood.lv <- sapply(fit@convergence, "[[", i = "Heywood.lv")
Heywood.ov <- sapply(fit@convergence, "[[", i = "Heywood.ov")
useImps <- useImps & !(Heywood.lv | Heywood.ov)
}
## custom removal by imputation number
rm.imps <- omit.imps[ which(omit.imps %in% 1:length(useImps)) ]
if (length(rm.imps)) useImps[as.numeric(rm.imps)] <- FALSE
## whatever is left
m <- sum(useImps)
if (m == 0L) stop('No imputations meet "omit.imps" criteria.')
useImps <- which(useImps)
} else if (!inherits(fit, "lavaan")) {
stop('"fit" must inherit from lavaan or lavaan.mi class', call. = FALSE)
}
if (!lavInspect(fit, "options")$meanstructure)
stop('This function requires the model to be fit with a mean structure.',
call. = FALSE)
# Check whether modVar is correct
if (is.character(modVar)) modVar <- match(modVar, nameX)
if ((NA %in% modVar) || !(do.call("&", as.list(modVar %in% 1:3))))
stop("The moderator name is not in the list of independent factors and is ",
"not 1, 2 or 3.") # JG: Changed error
## TDJ: If multigroup, check group %in% group.label
nG <- lavInspect(fit, "ngroups")
if (nG > 1L) {
group.label <- lavInspect(fit, "group.label")
## assign numeric to character
if (is.numeric(group)) {
if (group %in% 1:nG) {
group <- group.label[group]
} else group <- as.character(group)
} else group <- as.character(group)
## check that character is a group
if (!as.character(group) %in% group.label)
stop('"group" must be a character string naming a group of interest, or ',
'an integer corresponding to a group in lavInspect(fit, "group.label")')
group.number <- which(group.label == group)
} else group.number <- 1L
## Get the parameter estimates for that group
if (nG > 1L) {
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")[[group]]
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
GLIST <- fit@coefList[useImps]
est$beta <- Reduce("+", lapply(GLIST, function(i) i[[group]]$beta)) / m
est$alpha <- Reduce("+", lapply(GLIST, function(i) i[[group]]$alpha)) / m
est$psi <- Reduce("+", lapply(GLIST, function(i) i[[group]]$psi)) / m
}
} else {
## single-group model
if (inherits(fit, "lavaan")) {
est <- lavInspect(fit, "est")
} else if (inherits(fit, "lavaan.mi")) {
est <- list()
est$beta <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "beta")) / m
est$alpha <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "alpha")) / m
est$psi <- Reduce("+", lapply(fit@coefList[useImps], "[[", i = "psi")) / m
}
}
## Find the mean and covariance matrix of predictors
varX <- est$psi[ nameX, nameX]
meanX <-est$alpha[nameX, , drop = FALSE]
# Find the intercept, regression coefficients, and residual variance of residual-centered regression
intceptRC <- est$alpha[nameY, , drop = FALSE]
resVarRC <- est$psi[ nameY, nameY]
if (resVarRC < 0) warning("The residual variance is negative. The model did not converge on a proper solution.")
betaRC <- t(est$beta[nameY, nameX, drop = FALSE])
# Find the number of observations
numobs <- lavInspect(fit, "nobs")[group.number]
# Compute SSRC
meanXwith1 <- rbind(1, meanX)
varXwith0 <- cbind(0, rbind(0, varX))
SSRC <- numobs * (varXwith0 + (meanXwith1 %*% t(meanXwith1)))
# Compute Mean(Y) and Var(Y)
betaRCWithIntcept <- rbind(intceptRC, betaRC)
meanY <- t(meanXwith1) %*% betaRCWithIntcept
varY <- (t(betaRCWithIntcept) %*% SSRC %*% betaRCWithIntcept)/numobs - meanY^2 + resVarRC
# Compute Cov(Y, X)
covY <- as.matrix((varX %*% betaRC)[1:3,])
# Compute E(XZ), E(XW), E(ZW), E(XZW)
meanX[4] <- expect2NormProd(meanX[c(1,2)], varX[c(1,2), c(1,2)])
meanX[5] <- expect2NormProd(meanX[c(1,3)], varX[c(1,3), c(1,3)])
meanX[6] <- expect2NormProd(meanX[c(2,3)], varX[c(2,3), c(2,3)])
meanX[7] <- expect3NormProd(meanX[1:3], varX[1:3, 1:3])
# Compute Var(XZ), Var(XW), Var(ZW), Var(XZW)
varX[4, 4] <- var2NormProd(meanX[c(1,2)], varX[c(1,2), c(1,2)])
varX[5, 5] <- var2NormProd(meanX[c(1,3)], varX[c(1,3), c(1,3)])
varX[6, 6] <- var2NormProd(meanX[c(2,3)], varX[c(2,3), c(2,3)])
varX[7, 7] <- var3NormProd(meanX[1:3], varX[1:3, 1:3])
# Compute All covariances
varX[4, 1] <- varX[1, 4] <- expect3NormProd(meanX[c(1, 2, 1)], varX[c(1, 2, 1),c(1, 2, 1)]) - expect2NormProd(meanX[c(1,2)], varX[c(1,2),c(1,2)]) * meanX[1]
varX[5, 1] <- varX[1, 5] <- expect3NormProd(meanX[c(1, 3, 1)], varX[c(1, 3, 1),c(1, 3, 1)]) - expect2NormProd(meanX[c(1,3)], varX[c(1,3),c(1,3)]) * meanX[1]
varX[6, 1] <- varX[1, 6] <- expect3NormProd(meanX[c(2, 3, 1)], varX[c(2, 3, 1),c(2, 3, 1)]) - expect2NormProd(meanX[c(2,3)], varX[c(2,3),c(2,3)]) * meanX[1]
varX[7, 1] <- varX[1, 7] <- expect4NormProd(meanX[c(1,2,3,1)], varX[c(1,2,3,1),c(1,2,3,1)]) - expect3NormProd(meanX[c(1,2,3)], varX[c(1,2,3),c(1,2,3)]) * meanX[1]
varX[4, 2] <- varX[2, 4] <- expect3NormProd(meanX[c(1, 2, 2)], varX[c(1, 2, 2),c(1, 2, 2)]) - expect2NormProd(meanX[c(1,2)], varX[c(1,2),c(1,2)]) * meanX[2]
varX[5, 2] <- varX[2, 5] <- expect3NormProd(meanX[c(1, 3, 2)], varX[c(1, 3, 2),c(1, 3, 2)]) - expect2NormProd(meanX[c(1,3)], varX[c(1,3),c(1,3)]) * meanX[2]
varX[6, 2] <- varX[2, 6] <- expect3NormProd(meanX[c(2, 3, 2)], varX[c(2, 3, 2),c(2, 3, 2)]) - expect2NormProd(meanX[c(2,3)], varX[c(2,3),c(2,3)]) * meanX[2]
varX[7, 2] <- varX[2, 7] <- expect4NormProd(meanX[c(1,2,3,2)], varX[c(1,2,3,2),c(1,2,3,2)]) - expect3NormProd(meanX[c(1,2,3)], varX[c(1,2,3),c(1,2,3)]) * meanX[2]
varX[4, 3] <- varX[3, 4] <- expect3NormProd(meanX[c(1, 2, 3)], varX[c(1, 2, 3),c(1, 2, 3)]) - expect2NormProd(meanX[c(1,2)], varX[c(1,2),c(1,2)]) * meanX[3]
varX[5, 3] <- varX[3, 5] <- expect3NormProd(meanX[c(1, 3, 3)], varX[c(1, 3, 3),c(1, 3, 3)]) - expect2NormProd(meanX[c(1,3)], varX[c(1,3),c(1,3)]) * meanX[3]
varX[6, 3] <- varX[3, 6] <- expect3NormProd(meanX[c(2, 3, 3)], varX[c(2, 3, 3),c(2, 3, 3)]) - expect2NormProd(meanX[c(2,3)], varX[c(2,3),c(2,3)]) * meanX[3]
varX[7, 3] <- varX[3, 7] <- expect4NormProd(meanX[c(1,2,3,3)], varX[c(1,2,3,3),c(1,2,3,3)]) - expect3NormProd(meanX[c(1,2,3)], varX[c(1,2,3),c(1,2,3)]) * meanX[3]
varX[5, 4] <- varX[4, 5] <- expect4NormProd(meanX[c(1,3,1,2)], varX[c(1,3,1,2),c(1,3,1,2)]) - expect2NormProd(meanX[c(1,3)], varX[c(1,3),c(1,3)]) * expect2NormProd(meanX[c(1,2)], varX[c(1,2),c(1,2)])
varX[6, 4] <- varX[4, 6] <- expect4NormProd(meanX[c(2,3,1,2)], varX[c(2,3,1,2),c(2,3,1,2)]) - expect2NormProd(meanX[c(2,3)], varX[c(2,3),c(2,3)]) * expect2NormProd(meanX[c(1,2)], varX[c(1,2),c(1,2)])
varX[7, 4] <- varX[4, 7] <- expect5NormProd(meanX[c(1,2,3,1,2)], varX[c(1,2,3,1,2),c(1,2,3,1,2)]) - expect3NormProd(meanX[c(1, 2, 3)], varX[c(1, 2, 3),c(1, 2, 3)]) * expect2NormProd(meanX[c(1,2)], varX[c(1,2),c(1,2)])
varX[6, 5] <- varX[5, 6] <- expect4NormProd(meanX[c(2,3,1,3)], varX[c(2,3,1,3),c(2,3,1,3)]) - expect2NormProd(meanX[c(2,3)], varX[c(2,3),c(2,3)]) * expect2NormProd(meanX[c(1,3)], varX[c(1,3),c(1,3)])
varX[7, 5] <- varX[5, 7] <- expect5NormProd(meanX[c(1,2,3,1,3)], varX[c(1,2,3,1,3),c(1,2,3,1,3)]) - expect3NormProd(meanX[c(1, 2, 3)], varX[c(1, 2, 3),c(1, 2, 3)]) * expect2NormProd(meanX[c(1,3)], varX[c(1,3),c(1,3)])
varX[7, 6] <- varX[6, 7] <- expect5NormProd(meanX[c(1,2,3,2,3)], varX[c(1,2,3,2,3),c(1,2,3,2,3)]) - expect3NormProd(meanX[c(1, 2, 3)], varX[c(1, 2, 3),c(1, 2, 3)]) * expect2NormProd(meanX[c(2,3)], varX[c(2,3),c(2,3)])
# Find the meanX and varX without XZW
meanXReducedWith1 <- rbind(1, as.matrix(meanX[1:6]))
varXReducedWith0 <- cbind(0, rbind(0, varX[1:6, 1:6]))
SSMCReduced <- numobs * (varXReducedWith0 + (meanXReducedWith1 %*% t(meanXReducedWith1)))
# Find product of main and two-way onto three-way
covXZWwith0 <- rbind(0, as.matrix(varX[7, 1:6]))
meanXZWwith1 <- meanX[7] * meanXReducedWith1
SSXZW <- numobs * (covXZWwith0 + meanXZWwith1) # should the mean vector be squared (postmultiplied by its transpose)?
# Compute a vector and b4, b5, b6
a <- solve(SSMCReduced) %*% as.matrix(SSXZW)
betaTemp <- betaRC[4:6] - (as.numeric(betaRC[7]) * a[5:7])
betaTemp <- c(betaTemp, betaRC[7])
# Compute Cov(Y, XZ) and regression coefficients of no-centering
betaNC <- solve(varX[1:3,1:3], as.matrix(covY) - (t(varX[4:7, 1:3]) %*% as.matrix(betaTemp)))
betaNC <- rbind(as.matrix(betaNC), as.matrix(betaTemp))
covY <- rbind(covY, as.matrix((varX %*% betaNC)[4:7, 1]))
# Aggregate the non-centering sufficient statistics (Just show how to do but not necessary)
fullCov <- rbind(cbind(varX, covY), c(covY, varY))
fullMean <- rbind(meanX, meanY)
# Compute the intercept of no-centering
intceptNC <- meanY - t(betaNC) %*% meanX
# Compute SSNC
betaNCWithIntcept <- rbind(intceptNC, betaNC)
meanXwith1 <- rbind(1, meanX) #JG: redundant
varXwith0 <- rbind(0, cbind(0, varX)) #JG: redundant
SSNC <- numobs * (varXwith0 + (meanXwith1 %*% t(meanXwith1)))
# Compute residual variance on non-centering
resVarNC <- varY - (t(betaNCWithIntcept) %*% SSNC %*% betaNCWithIntcept)/numobs + meanY^2
## Extract all sampling (co)variances
if (inherits(fit, "lavaan")) {
varEst <- lavaan::vcov(fit)
} else if (inherits(fit, "lavaan.mi")) {
varEst <- getMethod(f = "vcov", signature = "lavaan.mi",
where = getNamespace("lavaan.mi"))(fit, omit.imps = omit.imps)
}
## for indexing sampling (co)variances
PT <- parTable(fit)
int.idx <- PT$label[PT$lhs == nameY & PT$op == "~1" & PT$group == group.number]
if (int.idx == "") {
## no custom label, use default
int.idx <- paste0(nameY, "~1")
if (nG > 1L && group.number > 1L) {
int.idx <- paste0(int.idx, ".g", group.number)
}
}
## check intercept is actually estimated (thus, has sampling variance)
estimateIntcept <- int.idx %in% rownames(varEst)
## if intercept is not estimated, add a row/column with zero sampling variance
if (!estimateIntcept) {
dn <- c(rownames(varEst), int.idx)
varEst <- rbind(cbind(varEst, 0), 0)
dimnames(varEst) <- list(dn, dn)
}
## index slopes in varEst
slope.idx <- which(PT$lhs == nameY & PT$op == "~" & PT$rhs %in% nameX & PT$group == group.number)
## check for user-defined labels; otherwise, set defaults
slope.label <- PT$label[slope.idx]
if (any(slope.label == "")) for (i in which(slope.label == "")) {
slope.label[i] <- paste0(nameY, "~", nameX[i],
ifelse(nG > 1L && group.number > 1L, no = "",
yes = paste0(".g", group.number)))
}
## save all indices in a vector
targetcol <- c(int.idx, slope.label)
varEstSlopeRC <- varEst[targetcol, targetcol]
## collect relevant estimates and variances to probe the interaction
usedVar <- as.numeric(resVarNC/resVarRC) * (varEstSlopeRC %*% SSRC %*% solve(SSNC))
usedBeta <- betaNCWithIntcept
## In case a moderator is listed first, find the order to rearrange
ord <- c(setdiff(1:3, modVar), modVar)
ord <- c(ord, 7 - rev(ord))
usedBeta <- usedBeta[c(1, ord+1, 8)]
usedVar <- usedVar[ c(1, ord+1, 8), c(1, ord+1, 8)]
## set all combinations of moderator values
val <- expand.grid(valProbe1, valProbe2)
## function to calculate p values for Wald z tests
pvalue <- function(x) (1 - pnorm(abs(x))) * 2
## Calculate simple intercepts
simpleIntcept <- usedBeta[1] + usedBeta[3] * val[,1] + usedBeta[4] * val[,2] + usedBeta[7] * val[,1] * val[,2]
varIntcept <- usedVar[1, 1] + val[,1]^2 * usedVar[3, 3] + val[,2]^2 * usedVar[4, 4] + val[,1]^2 * val[,2]^2 * usedVar[7, 7] + 2 * val[,1] * usedVar[1, 3] + 2 * val[,2] * usedVar[1, 4] + 2 * val[,1] * val[,2] * usedVar[1, 7] + 2 * val[,1] * val[,2] * usedVar[3, 4] + 2 * val[,1]^2 * val[,2] * usedVar[3, 7] + 2* val[,1] * val[,2]^2 * usedVar[4, 7]
zIntcept <- simpleIntcept / sqrt(varIntcept)
pIntcept <- pvalue(zIntcept)
resultIntcept <- cbind(val, simpleIntcept, sqrt(varIntcept), zIntcept, pIntcept)
colnames(resultIntcept) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultIntcept) <- c("lavaan.data.frame","data.frame")
## Calculate simple slopes
simpleSlope <- usedBeta[2] + usedBeta[5] * val[,1] + usedBeta[6] * val[,2] + usedBeta[8] * val[,1] * val[,2]
varSlope <- usedVar[2, 2] + val[,1]^2 * usedVar[5, 5] + val[,2]^2 * usedVar[6, 6] + val[,1]^2 * val[,2]^2 * usedVar[8, 8] + 2 * val[,1] * usedVar[2, 5] + 2 * val[,2] * usedVar[2, 6] + 2 * val[,1] * val[,2] * usedVar[2, 8] + 2 * val[,1] * val[,2] * usedVar[5, 6] + 2 * val[,1]^2 * val[,2] * usedVar[5, 8] + 2 * val[,1] * val[,2]^2 * usedVar[6, 8]
zSlope <- simpleSlope / sqrt(varSlope)
pSlope <- pvalue(zSlope)
resultSlope <- cbind(val, simpleSlope, sqrt(varSlope), zSlope, pSlope)
colnames(resultSlope) <- c(nameX[modVar], "est", "se", "z", "pvalue")
class(resultSlope) <- c("lavaan.data.frame","data.frame")
list(SimpleIntcept = resultIntcept, SimpleSlope = resultSlope)
}
## -----------------
## Plotting Function
## -----------------
##' Plot a latent interaction
##'
##' This function will plot the line graphs representing the simple effect of
##' the independent variable given the values of the moderator. For multigroup
##' models, it will only generate a plot for 1 group, as specified in the
##' function used to obtain the first argument.
##'
##' @note
##' If the `object` does not contain simple intercepts (i.e., if the
##' `object$SimpleIntcept` element is `NULL`), then all simple
##' intercepts are arbitrarily set to zero in order to plot the simple slopes.
##' This may not be consistent with the fitted model, but was (up until version
##' 0.5-7) the default behavior when the y-intercept was fixed to 0. In this case,
##' although the relative steepness of simple slopes can still meaningfully be
##' compared, the relative vertical positions of lines at any point along the
##' *x*-axis should not be interpreted.
##'
##' @param object A `list`, typically the result of probing a latent 2-way
##' or 3-way interaction obtained from the [probe2WayMC()],
##' [probe2WayRC()], [probe3WayMC()], or
##' [probe3WayRC()] functions.
##' @param xlim The vector of two numbers: the minimum and maximum values of the
##' independent variable
##' @param xlab The label of the x-axis
##' @param ylab The label of the y-axis
##' @param legend `logical`. If `TRUE` (default), a legend is printed.
##' @param legendArgs `list` of arguments passed to [legend()]
##' function if `legend=TRUE`.
##' @param \dots Any additional argument for the [plot()] function
##'
##' @return None. This function will plot the simple main effect only.
##'
##' @author
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso \itemize{
##' \item [indProd()] For creating the indicator products with no
##' centering, mean centering, double-mean centering, or residual centering.
##' \item [probe2WayMC()] For probing the two-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe3WayMC()] For probing the three-way latent interaction
##' when the results are obtained from mean-centering, or double-mean centering
##' \item [probe2WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' \item [probe3WayRC()] For probing the two-way latent interaction
##' when the results are obtained from residual-centering approach.
##' }
##'
##' @references
##'
##' Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and interpreting
##' latent variable interactions using the `semTools` package.
##' *Psych, 3*(3), 322--335. \doi{10.3390/psych3030024}
##'
##' @examples
##'
##' library(lavaan)
##'
##' dat2wayMC <- indProd(dat2way, 1:3, 4:6)
##'
##' model1 <- "
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f3 =~ x7 + x8 + x9
##' f3 ~ f1 + f2 + f12
##' f12 ~~ 0*f1
##' f12 ~~ 0*f2
##' x1 ~ 0*1
##' x4 ~ 0*1
##' x1.x4 ~ 0*1
##' x7 ~ 0*1
##' f1 ~ NA*1
##' f2 ~ NA*1
##' f12 ~ NA*1
##' f3 ~ NA*1
##' "
##'
##' fitMC2way <- sem(model1, data = dat2wayMC, meanstructure = TRUE)
##' result2wayMC <- probe2WayMC(fitMC2way, nameX = c("f1", "f2", "f12"),
##' nameY = "f3", modVar = "f2", valProbe = c(-1, 0, 1))
##' plotProbe(result2wayMC, xlim = c(-2, 2))
##'
##'
##' dat3wayMC <- indProd(dat3way, 1:3, 4:6, 7:9)
##'
##' model3 <- "
##' f1 =~ x1 + x2 + x3
##' f2 =~ x4 + x5 + x6
##' f3 =~ x7 + x8 + x9
##' f12 =~ x1.x4 + x2.x5 + x3.x6
##' f13 =~ x1.x7 + x2.x8 + x3.x9
##' f23 =~ x4.x7 + x5.x8 + x6.x9
##' f123 =~ x1.x4.x7 + x2.x5.x8 + x3.x6.x9
##' f4 =~ x10 + x11 + x12
##' f4 ~ f1 + f2 + f3 + f12 + f13 + f23 + f123
##' f1 ~~ 0*f12
##' f1 ~~ 0*f13
##' f1 ~~ 0*f123
##' f2 ~~ 0*f12
##' f2 ~~ 0*f23
##' f2 ~~ 0*f123
##' f3 ~~ 0*f13
##' f3 ~~ 0*f23
##' f3 ~~ 0*f123
##' f12 ~~ 0*f123
##' f13 ~~ 0*f123
##' f23 ~~ 0*f123
##' x1 ~ 0*1
##' x4 ~ 0*1
##' x7 ~ 0*1
##' x10 ~ 0*1
##' x1.x4 ~ 0*1
##' x1.x7 ~ 0*1
##' x4.x7 ~ 0*1
##' x1.x4.x7 ~ 0*1
##' f1 ~ NA*1
##' f2 ~ NA*1
##' f3 ~ NA*1
##' f12 ~ NA*1
##' f13 ~ NA*1
##' f23 ~ NA*1
##' f123 ~ NA*1
##' f4 ~ NA*1
##' "
##'
##' fitMC3way <- sem(model3, data = dat3wayMC, std.lv = FALSE,
##' meanstructure = TRUE)
##' result3wayMC <- probe3WayMC(fitMC3way, nameX = c("f1", "f2", "f3", "f12",
##' "f13", "f23", "f123"),
##' nameY = "f4", modVar = c("f1", "f2"),
##' valProbe1 = c(-1, 0, 1), valProbe2 = c(-1, 0, 1))
##' plotProbe(result3wayMC, xlim = c(-2, 2))
##'
##' @export
plotProbe <- function(object, xlim, xlab = "Indepedent Variable",
ylab = "Dependent Variable", legend = TRUE,
legendArgs = list(), ...) {
if (length(xlim) != 2) stop("The x-limit should be specified as a numeric",
" vector with the length of 2.")
# Extract simple slope
slope <- object$SimpleSlope
# Check whether the object is the two-way or three-way interaction result
numInt <- 2
if (ncol(slope) == 6) numInt <- 3
estSlope <- slope[, ncol(slope) - 3]
# Get whether the simple slope is significant. If so, the resulting lines will be
# shown as red. If not, the line will be black.
estSlopeSig <- (slope[, ncol(slope)] < 0.05) + 1
# Extract simple intercept. If the simple intercept is not provided, the intercept
# will be fixed as 0.
estIntercept <- NULL
if (is.null(object$SimpleIntcept)) {
warning("Simple intercepts unavailable from object. All lines therefore ",
"arbitrarily intersect at the origin. Relative vertical positions ",
"of lines at any point on the x-axis are therefore arbitrary and ",
"should not be interpreted. Only the relative steepness of simple ",
"slopes can meaningfully be compared.")
} else estIntercept <- object$SimpleIntcept[, ncol(slope) - 3]
if (numInt == 2) {
if (is.null(legendArgs$title)) legendArgs$title <- colnames(slope)[1]
if (is.null(legendArgs$legend)) legendArgs$legend <- slope[, 1]
plotSingleProbe(estSlope, estIntercept, xlim = xlim, xlab = xlab, ylab = ylab,
colLine = estSlopeSig, legend = legend,
legendArgs = legendArgs, ...)
} else if (numInt == 3) {
# Three-way interaction; separate lines for the first moderator, separate graphs
# for the second moderator
mod2 <- unique(slope[, 2])
mod1 <- unique(slope[, 1])
# Use multiple graphs in a figure
if (length(mod2) == 2) {
obj <- par(mfrow = c(1, 2))
} else if (length(mod2) == 3) {
obj <- par(mfrow = c(1, 3))
} else if (length(mod2) > 3) {
obj <- par(mfrow = c(2, ceiling(length(mod2)/2)))
} else if (length(mod2) == 1) {
# Intentionally leaving as blank
} else stop("Some errors occur")
for (i in 1:length(mod2)) {
select <- slope[, 2] == mod2[i]
if (is.null(legendArgs$title)) legendArgs$title <- colnames(slope)[1]
if (is.null(legendArgs$legend)) legendArgs$legend <- mod1
plotSingleProbe(estSlope[select], estIntercept[select], xlim = xlim,
xlab = xlab, ylab = ylab, colLine = estSlopeSig[select],
main = paste(colnames(slope)[2], "=", mod2[i]),
legend = legend, legendArgs = legendArgs, ...)
}
if (length(mod2) > 1)
par(obj)
} else {
stop("Please make sure that the object argument is obtained from",
" 'probe2wayMC', 'probe2wayRC', 'probe3wayMC', or 'probe3wayRC'.")
}
invisible(NULL)
}
## ----------------
## Hidden Functions
## ----------------
## Find the expected value of the product of two normal variates
## m = the mean of each normal variate
## s = the covariance matrix of all variates
expect2NormProd <- function(m, s) return(prod(m) + s[1, 2])
## Find the expected value of the product of three normal variates
## m = the mean of each normal variate
## s = the covariance matrix of all variates
expect3NormProd <- function(m, s) {
return(prod(m) + m[3] * s[1, 2] + m[2] * s[1, 3] + m[1] * s[2, 3])
}
## Find the expected value of the product of four normal variates
## m = the mean of each normal variate
## s = the covariance matrix of all variates
expect4NormProd <- function(m, s) {
first <- prod(m)
com <- utils::combn(1:4, 2)
forSecond <- function(draw, meanval, covval, index) {
draw2 <- setdiff(index, draw)
prod(meanval[draw2]) * covval[draw[1], draw[2]]
}
second <- sum(apply(com, 2, forSecond, meanval=m, covval=s, index=1:4))
com2 <- com[,1:3] #select only first three terms containing the first element only
forThird <- function(draw, covval, index) {
draw2 <- setdiff(index, draw)
covval[draw[1], draw[2]] * covval[draw2[1], draw2[2]]
}
third <- sum(apply(com2, 2, forThird, covval=s, index=1:4))
return(first + second + third)
}
## Find the expected value of the product of five normal variates
## m = the mean of each normal variate
## s = the covariance matrix of all variates
expect5NormProd <- function(m, s) {
first <- prod(m)
com <- utils::combn(1:5, 2)
forSecond <- function(draw, meanval, covval, index) {
draw2 <- setdiff(index, draw)
prod(meanval[draw2]) * covval[draw[1], draw[2]]
}
second <- sum(apply(com, 2, forSecond, meanval=m, covval=s, index=1:5))
com2 <- utils::combn(1:5, 4)
forThirdOuter <- function(index, m, s, indexall) {
targetMean <- m[setdiff(indexall, index)]
cominner <- utils::combn(index, 2)[,1:3] #select only first three terms containing the first element only
forThirdInner <- function(draw, covval, index) {
draw2 <- setdiff(index, draw)
covval[draw[1], draw[2]] * covval[draw2[1], draw2[2]]
}
thirdInner <- targetMean * sum(apply(cominner, 2, forThirdInner, covval=s, index=index))
return(thirdInner)
}
third <- sum(apply(com2, 2, forThirdOuter, m=m, s=s, indexall=1:5))
return(first + second + third)
}
## Find the variance of the product of two normal variates
## m = the mean of each normal variate
## s = the covariance matrix of all variates
var2NormProd <- function(m, s) {
first <- m[2]^2 * s[1, 1] + m[1]^2 * s[2, 2]
second <- 2 * m[1] * m[2] * s[1, 2]
third <- s[1, 1] * s[2, 2]
fourth <- s[1, 2]^2
return(first + second + third + fourth)
}
## Find the variance of the product of three normal variates
## m = the mean of each normal variate
## s = the covariance matrix of all variates
var3NormProd <- function(m, s) {
com <- utils::combn(1:3, 2)
forFirst <- function(draw, meanval, covval, index) {
# draw = 2, 3; draw2 = 1
draw2 <- setdiff(index, draw)
term1 <- meanval[draw[1]]^2 * meanval[draw[2]]^2 * covval[draw2, draw2]
term2 <- 2 * meanval[draw2]^2 * meanval[draw[1]] * meanval[draw[2]] * covval[draw[1], draw[2]]
term3 <- (meanval[draw2]^2 * covval[draw[1], draw[1]] * covval[draw[2], draw[2]]) + (meanval[draw2]^2 * covval[draw[1], draw[2]]^2)
term4 <- 4 * meanval[draw[1]] * meanval[draw[2]] * covval[draw2, draw2] * covval[draw[1], draw[2]]
term5 <- 6 * meanval[draw[1]] * meanval[draw[2]] * covval[draw2, draw[1]] * covval[draw2, draw[2]]
term1 + term2 + term3 + term4 + term5
}
first <- sum(apply(com, 2, forFirst, meanval=m, covval=s, index=1:3))
second <- prod(diag(s))
third <- 2 * s[3, 3] * s[1, 2]^2 + 2 * s[2, 2] * s[1, 3]^2 + 2 * s[1, 1] * s[2, 3]^2
fourth <- 8 * s[1, 2] * s[1, 3] * s[2, 3]
return(first + second + third + fourth)
}
## plotSingleProbe : plot the probing interaction result specific for only one moderator
## estSlope = slope of each line
## estIntercept = intercept of each line
## xlim = the minimum and maximum values of the independent variable (x-axis)
## xlab = the label for the independent variable
## ylab = the lable for the dependent variable
## main = the title of the graph
## colLine = the color of each line
## legend = whether to print a legend
## legendArgs = arguments to pass to legend() function
plotSingleProbe <- function(estSlope, estIntercept = NULL, xlim,
xlab = "Indepedent Variable",
ylab = "Dependent Variable", main = NULL,
colLine = "black", legend = TRUE,
legendArgs = list(), ...) {
if (is.null(estIntercept)) estIntercept <- rep(0, length(estSlope))
if (length(colLine) == 1) colLine <- rep(colLine, length(estSlope))
lower <- estIntercept + (xlim[1] * estSlope)
upper <- estIntercept + (xlim[2] * estSlope)
ylim <- c(min(c(lower, upper)), max(c(lower, upper)))
plot(cbind(xlim, ylim), xlim = xlim, ylim = ylim, type = "n",
xlab = xlab, ylab = ylab, main = main, ...)
for (i in 1:length(estSlope)) {
lines(cbind(xlim, c(lower[i], upper[i])),
col = colLine[i], lwd = 1.5, lty = i)
}
if (legend) {
positionX <- 0.25
if (all(estSlope > 0)) positionX <- 0.01
if (all(estSlope < 0)) positionX <- 0.50
if (is.null(legendArgs$x)) legendArgs$x <- positionX * (xlim[2] - xlim[1]) + xlim[1]
if (is.null(legendArgs$y)) legendArgs$y <- 0.99 * (ylim[2] - ylim[1]) + ylim[1]
if (is.null(legendArgs$col)) legendArgs$col <- colLine
if (is.null(legendArgs$lty)) legendArgs$lty <- 1:length(estSlope)
do.call(graphics::legend, legendArgs)
}
invisible(NULL)
}
semTools/R/powerAnalysisNested.R 0000644 0001762 0000144 00000020234 14632143377 016421 0 ustar ligges users ### Sunthud Pornprasertmanit, Bell Clinton, Pavel Panko
### Last updated: 10 January 2021
##' Find power given a sample size in nested model comparison
##'
##' Find the sample size that the power in rejection the samples from the
##' alternative pair of RMSEA is just over the specified power.
##'
##'
##' @importFrom stats qchisq pchisq
##'
##' @param rmsea0A The \eqn{H_0} baseline RMSEA
##' @param rmsea0B The \eqn{H_0} alternative RMSEA (trivial misfit)
##' @param rmsea1A The \eqn{H_1} baseline RMSEA
##' @param rmsea1B The \eqn{H_1} alternative RMSEA (target misfit to be rejected)
##' @param dfA degree of freedom of the more-restricted model
##' @param dfB degree of freedom of the less-restricted model
##' @param n Sample size
##' @param alpha The alpha level
##' @param group The number of group in calculating RMSEA
##'
##' @author Bell Clinton
##'
##' Pavel Panko (Texas Tech University; \email{pavel.panko@@ttu.edu})
##'
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso \itemize{
##' \item [plotRMSEApowernested()] to plot the statistical power for
##' nested model comparison based on population RMSEA given the sample size
##' \item [findRMSEAsamplesizenested()] to find the minium sample
##' size for a given statistical power in nested model comparison based on
##' population RMSEA
##' }
##'
##' @references
##' MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing
##' differences between nested covariance structure models: Power analysis and
##' null hypotheses. *Psychological Methods, 11*(1), 19--35.
##' \doi{10.1037/1082-989X.11.1.19}
##'
##' @examples
##'
##' findRMSEApowernested(rmsea0A = 0.06, rmsea0B = 0.05, rmsea1A = 0.08,
##' rmsea1B = 0.05, dfA = 22, dfB = 20, n = 200,
##' alpha = 0.05, group = 1)
##'
##' @export
findRMSEApowernested <- function(rmsea0A = NULL, rmsea0B = NULL, rmsea1A, rmsea1B = NULL, dfA, dfB, n, alpha = 0.05, group = 1) {
if(is.null(rmsea0A)) rmsea0A <- 0
if(is.null(rmsea0B)) rmsea0B <- 0
if(is.null(rmsea1B)) rmsea1B <- rmsea0B
if(dfA <= dfB) stop("The degree of freedom of the more-restricted model (dfA) should be greater than the degree of freedom of the less-restricted model (dfB)")
if(rmsea0A < rmsea0B) stop("In the null-hypothesis models, the RMSEA of the more-restricted model (rmsea0A) should have a higher than or equal to the RMSEA of the less-restricted model (rmsea0B).")
if(rmsea1A < rmsea1B) stop("In the alternative-hypothesis models, the RMSEA of the more-restricted model (rmsea1A) should have a higher than or equal to the RMSEA of the less-restricted model (rmsea1B).")
ddiff <- dfA-dfB
f0a <- (dfA*rmsea0A^2)/group
f0b <- (dfB*rmsea0B^2)/group
f1a <- (dfA*rmsea1A^2)/group
f1b <- (dfB*rmsea1B^2)/group
ncp0 <- (n-1)*(f0a-f0b)
ncp1 <- (n-1)*(f1a-f1b)
cval <- qchisq(1-alpha,ddiff,ncp0)
Power <- 1-pchisq(cval,ddiff,ncp1)
Power
}
##' Find sample size given a power in nested model comparison
##'
##' Find the sample size that the power in rejection the samples from the
##' alternative pair of RMSEA is just over the specified power.
##'
##'
##' @param rmsea0A The \eqn{H_0} baseline RMSEA
##' @param rmsea0B The \eqn{H_0} alternative RMSEA (trivial misfit)
##' @param rmsea1A The \eqn{H_1} baseline RMSEA
##' @param rmsea1B The \eqn{H_1} alternative RMSEA (target misfit to be rejected)
##' @param dfA degree of freedom of the more-restricted model.
##' @param dfB degree of freedom of the less-restricted model.
##' @param power The desired statistical power.
##' @param alpha The alpha level.
##' @param group The number of group in calculating RMSEA.
##'
##' @author Bell Clinton
##'
##' Pavel Panko (Texas Tech University; \email{pavel.panko@@ttu.edu})
##'
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso \itemize{
##' \item [plotRMSEApowernested()] to plot the statistical power for
##' nested model comparison based on population RMSEA given the sample size
##' \item [findRMSEApowernested()] to find the power for a given
##' sample size in nested model comparison based on population RMSEA
##' }
##'
##' @references
##' MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing
##' differences between nested covariance structure models: Power analysis and
##' null hypotheses. *Psychological Methods, 11*(1), 19--35.
##' \doi{10.1037/1082-989X.11.1.19}
##'
##' @examples
##'
##' findRMSEAsamplesizenested(rmsea0A = 0, rmsea0B = 0, rmsea1A = 0.06,
##' rmsea1B = 0.05, dfA = 22, dfB = 20, power = 0.80,
##' alpha = .05, group = 1)
##'
##' @export
findRMSEAsamplesizenested <- function(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, power = 0.80,
alpha = .05, group = 1) {
if(is.null(rmsea0A)) rmsea0A <- 0
if(is.null(rmsea0B)) rmsea0B <- 0
if(is.null(rmsea1B)) rmsea1B <- rmsea0B
if(dfA <= dfB) stop("The degree of freedom of the more-restricted model (dfA) should be greater than the degree of freedom of the less-restricted model (dfB)")
if(rmsea0A < rmsea0B) stop("In the null-hypothesis models, the RMSEA of the more-restricted model (rmsea0A) should have a higher than or equal to the RMSEA of the less-restricted model (rmsea0B).")
if(rmsea1A < rmsea1B) stop("In the alternative-hypothesis models, the RMSEA of the more-restricted model (rmsea1A) should have a higher than or equal to the RMSEA of the less-restricted model (rmsea1B).")
n <- 5:500
pow <- findRMSEApowernested(rmsea0A, rmsea0B, rmsea1A, rmsea1B, dfA, dfB, n, alpha, group = group)
if(all(pow > power)) {
return("Sample Size <= 5")
} else if (all(power > pow)) {
repeat {
n <- n + 500
pow <- findRMSEApowernested(rmsea0A, rmsea0B, rmsea1A, rmsea1B, dfA, dfB, n, alpha, group = group)
if(any(pow > power)) {
index <- which(pow > power)[1]
return(n[index]/group)
}
}
} else {
index <- which(pow > power)[1]
return(n[index]/group)
}
}
##' Plot power of nested model RMSEA
##'
##' Plot power of nested model RMSEA over a range of possible sample sizes.
##'
##'
##' @param rmsea0A The \eqn{H_0} baseline RMSEA
##' @param rmsea0B The \eqn{H_0} alternative RMSEA (trivial misfit)
##' @param rmsea1A The \eqn{H_1} baseline RMSEA
##' @param rmsea1B The \eqn{H_1} alternative RMSEA (target misfit to be rejected)
##' @param dfA degree of freedom of the more-restricted model
##' @param dfB degree of freedom of the less-restricted model
##' @param nlow Lower bound of sample size
##' @param nhigh Upper bound of sample size
##' @param steps Step size
##' @param alpha The alpha level
##' @param group The number of group in calculating RMSEA
##' @param \dots The additional arguments for the plot function.
##'
##' @author Bell Clinton
##'
##' Pavel Panko (Texas Tech University; \email{pavel.panko@@ttu.edu})
##'
##' Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso \itemize{
##' \item [findRMSEApowernested()] to find the power for a given
##' sample size in nested model comparison based on population RMSEA
##' \item [findRMSEAsamplesizenested()] to find the minium sample
##' size for a given statistical power in nested model comparison based on
##' population RMSEA
##' }
##'
##' @references
##' MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing
##' differences between nested covariance structure models: Power analysis and
##' null hypotheses. *Psychological Methods, 11*(1), 19--35.
##' \doi{10.1037/1082-989X.11.1.19}
##'
##' @examples
##'
##' plotRMSEApowernested(rmsea0A = 0, rmsea0B = 0, rmsea1A = 0.06,
##' rmsea1B = 0.05, dfA = 22, dfB = 20, nlow = 50,
##' nhigh = 500, steps = 1, alpha = .05, group = 1)
##'
##' @export
plotRMSEApowernested <- function(rmsea0A = NULL, rmsea0B = NULL, rmsea1A,
rmsea1B = NULL, dfA, dfB, nlow, nhigh,
steps = 1, alpha = .05, group = 1, ...){
nseq <- seq(nlow,nhigh, by=steps)
pow1 <- findRMSEApowernested(rmsea0A = rmsea0A, rmsea0B = rmsea0B, rmsea1A = rmsea1A, rmsea1B = rmsea1B, dfA = dfA, dfB = dfB, n = nseq, alpha = alpha, group = group)
plot(nseq, pow1, xlab="Sample Size", ylab="Power", main="Compute Power for Nested RMSEA", type="l", ...)
}
semTools/R/fmi.R 0000644 0001762 0000144 00000030523 14764331327 013173 0 ustar ligges users ### Mauricio Garnier Villarreal & Terrence D. Jorgensen
### Last updated: 12 March 2025
### This function estimates the Fraction of Missing Information for means and
### (co)variances of each variable in a partially observed data set or from
### a list of multiple imputed data sets
##' Fraction of Missing Information.
##'
##' This function estimates the Fraction of Missing Information (FMI) for
##' summary statistics of each variable, using either an incomplete data set or
##' a list of imputed data sets.
##'
##' The function estimates a saturated model with [lavaan::lavaan()] for a
##' single incomplete data set using FIML, or with [lavaan.mi::lavaan.mi()]
##' for a list of imputed data sets. If method = `"saturated"`, FMI will be
##' estiamted for all summary statistics, which could take a lot of time with
##' big data sets. If method = `"null"`, FMI will only be estimated for
##' univariate statistics (e.g., means, variances, thresholds). The saturated
##' model gives more reliable estimates, so it could also help to request a
##' subset of variables from a large data set.
##'
##'
##' @importFrom lavaan lavListInspect lavInspect
##'
##' @param data Either a single `data.frame` with incomplete observations,
##' or a `list` of imputed data sets.
##' @param method character. If `"saturated"` or `"sat"` (default),
##' the model used to estimate FMI is a freely estimated covariance matrix and
##' mean vector for numeric variables, and/or polychoric correlations and
##' thresholds for ordered categorical variables, for each group (if
##' applicable). If `"null"`, only means and variances are estimated for
##' numeric variables, and/or thresholds for ordered categorical variables
##' (i.e., covariances and/or polychoric/polyserial correlations are
##' constrained to zero). See **Details** for more information.
##' @param group `character`. The optional name of a grouping variable, to
##' request FMI in each group.
##' @param ords Optional `character` vector naming ordered-categorical
##' variables, if they are not already stored as class `ordered` in `data`.
##' @param varnames Optional `character` vector of variable names, to calculate
##' FMI for a subset of variables in `data`. By default, all numeric and
##' `ordered=` variables will be included, unless `data=` is a single
##' incomplete `data.frame`, in which case only numeric variables can be
##' used with FIML estimation. Other variable types will be removed.
##' @param exclude Optional `character` vector naming variables to exclude from
##' the analysis.
##' @param return.fit logical. If `TRUE`, the fitted [lavaan::lavaan-class] or
##' [lavaan.mi::lavaan.mi-class] model is returned, so FMI can be found from
##' `summary(..., fmi=TRUE)`.
##'
##' @return `fmi()` returns a list with at least 2 of the following:
##'
##' \item{Covariances}{A list of symmetric matrices: (1) the estimated/pooled
##' covariance matrix, or a list of group-specific matrices (if applicable)
##' and (2) a matrix of FMI, or a list of group-specific matrices (if
##' applicable). Only available if `method = "saturated"`. When
##' `method="cor"`, this element is replaced by `Correlations`.}
##' \item{Variances}{The estimated/pooled variance for each numeric variable.
##' Only available if `method = "null"` (otherwise, it is on the diagonal
##' of Covariances).}
##' \item{Means}{The estimated/pooled mean for each numeric variable.}
##' \item{Thresholds}{The estimated/pooled threshold(s) for each
##' ordered-categorical variable.}
##'
##' @author
##' Mauricio Garnier Villarreal (Vrije Universiteit Amsterdam; \email{m.garniervillarreal@@vu.nl})
##'
##' Terrence Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Rubin, D. B. (1987). *Multiple imputation for nonresponse in surveys*.
##' New York, NY: Wiley.
##'
##' Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction
##' of missing information from full information maximum likelihood.
##' *Structural Equation Modeling, 19*(3), 477--494.
##' \doi{10.1080/10705511.2012.687669}
##'
##' Wagner, J. (2010). The fraction of missing information as a tool for
##' monitoring the quality of survey data. *Public Opinion Quarterly,
##' 74*(2), 223--243. \doi{10.1093/poq/nfq007}
##'
##' @examples
##'
##' HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
##' "ageyr","agemo","school")]
##' set.seed(12345)
##' HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
##' age <- HSMiss$ageyr + HSMiss$agemo/12
##' HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
##'
##' ## calculate FMI (using FIML, provide partially observed data set)
##' (out1 <- fmi(HSMiss, exclude = "school"))
##' (out2 <- fmi(HSMiss, exclude = "school", method = "null"))
##' (out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
##' (out4 <- fmi(HSMiss, method = "cor", group = "school")) # correlations by group
##'
##' ## significance tests in lavaan(.mi) object
##' out5 <- fmi(HSMiss, method = "cor", return.fit = TRUE)
##' summary(out5) # factor loading == SD, covariance = correlation
##'
##' if(requireNamespace("lavaan.mi")){
##' ## ordered-categorical data
##' data(binHS5imps, package = "lavaan.mi")
##'
##' ## calculate FMI, using list of imputed data sets
##' fmi(binHS5imps, group = "school")
##' }
##'
##' @export
fmi <- function(data, method = "saturated", group = NULL, ords = NULL,
varnames = NULL, exclude = NULL, return.fit = FALSE) {
fiml <- is.data.frame(data)
## check for single data set or list of imputed data sets
data1 <- if (fiml) data else data[[1]]
## select user-specified variables
vars <- if (is.character(varnames)) varnames else colnames(data1)
## remove grouping variable and user-specified exclusions, if applicable
vars <- setdiff(vars, c(group, exclude))
## check classes
ordvars <- vars[sapply(data1[vars], is.ordered)]
if (!is.null(ords)) ordvars <- c(ordvars, ords)
numvars <- vars[sapply(data1[vars], is.numeric)]
vars <- union(numvars, ordvars)
numvars <- setdiff(vars, ordvars)
if (fiml) {
#TODO: enable estimator = "PML"
# pass missing= option as another fmi() argument?
if (length(ordvars)) message(c("By providing a single data set, only the ",
"FIML option is available to calculate FMI,",
" which requires continuous variables. The ",
"following variables were removed: ",
paste(ordvars, collapse = ", ")))
if (!length(numvars)) stop("No numeric variables were provided.")
vars <- numvars
}
## construct model
if (method == "saturated" | method == "sat") {
covstruc <- outer(vars, vars, function(x, y) paste(x, "~~", y))
diag(covstruc)[which(vars %in% ordvars)] <- ""
model <- covstruc[lower.tri(covstruc, diag = TRUE)]
} else if (method == "null") {
covstruc <- outer(vars, vars, function(x, y) paste(x, "~~", y))
diag(covstruc)[which(vars %in% ordvars)] <- ""
model <- diag(covstruc)
} else if (method == "cor") {
# phantoms <- paste0(".", vars, ".")
model <- c(paste0(".", vars, ". =~ ", ifelse(vars %in% ordvars, "1*", "NA*"),
vars), # loadings = SDs (fixed to 1 when ordinal)
paste0(vars, " ~~ 0*", vars)) # "residual" variances = 0
} else stop('Invalid method= argument: "', method, '"')
if (length(numvars)) model <- c(model, paste(numvars, "~ 1"))
## fit model
if (fiml) {
if (method == "cor") {
fit <- lavaan::cfa(model, data = data, missing = "fiml", group = group,
std.lv = TRUE)
} else {
fit <- lavaan::lavaan(model, data = data, missing = "fiml", group = group)
}
if (return.fit) return(fit)
comb.results <- lavaan::parameterEstimates(fit, fmi = TRUE, zstat = FALSE,
pvalue = FALSE, ci = FALSE)
nG <- lavInspect(fit, "ngroups")
if (nG == 1L) comb.results$group <- 1L
group.label <- lavInspect(fit, "group.label")
} else {
## list of imputations
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
if (method == "cor") {
fit <- lavaan.mi::cfa.mi(model, data = data, group = group,
ordered = ordvars, std.lv = TRUE)
} else {
fit <- lavaan.mi::lavaan.mi(model, data = data, group = group,
ordered = ordvars, auto.th = TRUE)
}
if (return.fit) return(fit)
comb.results <- lavaan.mi::parameterEstimates.mi(fit, fmi = TRUE,
zstat = FALSE,
pvalue = FALSE, ci = FALSE)
nG <- lavListInspect(fit, "ngroups")
if (nG == 1L) comb.results$group <- 1L
group.label <- lavListInspect(fit, "group.label")
#FIXME: also return RIV? Or make it an argument (FMI or RIV)
comb.results$riv <- NULL
}
## Variances from null model, if applicable
if (method == "null") {
if (length(numvars)) {
Variances <- comb.results[comb.results$lhs == comb.results$rhs,
c("lhs","group","est","fmi")]
colnames(Variances)[c(1, 3)] <- c("variable","coef")
if (nG > 1L) Variances$group <- group.label[Variances$group]
class(Variances) <- c("lavaan.data.frame","data.frame")
attr(Variances, "header") <- paste("Null-model estimates may not be as",
"accurate as saturated-model estimates.")
## start list of results
results <- list(Variances = Variances)
} else results <- list()
} else {
## covariances from saturated model, including polychorics (if applicable)
if (fiml) {
if (nG == 1L) {
covmat <- lavInspect(fit, "est")[[ifelse(method == "cor", "psi", "theta")]]
covmat <- list(covmat)
} else {
covmat <- sapply(lavInspect(fit, "est"), "[[",
i = ifelse(method == "cor", "psi", "theta"),
simplify = FALSE)
}
} else {
useImps <- sapply(fit@convergence, "[[", "converged")
m <- sum(useImps)
if (nG == 1L) {
CovList <- lapply(fit@coefList[useImps],
function(x) x[[ifelse(method == "cor", "psi", "theta")]])
covmat <- list(Reduce("+", CovList) / m)
} else {
covmat <- list()
for (i in group.label) {
groupList <- lapply(fit@coefList[useImps],"[[", i)
CovList <- lapply(groupList, function(x) x[[ifelse(method == "cor", "psi", "theta")]])
covmat[[i]] <- Reduce("+", CovList) / m
}
}
}
fmimat <- covmat
covars <- comb.results[comb.results$op == "~~", c("lhs","rhs","group","est","fmi")]
for (i in 1:nG) {
theseCovars <- covars[covars$group == i,]
if (method == "cor") {
phantomRows <- !(theseCovars$lhs %in% vars)
theseCovars <- theseCovars[phantomRows,]
}
fmimat[[i]][as.matrix(theseCovars[, 1:2])] <- theseCovars$fmi
fmimat[[i]][as.matrix(theseCovars[, 2:1])] <- theseCovars$fmi
if (method == "cor") {
## reset variable names
dimnames(fmimat[[i]]) <- dimnames(covmat[[i]]) <- list(vars, vars)
}
}
if (nG == 1L) {
Covariances <- list(coef = covmat[[1]], fmi = fmimat[[1]])
} else Covariances <- list(coef = covmat, fmi = fmimat)
## start list of results
results <- setNames(list(Covariances),
nm = ifelse(method == "cor", "Correlations", "Covariances"))
}
## Means, if applicable
if (length(numvars)) {
results$Means <- comb.results[comb.results$op == "~1" & comb.results$lhs %in% numvars,
c("lhs","group","est","fmi")]
colnames(results$Means)[c(1, 3)] <- c("variable","coef")
if (nG > 1L) results$Means$group <- group.label[results$Means$group]
class(results$Means) <- c("lavaan.data.frame","data.frame")
}
## Thresholds, if applicable
if (length(ordvars)) {
results$Thresholds <- comb.results[comb.results$op == "|",
c("lhs","rhs","group","est","fmi")]
colnames(results$Thresholds)[c(1, 2, 4)] <- c("variable","threshold","coef")
if (nG > 1L) results$Thresholds$group <- group.label[results$Thresholds$group]
class(results$Thresholds) <- c("lavaan.data.frame","data.frame")
}
results
}
semTools/R/monteCarloCI.R 0000644 0001762 0000144 00000036737 15142343762 014751 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 9 February 2026
## from https://www.da.ugent.be/cvs/pages/en/Presentations/Presentation%20Yves%20Rosseel.pdf
# dd <- read.table("https://www.statmodel.com/examples/shortform/4cat%20m.dat",
# col.names = c("intention", "intervention", "ciguse", "w"))
# myData <- do.call(rbind, lapply(1:nrow(dd), function(RR) {
# data.frame(rep(1, dd$w[RR]) %*% as.matrix(dd[RR, 1:3]))
# }))
# model <- '
# ciguse ~ c*intervention + b*intention
# intention ~ a*intervention
# # label threshold for ciguse
# ciguse | b0*t1
# # biased SEs
# naive.indirect := a*b
# naive.direct := c
# # correct
# probit11 := (-b0+c+b*a)/sqrt(b^2+1)
# probit10 := (-b0+c )/sqrt(b^2+1)
# probit00 := (-b0 )/sqrt(b^2+1)
# indirect := pnorm(probit11) - pnorm(probit10)
# direct := pnorm(probit10) - pnorm(probit00)
# OR.indirect := (pnorm(probit11)/(1-pnorm(probit11))) / (pnorm(probit10)/(1-pnorm(probit10)))
# OR.direct := (pnorm(probit10)/(1-pnorm(probit10))) / (pnorm(probit00)/(1-pnorm(probit00)))
# '
# fit <- sem(model, data = myData, ordered = c("ciguse","intention"))
# summary(fit, ci = TRUE)
##' Monte Carlo Confidence Intervals to Test Functions of Parameter Estimates
##'
##' Robust confidence intervals for functions of parameter estimates,
##' based on empirical sampling distributions of estimated model parameters.
##'
##' This function implements the Monte Carlo method of obtaining an empirical
##' sampling distribution of estimated model parameters, as described by
##' MacKinnon et al. (2004) for testing indirect effects in mediation models.
##' This is essentially a parametric bootstrap method, which (re)samples
##' parameters (rather than raw data) from a multivariate-normal distribution
##' with mean vector equal to estimates in `coef()` and covariance matrix
##' equal to the asymptotic covariance matrix `vcov()` of estimated parameters.
##'
##' The easiest way to use the function is to fit a SEM to data with
##' [lavaan::lavaan()], using the `:=` operator in the
##' [lavaan::model.syntax()] to specify user-defined parameters.
##' All information is then available in the resulting
##' [lavaan::lavaan-class] object. Alternatively (especially when using
##' external SEM software to fit the model), the expression(s) can be explicitly
##' passed to the function, along with the vector of estimated model parameters
##' and their associated asymptotic sampling covariance matrix (ACOV).
##' For further information on the Monte Carlo method, see MacKinnon et al.
##' (2004) and Preacher & Selig (2012).
##'
##' The asymptotic covariance matrix can be obtained easily from many popular
##' SEM software packages.
##' \itemize{
##' \item{LISREL: Including the EC option on the OU line will print the ACM
##' to a seperate file. The file contains the lower triangular elements of
##' the ACM in free format and scientific notation.}
##' \item{M*plus*: Include the command TECH3; in the OUTPUT section.
##' The ACM will be printed in the output.}
##' \item{`lavaan`: Use the [vcov()] method on the fitted [lavaan::lavaan-class]
##' object to return the ACM.}
##' }
##'
##'
##' @importFrom stats quantile
##' @importFrom methods getMethod
##' @importFrom lavaan parTable lavInspect
##'
##' @param object A object of class [lavaan::lavaan-class] in which
##' functions of parameters have already been defined using the `:=`
##' operator in `lavaan`'s [lavaan::model.syntax()]. When
##' `NULL`, users must specify `expr`, `coefs`, and `ACM`.
##' @param expr Optional `character` vector specifying functions of model
##' parameters (e.g., an indirect effect). Ideally, the vector should have
##' names, which is necessary if any user-defined parameters refer to other
##' user-defined parameters defined earlier in the vector (order matters!).
##' All parameters appearing in the vector must be provided in `coefs`,
##' or defined (as functions of `coefs`) earlier in `expr`. If
##' `length(expr) > 1L`, `nRep` samples will be drawn
##' simultaneously from a single multivariate distribution; thus,
##' `ACM` must include all parameters in `coefs`.
##' @param coefs `numeric` vector of parameter estimates used in
##' `expr`. Ignored when `object` is used.
##' @param ACM Symmetric `matrix` representing the asymptotic sampling
##' covariance matrix (ACOV) of the parameter estimates in `coefs`.
##' Ignored when `object` is used. Information on how to obtain the ACOV
##' in popular SEM software is described in **Details**.
##' @param nRep `integer`. The number of samples to draw, to obtain an
##' empirical sampling distribution of model parameters. Many thousand are
##' recommended to minimize Monte Carlo error of the estimated CIs.
##' @param standardized `logical` indicating whether to obtain CIs for the
##' fully standardized (`"std.all"`) estimates, using their asymptotic
##' sampling covariance matrix.
##' @param fast `logical` indicating whether to use a fast algorithm that
##' assumes all functions of parameters (in `object` or `expr`) use
##' standard operations. Set to `FALSE` if using (e.g.) [c()]
##' to concatenate parameters in the definition, which would have unintended
##' consequences when vectorizing functions in `expr` across sampled
##' parameters.
##' @param level `numeric` confidence level, between 0--1
##' @param na.rm `logical` passed to [stats::quantile()]
##' @param append.samples `logical` indicating whether to return the
##' simulated empirical sampling distribution of parameters (in `coefs`)
##' and functions (in `expr`) in a `list` with the results. This
##' could be useful to calculate more precise highest-density intervals (see
##' examples).
##' @param plot `logical` indicating whether to plot the empirical sampling
##' distribution of each function in `expr`
##' @param ask whether to prompt user before printing each plot
##' @param \dots arguments passed to [graphics::hist()] when
##' `plot = TRUE`.
##'
##' @return A `lavaan.data.frame` (to use lavaan's `print` method)
##' with point estimates and confidence limits of each requested function of
##' parameters in `expr` is returned. If `append.samples = TRUE`,
##' output will be a `list` with the same `$Results` along with a
##' second `data.frame` with the `$Samples` (in rows) of each
##' parameter (in columns), and an additional column for each requested
##' function of those parameters.
##'
##' @author Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits
##' for the indirect effect: Distribution of the product and resampling methods.
##' *Multivariate Behavioral Research, 39*(1) 99--128.
##' \doi{10.1207/s15327906mbr3901_4}
##'
##' Preacher, K. J., & Selig, J. P. (2010, July). Monte Carlo method
##' for assessing multilevel mediation: An interactive tool for creating
##' confidence intervals for indirect effects in 1-1-1 multilevel models.
##' Computer software available from .
##'
##' Preacher, K. J., & Selig, J. P. (2012). Advantages of Monte Carlo confidence
##' intervals for indirect effects. *Communication Methods and Measures, 6*(2),
##' 77--98. \doi{10.1080/19312458.2012.679848}
##'
##' Selig, J. P., & Preacher, K. J. (2008, June). Monte Carlo method for
##' assessing mediation: An interactive tool for creating confidence intervals
##' for indirect effects. Computer software available from
##' .
##'
##' @aliases monteCarloCI monteCarloMed
##'
##' @examples
##'
##' ## From the mediation tutorial:
##' ## https://lavaan.ugent.be/tutorial/mediation.html
##'
##' set.seed(1234)
##' X <- rnorm(100)
##' M <- 0.5*X + rnorm(100)
##' Y <- 0.7*M + rnorm(100)
##' dat <- data.frame(X = X, Y = Y, M = M)
##'
##' mod <- ' # direct effect
##' Y ~ c*X
##' # mediator
##' M ~ a*X
##' Y ~ b*M
##' # indirect effect (a*b)
##' ind := a*b
##' # total effect
##' total := ind + c
##' '
##' fit <- sem(mod, data = dat)
##' summary(fit, ci = TRUE) # print delta-method CIs
##'
##' ## Automatically extract information from lavaan object
##' set.seed(1234)
##' monteCarloCI(fit) # CIs more robust than delta method in smaller samples
##'
##' ## delta method for standardized solution
##' standardizedSolution(fit)
##' ## compare to Monte Carlo CIs:
##' set.seed(1234)
##' monteCarloCI(fit, standardized = TRUE)
##'
##' \donttest{
##' ## save samples to calculate more precise intervals:
##' set.seed(1234)
##' foo <- monteCarloCI(fit, append.samples = TRUE)
##' # library(HDInterval) # not a dependency; must be installed
##' # hdi(foo$Samples)
##' }
##' ## Parameters can also be obtained from an external analysis
##' myParams <- c("a","b","c")
##' (coefs <- coef(fit)[myParams]) # names must match those in the "expression"
##' ## Asymptotic covariance matrix from an external analysis
##' (AsyCovMat <- vcov(fit)[myParams, myParams])
##' ## Compute CI, include a plot
##' set.seed(1234)
##' monteCarloCI(expr = c(ind = 'a*b', total = 'ind + c',
##' ## other arbitrary functions are also possible
##' meaningless = 'sqrt(a)^b / log(abs(c))'),
##' coefs = coefs, ACM = AsyCovMat,
##' plot = TRUE, ask = TRUE) # print a plot for each
##'
##' @export
monteCarloCI <- function(object = NULL, expr, coefs, ACM, nRep = 2e4,
standardized = FALSE, fast = TRUE, level = 0.95,
na.rm = TRUE, append.samples = FALSE, plot = FALSE,
ask = getOption("device.ask.default"), ...) {
if (inherits(object, c("lavaan","lavaan.mi"))) {
## extract user-defined parameters from parTable (order of user-defined
PT <- parTable(object) # parameters must be correct for model to be fitted)
## create expression vector
expr <- PT$rhs[PT$op == ":="]
names(expr) <- PT$lhs[PT$op == ":="]
if (length(expr) == 0L) stop('No user-defined parameters found.')
}
## provide names if there are none
if (is.null(names(expr))) names(expr) <- expr
## Get names and the number of unique variables in the expression
funcVars <- unique(do.call("c", lapply(paste("~", expr), function(x) {
all.vars(stats::as.formula(x))
})))
## isolate names of model parameters (not user-defined), which get sampled
if (inherits(object, "lavaan")) {
if (standardized) {
STD <- lavaan::standardizedSolution(object)
coefRows <- !(STD$op %in% c(":=","==","<",">","<=",">="))
coefs <- STD$est.std[coefRows]
names(coefs) <- lavaan::lav_partable_labels(STD[coefRows, ])
} else coefs <- lavaan::coef(object)
} else if (inherits(object, "lavaan.mi")) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
if (standardized) {
## only available after this lavaan version:
if ( utils::packageDescription("lavaan", fields = "Version") < "0.6-19" ||
(utils::packageDescription("lavaan", fields = "Version") > "0.6-19" &&
utils::packageDescription("lavaan", fields = "Version") < "0.6-19.2148") ) {
stop("standardized=TRUE for a lavaan.mi-class object requires lavaan ",
"version >= 0.6-19 from CRAN, or development version >= ",
"0.6-19.2148 from GitHub")
}
STD <- lavaan.mi::standardizedSolution.mi(object)
coefRows <- !(STD$op %in% c(":=","==","<",">","<=",">="))
coefs <- STD$est.std[coefRows]
names(coefs) <- lavaan::lav_partable_labels(STD[coefRows, ])
} else coefs <- getMethod(f = "coef", signature = "lavaan.mi",
where = getNamespace("lavaan.mi"))(object)
}
sampVars <- intersect(names(coefs), funcVars)
## If a lavaan(.mi) object is provided, extract coefs and ACM
if (inherits(object, "lavaan")) {
coefs <- coefs[sampVars]
if (standardized) {
ACM <- lavInspect(object, "vcov.std.all")[sampVars, sampVars]
} else {
ACM <- lavaan::vcov(object)[sampVars, sampVars]
}
} else if (inherits(object, "lavaan.mi")) {
coefs <- coefs[sampVars]
if (standardized) {
ACM <- lavaan.mi::standardizedSolution.mi(object, return.vcov = TRUE,
type = "std.all")[sampVars, sampVars]
} else {
ACM <- getMethod(f = "vcov", signature = "lavaan.mi",
where = getNamespace("lavaan.mi"))(object)[sampVars, sampVars]
}
}
## Apply the expression(s) to POINT ESTIMATES
estList <- as.list(coefs)
for (i in seq_along(expr)) {
estList[names(expr[i])] <- eval(parse(text = expr[i]), envir = estList)
}
EST <- data.frame(est = do.call("c", estList[names(expr)]))
## old, buggy code (see issue #142)
# estList <- within(as.list(coefs), expr = {
# for (i in seq_along(expr)) assign(names(expr[i]), eval(parse(text = expr[i])))
# })[names(expr)]
# EST <- data.frame(est = do.call("c", estList))
rownames(EST) <- names(expr)
if (standardized && inherits(object, "lavaan")) colnames(EST) <- "est.std"
## Matrix of sampled values
# dat <-
samples <- data.frame(mnormt::rmnorm(n = nRep, mean = coefs, varcov = ACM))
colnames(samples) <- names(coefs)
## Apply the expression(s) to VECTORS of ESTIMATES
if (fast) {
for (i in seq_along(expr)) {
samples[names(expr[i])] <- eval(parse(text = expr[i]), envir = samples)
}
## old, buggy code (see issue #142)
# samples <- within(dat, expr = {
# for (i in seq_along(expr)) assign(names(expr[i]), eval(parse(text = expr[i])))
# })[c(sampVars, names(expr))]
} else {
## SLOWER: only necessary if expr creates objects using (e.g.) c(), which
## would concatenate parameters ACROSS samples as well as WITHIN
datList <- lapply(1:nRep, function(Rep) {
samples[Rep,] # dat[Rep,]
})
samples <- do.call(rbind, lapply(datList, function(Rep) {
for (i in seq_along(expr)) {
Rep[names(expr[i])] <- eval(parse(text = expr[i]), envir = Rep)
}
## old, buggy code (see issue #142)
# within(Rep, expr = {
# for (i in seq_along(expr)) assign(names(expr[i]), eval(parse(text = expr[i])))
# })
Rep
})) # [c(sampVars, names(expr))]
}
## Get the CI(s)
halfAlpha <- (1-level)/2
Limits <- lapply(samples[names(expr)], quantile, na.rm = na.rm,
probs = c(halfAlpha, 1 - halfAlpha))
CIs <- data.frame(do.call("rbind", Limits))
rownames(CIs) <- names(expr)
colnames(CIs) <- c("ci.lower","ci.upper")
## Switch for outputting a plot
if (plot) {
if (length(expr) > 1L && ask) {
opar <- grDevices::devAskNewPage()
grDevices::devAskNewPage(ask = TRUE)
}
for (i in seq_along(expr)) {
histArgs <- list(...)
histArgs$x <- samples[[ names(expr)[i] ]]
if (is.null(histArgs$breaks)) histArgs$breaks <- "FD"
if (is.null(histArgs$xlab)) histArgs$xlab <- paste0(level*100, '% Confidence Interval')
if (is.null(histArgs$main)) histArgs$main <- paste('Distribution of', names(expr)[i])
do.call("hist", histArgs)
abline(v = EST[i,1], lwd = 3)
abline(v = CIs[i,1:2], lwd = 2, lty = "dashed")
}
if (length(expr) > 1L && ask) grDevices::devAskNewPage(ask = opar)
}
## Always return point and interval estimates
out <- cbind(EST, CIs)
class(out) <- c("lavaan.data.frame","data.frame")
## also simulated values? (e.g., to calculate highest-density intervals)
if (append.samples) return(list(Results = out, Samples = samples))
out
}
#FIXME: Remove after a few version updates
monteCarloMed <- function(...) {
.Defunct("monteCarloCI",
msg = "monteCarloMed() has been replaced by monteCarloCI()")
}
semTools/R/powerAnalysisRMSEA.R 0000644 0001762 0000144 00000037651 15142344033 016046 0 ustar ligges users ### Sunthud Pornprasertmanit, Alexander M. Schoemann, Kristopher J. Preacher, Donna Coffman
### Last updated: 9 February 2026
##' Plot power curves for RMSEA
##'
##' Plots power of RMSEA over a range of sample sizes
##'
##' This function creates plot of power for RMSEA against a range of sample
##' sizes. The plot places sample size on the horizontal axis and power on the
##' vertical axis. The user should indicate the lower and upper values for
##' sample size and the sample size between each estimate ("step size") We
##' strongly urge the user to read the sources below (see References) before
##' proceeding. A web version of this function is available at:
##' . This function is also
##' implemented in the web application "power4SEM":
##'
##'
##'
##' @importFrom stats qchisq pchisq
##'
##' @param rmsea0 Null RMSEA
##' @param rmseaA Alternative RMSEA
##' @param df Model degrees of freedom
##' @param nlow Lower sample size
##' @param nhigh Upper sample size
##' @param steps Increase in sample size for each iteration. Smaller values of
##' steps will lead to more precise plots. However, smaller step sizes means a
##' longer run time.
##' @param alpha Alpha level used in power calculations
##' @param group The number of group that is used to calculate RMSEA.
##' @param \dots The additional arguments for the plot function.
##'
##' @return Plot of power for RMSEA against a range of sample sizes
##'
##' @author
##' Alexander M. Schoemann (East Carolina University; \email{schoemanna@@ecu.edu})
##'
##' Kristopher J. Preacher (Vanderbilt University; \email{kris.preacher@@vanderbilt.edu})
##'
##' Donna L. Coffman (Pennsylvania State University; \email{dlc30@@psu.edu})
##'
##' @seealso \itemize{
##' \item [plotRMSEAdist()] to visualize the RMSEA distributions
##' \item [findRMSEApower()] to find the statistical power based on
##' population RMSEA given a sample size
##' \item [findRMSEAsamplesize()] to find the minium sample size for
##' a given statistical power based on population RMSEA
##' }
##'
##' @references
##' MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing
##' differences between nested covariance structure models: Power analysis and
##' null hypotheses. *Psychological Methods, 11*(1), 19--35.
##' \doi{10.1037/1082-989X.11.1.19}
##'
##' MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis
##' and determination of sample size for covariance structure modeling.
##' *Psychological Methods, 1*(2), 130--149. \doi{10.1037/1082-989X.1.2.130}
##'
##' MacCallum, R. C., Lee, T., & Browne, M. W. (2010). The issue of isopower in
##' power analysis for tests of structural equation models. *Structural
##' Equation Modeling, 17*(1), 23--41. \doi{10.1080/10705510903438906}
##'
##' Preacher, K. J., Cai, L., & MacCallum, R. C. (2007). Alternatives to
##' traditional model comparison strategies for covariance structure models. In
##' T. D. Little, J. A. Bovaird, & N. A. Card (Eds.), *Modeling contextual
##' effects in longitudinal studies* (pp. 33--62). Mahwah, NJ: Lawrence Erlbaum
##' Associates.
##'
##' Steiger, J. H. (1998). A note on multiple sample extensions of the RMSEA fit
##' index. *Structural Equation Modeling, 5*(4), 411--419.
##' \doi{10.1080/10705519809540115}
##'
##' Steiger, J. H., & Lind, J. C. (1980, June). *Statistically based tests
##' for the number of factors.* Paper presented at the annual meeting of the
##' Psychometric Society, Iowa City, IA.
##'
##' Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L.
##' (2021). Analytical power calculations for structural equation modeling:
##' A tutorial and Shiny app. *Behavior Research Methods, 53*, 1385--1406.
##' \doi{10.3758/s13428-020-01479-0}
##'
##' @examples
##'
##' plotRMSEApower(rmsea0 = .025, rmseaA = .075, df = 23,
##' nlow = 100, nhigh = 500, steps = 10)
##'
##' @export
plotRMSEApower <- function(rmsea0, rmseaA, df, nlow, nhigh, steps = 1,
alpha = .05, group = 1, ...) {
pow1 <- 0
nseq <- seq(nlow,nhigh, by=steps)
for(i in nseq){
ncp0 <- ((i-1)*df*rmsea0^2)/group
ncpa <- ((i-1)*df*rmseaA^2)/group
#Compute power
if(rmsea0 < rmseaA) {
cval <- qchisq(alpha,df,ncp=ncp0,lower.tail=FALSE)
pow <- pchisq(cval,df,ncp=ncpa,lower.tail=FALSE)
}
if(rmsea0 > rmseaA) {
cval <- qchisq(1-alpha, df, ncp=ncp0, lower.tail=FALSE)
pow <- 1-pchisq(cval,df,ncp=ncpa,lower.tail=FALSE)
}
pow1<-c(pow1, pow)
}
pow1 <- pow1[-1]
plot(nseq,pow1,xlab="Sample Size",ylab="Power",main="Compute Power for RMSEA",type="l", ...)
}
##' Plot the sampling distributions of RMSEA
##'
##' Plots the sampling distributions of RMSEA based on the noncentral chi-square
##' distributions
##'
##' This function creates overlappling plots of the sampling distribution of
##' RMSEA based on noncentral \eqn{\chi^2} distribution (MacCallum, Browne, &
##' Suguwara, 1996). First, the noncentrality parameter (\eqn{\lambda}) is
##' calculated from RMSEA (Steiger, 1998; Dudgeon, 2004) by \deqn{\lambda = (N -
##' 1)d\varepsilon^2 / K,} where \eqn{N} is sample size, \eqn{d} is the model
##' degree of freedom, \eqn{K} is the number of group, and \eqn{\varepsilon} is
##' the population RMSEA. Next, the noncentral \eqn{\chi^2} distribution with a
##' specified *df* and noncentrality parameter is plotted. Thus,
##' the x-axis represents the sample \eqn{\chi^2} value. The sample \eqn{\chi^2}
##' value can be transformed to the sample RMSEA scale (\eqn{\hat{\varepsilon}})
##' by \deqn{\hat{\varepsilon} = \sqrt{K}\sqrt{\frac{\chi^2 - d}{(N - 1)d}},}
##' where \eqn{\chi^2} is the \eqn{\chi^2} value obtained from the noncentral
##' \eqn{\chi^2} distribution.
##'
##'
##' @importFrom stats qchisq
##'
##' @param rmsea The vector of RMSEA values to be plotted
##' @param n Sample size of a dataset
##' @param df Model degrees of freedom
##' @param ptile The percentile rank of the distribution of the first RMSEA that
##' users wish to plot a vertical line in the resulting graph
##' @param caption The name vector of each element of `rmsea`
##' @param rmseaScale If `TRUE`, the RMSEA scale is used in the x-axis. If
##' `FALSE`, the chi-square scale is used in the x-axis.
##' @param group The number of group that is used to calculate RMSEA.
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso \itemize{
##' \item [plotRMSEApower()] to plot the statistical power
##' based on population RMSEA given the sample size
##' \item [findRMSEApower()] to find the statistical power based on
##' population RMSEA given a sample size
##' \item [findRMSEAsamplesize()] to find the minium sample size for
##' a given statistical power based on population RMSEA
##' }
##'
##' @references
##' Dudgeon, P. (2004). A note on extending Steiger's (1998)
##' multiple sample RMSEA adjustment to other noncentrality parameter-based
##' statistic. *Structural Equation Modeling, 11*(3), 305--319.
##' \doi{10.1207/s15328007sem1103_1}
##'
##' MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis
##' and determination of sample size for covariance structure modeling.
##' *Psychological Methods, 1*(2), 130--149. \doi{10.1037/1082-989X.1.2.130}
##'
##' Steiger, J. H. (1998). A note on multiple sample extensions of the RMSEA fit
##' index. *Structural Equation Modeling, 5*(4), 411--419.
##' \doi{10.1080/10705519809540115}
##'
##' @examples
##'
##' plotRMSEAdist(c(.05, .08), n = 200, df = 20, ptile = .95, rmseaScale = TRUE)
##' plotRMSEAdist(c(.05, .01), n = 200, df = 20, ptile = .05, rmseaScale = FALSE)
##'
##' @export
plotRMSEAdist <- function(rmsea, n, df, ptile = NULL, caption = NULL,
rmseaScale = TRUE, group = 1) {
graph <- cbind(rmsea, df)
ncp <- apply(graph, MARGIN = 1,
FUN = function(x, n, group) ((n - 1) * x[2] * (x[1]^2))/group,
n = n, group = group)
graph <- cbind(graph, ncp)
dens <- lapply(as.list(data.frame(t(graph))), function(x) findDensity("chisq", df = x[2], ncp=x[3]))
if (rmseaScale) dens <- lapply(dens, function(x, df, n, group) { x[,1] <- (x[,1] - df)/(n-1); x[(x[,1] < 0),1] <- 0; x[,1] <- sqrt(group) * sqrt(x[,1]/df); return(x) }, df=df, n=n, group=group)
cutoff <- NULL
if (!is.null(ptile)) {
cutoff <- qchisq(ptile, df = graph[1, 2], ncp = graph[1, 3])
if (rmseaScale) cutoff <- sqrt(group) * sqrt((cutoff - df)/(df * (n - 1)))
}
if (is.null(caption)) caption <- sapply(graph[,1],
function(x) paste("Population RMSEA = ",
format(x, digits = 3),
sep = ""))
plotOverlapDensity(dens, cutoff, caption, ylab = "Density",
xlab = ifelse(rmseaScale, "RMSEA", "Chi-Square"))
equal0 <- sapply(dens, function(x) x[,1] == 0)
if (any(equal0)) warning("The density at RMSEA = 0 cannot be trusted",
" because the plots are truncated.")
}
##' Find the statistical power based on population RMSEA
##'
##' Find the proportion of the samples from the sampling distribution of RMSEA
##' in the alternative hypothesis rejected by the cutoff dervied from the
##' sampling distribution of RMSEA in the null hypothesis. This function can be
##' applied for both test of close fit and test of not-close fit (MacCallum,
##' Browne, & Suguwara, 1996)
##'
##' This function find the proportion of sampling distribution derived from the
##' alternative RMSEA that is in the critical region derived from the sampling
##' distribution of the null RMSEA. If `rmseaA` is greater than
##' `rmsea0`, the test of close fit is used and the critical region is in
##' the right hand side of the null sampling distribution. On the other hand, if
##' `rmseaA` is less than `rmsea0`, the test of not-close fit is used
##' and the critical region is in the left hand side of the null sampling
##' distribution (MacCallum, Browne, & Suguwara, 1996).
##'
##' There is also a Shiny app called "power4SEM" that provides a graphical user
##' interface for this functionality (Jak et al., in press). It can be accessed
##' at .
##'
##'
##' @importFrom stats qchisq pchisq
##'
##' @param rmsea0 Null RMSEA
##' @param rmseaA Alternative RMSEA
##' @param df Model degrees of freedom
##' @param n Sample size of a dataset
##' @param alpha Alpha level used in power calculations
##' @param group The number of group that is used to calculate RMSEA.
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @seealso \itemize{
##' \item [plotRMSEApower()] to plot the statistical power based on
##' population RMSEA given the sample size
##' \item [plotRMSEAdist()] to visualize the RMSEA distributions
##' \item [findRMSEAsamplesize()] to find the minium sample size for
##' a given statistical power based on population RMSEA
##' }
##'
##' @references
##' MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis
##' and determination of sample size for covariance structure modeling.
##' *Psychological Methods, 1*(2), 130--149. \doi{10.1037/1082-989X.1.2.130}
##'
##' Jak, S., Jorgensen, T. D., Verdam, M. G., Oort, F. J., & Elffers, L.
##' (2021). Analytical power calculations for structural equation modeling:
##' A tutorial and Shiny app. *Behavior Research Methods, 53*, 1385--1406.
##' \doi{10.3758/s13428-020-01479-0}
##'
##' @examples
##'
##' findRMSEApower(rmsea0 = .05, rmseaA = .08, df = 20, n = 200)
##'
##' @export
findRMSEApower <- function(rmsea0, rmseaA, df, n, alpha = .05, group = 1) {
ncp0 <- ((n-1)*df*rmsea0^2)/group
ncpa <- ((n-1)*df*rmseaA^2)/group
if (rmsea0 power)) {
return("Sample Size <= 5")
} else if (all(power > pow)) {
repeat {
n <- n + 500
pow <- findRMSEApower(rmsea0, rmseaA, df, n, alpha, group=group)
if(any(pow > power)) {
index <- which(pow > power)[1]
return(n[index]/group)
}
}
} else {
index <- which(pow > power)[1]
return(n[index]/group)
}
}
## ----------------
## Hidden Functions
## ----------------
## findDensity
## Find the x and y coordinate of a distribution in order to plot a density of a distribution
## dist: target distribution in text, such as "chisq"
## ...: Additional argument of the distribution
## Return the data frame with x and y coordinates for plotting density
findDensity <- function(dist, ...) {
FUN <- list()
FUN[[1]] <- get(paste("q", dist, sep=""))
FUN[[2]] <- c(0.001, 0.999)
FUN <- c(FUN, ...)
bound <- eval(as.call(FUN))
val <- seq(bound[1], bound[2], length.out=1000)
FUN[[1]] <- get(paste("d", dist, sep=""))
FUN[[2]] <- val
height <- eval(as.call(FUN))
return(cbind(val, height))
}
##Example Code
##findDensity("chisq", df=10)
## plotOverlapDensity
## Plot the overlapping distributions using density
## dat: A list of data frame where each data frame has the x coordinate as the variable 1 and y coordinate as the variable 2
## vline: Vertical line in the graph
## caption: The name of each density line
## ...: Additional argument of the plot function
plotOverlapDensity <- function(dat, vline = NULL, caption = NULL, ...) {
if (!is.list(dat)) {
temp <- list()
temp[[1]] <- dat
dat <- temp
}
stack <- do.call(rbind, dat)
lim <- apply(stack, 2, function(x) c(min(x), max(x)))
plot(stack, xlim = lim[,1], ylim = lim[,2], type = "n", ...)
for (i in 1:length(dat)) lines(dat[[i]], col = i, lwd = 1.5)
for (i in 1:length(vline)) abline(v = vline[i], lwd = 1.5)
if (!is.null(caption))
legend(0.50 * (lim[2,1] - lim[1,1]) + lim[1,1], 0.99 * (lim[2,2] - lim[1,2]) + lim[1,2], caption, col=1:length(dat), lty=1)
}
semTools/R/EmpKaiser.R 0000644 0001762 0000144 00000013653 14632144122 014272 0 ustar ligges users ### Ylenio Longo
### Last updated: 10 January 2021
##' Empirical Kaiser criterion
##'
##' Identify the number of factors to extract based on the Empirical Kaiser
##' Criterion (EKC). The analysis can be run on a `data.frame` or data
##' `matrix` (`data`), or on a correlation or covariance matrix
##' (`sample.cov`) and the sample size (`sample.nobs`). A
##' `data.frame` is returned with two columns: the eigenvalues from your
##' data or covariance matrix and the reference eigenvalues. The number of
##' factors suggested by the Empirical Kaiser Criterion (i.e. the sample
##' eigenvalues greater than the reference eigenvalues), and the number of
##' factors suggested by the original Kaiser Criterion
##' (i.e. sample eigenvalues > 1) is printed above the output.
##'
##'
##' @importFrom stats cov cov2cor
##'
##' @param data A `data.frame` or data `matrix` containing columns of
##' variables to be factor-analyzed.
##' @param sample.cov A covariance or correlation matrix can be used, instead of
##' `data`, to estimate the eigenvalues.
##' @param sample.nobs Number of observations (i.e. sample size) if
##' `is.null(data)` and `sample.cov=` is used.
##' @param missing If `"listwise"`, incomplete cases are removed listwise from
##' the `data.frame`. If `"direct"` or `"ml"` or `"fiml"` and the `estimator=`
##' is maximum likelihood, an EM algorithm is used to estimate an unrestricted
##' covariance matrix (and mean vector). If `"pairwise"`, pairwise deletion is
##' used. If `"default"``, the value is set depending on the estimator and the
##' mimic option (see [lavaan::lavCor()] for details).
##' @param ordered `character` vector. Only used if object is a `data.frame`.
##' Treat these variables as `ordered=` (ordinal) variables. Importantly, all
##' other variables will be treated as numeric (unless `is.ordered == TRUE` in
##' `data`). (see also [lavCor][lavaan::lavCor])
##' @param plot logical. Whether to print a scree plot comparing the sample
##' eigenvalues with the reference eigenvalues.
##' @return A `data.frame` showing the sample and reference eigenvalues.
##'
##' The number of factors suggested by the Empirical Kaiser Criterion (i.e. the
##' sample eigenvalues greater than the reference eigenvalues) is returned as an
##' attribute (see **Examples**).
##'
##' The number of factors suggested by the original Kaiser Criterion (i.e.
##' sample eigenvalues > 1) is also printed as a header to the `data.frame`
##'
##' @author Ylenio Longo (University of Nottingham;
##' \email{yleniolongo@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' @references Braeken, J., & van Assen, M. A. L. M. (2017). An empirical
##' Kaiser criterion. *Psychological Methods, 22*(3), 450--466.
##' \doi{10.1037/met0000074}
##'
##' @examples
##'
##' ## Simulate data with 3 factors
##' model <- '
##' f1 =~ .3*x1 + .5*x2 + .4*x3
##' f2 =~ .3*x4 + .5*x5 + .4*x6
##' f3 =~ .3*x7 + .5*x8 + .4*x9
##' '
##' dat <- simulateData(model, seed = 123)
##' ## save summary statistics
##' myCovMat <- cov(dat)
##' myCorMat <- cor(dat)
##' N <- nrow(dat)
##'
##' ## Run the EKC function
##' (out <- efa.ekc(dat))
##'
##' ## To extract the recommended number of factors using the EKC:
##' attr(out, "nfactors")
##'
##' ## If you do not have raw data, you can use summary statistics
##' (x1 <- efa.ekc(sample.cov = myCovMat, sample.nobs = N, plot = FALSE))
##' (x2 <- efa.ekc(sample.cov = myCorMat, sample.nobs = N, plot = FALSE))
##'
##' @export
efa.ekc <- function(data = NULL, sample.cov = NULL, sample.nobs = NULL,
missing = "default", ordered = NULL, plot = TRUE) {
## if data
if (!is.null(data)) {
data <- as.data.frame(data)
R <- lavaan::lavCor(data, missing = missing, ordered = ordered) #correlations
j <- dim(data)[2] #number of variables
n <- dim(data)[1] #sample size
} else {
## if covariance matrix
if (max(diag(sample.cov)) != 1 & min(diag(sample.cov)) != 1) {
R <- cov2cor(sample.cov)
j <- dim(R)[2] #number of variables
n <- sample.nobs #sample size
} else {
## if correlation matrix
R <- sample.cov
j <- dim(R)[2] #number of variables
n <- sample.nobs #sample size
}
}
g <- j/n #gamma: var / sample
l <- (1 + sqrt(g))^2 #1st reference eigenvalue
e <- eigen(R)$values #eigenvalues
v <- cumsum(e) #Define cumulatively summed eigenvalue vector
v1 <- v[1:j - 1] #omit last element
v2 <- c(0, v1) #put a zero upfront
w <- sort(1:j, decreasing = TRUE) #eigenvalue order vector
ref <- (((j - v2)/w) * l) #EKC reference eigenvalues
# results
Eigenvalues <- data.frame(Sample = e, Ref = ref) #sample and reference eigenvalues
rownames(Eigenvalues) <- 1:j
class(Eigenvalues) <- c("lavaan.data.frame","data.frame")
## add no. factors to extract as attribute, using each criterion
nfactors_EKC <- which(!(Eigenvalues[, 1] > Eigenvalues[, 2]))[1] - 1 # EKC
nfactors_KC <- which(!(Eigenvalues[, 1] > 1))[1] - 1 # Kaiser Criterion
attr(Eigenvalues, "header") <- paste(" Empirical Kaiser Criterion suggests",
nfactors_EKC, "factors.\n",
"Traditional Kaiser Criterion suggests",
nfactors_KC, "factors.")
attr(Eigenvalues, "nfactors") <- nfactors_EKC
if (plot) {
plot(Eigenvalues[, 1], type = "b", pch = 20, cex = 0.9, col = "black",
main = "Empirical Kaiser Criterion\nScree Plot", ylab = "Eigenvalues",
ylim = c(min(Eigenvalues), max(ceiling(Eigenvalues))),
xlab = "Factor Number", xlim = c(1, j))
lines(Eigenvalues[, 2], lty = "dashed", col = "blue")
legend("topright", c(" Data", " Empirical\n Reference", " Kaiser Criterion"),
col = c("black","blue","gray"), bty = "n",
pch = c(20, NA, NA), lty = c("solid","dashed","solid"), merge = TRUE)
abline(h = 1, col = "gray") # Kaiser Criterion
}
return(Eigenvalues)
}
semTools/R/poolMAlloc.R 0000644 0001762 0000144 00000136236 14764270100 014460 0 ustar ligges users ### Authors:
### Jason D. Rights (Vanderbilt University; jason.d.rights@vanderbilt.edu)
### - based on research from/with Sonya Sterba
### - adapted from OLD parcelAllocation() by Corbin Quick and Alexander Schoemann
### - additional "indices" argument added by Terrence D. Jorgensen
### Last updated: 12 March 2025
##' Combine sampling variability with parcel-allocation variability by
##' pooling results across M parcel-allocations
##'
##' @description
##' This function employs an iterative algorithm to pick the number of random
##' item-to-parcel allocations needed to meet user-defined stability criteria
##' for a fitted structural equation model (SEM) (see **Details** below for
##' more information). Pooled point and standard-error estimates from this SEM
##' can be outputted at this final selected number of allocations (however, it
##' is more efficient to save the allocations and treat them as multiple
##' imputations using [lavaan.mi::lavaan.mi()]; see **See Also** for links with
##' examples). Additionally, new indices (see Sterba & Rights, 2016) are
##' outputted for assessing the relative contributions of parcel-allocation
##' variability vs. sampling variability in each estimate. At each iteration,
##' this function generates a given number of random item-to-parcel allocations,
##' fits a SEM to each allocation, pools estimates across allocations from that
##' iteration, and then assesses whether stopping criteria are met. If stopping
##' criteria are not met, the algorithm increments the number of allocations
##' used (generating all new allocations).
##'
##' @details
##'
##' This function implements an algorithm for choosing the number of allocations
##' (*M*; described in Sterba & Rights, 2016), pools point and
##' standard-error estimates across these *M* allocations, and produces
##' indices for assessing the relative contributions of parcel-allocation
##' variability vs. sampling variability in each estimate.
##'
##' To obtain pooled test statistics for model fit or model comparison, the
##' `list` or parcel allocations can be passed to [lavaan.mi::lavaan.mi()]
##' (find **Examples** on the help pages for [parcelAllocation()]
##' and [PAVranking()]).
##'
##' This function randomly generates a given number (`nAllocStart`) of
##' item-to-parcel allocations, fits a SEM to each allocation, and then
##' increments the number of allocations used (by `nAllocAdd`) until the
##' pooled point and standard-error estimates fulfill stopping criteria
##' (`stopProp` and `stopValue`, defined above). A summary of results
##' from the model that was fit to the *M* allocations are returned.
##'
##' Additionally, this function outputs the proportion of allocations with
##' solutions that converged (using a maximum likelihood estimator) as well as
##' the proportion of allocations with solutions that were converged and proper.
##' The converged and proper solutions among the final *M* allocations are
##' used in computing pooled results.
##'
##' Additionally, after each iteration of the algorithm, information useful in
##' monitoring the algorithm is outputted. The number of allocations used at
##' that iteration, the proportion of pooled parameter estimates meeting
##' stopping criteria at the previous iteration, the proportion of pooled
##' standard errors meeting stopping criteria at the previous iteration, and the
##' runtime of that iteration are outputted. When stopping criteria are
##' satisfied, the full set of results are outputted.
##'
##' For further details on the benefits of the random allocation of items to
##' parcels, see Sterba (2011) and Sterba & MacCallum (2010).
##'
##' @importFrom stats sd pnorm pt qt runif pchisq
##' @importFrom lavaan lavInspect
##'
##' @param nPerPar A list in which each element is a vector, corresponding to
##' each factor, indicating sizes of parcels. If variables are left out of
##' parceling, they should not be accounted for here (i.e., there should not
##' be parcels of size "1").
##' @param facPlc A list of vectors, each corresponding to a factor, specifying
##' the item indicators of that factor (whether included in parceling or not).
##' Either variable names or column numbers. Variables not listed will not be
##' modeled or included in output datasets.
##' @param nAllocStart The number of random allocations of items to parcels to
##' generate in the first iteration of the algorithm.
##' @param nAllocAdd The number of allocations to add with each iteration of the
##' algorithm. Note that if only one iteration is desired, `nAllocAdd` can
##' be set to \eqn{0} and results will be output for `nAllocStart`
##' allocations only.
##' @param syntax lavaan syntax that defines the model.
##' @param dataset Item-level dataset
##' @param parceloutput Optional `character`. Path (folder/directory) where
##' *M* (the final selected number of allocations) parceled data sets will
##' be outputted from the iteration where the algorithm met stopping criteria.
##' Note for Windows users: file path must be specified using forward slashes
##' (`/`), not backslashes (`\\`). See [base::path.expand()]
##' for details. If `NULL` (default), nothing is saved to disk.
##' @param stopProp Value used in defining stopping criteria of the algorithm
##' (\eqn{\delta_a} in Sterba & Rights, 2016). This is the minimum proportion
##' of change (in any pooled parameter or pooled standard error estimate
##' listed in `selectParam`) that is allowable from one iteration of the
##' algorithm to the next. That is, change in pooled estimates and pooled
##' standard errors from one iteration to the next must all be less than
##' (`stopProp`) \eqn{\times} (value from former iteration). Note that
##' `stopValue` can override this criterion (see below). Also note that values
##' less than .01 are unlikely to lead to more substantively meaningful
##' precision. Also note that if only `stopValue` is a desired criterion,
##' `stopProp` can be set to 0.
##' @param stopValue Value used in defining stopping criteria of the algorithm
##' (\eqn{\delta_b} in Sterba & Rights, 2016). `stopValue` is a minimum
##' allowable amount of absolute change (in any pooled parameter or pooled
##' standard error estimate listed in `selectParam`) from one iteration of
##' the algorithm to the next. For a given pooled estimate or pooled standard
##' error, `stopValue` is only invoked as a stopping criteria when the
##' minimum change required by `stopProp` is less than `stopValue`.
##' Note that values less than .01 are unlikely to lead to more substantively
##' meaningful precision. Also note that if only `stopProp` is a desired
##' criterion, `stopValue` can be set to 0.
##' @param selectParam (Optional) A list of the pooled parameters to be used in
##' defining stopping criteria (i.e., `stopProp` and `stopValue`).
##' These parameters should appear in the order they are listed in the lavaan
##' syntax. By default, all pooled parameters are used. Note that
##' `selectParam` should only contain freely-estimated parameters. In one
##' example from Sterba & Rights (2016) `selectParam` included all free
##' parameters except item intercepts and in another example `selectParam`
##' included only structural parameters.
##' @param indices Optional `character` vector indicating the names of
##' available [lavaan::fitMeasures()] to be included in the output.
##' The first and second elements should be a chi-squared test statistic and
##' its associated degrees of freedom, both of which will be added if missing.
##' If `"default"`, the indices will be `c("chisq", "df", "cfi", "tli",
##' "rmsea","srmr")`. If a robust test statistic is requested (see
##' [lavaan::lavOptions()]), `c("chisq","df")` will be replaced
##' by `c("chisq.scaled","df.scaled")`. For the output to include both the
##' naive and robust test statistics, `indices` should include both, but
##' put the scaled test statistics first, as in `indices =
##' c("chisq.scaled", "df.scaled", "chisq", "df")`
##' @param double (Optional) If set to `TRUE`, requires stopping criteria
##' (`stopProp` and `stopValue`) to be met for all parameters (in
##' `selectParam`) for two consecutive iterations of the algorithm. By
##' default, this is set to `FALSE`, meaning stopping criteria need only be
##' met at one iteration of the algorithm.
##' @param names (Optional) A character vector containing the names of parceled
##' variables.
##' @param leaveout (Optional) A vector of variables to be left out of
##' randomized parceling. Either variable names or column numbers are allowed.
##' @param useTotalAlloc (Optional) If set to `TRUE`, function will output
##' a separate set of results that uses all allocations created by the
##' algorithm, rather than *M* allocations (see "Allocations needed for
##' stability" below). This distinction is further discussed in Sterba and
##' Rights (2016).
##' @param checkConv (Optional) If set to TRUE, function will output pooled
##' estimates and standard errors from 10 iterations post-convergence.
##' @param \dots Additional arguments to be passed to
##' [lavaan::lavaan()]. See also [lavaan::lavOptions()]
##'
##' @return
##'
##' \item{Estimates}{A table containing pooled results across *M*
##' allocations at the iteration where stopping criteria were met. Columns
##' correspond to individual parameter name, pooled estimate, pooled standard
##' error, *p* value for a *z* test of the parameter, normal-theory \eqn{95\%}
##' CI, *p* value for a *t* test of the parameter (using \eqn{df} described in
##' Sterba & Rights, 2016), and *t*-based \eqn{95\%} CI for the parameter.}
##' \item{Fit}{A table containing results related to model fit from the *M*
##' allocations at the iteration where stopping criteria were met. Columns
##' correspond to fit index names, the mean of each index across allocations,
##' the *SD* of each fit index across allocations, the minimum, maximum and
##' range of each fit index across allocations, and the percent of the *M*
##' allocations where the chi-square test of absolute fit was significant.}
##' \item{Proportions}{A table containing the proportion of the final *M*
##' allocations that (a) met the optimizer convergence criteria) and
##' (b) converged to proper solutions. Note that pooled estimates, pooled
##' standard errors, and other results are computed using only the converged,
##' proper allocations.}
##' \item{Stability}{The number of allocations (*M*) needed for stability, at
##' which point the algorithm's stopping criteria (defined above) were met.}
##' \item{Uncertainty}{Indices used to quantify uncertainty in estimates due to
##' sample vs. allocation variability. A table containing individual parameter
##' names, an estimate of the proportion of total variance of a pooled
##' parameter estimate that is attributable to parcel-allocation variability
##' (PPAV), and an estimate of the ratio of the between-allocation variance of
##' a pooled parameter estimate to the within-allocation variance (RPAV).
##' See Sterba & Rights (2016) for more detail.}
##' \item{Time}{The total runtime of the function, in minutes. Note that the
##' total runtime will be greater when the specified model encounters
##' convergence problems for some allocations, as is the case with the
##' [simParcel()] dataset used below.}
##'
##'
##' @author
##' Jason D. Rights (Vanderbilt University; \email{jason.d.rights@@vanderbilt.edu})
##'
##' The author would also like to credit Corbin Quick and Alexander Schoemann
##' for providing the original [parcelAllocation()] function (prior to its
##' revision by Terrence D. Jorgensen) on which this function is based.
##'
##' @seealso
##' [lavaan.mi::lavaan.mi()] for treating allocations as multiple imputations
##' to pool results across allocations. See **Examples** on help pages for
##' [parcelAllocation()] (when fitting a single model) and [PAVranking()]
##' (when comparing 2 models).
##'
##' @references
##'
##' Sterba, S. K. (2011). Implications of parcel-allocation
##' variability for comparing fit of item-solutions and parcel-solutions.
##' *Structural Equation Modeling, 18*(4), 554--577.
##' \doi{10.1080/10705511.2011.607073}
##'
##' Sterba, S. K., & MacCallum, R. C. (2010). Variability in parameter estimates
##' and model fit across random allocations of items to parcels.
##' *Multivariate Behavioral Research, 45*(2), 322--358.
##' \doi{10.1080/00273171003680302}
##'
##' Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation
##' variability in practice: Combining sources of uncertainty and choosing the
##' number of allocations. *Multivariate Behavioral Research, 51*(2--3),
##' 296--313. \doi{10.1080/00273171.2016.1144502}
##'
##' Sterba, S. K., & Rights, J. D. (2017). Effects of parceling on model
##' selection: Parcel-allocation variability in model ranking.
##' *Psychological Methods, 22*(1), 47--68. \doi{10.1037/met0000067}
##'
##' @examples
##'
##' \donttest{
##' ## lavaan syntax: A 2 Correlated
##' ## factor CFA model to be fit to parceled data
##'
##' parmodel <- '
##' f1 =~ NA*p1f1 + p2f1 + p3f1
##' f2 =~ NA*p1f2 + p2f2 + p3f2
##' p1f1 ~ 1
##' p2f1 ~ 1
##' p3f1 ~ 1
##' p1f2 ~ 1
##' p2f2 ~ 1
##' p3f2 ~ 1
##' p1f1 ~~ p1f1
##' p2f1 ~~ p2f1
##' p3f1 ~~ p3f1
##' p1f2 ~~ p1f2
##' p2f2 ~~ p2f2
##' p3f2 ~~ p3f2
##' f1 ~~ 1*f1
##' f2 ~~ 1*f2
##' f1 ~~ f2
##' '
##'
##' ## specify items for each factor
##' f1name <- colnames(simParcel)[1:9]
##' f2name <- colnames(simParcel)[10:18]
##'
##' ## run function
##' poolMAlloc(nPerPar = list(c(3,3,3), c(3,3,3)),
##' facPlc = list(f1name, f2name), nAllocStart = 10, nAllocAdd = 10,
##' syntax = parmodel, dataset = simParcel, stopProp = .03,
##' stopValue = .03, selectParam = c(1:6, 13:18, 21),
##' names = list("p1f1","p2f1","p3f1","p1f2","p2f2","p3f2"),
##' double = FALSE, useTotalAlloc = FALSE)
##' }
##'
##' ## See examples on ?parcelAllocation and ?PAVranking for how to obtain
##' ## pooled test statistics and other pooled lavaan output.
##' ## Details provided in Sterba & Rights (2016).
##'
##' @export
poolMAlloc <- function(nPerPar, facPlc, nAllocStart, nAllocAdd = 0,
parceloutput = NULL, syntax, dataset, stopProp, stopValue,
selectParam = NULL, indices = "default", double = FALSE,
checkConv = FALSE, names = "default", leaveout = 0,
useTotalAlloc = FALSE, ...) {
message('Note that more options for pooled results are available using the ',
'lavaan.mi package (see Examples on ?parcelAllocation and ?PAVranking)')
if (!is.null(parceloutput)) {
if (!dir.exists(parceloutput)) stop('invalid directory:\n',
paste(parceloutput), "\n\n")
}
StartTimeFull <- proc.time()
if (is.character(dataset)) dataset <- utils::read.csv(dataset)
if (indices[1] == "default") indices <- c("chisq", "df", "cfi", "tli", "rmsea","srmr")
## make sure chi-squared and df are the first and second elements
requestedChiSq <- grep(pattern = "chisq", indices, value = TRUE)
if (length(requestedChiSq) == 0L) {
indices <- unique(c("chisq", indices))
} else {
indices <- unique(c(requestedChiSq[1], indices))
}
requestedDF <- grep(pattern = "df", indices, value = TRUE)
if (length(requestedDF) == 0L) {
indices <- unique(c(indices[1], "df", indices[-1]))
} else {
indices <- unique(c(indices[1], requestedDF[1], indices[-1]))
}
isProperSolution <- function(object) {
lavpartable <- object@ParTable
lavfit <- object@Fit
lavdata <- object@Data
lavmodel <- object@Model
var.idx <- which(lavpartable$op == "~~" & lavpartable$lhs == lavpartable$rhs)
if (length(var.idx) > 0L && any(lavfit@est[var.idx] < 0)) return(FALSE)
if (length(lavaan::lavaanNames(lavpartable, type = "lv.regular")) > 0L) {
ETA <- list(lavInspect(object, "cov.lv"))
for (g in 1:lavdata@ngroups) {
eigvals <- eigen(ETA[[g]], symmetric = TRUE, only.values = TRUE)$values
if (any(eigvals < -1 * .Machine$double.eps^(3/4))) return(FALSE)
}
}
THETA <- list(lavInspect(object, "theta"))
for (g in 1:lavdata@ngroups) {
num.idx <- lavmodel@num.idx[[g]]
if (length(num.idx) > 0L) {
eigvals <- eigen(THETA[[g]][unlist(num.idx),
unlist(num.idx), drop = FALSE], symmetric = TRUE,
only.values = TRUE)$values
if (any(eigvals < -1 * .Machine$double.eps^(3/4))) return(FALSE)
}
}
TRUE
}
nloop <- 0
nAllocStarttemp <- nAllocStart
options(max.print = 1e+06)
BreakCounter <- NA
repeat {
StartTime <- proc.time()
nloop <- nloop + 1
if (double == TRUE & is.na(BreakCounter) == FALSE)
BreakCounter <- BreakCounter + 1
if (checkConv == TRUE & is.na(BreakCounter) == FALSE)
BreakCounter <- BreakCounter + 1
if (nloop > 1) {
if (is.na(BreakCounter) == TRUE) {
Parmn_revFinal <- Parmn_rev[[nloop - 1]]
nConvergedOutput <- nConverged
nConvergedProperOutput <- nConvergedProper
PooledSEwithinvarFinal <- PooledSEwithinvar
PooledSEbetweenvarFinal <- PooledSEbetweenvar
PooledSEFinal <- PooledSE
FitsumOutput <- Fitsum
nAllocOutput <- nAllocStart - nAllocAdd
AllocationsOutput <- Allocations
#ParamFinal <- Param # defined, but never used
}
ParamPooledSE_temp <- ParamPooledSE
ParamTest_temp <- ParamTest
PooledSE_temp <- PooledSE
ParamPoolSEdiffmin <- abs(ParamPooledSE_temp * stopProp)
ParamPoolSEdiffmin[ParamPoolSEdiffmin < stopValue] <- stopValue
ParamDiffMin <- abs(ParamTest * stopProp)
ParamDiffMin[ParamDiffMin < stopValue] <- stopValue
PooledSEmin <- abs(PooledSE * stopProp)
PooledSEmin[PooledSEmin < stopValue] <- stopValue
}
dataset <- as.matrix(dataset)
if (nAllocStart < 2) stop("Minimum of two allocations required.")
if (is.list(facPlc)) {
if (is.numeric(facPlc[[1]][1]) == FALSE) {
facPlcb <- facPlc
Namesv <- colnames(dataset)
for (i in 1:length(facPlc)) {
for (j in 1:length(facPlc[[i]])) {
facPlcb[[i]][j] <- match(facPlc[[i]][j],
Namesv)
}
facPlcb[[i]] <- as.numeric(facPlcb[[i]])
}
facPlc <- facPlcb
}
facPlc2 <- rep(0, ncol(dataset))
for (i in 1:length(facPlc)) {
for (j in 1:length(facPlc[[i]])) {
facPlc2[facPlc[[i]][j]] <- i
}
}
facPlc <- facPlc2
}
if (leaveout != 0) {
if (is.numeric(leaveout) == FALSE) {
leaveoutb <- rep(0, length(leaveout))
Namesv <- colnames(dataset)
for (i in 1:length(leaveout)) {
leaveoutb[i] <- match(leaveout[i], Namesv)
}
leaveout <- as.numeric(leaveoutb)
}
k1 <- 0.001
for (i in 1:length(leaveout)) {
facPlc[leaveout[i]] <- facPlc[leaveout[i]] + k1
k1 <- k1 + 0.001
}
}
if (0 %in% facPlc == TRUE) {
Zfreq <- sum(facPlc == 0)
for (i in 1:Zfreq) {
Zplc <- match(0, facPlc)
dataset <- dataset[, -Zplc]
facPlc <- facPlc[-Zplc]
}
}
if (is.list(nPerPar)) {
nPerPar2 <- c()
for (i in 1:length(nPerPar)) {
Onesp <- sum(facPlc > i & facPlc < i + 1)
nPerPar2 <- c(nPerPar2, nPerPar[i], rep(1, Onesp), recursive = TRUE)
}
nPerPar <- nPerPar2
}
Npp <- c()
for (i in 1:length(nPerPar)) {
Npp <- c(Npp, rep(i, nPerPar[i]))
}
Locate <- sort(round(facPlc))
Maxv <- max(Locate) - 1
if (length(Locate) != length(Npp)) {
stop("** ERROR! ** Parcels incorrectly specified. Check input!")
}
if (Maxv > 0) {
for (i in 1:Maxv) {
Mat <- match(i + 1, Locate)
if (Npp[Mat] == Npp[Mat - 1]) {
stop("** ERROR! ** Parcels incorrectly specified. Check input!")
}
}
}
Onevec <- facPlc - round(facPlc)
NleaveA <- length(Onevec) - sum(Onevec == 0)
NleaveP <- sum(nPerPar == 1)
if (NleaveA < NleaveP) {
warning("** WARNING! ** Single-variable parcels have been requested.",
" Check input!")
}
if (NleaveA > NleaveP)
warning("** WARNING! ** More non-parceled variables have been requested",
" than provided for in parcel vector. Check input!")
if (length(names) > 1) {
if (length(names) != length(nPerPar)) {
warning("** WARNING! ** Number of parcel names provided not equal to",
" number of parcels requested. Check input!")
}
}
Data <- c(1:ncol(dataset))
# Nfactors <- max(facPlc) # defined but never used
Nindicators <- length(Data)
Npar <- length(nPerPar)
Rmize <- runif(Nindicators, 1, Nindicators)
Data <- rbind(facPlc, Rmize, Data)
Results <- matrix(numeric(0), nAllocStart, Nindicators)
Pin <- nPerPar[1]
for (i in 2:length(nPerPar)) {
Pin <- c(Pin, nPerPar[i] + Pin[i - 1])
}
for (i in 1:nAllocStart) {
Data[2, ] <- runif(Nindicators, 1, Nindicators)
Data <- Data[, order(Data[2, ])]
Data <- Data[, order(Data[1, ])]
Results[i, ] <- Data[3, ]
}
Alpha <- rbind(Results[1, ], dataset)
Allocations <- list()
for (i in 1:nAllocStart) {
Ineff <- rep(NA, ncol(Results))
Ineff2 <- c(1:ncol(Results))
for (inefficient in 1:ncol(Results)) {
Ineff[Results[i, inefficient]] <- Ineff2[inefficient]
}
Alpha[1, ] <- Ineff
Beta <- Alpha[, order(Alpha[1, ])]
Temp <- matrix(NA, nrow(dataset), Npar)
TempAA <- if (length(1:Pin[1]) > 1) {
Beta[2:nrow(Beta), 1:Pin[1]]
} else cbind(Beta[2:nrow(Beta), 1:Pin[1]], Beta[2:nrow(Beta), 1:Pin[1]])
Temp[, 1] <- rowMeans(TempAA, na.rm = TRUE)
for (al in 2:Npar) {
Plc <- Pin[al - 1] + 1
TempBB <- if (length(Plc:Pin[al]) > 1) {
Beta[2:nrow(Beta), Plc:Pin[al]]
} else cbind(Beta[2:nrow(Beta), Plc:Pin[al]],
Beta[2:nrow(Beta), Plc:Pin[al]])
Temp[, al] <- rowMeans(TempBB, na.rm = TRUE)
}
if (length(names) > 1) {
colnames(Temp) <- names
}
Allocations[[i]] <- Temp
}
Param <- list()
Fitind <- list()
Converged <- list()
ProperSolution <- list()
ConvergedProper <- list()
for (i in 1:(nAllocStart)) {
data_parcel <- as.data.frame(Allocations[[i]], row.names = NULL, optional = FALSE)
fit <- lavaan::sem(syntax, data = data_parcel, ...)
## if a robust estimator was requested, update fit indices accordingly
requestedTest <- lavInspect(fit, "options")$test
if (any(requestedTest %in% c("satorra.bentler","yuan.bentler",
"yuan.bentler.mplus","scaled.shifted",
"mean.var.adjusted","satterthwaite"))) {
indices[1:2] <- c("chisq.scaled","df.scaled")
} else indices[1:2] <- c("chisq","df")
## check convergence and solution
if (lavInspect(fit, "converged") == TRUE) {
Converged[[i]] <- 1
} else Converged[[i]] <- 0
Param[[i]] <- lavaan::parameterEstimates(fit)[,
c("lhs", "op", "rhs", "est", "se", "z", "pvalue",
"ci.lower", "ci.upper")]
if (isProperSolution(fit) == TRUE & Converged[[i]] == 1) {
ProperSolution[[i]] <- 1
} else ProperSolution[[i]] <- 0
if (any(is.na(Param[[i]][, 5] == TRUE)))
ProperSolution[[i]] <- 0
if (Converged[[i]] == 1 & ProperSolution[[i]] == 1) {
ConvergedProper[[i]] <- 1
} else ConvergedProper[[i]] <- 0
if (ConvergedProper[[i]] == 0)
Param[[i]][, 4:9] <- matrix(data = NA, nrow(Param[[i]]), 6)
if (ConvergedProper[[i]] == 1) {
Fitind[[i]] <- lavaan::fitMeasures(fit, indices)
if (!all(indices %in% names(Fitind[[i]]))) {
invalidIndices <- setdiff(indices, names(Fitind[[i]]))
Fitind[[i]][invalidIndices] <- NA
}
} else Fitind[[i]] <- rep(NA, length(indices))
}
nConverged <- Reduce("+", Converged)
nProperSolution <- Reduce("+", ProperSolution)
nConvergedProper <- Reduce("+", ConvergedProper)
if (nConvergedProper == 0) stop("All allocations failed to converge and/or",
" yielded improper solutions for a given loop.")
Parmn <- Param[[1]]
if (is.null(selectParam))
selectParam <- 1:nrow(Parmn)
ParSE <- matrix(NA, nrow(Parmn), nAllocStart)
ParSEmn <- Parmn[, 5]
Parsd <- matrix(NA, nrow(Parmn), nAllocStart)
Fitmn <- Fitind[[1]]
Fitsd <- matrix(NA, length(Fitmn), nAllocStart)
Sigp <- matrix(NA, nrow(Parmn), nAllocStart)
Fitind <- data.frame(Fitind)
ParamSEsquared <- list()
for (i in 1:nAllocStart) {
ParamSEsquared[[i]] <- cbind(Param[[i]][, 5], Param[[i]][, 5])
if (any(is.na(ParamSEsquared[[i]]) == TRUE)) ParamSEsquared[[i]] <- 0
ParamSEsquared[[i]] <- apply(as.data.frame(ParamSEsquared[[i]]), 1, prod)
Parsd[, i] <- Param[[i]][, 4]
ParSE[, i] <- Param[[i]][, 5]
Sigp[, ncol(Sigp) - i + 1] <- Param[[i]][, 7]
Fitsd[, i] <- Fitind[[i]]
}
Sigp <- Sigp + 0.45
Sigp <- apply(Sigp, c(1, 2), round)
Sigp <- 1 - as.vector(rowMeans(Sigp, na.rm = TRUE))
Parsum <- cbind(apply(Parsd, 1, mean, na.rm = TRUE),
apply(Parsd, 1, sd, na.rm = TRUE),
apply(Parsd, 1, max, na.rm = TRUE),
apply(Parsd, 1, min, na.rm = TRUE),
apply(Parsd, 1, max, na.rm = TRUE) - apply(Parsd, 1, min, na.rm = TRUE),
Sigp)
colnames(Parsum) <- c("Avg Est.", "S.D.", "MAX",
"MIN", "Range", "% Sig")
ParSEmn <- Parmn[, 1:3]
ParSEfn <- cbind(ParSEmn, apply(ParSE, 1, mean, na.rm = TRUE),
apply(ParSE, 1, sd, na.rm = TRUE),
apply(ParSE, 1, max, na.rm = TRUE),
apply(ParSE, 1, min, na.rm = TRUE),
apply(ParSE, 1, max, na.rm = TRUE) - apply(ParSE, 1, min, na.rm = TRUE))
colnames(ParSEfn) <- c("lhs", "op", "rhs", "Avg SE",
"S.D.", "MAX", "MIN", "Range")
Fitsum <- cbind(apply(Fitsd, 1, mean, na.rm = TRUE),
apply(Fitsd, 1, sd, na.rm = TRUE),
apply(Fitsd, 1, max, na.rm = TRUE),
apply(Fitsd, 1, min, na.rm = TRUE),
apply(Fitsd, 1, max, na.rm = TRUE) - apply(Fitsd, 1, min, na.rm = TRUE))
rownames(Fitsum) <- indices
Parmn[, 4:ncol(Parmn)] <- Parmn[, 4:ncol(Parmn)]/nConvergedProper
Parmn <- Parmn[, 1:3]
Parmn <- cbind(Parmn, Parsum)
Fitmn <- Fitmn/nConvergedProper
pChisq <- list()
sigChisq <- list()
for (i in 1:nAllocStart) {
pChisq[[i]] <- (1 - pchisq(Fitsd[1, i], Fitsd[2, i]))
if (is.na(pChisq[[i]]) == FALSE & pChisq[[i]] < 0.05) {
sigChisq[[i]] <- 1
}
else sigChisq[[i]] <- 0
}
PerSigChisq <- (Reduce("+", sigChisq))/nConvergedProper * 100
PerSigChisq <- round(PerSigChisq, 4)
PerSigChisqCol <- c(PerSigChisq, # however many indices != chisq(.scaled)
rep("n/a", sum(!grepl(pattern = "chisq", x = indices))))
Fitsum <- data.frame(Fitsum, PerSigChisqCol, stringsAsFactors = FALSE)
colnames(Fitsum) <- c("Avg Ind", "S.D.", "MAX", "MIN",
"Range", "% Sig")
PooledSEwithinvar <- Reduce("+", ParamSEsquared)/nConvergedProper
PooledSEbetweenvar <- Parmn[, 5]^2
PooledSE <- sqrt(PooledSEwithinvar + PooledSEbetweenvar + PooledSEbetweenvar/nConvergedProper)
ParamPooledSE <- c(Parmn[, 4], PooledSE)
ParamTest <- Parmn[, 4]
if (nloop > 1) {
ParamPoolSEdiff <- abs(ParamPooledSE_temp - ParamPooledSE)
Paramdiff <- abs(ParamTest_temp - ParamTest)
PooledSEdiff <- abs(PooledSE - PooledSE_temp)
ParamPoolSEdifftest <- ParamPoolSEdiff - ParamPoolSEdiffmin
ParamPoolSEdifftest[ParamPoolSEdifftest <= 0] <- 0
ParamPoolSEdifftest[ParamPoolSEdifftest > 0] <- 1
Paramdifftest <- Paramdiff - ParamDiffMin
Paramdifftest[Paramdifftest <= 0] <- 0
Paramdifftest[Paramdifftest > 0] <- 1
PooledSEdifftest <- PooledSEdiff - PooledSEmin
PooledSEdifftest[PooledSEdifftest <= 0] <- 0
PooledSEdifftest[PooledSEdifftest > 0] <- 1
if (nloop == 2) {
ParamPoolSEdifftesttable <- cbind(ParamPoolSEdifftest)
Paramdifftesttable <- cbind(Paramdifftest)
PooledSEdifftesttable <- cbind(PooledSEdifftest)
}
if (nloop > 2) {
ParamPoolSEdifftesttable <- cbind(ParamPoolSEdifftesttable,
ParamPoolSEdifftest)
Paramdifftesttable <- cbind(Paramdifftesttable,
Paramdifftest)
PooledSEdifftesttable <- cbind(PooledSEdifftesttable,
PooledSEdifftest)
}
PropStopParam <- 1 - (Reduce("+", Paramdifftesttable[selectParam,
nloop - 1])/length(selectParam))
PropStopPooled <- 1 - (Reduce("+", PooledSEdifftesttable[selectParam,
nloop - 1])/length(selectParam))
PropStopParamPooled <- 1 - (Reduce("+", ParamPoolSEdifftesttable[c(selectParam, selectParam + nrow(Parmn)), nloop - 1]) /
(2 * length(selectParam)))
if (checkConv == TRUE & is.na(BreakCounter) == TRUE) {
print(nAllocStart)
print("Proportion of pooled estimates meeting stop criteria:")
print(PropStopParam)
print("Proportion of pooled SE meeting stop criteria:")
print(PropStopPooled)
}
if (checkConv == FALSE) {
print(nAllocStart)
print("Proportion of pooled estimates meeting stop criteria:")
print(PropStopParam)
print("Proportion of pooled SE meeting stop criteria:")
print(PropStopPooled)
}
}
nAllocStart <- nAllocStart + nAllocAdd
StopTime <- proc.time() - StartTime
print("Runtime:")
print(StopTime)
Parmn_rev <- list()
Parmn_rev[[nloop]] <- cbind(Parmn[, 1:4], PooledSE)
Parmn_rev[[nloop]][, 4:5] <- sapply(Parmn_rev[[nloop]][,4:5], as.numeric)
colnames(Parmn_rev[[nloop]]) <- c("lhs", "op", "rhs","Estimate", "Pooled SE")
if (nloop == 1) {
Param_revTemp <- cbind(Parmn[, 1:3], Parmn_rev[[nloop]][,4])
Param_revTemp[, 4] <- as.numeric(Param_revTemp[,4])
Param_revTotal <- cbind(Param_revTemp)
PooledSE_revTemp <- cbind(Parmn[, 1:3], Parmn_rev[[nloop]][,5])
PooledSE_revTemp[, 4] <- as.numeric(PooledSE_revTemp[,4])
PooledSE_revTotal <- cbind(PooledSE_revTemp)
}
if (nloop > 1) {
Param_revTemp <- cbind(Parmn_rev[[nloop]][, 4])
Param_revTemp <- as.numeric(Param_revTemp)
Param_revTotal <- cbind(Param_revTotal, Param_revTemp)
PooledSE_revTemp <- cbind(Parmn_rev[[nloop]][,
5])
PooledSE_revTemp <- as.numeric(PooledSE_revTemp)
PooledSE_revTotal <- cbind(PooledSE_revTotal,
PooledSE_revTemp)
}
if (nloop == 1) {
ParamTotal <- Param
FitindTotal <- Fitind
AllocationsTotal <- Allocations
nAllocTotal <- nAllocStart - nAllocAdd
nConvergedTotal <- nConverged
nProperSolutionTotal <- nProperSolution
nConvergedProperTotal <- nConvergedProper
}
if (nloop > 1) {
ParamTotal <- c(ParamTotal, Param)
FitindTotal <- c(FitindTotal, Fitind)
AllocationsTotal <- c(AllocationsTotal, Allocations)
nAllocTotal <- nAllocTotal + nAllocStart - nAllocAdd
nConvergedTotal <- nConverged + nConvergedTotal
nProperSolution <- nProperSolution + nProperSolutionTotal
nConvergedProperTotal <- nConvergedProper + nConvergedProperTotal
}
if (nloop > 1 & double == TRUE & is.na(BreakCounter) == FALSE & BreakCounter == 2) {
if (Reduce("+", ParamPoolSEdifftesttable[c(selectParam,
selectParam + nrow(Parmn_rev[[nloop]])), nloop - 1]) == 0)
break
}
if (nloop > 1 & double == TRUE) {
if (Reduce("+", ParamPoolSEdifftesttable[c(selectParam,
selectParam + nrow(Parmn_rev[[nloop]])), nloop - 1]) == 0) {
BreakCounter <- 1
}
else BreakCounter <- NA
}
if (nloop > 1 & checkConv == TRUE & is.na(BreakCounter) == TRUE) {
if (Reduce("+", ParamPoolSEdifftesttable[c(selectParam,
selectParam + nrow(Parmn_rev[[nloop]])), nloop - 1]) == 0)
BreakCounter <- 0
}
if (nloop > 1 & double == FALSE & checkConv == FALSE) {
if (Reduce("+", ParamPoolSEdifftesttable[c(selectParam,
selectParam + nrow(Parmn_rev[[nloop]])), nloop - 1]) == 0)
break
}
if (nAllocAdd == 0)
break
if (checkConv == TRUE & is.na(BreakCounter) == FALSE & BreakCounter == 9)
break
}
if (nAllocAdd == 0) {
Parmn_revFinal <- Parmn_rev[[nloop]]
nConvergedOutput <- nConverged
nConvergedProperOutput <- nConvergedProper
PooledSEwithinvarFinal <- PooledSEwithinvar
PooledSEbetweenvarFinal <- PooledSEbetweenvar
PooledSEFinal <- PooledSE
FitsumOutput <- Fitsum
nAllocOutput <- nAllocStart - nAllocAdd
AllocationsOutput <- Allocations
}
if (!is.null(parceloutput)) {
replist <- matrix(NA, nAllocOutput, 1)
for (i in 1:(nAllocOutput)) {
colnames(AllocationsOutput[[i]]) <- names
utils::write.table(AllocationsOutput[[i]],
file = paste(parceloutput, "/parcelruns", i, ".dat", sep = ""),
row.names = FALSE, col.names = TRUE)
replist[i, 1] <- paste("parcelruns", i, ".dat", sep = "")
}
utils:: write.table(replist, paste(parceloutput, "/parcelrunsreplist.dat",
sep = ""), quote = FALSE, row.names = FALSE,
col.names = FALSE)
}
if (useTotalAlloc == TRUE) {
ParmnTotal <- ParamTotal[[1]]
ParSETotal <- matrix(NA, nrow(ParmnTotal), nAllocTotal)
ParSEmnTotal <- ParmnTotal[, 5]
ParsdTotal <- matrix(NA, nrow(ParmnTotal), nAllocTotal)
FitmnTotal <- FitindTotal[[1]]
FitsdTotal <- matrix(NA, length(FitmnTotal), nAllocTotal)
SigpTotal <- matrix(NA, nrow(ParmnTotal), nAllocTotal)
FitindTotal <- data.frame(FitindTotal)
ParamSEsquaredTotal <- list()
for (i in 1:nAllocTotal) {
ParamSEsquaredTotal[[i]] <- cbind(ParamTotal[[i]][,5], ParamTotal[[i]][, 5])
if (any(is.na(ParamSEsquaredTotal[[i]]) == TRUE))
ParamSEsquaredTotal[[i]] <- 0
ParamSEsquaredTotal[[i]] <- apply(as.data.frame(ParamSEsquaredTotal[[i]]),1, prod)
ParsdTotal[, i] <- ParamTotal[[i]][, 4]
ParSETotal[, i] <- ParamTotal[[i]][, 5]
SigpTotal[, ncol(Sigp) - i + 1] <- ParamTotal[[i]][,7]
FitsdTotal[, i] <- FitindTotal[[i]]
}
SigpTotal <- SigpTotal + 0.45
SigpTotal <- apply(SigpTotal, c(1, 2), round)
SigpTotal <- 1 - as.vector(rowMeans(SigpTotal, na.rm = TRUE))
ParsumTotal <- cbind(apply(ParsdTotal, 1, mean, na.rm = TRUE),
apply(ParsdTotal, 1, sd, na.rm = TRUE),
apply(ParsdTotal, 1, max, na.rm = TRUE),
apply(ParsdTotal, 1, min, na.rm = TRUE),
apply(ParsdTotal, 1, max, na.rm = TRUE) - apply(ParsdTotal, 1, min, na.rm = TRUE),
SigpTotal)
colnames(ParsumTotal) <- c("Avg Est.", "S.D.", "MAX", "MIN", "Range", "% Sig")
ParSEmnTotal <- ParmnTotal[, 1:3]
ParSEfnTotal <- cbind(ParSEmnTotal,
apply(ParSETotal, 1, mean, na.rm = TRUE),
apply(ParSETotal, 1, sd, na.rm = TRUE),
apply(ParSETotal, 1, max, na.rm = TRUE),
apply(ParSETotal, 1, min, na.rm = TRUE),
apply(ParSETotal, 1, max, na.rm = TRUE) - apply(ParSETotal, 1, min, na.rm = TRUE))
colnames(ParSEfnTotal) <- c("lhs", "op", "rhs", "Avg SE",
"S.D.", "MAX", "MIN", "Range")
FitsumTotal <- cbind(apply(FitsdTotal, 1, mean, na.rm = TRUE),
apply(FitsdTotal, 1, sd, na.rm = TRUE),
apply(FitsdTotal, 1, max, na.rm = TRUE),
apply(FitsdTotal, 1, min, na.rm = TRUE),
apply(FitsdTotal, 1, max, na.rm = TRUE) - apply(FitsdTotal, 1, min, na.rm = TRUE))
rownames(FitsumTotal) <- indices
ParmnTotal[, 4:ncol(ParmnTotal)] <- ParmnTotal[,4:ncol(Parmn)]/nConvergedProperTotal
ParmnTotal <- ParmnTotal[, 1:3]
ParmnTotal <- cbind(ParmnTotal, ParsumTotal)
FitmnTotal <- FitmnTotal/nConvergedProperTotal
pChisqTotal <- list()
sigChisqTotal <- list()
for (i in 1:nAllocTotal) {
pChisqTotal[[i]] <- (1 - pchisq(FitsdTotal[1,i], FitsdTotal[2, i]))
if (is.na(pChisqTotal[[i]]) == FALSE & pChisqTotal[[i]] < 0.05) {
sigChisqTotal[[i]] <- 1
} else sigChisqTotal[[i]] <- 0
}
PerSigChisqTotal <- (Reduce("+", sigChisqTotal))/nConvergedProperTotal * 100
PerSigChisqTotal <- round(PerSigChisqTotal, 4)
PerSigChisqColTotal <- c(PerSigChisqTotal, "n/a", "n/a", "n/a", "n/a")
FitsumTotal <- data.frame(FitsumTotal, PerSigChisqColTotal, stringsAsFactors = FALSE)
colnames(FitsumTotal) <- c("Avg Ind", "S.D.", "MAX", "MIN", "Range", "% Sig")
PooledSEwithinvarTotal <- Reduce("+", ParamSEsquaredTotal)/nConvergedProperTotal
PooledSEbetweenvarTotal <- ParmnTotal[, 5]^2
PooledSETotal <- sqrt(PooledSEwithinvarTotal + PooledSEbetweenvarTotal +
PooledSEbetweenvarTotal/nConvergedProperTotal)
ParamPooledSETotal <- c(ParmnTotal[, 4], PooledSETotal)
ParamTestTotal <- ParmnTotal[, 4]
Parmn_revTotal <- cbind(ParmnTotal[, 1:4], PooledSETotal)
Parmn_revTotal[, 4:5] <- sapply(Parmn_revTotal[,4:5], as.numeric)
colnames(Parmn_revTotal) <- c("lhs", "op", "rhs",
"Estimate", "Pooled SE")
df_tTotal <- (nConvergedProperTotal - 1) *
(1 + (nConvergedProperTotal * PooledSEwithinvarTotal)/(nConvergedProperTotal *
PooledSEbetweenvarTotal + PooledSEbetweenvarTotal))^2
crit_tTotal <- abs(qt(0.05/2, df_tTotal))
pval_zTotal <- 2 * (1 - pnorm(abs(Parmn_revTotal[, 4]/PooledSETotal)))
pval_tTotal <- 2 * (1 - pt(abs(Parmn_revTotal[, 4]/PooledSETotal),
df = df_tTotal))
CI95_Lower_zTotal <- Parmn_revTotal[, 4] - 1.959963985 * PooledSETotal
CI95_Upper_zTotal <- Parmn_revTotal[, 4] + 1.959963985 * PooledSETotal
CI95_Lower_tTotal <- Parmn_revTotal[, 4] - crit_tTotal * PooledSETotal
CI95_Upper_tTotal <- Parmn_revTotal[, 4] + crit_tTotal * PooledSETotal
Parmn_revTotal <- cbind(Parmn_revTotal, pval_zTotal,
CI95_Lower_zTotal, CI95_Upper_zTotal, pval_tTotal,
CI95_Lower_tTotal, CI95_Upper_tTotal)
colnames(Parmn_revTotal) <- c("lhs", "op", "rhs",
"Pooled Est", "Pooled SE", "pval_z", "CI95_LB_z",
"CI95_UB_z", "pval_t", "CI95_LB_t", "CI95_UB_t")
for (i in 1:nrow(Parmn_revTotal)) {
if (Parmn_revTotal[i, 5] == 0)
Parmn_revTotal[i, 6:11] <- NA
}
RPAVTotal <- (PooledSEbetweenvarTotal + (PooledSEbetweenvarTotal/(nConvergedProperTotal)))/PooledSEwithinvarTotal
PPAVTotal <- (((nConvergedProperTotal + 1)/(nConvergedProperTotal)) *
PooledSEbetweenvarTotal)/(PooledSEwithinvarTotal +
(((nConvergedProperTotal + 1)/(nConvergedProperTotal)) * PooledSEbetweenvarTotal))
PAVtableTotal <- cbind(ParmnTotal[1:3], RPAVTotal, PPAVTotal)
Parmn_revTotal[, 4:11] <- apply(Parmn_revTotal[, 4:11], 2, round, digits = 4)
FitsumTotal[, 1:5] <- apply(FitsumTotal[, 1:5], 2, round, digits = 4)
PAVtableTotal[, 4:5] <- apply(PAVtableTotal[, 4:5], 2, round, digits = 4)
FitsumTotal[2, 2:5] <- c("n/a", "n/a", "n/a", "n/a")
ConvergedProperSumTotal <- rbind((nConvergedTotal)/(nAllocTotal),
(nConvergedProperTotal)/(nAllocTotal))
rownames(ConvergedProperSumTotal) <- c("Converged", "Converged and Proper")
colnames(ConvergedProperSumTotal) <- "Proportion of Allocations"
}
if (nAllocAdd != 0) {
if (nloop == 2) {
PropParamMet <- matrix(data = 1, nrow(Parmn), 1)
PropPooledSEMet <- matrix(data = 1, nrow(Parmn), 1)
}
if (nloop != 2) {
PropParamMet <- (1 - apply(Paramdifftesttable[, 1:nloop - 1], 1, mean)) * 100
PropPooledSEMet <- (1 - apply(PooledSEdifftesttable[,1:nloop - 1], 1, mean)) * 100
}
FirstParamMet <- apply(Paramdifftesttable == 0, 1, which.max)
FirstPooledSEMet <- apply(PooledSEdifftesttable == 0, 1, which.max)
}
if (nAllocAdd == 0) {
PropParamMet <- matrix(data = NA, nrow(Parmn), 1)
PropPooledSEMet <- matrix(data = NA, nrow(Parmn), 1)
FirstParamMet <- matrix(data = NA, nrow(Parmn), 1)
FirstPooledSEMet <- matrix(data = NA, nrow(Parmn), 1)
}
PerLoops <- cbind(Parmn[, 1:3], PropParamMet, PropPooledSEMet)
colnames(PerLoops) <- c("lhs", "op", "rhs", "Param Criteria Met",
"PooledSE Criteria Met")
FirstLoops <- cbind(Parmn[, 1:3], FirstParamMet, FirstPooledSEMet)
colnames(FirstLoops) <- c("lhs", "op", "rhs", "Param Criteria Met",
"PooledSE Criteria Met")
NumbAllocations <- cbind(Parmn[, 1:3],
(FirstParamMet - 1) * nAllocAdd + nAllocStarttemp,
(FirstPooledSEMet - 1) * nAllocAdd + nAllocStarttemp)
colnames(NumbAllocations) <- c("lhs", "op", "rhs", "Param Criteria Met",
"PooledSE Criteria Met")
if (nAllocAdd != 0) {
for (i in 1:nrow(Parmn)) {
if ((i %in% selectParam) == FALSE)
PerLoops[i, 4:5] <- NA
if ((i %in% selectParam) == FALSE)
FirstLoops[i, 4:5] <- NA
if ((i %in% selectParam) == FALSE)
NumbAllocations[i, 4:5] <- NA
}
}
df_t <- (nConvergedProperOutput - 1) *
(1 + (nConvergedProperOutput * PooledSEwithinvarFinal) /
(nConvergedProperOutput * PooledSEbetweenvarFinal + PooledSEbetweenvarFinal))^2
crit_t <- abs(qt(0.05/2, df_t))
pval_z <- 2 * (1 - pnorm(abs(Parmn_revFinal[, 4]/PooledSEFinal)))
pval_t <- 2 * (1 - pt(abs(Parmn_revFinal[, 4]/PooledSEFinal),
df = df_t))
CI95_Lower_z <- Parmn_revFinal[, 4] - 1.959963985 * PooledSEFinal
CI95_Upper_z <- Parmn_revFinal[, 4] + 1.959963985 * PooledSEFinal
CI95_Lower_t <- Parmn_revFinal[, 4] - crit_t * PooledSEFinal
CI95_Upper_t <- Parmn_revFinal[, 4] + crit_t * PooledSEFinal
Parmn_revFinal <- cbind(Parmn_revFinal, pval_z, CI95_Lower_z,
CI95_Upper_z, pval_t, CI95_Lower_t, CI95_Upper_t)
colnames(Parmn_revFinal) <- c("lhs", "op", "rhs", "Pooled Est",
"Pooled SE", "pval_z", "CI95_LB_z", "CI95_UB_z",
"pval_t", "CI95_LB_t", "CI95_UB_t")
for (i in 1:nrow(Parmn_revFinal)) {
if (Parmn_revFinal[i, 5] == 0 | is.na(Parmn_revFinal[i, 5]) == TRUE)
Parmn_revFinal[i, 6:11] <- NA
}
RPAV <- (PooledSEbetweenvarFinal + (PooledSEbetweenvarFinal/(nConvergedProperOutput)))/PooledSEwithinvarFinal
PPAV <- (((nConvergedProperOutput + 1)/(nConvergedProperOutput)) *
PooledSEbetweenvarFinal)/(PooledSEwithinvarFinal +
(((nConvergedProperOutput + 1)/(nConvergedProperOutput)) *
PooledSEbetweenvarFinal))
PAVtable <- cbind(Parmn[1:3], RPAV, PPAV)
colnames(Param_revTotal) <- c("lhs", "op", "rhs", c(1:nloop))
colnames(PooledSE_revTotal) <- c("lhs", "op", "rhs",
c(1:nloop))
Param_revTotal[, 4:(nloop + 3)] <- sapply(Param_revTotal[,
4:(nloop + 3)], as.numeric)
PooledSE_revTotal[, 4:(nloop + 3)] <- sapply(PooledSE_revTotal[,
4:(nloop + 3)], as.numeric)
Parmn_revFinal[, 4:11] <- apply(Parmn_revFinal[, 4:11],
2, round, digits = 4)
FitsumOutput[, 1:5] <- apply(FitsumOutput[, 1:5], 2,
round, digits = 4)
if (nAllocAdd != 0)
Param_revTotal[, 4:(nloop + 3)] <- apply(Param_revTotal[,
4:(nloop + 3)], 2, round, digits = 8)
if (nAllocAdd == 0)
Param_revTotal[, 4] <- round(Param_revTotal[, 4],
8)
if (nAllocAdd != 0)
PooledSE_revTotal[, 4:(nloop + 3)] <- apply(PooledSE_revTotal[,
4:(nloop + 3)], 2, round, digits = 8)
if (nAllocAdd == 0)
PooledSE_revTotal[, 4] <- round(PooledSE_revTotal[,
4], 8)
PAVtable[, 4:5] <- apply(PAVtable[, 4:5], 2, round, digits = 4)
FitsumOutput[2, 2:5] <- c("n/a", "n/a", "n/a", "n/a")
ConvergedProperSum <- rbind((nConvergedOutput)/(nAllocOutput),
(nConvergedProperOutput)/(nAllocOutput))
rownames(ConvergedProperSum) <- c("Converged", "Converged and Proper")
colnames(ConvergedProperSum) <- "Proportion of Allocations"
StopTimeFull <- proc.time() - StartTimeFull
if (useTotalAlloc == FALSE) {
Output_mod <- list(Parmn_revFinal, FitsumOutput,
ConvergedProperSum, nAllocOutput, PAVtable, StopTimeFull[[3]]/60)
names(Output_mod) <- c("Estimates", "Fit",
"Proportion of Converged and Proper Allocations",
"Allocations needed for stability (M)",
"Indices to quantify uncertainty in estimates due to sampling vs. allocation variability",
"Total runtime (minutes)")
}
if (useTotalAlloc == TRUE) {
Output_mod <- list(Parmn_revFinal, FitsumOutput,
ConvergedProperSum, nAllocOutput, PAVtable, Parmn_revTotal,
FitsumTotal, ConvergedProperSumTotal, nAllocTotal,
PAVtableTotal, StopTimeFull[[3]]/60)
names(Output_mod) <- c("Estimates (using M allocations)", "Fit (using M allocations)",
"Proportion of Converged and Proper Allocations (using M allocations)",
"Allocations needed for stability (M)",
"Indices to quantify uncertainty in estimates due to sampling vs. allocation variability (using M allocations)",
"Estimates (using all allocations)", "Fit (using all allocations)",
"Proportion of Converged and Proper Allocations (using all allocations)",
"Total Allocations used by algorithm",
"Indices to quantify uncertainty in estimates due to sampling vs. allocation variability (using all allocations)",
"Total runtime (minutes)")
}
if (exists("invalidIndices")) {
if (length(invalidIndices)) message('\n\nInvalid fit indices requested: ',
paste(invalidIndices, collapse = ", "),
"\n\n")
}
return(Output_mod)
}
semTools/R/data.R 0000644 0001762 0000144 00000011222 14632143376 013323 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 4 April 2017
### document example data sets
#' Simulated Dataset to Demonstrate Two-way Latent Interaction
#'
#' A simulated data set with 2 independent factors and 1 dependent factor where
#' each factor has three indicators
#'
#'
#' @format A `data.frame` with 500 observations of 9 variables.
#' \describe{
#' \item{x1}{The first indicator of the first independent factor}
#' \item{x2}{The second indicator of the first independent factor}
#' \item{x3}{The third indicator of the first independent factor}
#' \item{x4}{The first indicator of the second independent factor}
#' \item{x5}{The second indicator of the second independent factor}
#' \item{x6}{The third indicator of the second independent factor}
#' \item{x7}{The first indicator of the dependent factor}
#' \item{x8}{The second indicator of the dependent factor}
#' \item{x9}{The third indicator of the dependent factor}
#' }
#' @source Data were generated by the [MASS::mvrnorm()] function in
#' the `MASS` package.
#' @examples head(dat2way)
"dat2way"
#' Simulated Dataset to Demonstrate Three-way Latent Interaction
#'
#' A simulated data set with 3 independent factors and 1 dependent factor where
#' each factor has three indicators
#'
#'
#' @format A `data.frame` with 500 observations of 12 variables.
#' \describe{
#' \item{x1}{The first indicator of the first independent factor}
#' \item{x2}{The second indicator of the first independent factor}
#' \item{x3}{The third indicator of the first independent factor}
#' \item{x4}{The first indicator of the second independent factor}
#' \item{x5}{The second indicator of the second independent factor}
#' \item{x6}{The third indicator of the second independent factor}
#' \item{x7}{The first indicator of the third independent factor}
#' \item{x8}{The second indicator of the third independent factor}
#' \item{x9}{The third indicator of the third independent factor}
#' \item{x10}{The first indicator of the dependent factor}
#' \item{x11}{The second indicator of the dependent factor}
#' \item{x12}{The third indicator of the dependent factor}
#' }
#' @source Data were generated by the [MASS::mvrnorm()] function in
#' the `MASS` package.
#' @examples head(dat3way)
"dat3way"
#' Simulated Data set to Demonstrate Categorical Measurement Invariance
#'
#' A simulated data set with 2 factors with 4 indicators each separated into
#' two groups
#'
#'
#' @format A `data.frame` with 200 observations of 9 variables.
#' \describe{
#' \item{g}{Sex of respondents}
#' \item{u1}{Indicator 1}
#' \item{u2}{Indicator 2}
#' \item{u3}{Indicator 3}
#' \item{u4}{Indicator 4}
#' \item{u5}{Indicator 5}
#' \item{u6}{Indicator 6}
#' \item{u7}{Indicator 7}
#' \item{u8}{Indicator 8}
#' }
#' @source Data were generated using the `lavaan` package.
#' @examples head(datCat)
"datCat"
#' Simulated Data set to Demonstrate Longitudinal Measurement Invariance
#'
#' A simulated data set with 1 factors with 3 indicators in three timepoints
#'
#'
#' @format A `data.frame` with 200 observations of 10 variables.
#' \describe{
#' \item{sex}{Sex of respondents}
#' \item{y1t1}{Indicator 1 in Time 1}
#' \item{y2t1}{Indicator 2 in Time 1}
#' \item{y3t1}{Indicator 3 in Time 1}
#' \item{y1t2}{Indicator 1 in Time 2}
#' \item{y2t2}{Indicator 2 in Time 2}
#' \item{y3t2}{Indicator 3 in Time 2}
#' \item{y1t3}{Indicator 1 in Time 3}
#' \item{y2t3}{Indicator 2 in Time 3}
#' \item{y3t3}{Indicator 3 in Time 3}
#' }
#' @source Data were generated using the `simsem` package.
#' @examples head(exLong)
"exLong"
#' Simulated Data set to Demonstrate Random Allocations of Parcels
#'
#' A simulated data set with 2 factors with 9 indicators for each factor
#'
#'
#' @format A `data.frame` with 800 observations of 18 variables.
#' \describe{
#' \item{f1item1}{Item 1 loading on factor 1}
#' \item{f1item2}{Item 2 loading on factor 1}
#' \item{f1item3}{Item 3 loading on factor 1}
#' \item{f1item4}{Item 4 loading on factor 1}
#' \item{f1item5}{Item 5 loading on factor 1}
#' \item{f1item6}{Item 6 loading on factor 1}
#' \item{f1item7}{Item 7 loading on factor 1}
#' \item{f1item8}{Item 8 loading on factor 1}
#' \item{f1item9}{Item 9 loading on factor 1}
#' \item{f2item1}{Item 1 loading on factor 2}
#' \item{f2item2}{Item 2 loading on factor 2}
#' \item{f2item3}{Item 3 loading on factor 2}
#' \item{f2item4}{Item 4 loading on factor 2}
#' \item{f2item5}{Item 5 loading on factor 2}
#' \item{f2item6}{Item 6 loading on factor 2}
#' \item{f2item7}{Item 7 loading on factor 2}
#' \item{f2item8}{Item 8 loading on factor 2}
#' \item{f2item9}{Item 9 loading on factor 2}
#' }
#' @source Data were generated using the `simsem` package.
#' @examples head(simParcel)
"simParcel"
semTools/R/indProd.R 0000644 0001762 0000144 00000033307 15142336665 014023 0 ustar ligges users ### Sunthud Pornprasertmanit and Alexander M. Schoemann
### Last updated: 9 February 2026
### prepare product indicators for 2-way and 3-way interactions in SEM
##' Make products of indicators using no centering, mean centering, double-mean
##' centering, or residual centering
##'
##' The `indProd` function will make products of indicators using no
##' centering, mean centering, double-mean centering, or residual centering. The
##' `orthogonalize` function is the shortcut of the `indProd` function
##' to make the residual-centered indicators products.
##'
##'
##' @aliases indProd orthogonalize
##' @importFrom stats lm
##'
##' @param data The desired data to be transformed.
##' @param var1 Names or indices of the variables loaded on the first factor
##' @param var2 Names or indices of the variables loaded on the second factor
##' @param var3 Names or indices of the variables loaded on the third factor
##' (for three-way interaction)
##' @param match Specify `TRUE` to use match-paired approach (Marsh, Wen, &
##' Hau, 2004). If `FALSE`, the resulting products are all possible
##' products.
##' @param meanC Specify `TRUE` for mean centering the main effect
##' indicator before making the products
##' @param residualC Specify `TRUE` for residual centering the products by
##' the main effect indicators (Little, Bovaird, & Widaman, 2006).
##' @param doubleMC Specify `TRUE` for centering the resulting products
##' (Lin et. al., 2010)
##' @param namesProd The names of resulting products
##' @return The original data attached with the products.
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com}) Alexander
##' Schoemann (East Carolina University; \email{schoemanna@@ecu.edu})
##' @seealso \itemize{ \item [probe2WayMC()] For probing the two-way
##' latent interaction when the results are obtained from mean-centering, or
##' double-mean centering. \item [probe3WayMC()] For probing the
##' three-way latent interaction when the results are obtained from
##' mean-centering, or double-mean centering. \item [probe2WayRC()]
##' For probing the two-way latent interaction when the results are obtained
##' from residual-centering approach. \item [probe3WayRC()] For
##' probing the two-way latent interaction when the results are obtained from
##' residual-centering approach. \item [plotProbe()] Plot the simple
##' intercepts and slopes of the latent interaction. }
##' @references Marsh, H. W., Wen, Z. & Hau, K. T. (2004). Structural equation
##' models of latent interactions: Evaluation of alternative estimation
##' strategies and indicator construction. *Psychological Methods, 9*(3),
##' 275--300. \doi{10.1037/1082-989X.9.3.275}
##'
##' Lin, G. C., Wen, Z., Marsh, H. W., & Lin, H. S. (2010). Structural equation
##' models of latent interactions: Clarification of orthogonalizing and
##' double-mean-centering strategies. *Structural Equation Modeling, 17*(3),
##' 374--391. \doi{10.1080/10705511.2010.488999}
##'
##' Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of
##' orthogonalizing powered and product terms: Implications for modeling
##' interactions among latent variables. *Structural Equation Modeling,
##' 13*(4), 497--519. \doi{10.1207/s15328007sem1304_1}
##' @examples
##'
##' ## Mean centering / two-way interaction / match-paired
##' dat <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6)
##'
##' ## Residual centering / two-way interaction / match-paired
##' dat2 <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6, match = FALSE,
##' meanC = FALSE, residualC = TRUE, doubleMC = FALSE)
##'
##' ## Double-mean centering / two-way interaction / match-paired
##' dat3 <- indProd(attitude[ , -1], var1 = 1:3, var2 = 4:6, match = FALSE,
##' meanC = TRUE, residualC = FALSE, doubleMC = TRUE)
##'
##' ## Mean centering / three-way interaction / match-paired
##' dat4 <- indProd(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6)
##'
##' ## Residual centering / three-way interaction / match-paired
##' dat5 <- orthogonalize(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6,
##' match = FALSE)
##'
##' ## Double-mean centering / three-way interaction / match-paired
##' dat6 <- indProd(attitude[ , -1], var1 = 1:2, var2 = 3:4, var3 = 5:6,
##' match = FALSE, meanC = TRUE, residualC = TRUE,
##' doubleMC = TRUE)
##'
##'
##' ## To add product-indicators to multiple-imputed data sets
##' \donttest{
##' if (requireNamespace("lavaan.mi")) {
##' data(HS20imps, package = "lavaan.mi")
##'
##' ## apply indProd() to the list of data.frames
##' imps2 <- lapply(HS20imps, indProd,
##' var1 = c("x1","x2","x3"), var2 = c("x4","x5","x6"))
##' ## verify:
##' lapply(imps2, head)
##' }
##' }
##'
##' @export
indProd <- function(data, var1, var2, var3 = NULL, match = TRUE, meanC = TRUE,
residualC = FALSE, doubleMC = TRUE, namesProd = NULL) {
# Get all variable names
if (all(is.numeric(var1)))
var1 <- colnames(data)[var1]
if (all(is.numeric(var2)))
var2 <- colnames(data)[var2]
if (!is.null(var3) && all(is.numeric(var3))) var3 <- colnames(data)[var3]
dat1 <- data[, var1]
dat2 <- data[, var2]
dat3 <- NULL
if (!is.null(var3)) dat3 <- data[, var3]
# Mean centering on the original indicators
if (meanC) {
dat1 <- scale(dat1, scale = FALSE)
dat2 <- scale(dat2, scale = FALSE)
if (!is.null(dat3)) dat3 <- scale(dat3, scale = FALSE)
}
if (match) {
# Check whether the number of variables are equal across variable sets
if (length(var1) != length(var2))
stop("If the match-paired approach is used, the number of",
" variables in all sets must be equal.")
if (!is.null(var3) && (length(var1) != length(var3)))
stop("If the match-paired approach is used, the number of",
" variables in all three sets must be equal.")
datProd <- NULL
if (is.null(var3)) {
# Two-way interaction
datProd <- dat1 * dat2
if (residualC) {
notmissing <- which(!apply(datProd, 1, function(x) any(is.na(x))))
colnames(datProd) <- paste("interactionProduct", 1:ncol(datProd), sep = "")
# Write the expression for linear model and residualize the products
temp <- data.frame(datProd, dat1, dat2)
express <- paste("cbind(", paste(colnames(datProd), collapse = ", "),
") ~ ", paste(c(colnames(dat1), colnames(dat2)),
collapse = " + "), sep = "")
datProd[notmissing,] <- lm(express, data = temp)$residuals
}
} else {
# Three-way interaction
datProd2way <- cbind(dat1 * dat2, dat1 * dat3, dat2 * dat3)
datProd3way <- dat1 * dat2 * dat3
if (residualC) {
notmissing2way <- which(!apply(datProd2way, 1, function(x) any(is.na(x))))
colnames(datProd2way) <- paste("interaction2Product", 1:ncol(datProd2way), sep = "")
# Write the expression for linear model and residualize the two-way products
temp2 <- data.frame(datProd2way, dat1, dat2, dat3)
express2 <- paste("cbind(", paste(colnames(datProd2way), collapse = ", "),
") ~ ", paste(c(colnames(dat1), colnames(dat2),
colnames(dat3)), collapse = " + "), sep = "")
datProd2way[notmissing2way,] <- lm(express2, data = temp2)$residuals
# Making all possible products to residualize the 3-way interaction
datProd2wayFull <- matrix(0, nrow(data), 1)
for (i in 1:length(var1)) datProd2wayFull <- data.frame(datProd2wayFull, matrix(rep(dat1[, i], length(var2)), ncol = length(var2)) * dat2)
for (i in 1:length(var1)) datProd2wayFull <- data.frame(datProd2wayFull, matrix(rep(dat1[, i], length(var3)), ncol = length(var3)) * dat3)
for (i in 1:length(var2)) datProd2wayFull <- data.frame(datProd2wayFull, matrix(rep(dat2[, i], length(var3)), ncol = length(var3)) * dat3)
datProd2wayFull <- datProd2wayFull[, -1]
colnames(datProd2wayFull) <- paste("interaction2Product", 1:ncol(datProd2wayFull), sep = "")
notmissing3way <- which(!apply(datProd3way, 1, function(x) any(is.na(x))))
colnames(datProd3way) <- paste("interaction3Product", 1:ncol(datProd3way), sep = "")
# Write the expression for linear model and residualize the three-way products
temp3 <- data.frame(datProd3way, dat1, dat2, dat3, datProd2wayFull)
express3 <- paste("cbind(", paste(colnames(datProd3way), collapse = ", "),
") ~ ", paste(c(colnames(dat1), colnames(dat2), colnames(dat3),
colnames(datProd2wayFull)), collapse = " + "), sep = "")
datProd3way[notmissing3way,] <- lm(express3, data = temp3)$residuals
}
datProd <- cbind(datProd2way, datProd3way)
}
## Mean-centering the final product
if (doubleMC) datProd <- scale(datProd, scale = FALSE)
## Rename the obtained product terms
if (is.null(namesProd)) {
if (is.null(var3)) {
colnames(datProd) <- paste(var1, var2, sep = ".")
} else {
colnames(datProd) <- c(paste(var1, var2, sep = "."),
paste(var1, var3, sep = "."),
paste(var2, var3, sep = "."),
paste(var1, var2, var3, sep = "."))
}
} else {
colnames(datProd) <- namesProd
}
} else {
datProd <- NULL
if (is.null(var3)) {
# Create all possible combinations of the products of indicators
datProd <- matrix(0, nrow(data), 1)
for (i in 1:length(var1)) datProd <- data.frame(datProd, matrix(rep(dat1[, i], length(var2)), ncol = length(var2)) * dat2)
datProd <- datProd[, -1]
if (residualC) {
notmissing <- which(!apply(datProd, 1, function(x) any(is.na(x))))
colnames(datProd) <- paste("interactionProduct", 1:ncol(datProd), sep = "")
# Write the expression for linear model and residualize the two-way products
temp <- data.frame(datProd, dat1, dat2)
express <- paste("cbind(", paste(colnames(datProd), collapse = ", "),
") ~ ", paste(c(colnames(dat1), colnames(dat2)),
collapse = " + "), sep = "")
datProd[notmissing,] <- lm(express, data = temp)$residuals
}
} else {
# Create all possible combinations of the products of indicators
datProd2way <- matrix(0, nrow(data), 1)
for (i in 1:length(var1)) datProd2way <- data.frame(datProd2way, matrix(rep(dat1[, i], length(var2)), ncol = length(var2)) * dat2)
for (i in 1:length(var1)) datProd2way <- data.frame(datProd2way, matrix(rep(dat1[, i], length(var3)), ncol = length(var3)) * dat3)
for (i in 1:length(var2)) datProd2way <- data.frame(datProd2way, matrix(rep(dat2[, i], length(var3)), ncol = length(var3)) * dat3)
datProd3way <- matrix(0, nrow(data), 1)
for (i in 1:length(var1)) {
for(j in 1:length(var2)) {
datProd3way <- data.frame(datProd3way, matrix(rep(dat1[, i], length(var3)), ncol = length(var3)) * matrix(rep(dat2[, j], length(var3)), ncol = length(var3)) * dat3)
}
}
datProd2way <- datProd2way[, -1]
datProd3way <- datProd3way[, -1]
if (residualC) {
notmissing2way <- which(!apply(datProd2way, 1, function(x) any(is.na(x))))
colnames(datProd2way) <- paste("interaction2Product", 1:ncol(datProd2way), sep = "")
# Write the expression for linear model and residualize the two-way products
temp2 <- data.frame(datProd2way, dat1, dat2, dat3)
express2 <- paste("cbind(", paste(colnames(datProd2way), collapse = ", "),
") ~ ", paste(c(colnames(dat1), colnames(dat2),
colnames(dat3)), collapse = " + "), sep = "")
datProd2way[notmissing2way,] <- lm(express2, data = temp2)$residuals
notmissing3way <- which(!apply(datProd3way, 1, function(x) any(is.na(x))))
colnames(datProd3way) <- paste("interaction3Product", 1:ncol(datProd3way), sep = "")
# Write the expression for linear model and residualize the three-way products
temp3 <- data.frame(datProd3way, dat1, dat2, dat3, datProd2way)
express3 <- paste("cbind(", paste(colnames(datProd3way), collapse = ", "),
") ~ ", paste(c(colnames(dat1), colnames(dat2),
colnames(dat3), colnames(datProd2way)),
collapse = " + "), sep = "")
datProd3way[notmissing3way,] <- lm(express3, data = temp3)$residuals
}
datProd <- cbind(datProd2way, datProd3way)
}
## Double-mean centering
if (doubleMC) datProd <- scale(datProd, scale = FALSE)
## Name the resulting product terms
if (is.null(namesProd)) {
temp <- NULL
if (is.null(var3)) {
for (i in 1:length(var1)) temp <- c(temp, paste(var1[i], var2, sep = "."))
} else {
for (i in 1:length(var1)) temp <- c(temp, paste(var1[i], var2, sep = "."))
for (i in 1:length(var1)) temp <- c(temp, paste(var1[i], var3, sep = "."))
for (i in 1:length(var2)) temp <- c(temp, paste(var2[i], var3, sep = "."))
for (i in 1:length(var1)) {
for(j in 1:length(var2)) {
temp <- c(temp, paste(var1[i], var2[j], var3, sep = "."))
}
}
}
colnames(datProd) <- temp
} else {
colnames(datProd) <- namesProd
}
}
## Bind the products back to the original data
data.frame(data, datProd)
}
##' @rdname indProd
##' @export
orthogonalize <- function(data, var1, var2, var3 = NULL,
match = TRUE, namesProd = NULL) {
indProd(data = data, var1 = var1, var2 = var2, var3 = var3,
match = match, meanC = FALSE, residualC = TRUE, doubleMC = FALSE,
namesProd = namesProd)
}
semTools/R/singleParamTest.R 0000644 0001762 0000144 00000042047 15142331256 015516 0 ustar ligges users ### Sunthud Pornprasertmanit
### Last updated: 9 February 2026
##' Single Parameter Test Divided from Nested Model Comparison
##'
##' In comparing two nested models, \eqn{\Delta\chi^2} test may indicate that
##' two models are different. However, like other omnibus tests, researchers do
##' not know which fixed parameters or constraints make these two models
##' different. This function will help researchers identify the significant
##' parameter.
##'
##' This function first identifies the differences between these two models. The
##' model with more free parameters is referred to as parent model and the model
##' with fewer free parameters is referred to as nested model. Two tests are
##' implemented here:
##'
##' \enumerate{
##' \item `free`: The nested model is used as a template. Then,
##' one parameter indicating the differences between two models is freed. The new
##' model is compared with the nested model. This process is repeated for all
##' differences between two models.
##' \item`fix`: The parent model is used
##' as a template. Then, one parameter indicating the differences between two
##' models is fixed or constrained to be equal to other parameters. The new
##' model is then compared with the parent model. This process is repeated for
##' all differences between two models.
##' \item`mi`: No longer available
##' because the test of modification indices is not consistent. For example, if
##' two parameters are equality constrained, the modification index from the
##' first parameter is not equal to the second parameter.
##' }
##'
##' Note that this function does not adjust for the inflated Type I error rate
##' from multiple tests.
##'
##' @param model1 Model 1.
##' @param model2 Model 2. Note that two models must be nested models. Further,
##' the order of parameters in their parameter tables are the same. That is,
##' nested models with different scale identifications may not be able to test
##' by this function.
##' @param return.fit Return the submodels fitted by this function
##' @param method The method used to calculate likelihood ratio test. See
##' [lavaan::lavTestLRT()] for available options
##' @return If `return.fit = FALSE`, the result tables are provided.
##' \eqn{\chi^2} and *p* value are provided for all methods. Note that the
##' \eqn{\chi^2} is all based on 1 *df*. Expected parameter changes
##' and their standardized forms are also provided.
##'
##' If `return.fit = TRUE`, a list with two elements are provided. The
##' first element is the tabular result. The second element is the submodels
##' used in the `free` and `fix` methods.
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' @examples
##'
##' library(lavaan)
##'
##' # Nested model comparison by hand
##' HS.model1 <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6'
##' HS.model2 <- ' visual =~ a*x1 + a*x2 + a*x3
##' textual =~ b*x4 + b*x5 + b*x6'
##'
##' m1 <- cfa(HS.model1, data = HolzingerSwineford1939, std.lv = TRUE,
##' estimator = "MLR")
##' m2 <- cfa(HS.model2, data = HolzingerSwineford1939, std.lv = TRUE,
##' estimator = "MLR")
##' anova(m1, m2)
##' singleParamTest(m1, m2)
##'
##'
##' ## Nested models to test measurement invariance
##' HW.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' mod.config <- cfa(model = HW.model, data = HolzingerSwineford1939,
##' group = "school")
##' mod.metric <- cfa(model = HW.model, data = HolzingerSwineford1939,
##' group = "school", group.equal = "loadings")
##' singleParamTest(mod.config, mod.metric)
##'
##'
##' @export
singleParamTest <- function(model1, model2, return.fit = FALSE,
method = "satorra.bentler.2001") {
# Check nested models without any swaps
if(lavaan::fitMeasures(model1, "df")[[1]] > lavaan::fitMeasures(model2, "df")[[1]]) {
fit0 <- model1
fit1 <- model2
} else {
fit0 <- model2
fit1 <- model1
}
# fit0 = Nested model, fit1 = Parent model
pt1 <- parTable(fit1)
pt0 <- parTable(fit0)
namept1 <- paramNameFromPt(pt1)
namept0 <- paramNameFromPt(pt0)
# Two possible constraints: fixed parameters and equality constraints
free1 <- (pt1$free != 0) & !(duplicated(pt1$free))
free0 <- (pt0$free != 0) & !(duplicated(pt0$free))
iscon1 <- pt1$op == "=="
iscon0 <- pt0$op == "=="
con1 <- list(id = integer(0), lhs = character(0), op = character(0), rhs = character(0))
con0 <- list(id = integer(0), lhs = character(0), op = character(0), rhs = character(0))
if(any(iscon1)) con1 <- list(id = pt1$id[iscon1], lhs = pt1$lhs[iscon1], op = pt1$op[iscon1], rhs = pt1$rhs[iscon1])
if(any(iscon0)) con0 <- list(id = pt0$id[iscon0], lhs = pt0$lhs[iscon0], op = pt0$op[iscon0], rhs = pt0$rhs[iscon0])
if(length(free1[!iscon1]) != length(free0[!iscon0])) stop("Parameter tables in two models do not have equal lengths. This function does not work.")
if(!all(free1[free0])) stop("Model are not nested or are not arranged in the way that this function works.")
if(sum(iscon1) > sum(iscon0)) stop("There are equality constraints in the model with less degrees of freedom that do not exist in the model with higher degrees of freedom. Thus, two models are not nested.")
if(!all.equal(lapply(pt1[2:4], "[", !iscon1), lapply(pt0[2:4], "[", !iscon0))) stop("This function needs parameter tables of two models to have the same orders of the same parameters.")
# Find fixed values or constraints
difffree <- !free0[!iscon0] & free1[!iscon1]
textcon1 <- paste0(con1$lhs, con1$op, con1$rhs)
textcon0 <- paste0(con0$lhs, con0$op, con0$rhs)
indexsamecon <- match(textcon1, textcon0)
indexdiffcon <- setdiff(seq_along(textcon0), indexsamecon)
diffcon <- lapply(con0, "[", indexdiffcon)
fixval <- which(difffree)
index <- c(fixval, diffcon$id)
if(length(index) <= 0) stop("Two models are identical. No single parameter test can be done.")
# Find nested model and release 1-by-1
freeCon <- matrix(NA, length(index), 2)
colnames(freeCon) <- c("free.chi", "free.p")
listFreeCon <- list()
runnum <- 1
for(i in seq_along(fixval)) {
temp <- freeParTable(pt0, pt0$lhs[fixval[i]], pt0$op[fixval[i]], pt0$rhs[fixval[i]], pt0$group[fixval[i]])
tryresult <- try(tempfit <- refit(temp, fit0), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit0, method = method), silent = TRUE)
if(!is(compresult, "try-error")) freeCon[runnum,] <- unlist(modelcomp[2, c(5, 7)])
}
listFreeCon <- c(listFreeCon, tryresult)
runnum <- runnum + 1
}
rownames(freeCon)[seq_along(fixval)] <- names(listFreeCon)[seq_along(fixval)] <- namept0[fixval]
for(i in seq_along(diffcon$id)) {
temp <- removeEqCon(pt0, diffcon$id[i])
tryresult <- try(tempfit <- refit(temp, fit0), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit0, method = method), silent = TRUE)
if(!is(compresult, "try-error")) freeCon[runnum,] <- unlist(modelcomp[2, c(5, 7)])
}
listFreeCon <- c(listFreeCon, tryresult)
runnum <- runnum + 1
}
poscon <- seq_along(diffcon$id) + length(fixval)
rownames(freeCon)[poscon] <- names(listFreeCon)[poscon] <- namept0[diffcon$id]
# Find parent model and constrain 1-by-1
fixCon <- matrix(NA, length(index), 2)
colnames(fixCon) <- c("fix.chi", "fix.p")
listFixCon <- list()
runnum <- 1
for(i in seq_along(fixval)) {
temp <- fixParTable(pt1, pt1$lhs[fixval[i]], pt1$op[fixval[i]], pt1$rhs[fixval[i]], pt1$group[fixval[i]], pt0$ustart[fixval[i]])
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[runnum,] <- unlist(modelcomp[2,c(5, 7)])
}
listFixCon <- c(listFixCon, tryresult)
runnum <- runnum + 1
}
rownames(fixCon)[seq_along(fixval)] <- names(listFixCon)[seq_along(fixval)] <- namept0[fixval]
for(i in seq_along(diffcon$id)) {
temp <- patMerge(pt1, list(lhs = diffcon$lhs[i], op = diffcon$op[i], rhs = diffcon$rhs[i]))
tryresult <- try(tempfit <- refit(temp, fit1), silent = TRUE)
if(!is(tryresult, "try-error")) {
compresult <- try(modelcomp <- lavaan::lavTestLRT(tempfit, fit1, method = method), silent = TRUE)
if(!is(compresult, "try-error")) fixCon[runnum,] <- unlist(modelcomp[2, c(5, 7)])
}
listFixCon <- c(listFixCon, tryresult)
runnum <- runnum + 1
}
poscon <- seq_along(diffcon$id) + length(fixval)
rownames(fixCon)[poscon] <- names(listFixCon)[poscon] <- namept0[diffcon$id]
result <- cbind(freeCon, fixCon)
if(return.fit) {
return(invisible(list(result = result, models = list(free = listFreeCon, fix = listFixCon))))
} else {
return(result)
}
}
## ----------------
## Hidden Functions
## ----------------
paramNameFromPt <- function(pt) {
ngroups <- max(pt$group)
result <- NULL
if (ngroups == 1) {
result <- paste0(pt$lhs, pt$op, pt$rhs)
} else {
grouplab <- paste0(".g", pt$group)
grouplab[grouplab == ".g0" | grouplab == ".g1"] <- ""
result <- paste0(pt$lhs, pt$op, pt$rhs, grouplab)
}
con <- pt$op == "=="
pt$lhs[con] <- result[match(pt$lhs[con], pt$plabel)]
pt$rhs[con] <- result[match(pt$rhs[con], pt$plabel)]
result[con] <- paste(pt$lhs[con], pt$op[con], pt$rhs[con])
result
}
##' @importFrom lavaan lavInspect
refit <- function(pt, object, resetstart = TRUE) {
if (resetstart && "start" %in% names(pt)) pt <- pt[-which("start" == names(pt))]
previousCall <- lavInspect(object, "call")
## Why this?
args <- previousCall[-1]
args$model <- pt
funcall <- as.character(previousCall[[1]])
do.call(funcall[length(funcall)], args)
## instead of this?
# previousCall$model <- pt
# eval(previousCall)
}
## MOVED FROM lonInvariance.R when it was removed from semTools 0.5-8 (9 Feb 2026)
# rearrangeFreeElement: Rearrange the number listed in 'free' in parameter tables
rearrangeFreeElement <- function(vec) {
vec2 <- vec
vec <- vec[vec != 0]
uvec <- unique(vec)
newvec <- 1:length(unique(vec))
vec2[vec2 != 0] <- newvec[match(vec, uvec)]
class(vec2) <- "integer"
vec2
}
# rearrangept: Rearrange parameter table and plabel
rearrangept <- function(pt) {
createplabel <- function(num) {
result <- paste0(".p", num, ".")
result[num == 0] <- ""
result
}
oldfree <- pt$free
newfree <- rearrangeFreeElement(oldfree)
oldplabel <- pt$plabel
newplabel <- createplabel(seq_along(pt$op))
eqpos <- which(pt$op == "==")
newplabel[eqpos] <- ""
if (length(eqpos) > 0) {
eqlhs <- pt$lhs[eqpos]
eqrhs <- pt$rhs[eqpos]
matchlhs <- match(eqlhs, oldplabel)
matchrhs <- match(eqrhs, oldplabel)
neweqlhs <- newplabel[matchlhs]
neweqrhs <- newplabel[matchrhs]
neweqlhs[is.na(matchlhs)] <- eqlhs[is.na(matchlhs)]
neweqrhs[is.na(matchrhs)] <- eqrhs[is.na(matchrhs)]
pt$lhs[eqpos] <- neweqlhs
pt$rhs[eqpos] <- neweqrhs
}
pt$free <- newfree
pt$plabel <- newplabel
pt
}
# freeParTable: Free elements in parameter table
# also used in partialInvariance
freeParTable <- function(parTable, lhs, op, rhs, group, ustart = NA) {
parTable$start <- parTable$est <- parTable$se <- NULL
target <- cbind(lhs, op, rhs, group)
for (i in 1:nrow(target)) {
targetElem <- matchElement(parTable = parTable, vec = target[i,])
ptargetElem <- parTable$plabel[targetElem]
if ((length(targetElem) == 0) || is.na(targetElem)) {
newline <- list(lhs = as.character(target[i, 1]),
op = as.character(target[i, 2]),
rhs = as.character(target[i, 3]),
group = as.integer(target[i, 4]),
free = as.integer(max(parTable$free) + 1),
ustart = as.numeric(NA))
parTable <- patMerge(pt1 = parTable, pt2 = newline)
} else {
if (parTable$free[targetElem] == 0) {
parTable$ustart[targetElem] <- ustart
parTable$user[targetElem] <- 1
parTable$free[targetElem] <- max(parTable$free) + 1
}
equalelement <- which(parTable$op == "==")
rmelem <- intersect(union(match(ptargetElem, parTable$lhs),
match(ptargetElem, parTable$rhs)),
equalelement)
if (length(rmelem) > 0) parTable <- removeEqCon(parTable, rmelem)
}
}
parTable <- rearrangept(parTable)
parTable
}
# fixParTable: Fix elements in parameter table
# also used in partialInvariance
fixParTable <- function(parTable, lhs, op, rhs, group, ustart = NA) {
parTable$start <- parTable$est <- parTable$se <- NULL
target <- cbind(lhs, op, rhs, group)
element <- apply(target, 1, matchElement, parTable=parTable)
for (i in 1:nrow(target)) {
## Why was Sunthud printing warnings? (originally used warnings(), not warning()...)
# if (parTable$free[element[i]] == 0) warning('The parameter ', lhs, op, rhs,
# ' in group ', group,
# ' is already fixed.')
# equalelement <- which(parTable$op == "==")
# targetElem <- matchElement(parTable = parTable, vec = target[i,])
# ptargetElem <- parTable$plabel[targetElem]
# rmelem <- intersect(union(match(ptargetElem, parTable$lhs), match(ptargetElem, parTable$rhs)), equalelement)
# if(length(rmelem) > 0) parTable <- removeEqCon(parTable, rmelem)
parTable$ustart[element[i]] <- ustart
parTable$user[element[i]] <- 1
parTable$free[element[i]] <- 0
}
parTable <- rearrangept(parTable)
# rearrangePlabel with change all equality constraints
parTable
}
# removeEqCon: Remove equality constraints
removeEqCon <- function(pt, element) {
pt <- lapply(pt, "[", -element)
pt$id <- seq_along(pt$id)
pt
}
patMerge <- function (pt1 = NULL, pt2 = NULL, remove.duplicated = FALSE,
fromLast = FALSE, warn = TRUE) {
pt1 <- as.data.frame(pt1, stringsAsFactors = FALSE)
pt2 <- as.data.frame(pt2, stringsAsFactors = FALSE)
stopifnot(!is.null(pt1$lhs), !is.null(pt1$op), !is.null(pt1$rhs),
!is.null(pt2$lhs), !is.null(pt2$op), !is.null(pt2$rhs))
if (is.null(pt1$group) && is.null(pt2$group)) {
TMP <- rbind(pt1[, c("lhs", "op", "rhs", "group")],
pt2[, c("lhs", "op", "rhs", "group")])
}
else {
if (is.null(pt1$group) && !is.null(pt2$group)) {
pt1$group <- rep(1L, length(pt1$lhs))
}
else if (is.null(pt2$group) && !is.null(pt1$group)) {
pt2$group <- rep(1L, length(pt2$lhs))
}
TMP <- rbind(pt1[, c("lhs", "op", "rhs", "group")],
pt2[, c("lhs", "op", "rhs", "group")])
}
if (is.null(pt1$user) && !is.null(pt2$user)) {
pt1$user <- rep(0L, length(pt1$lhs))
}
else if (is.null(pt2$user) && !is.null(pt1$user)) {
pt2$user <- rep(0L, length(pt2$lhs))
}
if (is.null(pt1$free) && !is.null(pt2$free)) {
pt1$free <- rep(0L, length(pt1$lhs))
}
else if (is.null(pt2$free) && !is.null(pt1$free)) {
pt2$free <- rep(0L, length(pt2$lhs))
}
if (is.null(pt1$ustart) && !is.null(pt2$ustart)) {
pt1$ustart <- rep(0, length(pt1$lhs))
}
else if (is.null(pt2$ustart) && !is.null(pt1$ustart)) {
pt2$ustart <- rep(0, length(pt2$lhs))
}
if (is.null(pt1$exo) && !is.null(pt2$exo)) {
pt1$exo <- rep(0L, length(pt1$lhs))
}
else if (is.null(pt2$exo) && !is.null(pt1$exo)) {
pt2$exo <- rep(0L, length(pt2$lhs))
}
if (is.null(pt1$label) && !is.null(pt2$label)) {
pt1$label <- rep("", length(pt1$lhs))
}
else if (is.null(pt2$label) && !is.null(pt1$label)) {
pt2$label <- rep("", length(pt2$lhs))
}
if (is.null(pt1$plabel) && !is.null(pt2$plabel)) {
pt1$plabel <- rep("", length(pt1$lhs))
}
else if (is.null(pt2$plabel) && !is.null(pt1$plabel)) {
pt2$plabel <- rep("", length(pt2$lhs))
}
if (is.null(pt1$start) && !is.null(pt2$start)) {
pt1$start <- rep(as.numeric(NA), length(pt1$lhs))
}
else if (is.null(pt2$start) && !is.null(pt1$start)) {
pt2$start <- rep(as.numeric(NA), length(pt2$lhs))
}
if (!is.null(pt1$est)) pt1$est <- NULL
if (!is.null(pt2$est)) pt2$est <- NULL
if (!is.null(pt1$se)) pt1$se <- NULL
if (!is.null(pt2$se)) pt2$se <- NULL
if (remove.duplicated) {
idx <- which(duplicated(TMP, fromLast = fromLast))
if (length(idx)) {
if (warn) {
warning("lavaan WARNING: duplicated parameters are ignored:\n",
paste(apply(pt1[idx, c("lhs", "op", "rhs")],
1, paste, collapse = " "), collapse = "\n"))
}
if (fromLast) {
pt1 <- pt1[-idx, ]
}
else {
idx <- idx - nrow(pt1)
pt2 <- pt2[-idx, ]
}
}
} else if (!is.null(pt1$start) && !is.null(pt2$start)) {
for (i in 1:length(pt1$lhs)) {
idx <- which(pt2$lhs == pt1$lhs[i] & pt2$op == pt1$op[i] &
pt2$rhs == pt1$rhs[i] & pt2$group == pt1$group[i])
pt2$start[idx] <- pt1$start[i]
}
}
if (is.null(pt1$id) && !is.null(pt2$id)) {
nid <- max(pt2$id)
pt1$id <- (nid + 1L):(nid + nrow(pt1))
}
else if (is.null(pt2$id) && !is.null(pt1$id)) {
nid <- max(pt1$id)
pt2$id <- (nid + 1L):(nid + nrow(pt2))
}
NEW <- base::merge(pt1, pt2, all = TRUE, sort = FALSE)
NEW
}
semTools/R/dataDiagnosis.R 0000644 0001762 0000144 00000024470 15142343123 015162 0 ustar ligges users ### Sunthud Pornprasertmanit & Terrence D. Jorgensen
### Last updated: 9 February 2026
### Higher-order moments. Initial version from the simsem package.
##' Finding skewness
##'
##' Finding skewness (\eqn{g_{1}}) of an object
##'
##' The skewness computed by default is \eqn{g_{1}}, the third standardized
##' moment of the empirical distribution of `object`.
##' The population parameter skewness \eqn{\gamma_{1}} formula is
##'
##' \deqn{\gamma_{1} = \frac{\mu_{3}}{\mu^{3/2}_{2}},}
##'
##' where \eqn{\mu_{i}} denotes the \eqn{i} order central moment.
##'
##' The skewness formula for sample statistic \eqn{g_{1}} is
##'
##' \deqn{g_{1} = \frac{k_{3}}{k^{2}_{2}},}
##'
##' where \eqn{k_{i}} are the \eqn{i} order *k*-statistic.
##'
##' The standard error of the skewness is
##'
##' \deqn{Var(\hat{g}_1) = \frac{6}{N}}
##'
##' where \eqn{N} is the sample size.
##'
##'
##' @importFrom stats pnorm
##'
##' @param object A vector used to find a skewness
##' @param population `TRUE` to compute the parameter formula. `FALSE`
##' to compute the sample statistic formula.
##' @return A value of a skewness with a test statistic if the population is
##' specified as `FALSE`
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##' @seealso \itemize{
##' \item [kurtosis()] Find the univariate excessive kurtosis
##' of a variable
##' \item [mardiaSkew()] Find Mardia's multivariate skewness
##' of a set of variables
##' \item [mardiaKurtosis()] Find the Mardia's multivariate
##' kurtosis of a set of variables
##' }
##' @references Weisstein, Eric W. (n.d.). *Skewness*. Retrieved from
##' *MathWorld*--A Wolfram Web Resource:
##'
##' @examples
##'
##' skew(1:5)
##'
##' @export
skew <- function(object, population = FALSE) {
if(any(is.na(object))) {
object <- object[!is.na(object)]
warning("Missing observations are removed from a vector.")
}
if(population) {
out <- centralMoment(object, 3) / (centralMoment(object, 2)^(3/2))
} else {
est <- kStat(object, 3) / (kStat(object, 2)^(3/2))
se <- sqrt(6/length(object))
z <- est/se
p <- (1 - pnorm(abs(z)))*2
out <- c("skew (g1)"=est, se=se, z=z, p=p)
}
class(out) <- c("lavaan.vector", "numeric")
out
}
##' Finding excessive kurtosis
##'
##' Finding excessive kurtosis (\eqn{g_{2}}) of an object
##'
##' The excessive kurtosis computed by default is \eqn{g_{2}}, the fourth
##' standardized moment of the empirical distribution of `object`.
##' The population parameter excessive kurtosis \eqn{\gamma_{2}} formula is
##'
##' \deqn{\gamma_{2} = \frac{\mu_{4}}{\mu^{2}_{2}} - 3,}
##'
##' where \eqn{\mu_{i}} denotes the \eqn{i} order central moment.
##'
##' The excessive kurtosis formula for sample statistic \eqn{g_{2}} is
##'
##' \deqn{g_{2} = \frac{k_{4}}{k^{2}_{2}} - 3,}
##'
##' where \eqn{k_{i}} are the \eqn{i} order *k*-statistic.
##'
##' The standard error of the excessive kurtosis is
##'
##' \deqn{Var(\hat{g}_{2}) = \frac{24}{N}}
##'
##' where \eqn{N} is the sample size.
##'
##'
##' @importFrom stats pnorm
##'
##' @param object A vector used to find a excessive kurtosis
##' @param population `TRUE` to compute the parameter formula. `FALSE`
##' to compute the sample statistic formula.
##' @return A value of an excessive kurtosis with a test statistic if the
##' population is specified as `FALSE`
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##' @seealso \itemize{
##' \item [skew()] Find the univariate skewness of a variable
##' \item [mardiaSkew()] Find the Mardia's multivariate
##' skewness of a set of variables
##' \item [mardiaKurtosis()] Find the Mardia's multivariate kurtosis
##' of a set of variables
##' }
##' @references Weisstein, Eric W. (n.d.). *Kurtosis.* Retrieved from
##' *MathWorld*--A Wolfram Web Resource:
##'
##'
##' @examples
##'
##' kurtosis(1:5)
##'
##' @export
kurtosis <- function(object, population = FALSE) {
if(any(is.na(object))) {
object <- object[!is.na(object)]
warning("Missing observations are removed from a vector.")
}
if(population) {
out <- (centralMoment(object, 4) / (centralMoment(object, 2)^2)) - 3
} else {
est <- kStat(object, 4) / (kStat(object, 2)^(2))
se <- sqrt(24/length(object))
z <- est/se
p <- (1 - pnorm(abs(z)))*2
out <- c("Excess Kur (g2)"=est, se=se, z=z, p=p)
}
class(out) <- c("lavaan.vector", "numeric")
out
}
##' Finding Mardia's multivariate skewness
##'
##' Finding Mardia's multivariate skewness of multiple variables
##'
##' The Mardia's multivariate skewness formula (Mardia, 1970) is
##' \deqn{ b_{1, d} = \frac{1}{n^2}\sum^n_{i=1}\sum^n_{j=1}\left[
##' \left(\bold{X}_i - \bold{\bar{X}} \right)^{'} \bold{S}^{-1}
##' \left(\bold{X}_j - \bold{\bar{X}} \right) \right]^3, }
##' where \eqn{d} is the number of variables, \eqn{X} is the target dataset
##' with multiple variables, \eqn{n} is the sample size, \eqn{\bold{S}} is
##' the sample covariance matrix of the target dataset, and \eqn{\bold{\bar{X}}}
##' is the mean vectors of the target dataset binded in \eqn{n} rows.
##' When the population multivariate skewness is normal, the
##' \eqn{\frac{n}{6}b_{1,d}} is asymptotically distributed as \eqn{\chi^2}
##' distribution with \eqn{d(d + 1)(d + 2)/6} degrees of freedom.
##'
##'
##' @importFrom stats cov pchisq
##'
##' @param dat The target matrix or data frame with multiple variables
##' @param use Missing data handling method from the [stats::cov()]
##' function.
##' @return A value of a Mardia's multivariate skewness with a test statistic
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##' @seealso \itemize{
##' \item [skew()] Find the univariate skewness of a variable
##' \item [kurtosis()] Find the univariate excessive
##' kurtosis of a variable
##' \item [mardiaKurtosis()] Find the Mardia's multivariate
##' kurtosis of a set of variables
##' }
##' @references Mardia, K. V. (1970). Measures of multivariate skewness and
##' kurtosis with applications. *Biometrika, 57*(3), 519--530.
##' \doi{10.2307/2334770}
##' @examples
##'
##' library(lavaan)
##' mardiaSkew(HolzingerSwineford1939[ , paste0("x", 1:9)])
##'
##' @export
mardiaSkew <- function(dat, use = "everything") {
centeredDat <- scale(dat, center=TRUE, scale=FALSE)
invS <- solve(cov(dat, use = use))
FUN <- function(vec1, vec2, invS) {
as.numeric(t(as.matrix(vec1)) %*% invS %*% as.matrix(vec2))
}
FUN2 <- function(vec1, listVec2, invS) {
sapply(listVec2, FUN, vec1=vec1, invS=invS)
}
indivTerm <- sapply(as.list(data.frame(t(centeredDat))), FUN2,
listVec2=as.list(data.frame(t(centeredDat))), invS=invS)
b1d <- sum(indivTerm^3, na.rm = TRUE) / (nrow(dat)^2)
d <- ncol(dat)
chi <- nrow(dat) * b1d / 6
df <- d * (d + 1) * (d + 2) / 6
p <- pchisq(chi, df = df, lower.tail = FALSE)
out <- c(b1d = b1d, chi = chi, df=df, p=p)
class(out) <- c("lavaan.vector", "numeric")
return(out)
}
##' Finding Mardia's multivariate kurtosis
##'
##' Finding Mardia's multivariate kurtosis of multiple variables
##'
##' The Mardia's multivariate kurtosis formula (Mardia, 1970) is
##' \deqn{ b_{2, d} = \frac{1}{n}\sum^n_{i=1}\left[ \left(\bold{X}_i -
##' \bold{\bar{X}} \right)^{'} \bold{S}^{-1} \left(\bold{X}_i -
##' \bold{\bar{X}} \right) \right]^2, }
##' where \eqn{d} is the number of variables, \eqn{X} is the target
##' dataset with multiple variables, \eqn{n} is the sample size, \eqn{\bold{S}}
##' is the sample covariance matrix of the target dataset, and
##' \eqn{\bold{\bar{X}}} is the mean vectors of the target dataset binded in
##' \eqn{n} rows. When the population multivariate kurtosis is normal, the
##' \eqn{b_{2,d}} is asymptotically distributed as normal distribution with the
##' mean of \eqn{d(d + 2)} and variance of \eqn{8d(d + 2)/n}.
##'
##'
##' @importFrom stats cov pnorm
##'
##' @param dat The target matrix or data frame with multiple variables
##' @param use Missing data handling method from the [stats::cov()]
##' function.
##' @return A value of a Mardia's multivariate kurtosis with a test statistic
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##' @seealso \itemize{
##' \item [skew()] Find the univariate skewness of a variable
##' \item [kurtosis()] Find the univariate excessive kurtosis
##' of a variable
##' \item [mardiaSkew()] Find the Mardia's multivariate skewness
##' of a set of variables
##' }
##' @references Mardia, K. V. (1970). Measures of multivariate skewness and
##' kurtosis with applications. *Biometrika, 57*(3), 519--530.
##' \doi{10.2307/2334770}
##' @examples
##'
##' library(lavaan)
##' mardiaKurtosis(HolzingerSwineford1939[ , paste0("x", 1:9)])
##'
##' @export
mardiaKurtosis <- function(dat, use = "everything") {
centeredDat <- scale(dat, center=TRUE, scale=FALSE)
invS <- solve(cov(dat, use = use))
FUN <- function(vec, invS) {
as.numeric(t(as.matrix(vec)) %*% invS %*% as.matrix(vec))
}
indivTerm <- sapply(as.list(data.frame(t(centeredDat))), FUN, invS=invS)
b2d <- sum(indivTerm^2, na.rm = TRUE) / nrow(dat)
d <- ncol(dat)
m <- d * (d + 2)
v <- 8 * d * (d + 2) / nrow(dat)
z <- (b2d - m)/sqrt(v)
p <- pnorm(-abs(z)) * 2
out <- c(b2d = b2d, z = z, p=p)
class(out) <- c("lavaan.vector", "numeric")
return(out)
}
## ----------------
## Hidden Functions
## ----------------
## centralMoment
## Calculate central moments of a variable
## Arguments:
## x: vector of a variable
## ord: order of the moment
## weight: weight variable
centralMoment <- function(x, ord) {
if(ord < 2) stop("Central moment can be calculated for order 2 or more in an integer.")
wm <- mean(x)
result <- sum((x - wm)^(ord))/length(x)
return(result)
}
## Example
## centralMoment(1:5, 2)
## kStat
## Calculate the k-statistic (i.e., unbiased estimator of a cumulant) of a variable
## Arguments:
## x: vector of a variable
## ord: order of the k-statistics
kStat <- function(x, ord) {
# Formula from mathworld wolfram
n <- length(x)
if(ord == 1) {
return(mean(x))
} else if (ord == 2) {
return(centralMoment(x, 2) * n / (n - 1))
} else if (ord == 3) {
return(centralMoment(x, 3) * n^2 / ((n - 1) * (n - 2)))
} else if (ord == 4) {
num1 <- n^2
num2 <- (n + 1) * centralMoment(x, 4)
num3 <- 3 * (n - 1) * centralMoment(x, 2)^2
denom <- (n - 1) * (n - 2) * (n - 3)
return((num1 * (num2 - num3))/denom)
} else {
stop("Order can be 1, 2, 3, or 4 only.")
}
}
## Example
## kStat(1:5, 4)
semTools/R/fitIndices.R 0000644 0001762 0000144 00000074076 15142343266 014510 0 ustar ligges users ### Title: Compute more fit indices
### Authors: Terrence D. Jorgensen, Sunthud Pornprasertmanit,
### Aaron Boulton, Ruben Arslan, Mauricio Garnier-Villarreal
### Last updated: 9 February 2026
### Description: Calculations for promising alternative fit indices
##' Calculate more fit indices
##'
##' Calculate more fit indices that are not already provided in lavaan.
##'
##' See [nullRMSEA()] for the further details of the computation of
##' RMSEA of the null model.
##'
##' Gamma-Hat (`gammaHat`; West, Taylor, & Wu, 2012) is a global
##' goodness-of-fit index which can be computed (assuming equal number of
##' indicators across groups) by
##'
##' \deqn{ \hat{\Gamma} =\frac{p}{p + 2 \times \frac{\chi^{2}_{k} - df_{k}}{N}},}
##'
##' where \eqn{p} is the number of variables in the model, \eqn{\chi^{2}_{k}} is
##' the \eqn{\chi^2} test statistic value of the target model, \eqn{df_{k}} is
##' the degree of freedom when fitting the target model, and \eqn{N} is the
##' sample size (or sample size minus the number of groups if `mimic` is
##' set to `"EQS"`).
##'
##' Adjusted Gamma-Hat (`adjGammaHat`; West, Taylor, & Wu, 2012) is a
##' global fit index which can be computed by
##'
##' \deqn{ \hat{\Gamma}_\textrm{adj} = \left(1 - \frac{K \times p \times
##' (p + 1)}{2 \times df_{k}} \right) \times \left( 1 - \hat{\Gamma} \right),}
##'
##' where \eqn{K} is the number of groups (please refer to Dudgeon, 2004, for
##' the multiple-group adjustment for `adjGammaHat`).
##'
##' Note that if Satorra--Bentler's or Yuan--Bentler's method is used, the fit
##' indices using the scaled \eqn{\chi^2} values are also provided.
##'
##' The remaining indices are information criteria calculated using the
##' `object`'s \eqn{-2 \times} log-likelihood, abbreviated \eqn{-2LL}.
##'
##' Corrected Akaike Information Criterion (`aic.smallN`; Burnham &
##' Anderson, 2003) is a corrected version of AIC for small sample size, often
##' abbreviated AICc:
##'
##' \deqn{ \textrm{AIC}_{\textrm{small}-N} = AIC + \frac{2q(q + 1)}{N - q - 1},}
##'
##' where \eqn{AIC} is the original AIC: \eqn{-2LL + 2q} (where \eqn{q}
##' = the number of estimated parameters in the target model). Note that AICc is
##' a small-sample correction derived for univariate regression models, so it is
##' probably *not* appropriate for comparing SEMs.
##'
##' Corrected Bayesian Information Criterion (`bic.priorN`; Kuha, 2004) is
##' similar to BIC but explicitly specifying the sample size on which the prior
##' is based (\eqn{N_{prior}}) using the `nPrior` argument.
##'
##' \deqn{ \textrm{BIC}_{\textrm{prior}-N} = -2LL + q\log{( 1 + \frac{N}{N_{prior}} )}.}
##'
##' Bollen et al. (2012, 2014) discussed additional BICs that incorporate more
##' terms from a Taylor series expansion, which the standard BIC drops. The
##' "Scaled Unit-Information Prior" BIC is calculated depending on whether the
##' product of the vector of estimated model parameters (\eqn{\hat{\theta}}) and
##' the observed information matrix (FIM) exceeds the number of estimated model
##' parameters (Case 1) or not (Case 2), which is checked internally:
##'
##' \deqn{ \textrm{SPBIC}_{\textrm{Case 1}} = -2LL + q(1 - \frac{q}{\hat{\theta}^{'} \textrm{FIM} \hat{\theta}}), \textrm{ or}}
##' \deqn{ \textrm{SPBIC}_{\textrm{Case 2}} = -2LL + \hat{\theta}^{'} \textrm{FIM} \hat{\theta},}
##'
##' Note that this implementation of SPBIC is calculated on the assumption that
##' priors for all estimated parameters are centered at zero, which is
##' inappropriate for most SEMs (e.g., variances should not have priors centered
##' at the lowest possible value; Bollen, 2014, p. 6).
##'
##' Bollen et al. (2014, eq. 14) credit the HBIC to Haughton (1988):
##'
##' \deqn{ \textrm{HBIC} = -2LL + q\log{\frac{N}{2 \pi}}.}
##'
##' Bollen et al. (2012, p. 305) proposed the information matrix (\eqn{I})-based BIC by
##' adding another term:
##'
##' \deqn{ \textrm{IBIC} = -2LL + q\log{\frac{N}{2 \pi}} + \log{\det{\textrm{FIM}}},}
##'
##' or equivalently, using the inverse information (the asymptotic sampling
##' covariance matrix of estimated parameters: ACOV):
##'
##' \deqn{ \textrm{IBIC} = -2LL - q\log{2 \pi} - \log{\det{\textrm{ACOV}}}.}
##'
##' Stochastic information criterion (SIC; see Preacher, 2006, for details) is
##' similar to IBIC but does not include the \eqn{q\log{2 \pi}} term that is
##' also in HBIC. SIC and IBIC both account for model complexity in a model's
##' functional form, not merely the number of free parameters. The SIC can be
##' computed as:
##'
##' \deqn{ \textrm{SIC} = -2LL + q\log{N} + \log{\det{\textrm{FIM}}} = -2LL - \log{\det{\textrm{ACOV}}}.}
##'
##' Hannan--Quinn Information Criterion (HQC; Hannan & Quinn, 1979) is used for
##' model selection, similar to AIC or BIC.
##'
##' \deqn{ \textrm{HQC} = -2LL + 2q\log{(\log{N})},}
##'
##' Bozdogan Information Complexity (ICOMP) Criteria (Howe et al., 2011),
##' instead of penalizing the number of free parameters directly,
##' ICOMP penalizes the covariance complexity of the model.
##'
##' \deqn{ \textrm{ICOMP} = -2LL + s \times log(\frac{\bar{\lambda_a}}{\bar{\lambda_g}}) }
##'
##'
##' @importFrom lavaan lavInspect
##'
##' @param object The lavaan model object provided after running the `cfa`,
##' `sem`, `growth`, or `lavaan` functions.
##' @param fit.measures Additional fit measures to be calculated. All additional
##' fit measures are calculated by default
##' @param nPrior The sample size on which prior is based. This argument is used
##' to compute `bic.priorN`.
##'
##' @return A `numeric` `lavaan.vector` including any of the
##' following requested via `fit.measures=`
##' \enumerate{
##' \item `gammaHat`: Gamma-Hat
##' \item `adjGammaHat`: Adjusted Gamma-Hat
##' \item `baseline.rmsea`: RMSEA of the default baseline (i.e., independence) model
##' \item `gammaHat.scaled`: Gamma-Hat using scaled \eqn{\chi^2}
##' \item `adjGammaHat.scaled`: Adjusted Gamma-Hat using scaled \eqn{\chi^2}
##' \item `baseline.rmsea.scaled`: RMSEA of the default baseline (i.e.,
##' independence) model using scaled \eqn{\chi^2}
##' \item `aic.smallN`: Corrected (for small sample size) AIC
##' \item `bic.priorN`: BIC with specified prior sample size
##' \item `spbic`: Scaled Unit-Information Prior BIC (SPBIC)
##' \item `hbic`: Haughton's BIC (HBIC)
##' \item `ibic`: Information-matrix-based BIC (IBIC)
##' \item `sic`: Stochastic Information Criterion (SIC)
##' \item `hqc`: Hannan-Quinn Information Criterion (HQC)
##' \item `icomp`: Bozdogan Information Complexity (ICOMP) Criteria
##' }
##'
##' @author Sunthud Pornprasertmanit (\email{psunthud@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' Aaron Boulton (University of Delaware)
##'
##' Ruben Arslan (Humboldt-University of Berlin, \email{rubenarslan@@gmail.com})
##'
##' Yves Rosseel (Ghent University; \email{Yves.Rosseel@@UGent.be})
##'
##' Mauricio Garnier-Villarreal (Vrije Universiteit Amsterdam; \email{mgv@pm.me})
##'
##' A great deal of feedback was provided by Kris Preacher regarding Bollen et
##' al.'s (2012, 2014) extensions of BIC.
##'
##' @seealso
##' \itemize{
##' \item [epcEquivFit()] For the equivalence testing based on expected
##' parameter changes for model fit evaluation
##' \item [nullRMSEA()] For RMSEA of the default independence model
##' }
##'
##' @references
##'
##' Bollen, K. A., Ray, S., Zavisca, J., & Harden, J. J. (2012). A comparison of
##' Bayes factor approximation methods including two new methods.
##' *Sociological Methods & Research, 41*(2), 294--324.
##' \doi{10.1177/0049124112452393}
##'
##' Bollen, K. A., Harden, J. J., Ray, S., & Zavisca, J. (2014). BIC and
##' alternative Bayesian information criteria in the selection of structural
##' equation models. *Structural Equation Modeling, 21*(1), 1--19.
##' \doi{10.1080/10705511.2014.856691}
##'
##' Burnham, K., & Anderson, D. (2003). *Model selection and
##' multimodel inference: A practical--theoretic approach*. New York, NY:
##' Springer--Verlag.
##'
##' Dudgeon, P. (2004). A note on extending Steiger's (1998) multiple sample
##' RMSEA adjustment to other noncentrality parameter-based statistic.
##' *Structural Equation Modeling, 11*(3), 305--319.
##' \doi{10.1207/s15328007sem1103_1}
##'
##' Howe, E. D., Bozdogan, H., & Katragadda, S. (2011). Structural equation
##' modeling (SEM) of categorical and mixed-data using the novel Gifi
##' transformations and information complexity (ICOMP) criterion.
##' *Istanbul University Journal of the School of Business Administration, 40*(1), 86--123.
##'
##' Kuha, J. (2004). AIC and BIC: Comparisons of assumptions and performance.
##' *Sociological Methods Research, 33*(2), 188--229.
##' \doi{10.1177/0049124103262065}
##'
##' Preacher, K. J. (2006). Quantifying parsimony in structural equation
##' modeling. *Multivariate Behavioral Research, 43*(3), 227--259.
##' \doi{10.1207/s15327906mbr4103_1}
##'
##' West, S. G., Taylor, A. B., & Wu, W. (2012). Model fit and model selection
##' in structural equation modeling. In R. H. Hoyle (Ed.), *Handbook of
##' structural equation modeling* (pp. 209--231). New York, NY: Guilford.
##'
##'
##' @examples
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' fit <- cfa(HS.model, data = HolzingerSwineford1939)
##' moreFitIndices(fit)
##'
##' fit2 <- cfa(HS.model, data = HolzingerSwineford1939, estimator = "mlr")
##' moreFitIndices(fit2)
##'
##' @export
moreFitIndices <- function(object, fit.measures = "all", nPrior = 1) {
## check for validity of user-specified "fit.measures" argument
fit.choices <- c("gammaHat","adjGammaHat","baseline.rmsea",
"gammaHat.scaled","adjGammaHat.scaled","baseline.rmsea.scaled",
"aic.smallN","bic.priorN","spbic","hbic","ibic","sic","hqc","icomp")
flags <- setdiff(fit.measures, c("all", fit.choices))
if (length(flags)) stop(paste("Argument 'fit.measures' includes invalid options:",
paste(flags, collapse = ", "),
"Please choose 'all' or among the following:",
paste(fit.choices, collapse = ", "), sep = "\n"))
if ("all" %in% fit.measures) fit.measures <- fit.choices
# Extract fit indices information from lavaan object
fit <- lavInspect(object, "fit")
# Get the number of variable
p <- length(lavaan::lavNames(object, type = "ov", group = 1))
# Get the number of parameters
nParam <- fit["npar"]
# Find the number of groups
ngroup <- lavInspect(object, "ngroups")
# Get number of observations
N <- n <- lavInspect(object, "ntotal")
if (lavInspect(object, "options")$mimic == "EQS") n <- n - ngroup
## Calculate -2*log(likelihood)
f <- -2 * fit["logl"]
## Compute fit indices
result <- list()
if (length(grep("gamma", fit.measures, ignore.case = TRUE))) {
gammaHat <- p / (p + 2 * ((fit["chisq"] - fit["df"]) / n))
adjGammaHat <- 1 - (((ngroup * p * (p + 1)) / 2) / fit["df"]) * (1 - gammaHat)
result["gammaHat"] <- gammaHat
result["adjGammaHat"] <- adjGammaHat
if (any(grepl(pattern = "scaled", x = names(fit)))) {
gammaHatScaled <- p / (p + 2 * ((fit["chisq.scaled"] - fit["df.scaled"]) / n))
adjGammaHatScaled <- 1 - (((ngroup * p * (p + 1)) / 2) / fit["df.scaled"]) * (1 - gammaHatScaled)
result["gammaHat.scaled"] <- gammaHatScaled
result["adjGammaHat.scaled"] <- adjGammaHatScaled
}
}
if (length(grep("rmsea", fit.measures))) {
result["baseline.rmsea"] <- nullRMSEA(object, silent = TRUE)
if (any(grepl(pattern = "scaled", x = names(fit)))) {
result["baseline.rmsea.scaled"] <- nullRMSEA(object, scaled = TRUE, silent = TRUE)
}
}
## Compute information criteria
if (!is.na(f)) {
if ("aic.smallN" %in% fit.measures) {
warning('AICc (aic.smallN) was developed for univariate linear models.',
' It is probably not appropriate to use AICc to compare SEMs.')
result["aic.smallN"] <- fit[["aic"]] + (2 * nParam * (nParam + 1)) / (N - nParam - 1)
}
if ("bic.priorN" %in% fit.measures) {
result["bic.priorN"] <- f + log(1 + N/nPrior) * nParam
}
if ("spbic" %in% fit.measures) {
theta <- lavaan::coef(object)
FIM <- lavInspect(object, "information.observed")
junk <- t(theta) %*% FIM %*% theta
if (nParam < junk) {
result["spbic"] <- f + nParam*(1 - log(nParam / junk)) # Case 1
} else result["spbic"] <- f + junk # Case 2
}
if ("hbic" %in% fit.measures) result["hbic"] <- f + nParam*log(N/(2*pi))
if ("icomp" %in% fit.measures) {
Fhatinv <- lavInspect(object, "inverted.information.expected")
s <- qr(Fhatinv)$rank
C1 <- (s/2)*log((sum(diag(Fhatinv)))/s) - .5*log(det(Fhatinv))
result["icomp"] <- f + 2*C1
}
## check determinant of ACOV for IBIC and SIC
if (any(c("ibic","sic") %in% fit.measures)) {
ACOV <- lavInspect(object, "vcov")
detACOV <- det(ACOV)
if (detACOV <= 0) {
## look for duplicate names (simple equality constraints) to
## remove any obviously redundant parameters
#TODO: any way to check for redundancies implied by "==" constraints?
RN <- unique(rownames(ACOV))
if (length(RN) < nrow(ACOV)) detACOV <- det(ACOV[RN, RN])
}
}
if ("ibic" %in% fit.measures) {
if (detACOV <= 0) {
result["ibic"] <- NA
message('Determinant of vcov(object) <= 0, so IBIC cannot be calculated')
} else result["ibic"] <- f - nParam*log(2*pi) - log(detACOV)
}
if ("sic" %in% fit.measures) {
if (detACOV <= 0) {
result["sic"] <- NA
message('Determinant of vcov(object) <= 0, so SIC cannot be calculated')
} else {
result["sic"] <- f - log(detACOV)
## doi:10.1007/s10519-004-5587-0 (p. 596) says to use observed Fisher information,
## but ACOV will be lavInspect(object, "options")$information
## legacy code:
# E.inv <- lavaan::lavTech(object, "inverted.information.observed")
# E <- MASS::ginv(E.inv) * lavaan::nobs(object) # multiply unit information by N
}
}
if ("hqc" %in% fit.measures) result["hqc"] <- f + 2*nParam*log(log(N))
}
result <- unlist(result[fit.measures])
class(result) <- c("lavaan.vector","numeric")
result
}
##' Calculate the RMSEA of the null model
##'
##' Calculate the RMSEA of the null (baseline) model
##'
##' RMSEA of the null model is calculated similar to the formula provided in the
##' `lavaan` package. The standard formula of RMSEA is
##'
##' \deqn{ RMSEA =\sqrt{\frac{\chi^2}{N \times df} - \frac{1}{N}} \times
##' \sqrt{G} }
##'
##' where \eqn{\chi^2} is the chi-square test statistic value of the target
##' model, \eqn{N} is the total sample size, \eqn{df} is the degree of freedom
##' of the hypothesized model, \eqn{G} is the number of groups. Kenny proposed
##' in his website that
##'
##' "A reasonable rule of thumb is to examine the RMSEA for the null model and
##' make sure that is no smaller than 0.158. An RMSEA for the model of 0.05 and
##' a TLI of .90, implies that the RMSEA of the null model is 0.158. If the
##' RMSEA for the null model is less than 0.158, an incremental measure of fit
##' may not be that informative."
##'
##' See also the paper cited in **References**.
##'
##'
##' @importFrom lavaan lavInspect
##'
##' @param object The lavaan model object provided after running the `cfa`,
##' `sem`, `growth`, or `lavaan` functions.
##' @param scaled If `TRUE`, the scaled (or robust, if available) RMSEA
##' is returned. Ignored if a robust test statistic was not requested.
##' @param silent If `TRUE`, do not print anything on the screen.
##'
##' @return A value of RMSEA of the null model (a `numeric` vector)
##' returned invisibly.
##'
##' @author
##' Ruben Arslan (Humboldt-University of Berlin, \email{rubenarslan@@gmail.com})
##'
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @seealso
##' \itemize{
##' \item [miPowerFit()] For the modification indices and their
##' power approach for model fit evaluation
##' \item [moreFitIndices()] For other fit indices
##' }
##'
##' @references Kenny, D. A., Kaniskan, B., & McCoach, D. B. (2015). The
##' performance of RMSEA in models with small degrees of freedom.
##' *Sociological Methods Research, 44*(3), 486--507.
##' \doi{10.1177/0049124114543236}
##'
##' @examples
##'
##' HS.model <- ' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9 '
##'
##' fit <- cfa(HS.model, data = HolzingerSwineford1939)
##' nullRMSEA(fit)
##'
##' @export
nullRMSEA <- function(object, scaled = FALSE, silent = FALSE) {
fit <- lavaan::update(object, slotData = object@Data,
model = lavaan::lav_partable_independence(object))
fits <- lavaan::fitMeasures(fit, fit.measures = c("rmsea","rmsea.scaled",
"rmsea.robust"))
if (scaled) {
RMSEA <- fits["rmsea.robust"]
if (is.na(RMSEA)) RMSEA <- fits["rmsea.scaled"]
if (is.na(RMSEA)) RMSEA <- fits["rmsea"]
} else RMSEA <- fits["rmsea"]
if (!silent) {
cat("The baseline model's RMSEA =", RMSEA, "\n\n")
if (RMSEA < 0.158 ) {
cat("CFI, TLI, and other incremental fit indices may not be very",
"informative because the baseline model's RMSEA < 0.158",
"(Kenny, Kaniskan, & McCoach, 2015). \n")
}
}
invisible(RMSEA)
}
## Stochastic Information Criterion
## f = minimized discrepancy function
## lresults = lavaan sem output object
#TODO: update to extract f from lresults. Make public?
sic <- function(f, lresults = NULL) {
## p. 596 of doi:10.1007/s10519-004-5587-0 says to use observed Fisher information
E.inv <- lavaan::lavTech(lresults, "inverted.information.observed")
if (inherits(E.inv, "try-error")) {
return(as.numeric(NA))
}
E <- MASS::ginv(E.inv) * lavaan::nobs(lresults)
eigvals <- eigen(E, symmetric = TRUE, only.values = TRUE)$values
# only positive ones
eigvals <- eigvals[ eigvals > sqrt(.Machine$double.eps)]
DET <- prod(eigvals)
## check singular
if (DET <= 0) return(NA)
## return SIC
}
##' Small-*N* correction for \eqn{chi^2} test statistic
##'
##' Calculate small-*N* corrections for \eqn{chi^2} model-fit test
##' statistic to adjust for small sample size (relative to model size).
##'
##' Four finite-sample adjustments to the chi-squared statistic are currently
##' available, all of which are described in Shi et al. (2018). These all
##' assume normally distributed data, and may not work well with severely
##' nonnormal data. Deng et al. (2018, section 4) review proposed small-*N*
##' adjustments that do not assume normality, which rarely show promise, so
##' they are not implemented here. This function currently will apply
##' small-*N* adjustments to scaled test statistics with a warning that
##' they do not perform well (Deng et al., 2018).
##'
##' @importFrom lavaan lavInspect lavNames
##' @importFrom stats pchisq
##' @importFrom methods getMethod
##'
##' @param fit0,fit1 [lavaan::lavaan-class] or [lavaan.mi::lavaan.mi-class] object(s)
##' @param smallN.method `character` indicating the small-*N*
##' correction method to use. Multiple may be chosen (all of which assume
##' normality), as described in Shi et al. (2018):
##' `c("swain","yuan.2015","yuan.2005","bartlett")`. Users may also
##' simply select `"all"`.
##' @param \dots Additional arguments to the [lavaan::lavTestLRT()] or
##' [lavaan.mi::lavTestLRT.mi()] functions. Ignored when `is.null(fit1)`.
##' @param omit.imps `character` vector specifying criteria for omitting
##' imputations from pooled results. Ignored unless `fit0` (and
##' optionally `fit1`) is a [lavaan.mi::lavaan.mi-class] object. See
##' [lavaan.mi::lavTestLRT.mi()] for a description of options and defaults.
##'
##' @return A `list` of `numeric` vectors: one for the originally
##' requested statistic(s), along with one per requested `smallN.method`.
##' All include the the (un)adjusted test statistic, its *df*, and the
##' *p* value for the test under the null hypothesis that the model fits
##' perfectly (or that the 2 models have equivalent fit).
##' The adjusted chi-squared statistic(s) also include(s) the scaling factor
##' for the small-*N* adjustment.
##'
##' @author
##' Terrence D. Jorgensen (University of Amsterdam; \email{TJorgensen314@@gmail.com})
##'
##' @references
##' Deng, L., Yang, M., & Marcoulides, K. M. (2018). Structural equation
##' modeling with many variables: A systematic review of issues and
##' developments. *Frontiers in Psychology, 9*, 580.
##' \doi{10.3389/fpsyg.2018.00580}
##'
##' Shi, D., Lee, T., & Terry, R. A. (2018). Revisiting the model
##' size effect in structural equation modeling.
##' *Structural Equation Modeling, 25*(1), 21--40.
##' \doi{10.1080/10705511.2017.1369088}
##'
##' @examples
##'
##' HS.model <- '
##' visual =~ x1 + b1*x2 + x3
##' textual =~ x4 + b2*x5 + x6
##' speed =~ x7 + b3*x8 + x9
##' '
##' fit1 <- cfa(HS.model, data = HolzingerSwineford1939[101:150,])
##' ## test a single model (implicitly compared to a saturated model)
##' chisqSmallN(fit1)
##'
##' ## fit a more constrained model
##' fit0 <- cfa(HS.model, data = HolzingerSwineford1939[101:150,],
##' orthogonal = TRUE)
##' ## compare 2 models
##' chisqSmallN(fit1, fit0)
##'
##' @export
chisqSmallN <- function(fit0, fit1 = NULL,
smallN.method = if (is.null(fit1)) c("swain","yuan.2015") else "yuan.2005",
..., omit.imps = c("no.conv","no.se")) {
if ("all" %in% smallN.method) smallN.method <- c("swain","yuan.2015",
"yuan.2005","bartlett")
smallN.method <- intersect(tolower(smallN.method),
c("swain","yuan.2015","yuan.2005","bartlett"))
if (!any(smallN.method %in% c("swain","yuan.2015","yuan.2005","bartlett")))
stop('No recognized options for "smallN.method" argument')
## check class
if (!inherits(fit0, what = c("lavaan","lavaan.mi")))
stop("this function is only applicable to fitted lavaan(.mi) models")
## necessary to load lavaan.mi?
if (inherits(fit0, what = c("lavaan.mi"))) {
if (!"package:lavaan.mi" %in% search()) attachNamespace("lavaan.mi")
}
## if there are 2 models...
if (!is.null(fit1)) {
## check classes
if (!inherits(fit1, what = c("lavaan","lavaan.mi")))
stop("this function is only applicable to fitted lavaan(.mi) models")
modClass <- unique(sapply(list(fit0, fit1), class))
if (length(modClass) > 1L) stop('All models must be of the same class (e.g.,',
' cannot compare lavaan objects to lavaan.mi)')
## check order of DF
# suppressMessages(DF0 <- getMethod("fitMeasures", class(fit0))(fit0, fit.measures = "df",
# omit.imps = omit.imps)[1])
# suppressMessages(DF1 <- getMethod("fitMeasures", class(fit1))(fit1, fit.measures = "df",
# omit.imps = omit.imps)[1])
if (inherits(fit0, "lavaan.mi")) {
DF0 <- fitMeasures(fit0, fit.measures = "df", omit.imps = omit.imps)[1]
DF1 <- fitMeasures(fit1, fit.measures = "df", omit.imps = omit.imps)[1]
} else {
DF0 <- fitMeasures(fit0, fit.measures = "df")[1]
DF1 <- fitMeasures(fit1, fit.measures = "df")[1]
}
if (DF0 == DF1) stop("Models have the same degrees of freedom.")
parent <- which.min(c(DF0, DF1))
if (parent == 1L) {
parent <- fit0
fit0 <- fit1
fit1 <- parent
parent <- DF0
DF0 <- DF1
DF1 <- parent
}
if (min(c(DF0, DF1)) == 0L) {
message('Less restricted model has df=0, so chi-squared difference ',
'not needed to compare models. Using only the restricted ',
"model's chi-squared statistic.")
fit1 <- NULL
}
}
## check whether methods can be used
if (!is.null(fit1)) {
if (any(smallN.method %in% c("yuan.2015","swain"))) {
message('Swain (1975) and Yuan (2015) corrections depend on the number ',
'of free parameters, so it is unavailable for model comparison.')
smallN.method <- smallN.method[-which(smallN.method %in% c("yuan.2015","swain"))]
}
if (!length(smallN.method)) {
stop('No valid options for "smallN.method" argument')
} else warning('Small-N corrections developed for single models, not for ',
'model comparison. Experimentally applying correction to ',
'chi-squared difference statistic, which might be invalid.')
}
## save quantities relevant across correction methods
N <- lavInspect(fit0, "ntotal")
Ng <- lavInspect(fit0, "ngroups")
if (!lavInspect(fit0, "options")$sample.cov.rescale) N <- N - Ng
P <- length(lavNames(fit0))
K <- length(lavNames(fit0, type = "lv")) # count latent factors
if (is.null(fit1)) {
# FIT <- getMethod("fitMeasures", class(fit0))(fit0,
# ## lavaan.mi arguments ignored
# ## for lavaan objects
# omit.imps = omit.imps,
# asymptotic = TRUE,
# fit.measures = c("npar","chisq",
# "df","pvalue"))
if (inherits(fit0, "lavaan.mi")) {
FIT <- fitMeasures(fit0, fit.measures = c("npar","chisq","df","pvalue"),
omit.imps = omit.imps, asymptotic = TRUE)
} else {
FIT <- fitMeasures(fit0, fit.measures = c("npar","chisq","df","pvalue"))
}
scaled <- any(grepl(pattern = "scaled", x = names(FIT)))
if (scaled) warning('Small-N corrections developed assuming normality, but',
' a scaled test was requested. Applying correction(s) ',
'to the scaled test statistic, but this has not ',
'performed well in past simulations.')
NPAR <- FIT[["npar"]]
chi <- FIT[[if (scaled) "chisq.scaled" else "chisq"]]
DF <- FIT[[if (scaled) "df.scaled" else "df"]]
PV <- FIT[[if (scaled) "pvalue.scaled" else "pvalue"]]
} else {
## Compare to a second model. Check matching stats.
N1 <- lavInspect(fit1, "ntotal")
Ng1 <- lavInspect(fit1, "ngroups")
if (!lavInspect(fit1, "options")$sample.cov.rescale) N1 <- N1 - Ng1
if (N != N1) stop("Unequal sample sizes")
if (P != length(lavNames(fit1))) stop("Unequal number of observed variables")
K1 <- length(lavNames(fit1, type = "lv"))
if (K != K1 && any(smallN.method %in% c("yuan.2005","bartlett"))) {
warning("Unequal number of latent variables (k). Unclear how to apply ",
"Yuan (2005) or Bartlett (2015) corrections when comparing ",
"models with different k. Experimentally using the larger ",
"model's k, but there is no evidence this is valid.")
K <- max(K, K1)
}
AOV <- try(compareFit(fit0, fit1, argsLRT = list(...), indices = FALSE),
silent = TRUE)
if (inherits(AOV, "try-error")) stop('Model comparison failed. Try using ',
'lavTestLRT() to investigate why.')
if (inherits(fit0, "lavaan")) {
if (grepl("scaled", attr(AOV@nested, "heading"), ignore.case = TRUE))
warning('Small-N corrections developed assuming normality, but scaled ',
'tests were requested. Applying correction(s) to the scaled test',
' statistic, but this has not performed well in past simulations.')
chi <- AOV@nested[["Chisq diff"]][2]
DF <- AOV@nested[["Df diff"]][2]
PV <- AOV@nested[["Pr(>Chisq)"]][2]
} else if (inherits(fit0, "lavaan.mi")) {
if (any(grepl("scaled", colnames(AOV@nested), ignore.case = TRUE)))
warning('Small-N corrections developed assuming normality, but scaled ',
'tests were requested. Applying correction(s) to the scaled test',
' statistic, but this has not performed well in past simulations.')
chi <- AOV@nested[1, 1]
DF <- AOV@nested[1, 2]
PV <- AOV@nested[1, 3]
}
}
## empty list to store correction(s)
out <- list()
out[[ lavInspect(fit0, "options")$test ]] <- c(chisq = chi, df = DF,
pvalue = PV)
class(out[[1]]) <- c("lavaan.vector","numeric")
## calculate Swain's (1975) correction
## (undefined for model comparison)
if ("swain" %in% smallN.method) {
Q <- (sqrt(1 + 8*NPAR) - 1) / 2
num <- P*(2*P^2 + 3*P - 1) - Q*(2*Q^2 + 3*Q - 1)
SC <- 1 - num / (12*DF*N)
out[["swain"]] <- c(chisq = chi*SC, df = DF,
pvalue = pchisq(chi*SC, DF, lower.tail = FALSE),
smallN.factor = SC)
class(out[["swain"]]) <- c("lavaan.vector","numeric")
}
## calculate Yuan's (2015) correction
## (undefined for model comparison)
if ("yuan.2015" %in% smallN.method) {
## numerator uses actual N regardless of sample.cov.rescale
SC <- (lavInspect(fit0, "ntotal") - (2.381 + .361*P + .006*NPAR)) / N
out[["yuan.2015"]] <- c(chisq = chi*SC, df = DF,
pvalue = pchisq(chi*SC, DF, lower.tail = FALSE),
smallN.factor = SC)
class(out[["yuan.2015"]]) <- c("lavaan.vector","numeric")
}
## calculate Yuan's (2005) correction
if ("yuan.2005" %in% smallN.method) {
SC <- 1 - ((2*P + 2*K + 7) / (6*N))
out[["yuan.2005"]] <- c(chisq = chi*SC, df = DF,
pvalue = pchisq(chi*SC, DF, lower.tail = FALSE),
smallN.factor = SC)
class(out[["yuan.2005"]]) <- c("lavaan.vector","numeric")
}
## calculate Bartlett's (1950) k-factor correction (ONLY appropriate for EFA)
if ("bartlett" %in% smallN.method) {
message('Bartlett\'s k-factor correction was developed for EFA models, ',
'not for general SEMs.')
SC <- 1 - ((2*P + 4*K + 5) / (6*N))
out[["bartlett"]] <- c(chisq = chi*SC, df = DF,
pvalue = pchisq(chi*SC, DF, lower.tail = FALSE),
smallN.factor = SC)
class(out[["bartlett"]]) <- c("lavaan.vector","numeric")
}
out[c(lavInspect(fit0, "options")$test, smallN.method)]
}
semTools/R/missingBootstrap.R 0000644 0001762 0000144 00000101130 15142343550 015750 0 ustar ligges users ### Terrence D. Jorgensen
### Last updated: 9 February 2026
### Savalei & Yuan's (2009) model-based bootstrap for missing data
## ----------------------------
## "BootMiss" Class and Methods
## ----------------------------
##' Class For the Results of Bollen-Stine Bootstrap with Incomplete Data
##'
##' This class contains the results of Bollen-Stine bootstrap with missing data.
##'
##'
##' @name BootMiss-class
##' @aliases BootMiss-class show,BootMiss-method summary,BootMiss-method
##' hist,BootMiss-method
##' @docType class
##' @section Objects from the Class: Objects can be created via the
##' [bsBootMiss()] function.
##' @slot time A list containing 2 `difftime` objects (`transform`
##' and `fit`), indicating the time elapsed for data transformation and
##' for fitting the model to bootstrap data sets, respectively.
##' @slot transData Transformed data
##' @slot bootDist The vector of \eqn{chi^2} values from bootstrap data sets
##' fitted by the target model
##' @slot origChi The \eqn{chi^2} value from the original data set
##' @slot df The degree of freedom of the model
##' @slot bootP The *p* value comparing the original \eqn{chi^2} with the
##' bootstrap distribution
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##' @seealso [bsBootMiss()]
##' @examples
##'
##' # See the example from the bsBootMiss function
##'
setClass("BootMiss", representation(time = "list",
transData = "data.frame",
bootDist = "vector",
origChi = "numeric",
df = "numeric",
bootP = "numeric"))
##' @rdname BootMiss-class
##' @aliases show,BootMiss-method
##' @importFrom stats pchisq
##' @export
setMethod("show", "BootMiss",
function(object) {
cat("Chi-Squared = ", object@origChi, "\nDegrees of Freedom = ",
object@df, "\nTheoretical p value = ",
pchisq(object@origChi, object@df, lower.tail = FALSE),
"\n i.e., pchisq(", object@origChi, ", df = ",
object@df, ", lower.tail = FALSE)\n",
"\nBootstrapped p value = ", object@bootP, "\n\n", sep = "")
invisible(object)
})
##' @rdname BootMiss-class
##' @aliases summary,BootMiss-method
##' @importFrom stats var
##' @export
setMethod("summary", "BootMiss",
function(object) {
cat("Time elapsed to transform the data:\n")
print(object@time$transform)
cat("\nTime elapsed to fit the model to", length(object@bootDist),
"bootstrapped samples:\n")
print(object@time$fit)
cat("\nMean of Theoretical Distribution = DF =", object@df,
"\nVariance of Theoretical Distribution = 2*DF =", 2*object@df,
"\n\nMean of Bootstrap Distribution =", mean(object@bootDist),
"\nVariance of Bootstrap Distribution =",
var(object@bootDist), "\n\n")
show(object)
invisible(object)
})
##' @rdname BootMiss-class
##' @aliases hist,BootMiss-method
##' @importFrom stats qchisq dchisq quantile
##' @param object,x object of class `BootMiss`
##' @param ... Additional arguments to pass to [graphics::hist()]
##' @param alpha alpha level used to draw confidence limits
##' @param nd number of digits to display
##' @param printLegend `logical`. If `TRUE` (default), a legend will
##' be printed with the histogram
##' @param legendArgs `list` of arguments passed to the
##' [graphics::legend()] function. The default argument is a list
##' placing the legend at the top-left of the figure.
##' @return The `hist` method returns a list of `length == 2`,
##' containing the arguments for the call to `hist` and the arguments
##' to the call for `legend`, respectively.
##' @export
setMethod("hist", "BootMiss",
function(x, ..., alpha = .05, nd = 2, printLegend = TRUE,
legendArgs = list(x = "topleft")) {
ChiSq <- x@origChi
DF <- x@df
bootDist <- x@bootDist
bCrit <- round(quantile(bootDist, probs = 1 - alpha), nd)
theoDist <- dchisq(seq(.1, max(c(ChiSq, bootDist)), by = .1), df = DF)
Crit <- round(qchisq(p = alpha, df = DF, lower.tail = FALSE), nd)
Lim <- c(0, max(c(ChiSq, bootDist, Crit)))
if (ChiSq > Lim[2]) Lim[2] <- ChiSq
histArgs <- list(...)
histArgs$x <- bootDist
histArgs$freq <- FALSE
if (is.null(histArgs$col)) histArgs$col <- "grey69"
if (is.null(histArgs$xlim)) histArgs$xlim <- Lim
if (is.null(histArgs$main)) histArgs$main <- expression("Model-Based Bootstrap Distribution of" ~ chi^2)
if (is.null(histArgs$ylab)) histArgs$ylab <- "Probability Density"
if (is.null(histArgs$xlab)) histArgs$xlab <- expression(chi^2)
if (printLegend) {
if (nd < length(strsplit(as.character(1 / alpha), "")[[1]]) - 1) {
warning(paste0("The number of digits argument (nd = ", nd ,
") is too low to display your p value at the",
" same precision as your requested alpha level (alpha = ",
alpha, ")"))
}
if (x@bootP < (1 / 10^nd)) {
pVal <- paste(c("< .", rep(0, nd - 1),"1"), collapse = "")
} else {
pVal <- paste("=", round(x@bootP, nd))
}
if (is.null(legendArgs$box.lty)) legendArgs$box.lty <- 0
if (is.null(legendArgs$lty)) legendArgs$lty <- c(1, 2, 2, 1, 0, 0)
if (is.null(legendArgs$lwd)) legendArgs$lwd <- c(2, 2, 2, 3, 0, 0)
#if (is.null(legendArgs$cex)) legendArgs$cex <- c(1.1, 1, 1, 1, 1, 1)
if (is.null(legendArgs$col)) legendArgs$col <- c("black","black","grey69","red","", "")
legendArgs$legend <- c(bquote(chi[.(paste("df =", DF))]^2),
bquote(Critical ~ chi[alpha ~ .(paste(" =", alpha))]^2 == .(Crit)),
bquote(Bootstrap~Critical~chi[alpha ~ .(paste(" =", alpha))]^2 == .(bCrit)),
expression(Observed ~ chi^2),
bquote(.("")),
bquote(Bootstrap ~ italic(p) ~~ .(pVal)))
}
H <- do.call(hist, c(histArgs["x"], plot = FALSE))
histArgs$ylim <- c(0, max(H$density, theoDist))
suppressWarnings({
do.call(hist, histArgs)
lines(x = seq(.1, max(c(ChiSq, bootDist)), by = .1), y = theoDist, lwd = 2)
abline(v = Crit, col = "black", lwd = 2, lty = 2)
abline(v = bCrit, col = "grey69", lwd = 2, lty = 2)
abline(v = ChiSq, col = "red", lwd = 3)
if (printLegend) do.call(legend, legendArgs)
})
## return arguments to create histogram (and optionally, legend)
invisible(list(hist = histArgs, legend = legendArgs))
})
## --------------------
## Constructor Function
## --------------------
##' Bollen-Stine Bootstrap with the Existence of Missing Data
##'
##' Implement the Bollen and Stine's (1992) Bootstrap when missing observations
##' exist. The implemented method is proposed by Savalei and Yuan (2009). This
##' can be used in two ways. The first and easiest option is to fit the model to
##' incomplete data in `lavaan` using the FIML estimator, then pass that
##' `lavaan` object to `bsBootMiss`.
##'
##' The second is designed for users of other software packages (e.g., LISREL,
##' EQS, Amos, or Mplus). Users can import their data, \eqn{\chi^2} value, and
##' model-implied moments from another package, and they have the option of
##' saving (or writing to a file) either the transformed data or bootstrapped
##' samples of that data, which can be analyzed in other programs. In order to
##' analyze the bootstrapped samples and return a *p* value, users of other
##' programs must still specify their model using lavaan syntax.
##'
##'
##' @importFrom lavaan lavInspect parTable
##' @param x A target `lavaan` object used in the Bollen-Stine bootstrap
##' @param transformation The transformation methods in Savalei and Yuan (2009).
##' There are three methods in the article, but only the first two are currently
##' implemented here. Use `transformation = 1` when there are few missing
##' data patterns, each of which has a large size, such as in a
##' planned-missing-data design. Use `transformation = 2` when there are
##' more missing data patterns. The currently unavailable
##' `transformation = 3` would be used when several missing data patterns
##' have n = 1.
##' @param nBoot The number of bootstrap samples.
##' @param model Optional. The target model if `x` is not provided.
##' @param rawData Optional. The target raw data set if `x` is not
##' provided.
##' @param Sigma Optional. The model-implied covariance matrix if `x` is
##' not provided.
##' @param Mu Optional. The model-implied mean vector if `x` is not
##' provided.
##' @param group Optional character string specifying the name of the grouping
##' variable in `rawData` if `x` is not provided.
##' @param ChiSquared Optional. The model's \eqn{\chi^2} test statistic if
##' `x` is not provided.
##' @param EMcov Optional, if `x` is not provided. The EM (or Two-Stage ML)
##' estimated covariance matrix used to speed up Transformation 2 algorithm.
##' @param transDataOnly Logical. If `TRUE`, the result will provide the
##' transformed data only.
##' @param writeTransData Logical. If `TRUE`, the transformed data set is
##' written to a text file, `transDataOnly` is set to `TRUE`, and the
##' transformed data is returned invisibly.
##' @param bootSamplesOnly Logical. If `TRUE`, the result will provide
##' bootstrap data sets only.
##' @param writeBootData Logical. If `TRUE`, the stacked bootstrap data
##' sets are written to a text file, `bootSamplesOnly` is set to
##' `TRUE`, and the list of bootstrap data sets are returned invisibly.
##' @param writeArgs Optional `list`. If `writeBootData = TRUE` or
##' `writeBootData = TRUE`, user can pass arguments to the
##' [utils::write.table()] function as a list. Some default values
##' are provided: `file` = "bootstrappedSamples.dat", `row.names` =
##' `FALSE`, and `na` = "-999", but the user can override all of these
##' by providing other values for those arguments in the `writeArgs` list.
##' @param seed The seed number used in randomly drawing bootstrap samples.
##' @param suppressWarn Logical. If `TRUE`, warnings from `lavaan`
##' function will be suppressed when fitting the model to each bootstrap sample.
##' @param showProgress Logical. Indicating whether to display a progress bar
##' while fitting models to bootstrap samples.
##' @param \dots The additional arguments in the [lavaan::lavaan()]
##' function. See also [lavaan::lavOptions()]
##' @return As a default, this function returns a [BootMiss-class]
##' object containing the results of the bootstrap samples. Use `show`,
##' `summary`, or `hist` to examine the results. Optionally, the
##' transformed data set is returned if `transDataOnly = TRUE`. Optionally,
##' the bootstrap data sets are returned if `bootSamplesOnly = TRUE`.
##' @author Terrence D. Jorgensen (University of Amsterdam;
##' \email{TJorgensen314@@gmail.com})
##'
##' Syntax for transformations borrowed from supplementary materials
##' in Savalei & Yuan (2009)
##'
##' @seealso [BootMiss-class]
##'
##' @references
##'
##' Bollen, K. A., & Stine, R. A. (1992). Bootstrapping goodness-of-fit measures
##' in structural equation models. *Sociological Methods &
##' Research, 21*(2), 205--229. \doi{10.1177/0049124192021002004}
##'
##' Savalei, V., & Yuan, K.-H. (2009). On the model-based bootstrap with missing
##' data: Obtaining a p-value for a test of exact fit. *Multivariate
##' Behavioral Research, 44*(6), 741--763. \doi{10.1080/00273170903333590}
##'
##' @examples
##'
##' dat1 <- HolzingerSwineford1939
##' dat1$x5 <- ifelse(dat1$x1 <= quantile(dat1$x1, .3), NA, dat1$x5)
##' dat1$x9 <- ifelse(is.na(dat1$x5), NA, dat1$x9)
##'
##' targetModel <- "
##' visual =~ x1 + x2 + x3
##' textual =~ x4 + x5 + x6
##' speed =~ x7 + x8 + x9
##' "
##' targetFit <- sem(targetModel, dat1, meanstructure = TRUE, std.lv = TRUE,
##' missing = "fiml", group = "school")
##' summary(targetFit, fit = TRUE, standardized = TRUE)
##'
##' \donttest{
##' ## The number of bootstrap samples should be much higher than this example
##' temp <- bsBootMiss(targetFit, transformation = 1, nBoot = 10, seed = 31415)
##'
##' temp
##' summary(temp)
##' hist(temp)
##' hist(temp, printLegend = FALSE) # suppress the legend
##' ## user can specify alpha level (default: alpha = 0.05), and the number of
##' ## digits to display (default: nd = 2). Pass other arguments to hist(...),
##' ## or a list of arguments to legend() via "legendArgs"
##' hist(temp, alpha = .01, nd = 3, xlab = "something else", breaks = 25,
##' legendArgs = list("bottomleft", box.lty = 2))
##' }
##'
##' @export
bsBootMiss <- function(x, transformation = 2, nBoot = 500, model, rawData,
Sigma, Mu, group, ChiSquared, EMcov,
writeTransData = FALSE, transDataOnly = FALSE,
writeBootData = FALSE, bootSamplesOnly = FALSE,
writeArgs, seed = NULL, suppressWarn = TRUE,
showProgress = TRUE, ...) {
if(writeTransData) transDataOnly <- TRUE
if(writeBootData) bootSamplesOnly <- TRUE
check.nBoot <- (!is.numeric(nBoot) | nBoot < 1L) & !transDataOnly
if (check.nBoot) stop("The \"nBoot\" argument must be a positive integer.")
## Which transformation?
if (!(transformation %in% 1:2)) stop("User must specify transformation 1 or 2.
Consult Savalei & Yuan (2009) for advice.
Transformation 3 is not currently available.")
if (transformation == 2) SavaleiYuan <- trans2
#if (transformation == 3) SavaleiYuan <- trans3
######################
## Data Preparation ##
######################
## If a lavaan object is supplied, the extracted values for rawData, Sigma, Mu,
## EMcov, and EMmeans will override any user-supplied arguments.
if (hasArg(x)) {
if (!lavInspect(x, "options")$meanstructure)
stop('You must fit the lavaan model with the argument "meanstructure=TRUE".')
nG <- lavInspect(x, "ngroups")
if (nG == 1L) {
rawData <- list(as.data.frame(lavInspect(x, "data")))
} else rawData <- lapply(lavInspect(x, "data"), as.data.frame)
for (g in seq_along(rawData)) {
colnames(rawData[[g]]) <- lavaan::lavNames(x)
checkAllMissing <- apply(rawData[[g]], 1, function(x) all(is.na(x)))
if (any(checkAllMissing)) rawData[[g]] <- rawData[[g]][!checkAllMissing, ]
}
ChiSquared <- lavInspect(x, "fit")[c("chisq", "chisq.scaled")]
ChiSquared <- ifelse(is.na(ChiSquared[2]), ChiSquared[1], ChiSquared[2])
group <- lavInspect(x, "group")
if (length(group) == 0) group <- "group"
group.label <- lavInspect(x, "group.label")
if (length(group.label) == 0) group.label <- 1
Sigma <- lavInspect(x, "cov.ov")
Mu <- lavInspect(x, "mean.ov")
EMcov <- lavInspect(x, "sampstat")$cov
if (nG == 1L) {
Sigma <- list(Sigma)
Mu <- list(Mu)
EMcov <- list(EMcov)
}
} else {
## If no lavaan object is supplied, check that required arguments are.
suppliedData <- c(hasArg(rawData), hasArg(Sigma), hasArg(Mu))
if (!all(suppliedData)) {
stop("Without a lavaan fitted object, user must supply raw data and",
" model-implied moments.")
}
if (!hasArg(model) & !(transDataOnly | bootSamplesOnly)) {
stop("Without model syntax or fitted lavaan object, user can only call",
" this function to save transformed data or bootstrapped samples.")
}
if (!hasArg(ChiSquared) & !(transDataOnly | bootSamplesOnly)) {
stop("Without a fitted lavaan object or ChiSquared argument, user can",
" only call this function to save transformed data, bootstrapped",
" samples, or bootstrapped chi-squared values.")
}
if (!any(c(transDataOnly, bootSamplesOnly))) {
if (!is.numeric(ChiSquared)) stop("The \"ChiSquared\" argument must be numeric.")
}
## If user supplies one-group data & moments, convert to lists.
if ( inherits(rawData, "data.frame")) rawData <- list(rawData)
if (!inherits(rawData, "list")) {
stop("The \"rawData\" argument must be a data.frame or list of data frames.")
} else {
if (!all(sapply(rawData, is.data.frame))) stop("Every element of \"rawData\" must be a data.frame")
}
if (inherits(Sigma, "matrix")) Sigma <- list(Sigma)
if (is.numeric(Mu)) Mu <- list(Mu)
## check whether EMcov was supplied for starting values in Trans2/Trans3
if (!hasArg(EMcov)) {
EMcov <- vector("list", length(Sigma))
} else {
if (inherits(EMcov, "matrix")) EMcov <- list(EMcov)
## check EMcov is symmetric and dimensions match Sigma
for (g in seq_along(EMcov)) {
if (!isSymmetric(EMcov[[g]])) stop("EMcov in group ", g, " not symmetric.")
unequalDim <- !all(dim(EMcov[[g]]) == dim(Sigma[[g]]))
if (unequalDim) stop("Unequal dimensions in Sigma and EMcov.")
}
}
## Check the number of groups by the size of the lists.
unequalGroups <- !all(length(rawData) == c(length(Sigma), length(Mu)))
if (unequalGroups) stop("Unequal number of groups in rawData, Sigma, Mu.
For multiple-group models, rawData must be a list of data frames,
NOT a single data frame with a \"group\" column.")
nG <- length(Sigma)
## In each group, check Sigma is symmetric and dimensions match rawData and Mu.
for (g in seq_along(rawData)) {
if (!isSymmetric(Sigma[[g]])) stop("Sigma in group ", g, " not symmetric.")
unequalDim <- !all(ncol(rawData[[g]]) == c(nrow(Sigma[[g]]), length(Mu[[g]])))
if (unequalDim) stop("Unequal dimensions in rawData, Sigma, Mu.")
}
## Check for names of group levels. If NULL, assign arbitrary ones.
if (!hasArg(group)) group <- "group"
if (!is.character(group)) stop("The \"group\" argument must be a character string.")
if (is.null(names(rawData))) {
group.label <- paste0("g", seq_along(rawData))
} else {
group.label <- names(rawData)
}
}
## save a copy as myTransDat, whose elements will be replaced iteratively by
## group and by missing data pattern within group.
myTransDat <- rawData
names(myTransDat) <- group.label
output <- list()
#########################
## Data Transformation ##
#########################
for (g in seq_along(group.label)) {
if (transformation == 1) {
## get missing data patterns
R <- ifelse(is.na(rawData[[g]]), 1, 0)
rowMissPatt <- apply(R, 1, function(x) paste(x, collapse = ""))
patt <- unique(rowMissPatt)
myRows <- lapply(patt, function(x) which(rowMissPatt == x))
## for each pattern, apply transformation
tStart <- Sys.time()
transDatList <- lapply(patt, trans1, rowMissPatt = rowMissPatt,
dat = rawData[[g]], Sigma = Sigma[[g]], Mu = Mu[[g]])
output$timeTrans <- Sys.time() - tStart
for (i in seq_along(patt)) myTransDat[[g]][myRows[[i]], ] <- transDatList[[i]]
} else {
tStart <- Sys.time()
myTransDat[[g]] <- SavaleiYuan(dat = rawData[[g]], Sigma = Sigma[[g]],
Mu = Mu[[g]], EMcov = EMcov[[g]])
output$timeTrans <- Sys.time() - tStart
}
}
## option to end function here
if (transDataOnly) {
for (g in seq_along(myTransDat)) myTransDat[[g]][ , group] <- group.label[g]
## option to write transformed data to a file
if (writeTransData) {
## Set a few options, if the user didn't.
if (!hasArg(writeArgs)) writeArgs <- list(file = "transformedData.dat",
row.names = FALSE, na = "-999")
if (!exists("file", where = writeArgs)) writeTransArgs$file <- "transformedData.dat"
if (!exists("row.names", where = writeArgs)) writeArgs$row.names <- FALSE
if (!exists("na", where = writeArgs)) writeArgs$na <- "-999"
## add grouping variable and bind together into one data frame
for (g in seq_along(myTransDat)) myTransDat[[g]][ , group] <- group.label[g]
writeArgs$x <- do.call("rbind", myTransDat)
## write to file, print details to screen
do.call("write.table", writeArgs)
cat("Transformed data was written to file \"", writeArgs$file, "\" in:\n\n",
getwd(), "\n\nunless path specified by user in 'file' argument.\n", sep = "")
return(invisible(writeArgs$x))
}
return(do.call("rbind", myTransDat))
}
#############################################
## Bootstrap distribution of fit statistic ##
#############################################
## draw bootstrap samples
if (!is.null(seed)) set.seed(seed)
bootSamples <- lapply(1:nBoot, function(x) getBootSample(myTransDat, group, group.label))
## option to write bootstrapped samples to file(s)
if (writeBootData) {
## Set a few options, if the user didn't.
if (!hasArg(writeArgs)) writeArgs <- list(file = "bootstrappedSamples.dat",
row.names = FALSE, na = "-999")
if (!exists("file", where = writeArgs)) writeTransArgs$file <- "bootstrappedSamples.dat"
if (!exists("row.names", where = writeArgs)) writeArgs$row.names <- FALSE
if (!exists("na", where = writeArgs)) writeArgs$na <- "-999"
## add indicator for bootstrapped sample, bind together into one data frame
for (b in seq_along(bootSamples)) bootSamples[[b]]$bootSample <- b
writeArgs$x <- do.call("rbind", bootSamples)
## write to file, print details to screen
do.call("write.table", writeArgs)
cat("Bootstrapped samples written to file \"", writeArgs$file, "\" in:\n\n",
getwd(), "\n\nunless path specified by user in 'file' argument.\n", sep = "")
return(invisible(bootSamples))
}
## option to end function here
if (bootSamplesOnly) return(bootSamples)
## check for lavaan arguments in (...)
lavaanArgs <- list(...)
lavaanArgs$group <- group
## fit model to bootstrap samples, save distribution of chi-squared test stat
if (hasArg(x)) {
## grab defaults from lavaan object "x"
lavaanArgs$slotParTable <- as.list(parTable(x))
lavaanArgs$slotModel <- x@Model
lavaanArgs$slotOptions <- lavInspect(x, "options")
} else {
lavaanArgs$model <- model
lavaanArgs$missing <- "fiml"
## set defaults that will be necessary for many models to run, that will
## probably not be specified explictly or included in lavaan syntax
lavaanArgs$meanstructure <- TRUE
if (!exists("auto.var", where = lavaanArgs)) lavaanArgs$auto.var <- TRUE
if (!exists("auto.cov.y", where = lavaanArgs)) lavaanArgs$auto.cov.y <- TRUE
if (!exists("auto.cov.lv.x", where = lavaanArgs)) lavaanArgs$auto.cov.lv.x <- TRUE
}
## run bootstrap fits
if (showProgress) {
mypb <- utils::txtProgressBar(min = 1, max = nBoot, initial = 1, char = "=",
width = 50, style = 3, file = "")
bootFits <- numeric()
tStart <- Sys.time()
for (j in 1:nBoot) {
bootFits[j] <- fitBootSample(bootSamples[[j]], args = lavaanArgs,
suppress = suppressWarn)
utils::setTxtProgressBar(mypb, j)
}
close(mypb)
output$timeFit <- Sys.time() - tStart
} else {
tStart <- Sys.time()
bootFits <- sapply(bootSamples, fitBootSample, args = lavaanArgs,
suppress = suppressWarn)
output$timeFit <- Sys.time() - tStart
}
## stack groups, save transformed data and distribution in output object
for (g in seq_along(myTransDat)) myTransDat[[g]][ , group] <- group.label[g]
output$Transformed.Data <- do.call("rbind", myTransDat)
output$Bootstrapped.Distribution <- bootFits
output$Original.ChiSquared <- ChiSquared
if (hasArg(x)) {
output$Degrees.Freedom <- lavInspect(x, "fit")["df"]
} else {
convSamp <- which(!is.na(bootFits))[1]
lavaanArgs$data <- bootSamples[[convSamp]]
lavaanlavaan <- function(...) { lavaan::lavaan(...) }
output$Degrees.Freedom <- lavInspect(do.call(lavaanlavaan, lavaanArgs), "fit")["df"]
}
## calculate bootstrapped p-value
output$Bootstrapped.p.Value <- mean(bootFits >= ChiSquared, na.rm = TRUE)
## print warning if any models didn't converge
if (any(is.na(bootFits))) {
nonConvMessage <- paste("Model did not converge for the following bootstrapped samples",
paste(which(is.na(bootFits)), collapse = "\t"), sep = ":\n")
warning(nonConvMessage)
}
finalResult <- new("BootMiss", time = list(transform = output$timeTrans, fit = output$timeFit), transData = output$Transformed.Data, bootDist = output$Bootstrapped.Distribution, origChi = output$Original.ChiSquared, df = output$Degrees.Freedom, bootP = output$Bootstrapped.p.Value)
finalResult
}
## ----------------
## Hidden Functions
## ----------------
## Function to execute Transformation 1 on a single missing-data pattern
trans1 <- function(MDpattern, rowMissPatt, dat, Sigma, Mu) {
myRows <- which(rowMissPatt == MDpattern)
X <- apply(dat[myRows, ], 2, scale, scale = FALSE)
observed <- !is.na(X[1, ])
Xreduced <- X[ , observed]
Mreduced <- as.numeric(Mu[observed])
SigmaChol <- chol(Sigma[observed, observed])
S <- t(Xreduced) %*% Xreduced / nrow(X)
Areduced <- t(SigmaChol) %*% t(solve(chol(S)))
Yreduced <- t(Areduced %*% t(Xreduced) + Mreduced)
Y <- replace(X, !is.na(X), Yreduced)
Y
}
## Function to execute Transformation 2 on a single group
trans2 <- function(dat, Sigma, Mu, EMcov) {
## Computing Function of A (eq. 12), whose root is desired
eq12 <- function(A) {
ga <- rep(0, pStar)
for (j in 1:J) {
Tj <- Mjs[[j]] %*% A %*% Hjs[[j]] %*% A %*% Mjs[[j]] - Mjs[[j]]
ga <- ga + Njs[j] * Dupinv %*% c(Tj) # same as vech(Tj)
}
ga
}
## Computing Derivative of Function of A (eq. 13)
eq13 <- function(A) {
deriv12 <- matrix(0, nrow = pStar, ncol = pStar)
for (j in 1:J) {
Tj1 <- Mjs[[j]] %*% A %*% Hjs[[j]]
deriv12 <- deriv12 + 2*Njs[j]*Dupinv %*% kronecker(Tj1, Mjs[[j]]) %*% Dup
}
deriv12
}
## get missing data patterns
R <- ifelse(is.na(dat), 1, 0)
rowMissPatt <- apply(R, 1, function(x) paste(x, collapse = ""))
MDpattern <- unique(rowMissPatt)
## sample size within each MD pattern
Njs <- sapply(MDpattern, function(patt) sum(rowMissPatt == patt))
J <- length(MDpattern) # number of MD patterns
p <- ncol(dat) # number of variables in model
pStar <- p*(p + 1) / 2 # number of nonredundant covariance elements
## create empty lists for each MD pattern
Xjs <- vector("list", J)
Hjs <- vector("list", J)
Mjs <- vector("list", J)
## Create Duplication Matrix and its inverse (Magnus & Neudecker, 1999)
Dup <- lavaan::lav_matrix_duplication(p)
Dupinv <- solve(t(Dup) %*% Dup) %*% t(Dup)
## step through each MD pattern, populate Hjs and Mjs
for (j in 1:J) {
Xjs[[j]] <- apply(dat[rowMissPatt == MDpattern[j], ], 2, scale, scale = FALSE)
if (!is.matrix(Xjs[[j]])) Xjs[[j]] <- t(Xjs[[j]])
observed <- !is.na(Xjs[[j]][1, ])
Sj <- t(Xjs[[j]]) %*% Xjs[[j]] / Njs[j]
Hjs[[j]] <- replace(Sj, is.na(Sj), 0)
Mjs[[j]] <- replace(Sj, !is.na(Sj), solve(Sigma[observed, observed]))
Mjs[[j]] <- replace(Mjs[[j]], is.na(Mjs[[j]]), 0)
}
## Compute starting Values for A
if (is.null(EMcov)) {
A <- diag(p)
} else {
EMeig <- eigen(EMcov)
EMrti <- EMeig$vectors %*% diag(1 / sqrt(EMeig$values)) %*% t(EMeig$vectors)
Sigeig <- eigen(Sigma)
Sigrt <- Sigeig$vectors %*% diag(sqrt(Sigeig$values)) %*% t(Sigeig$vectors)
B <- Sigrt %*% EMrti
A <- .5*(B + t(B))
}
## Newton Algorithm for finding root (eq. 14)
crit <- .1
a <- c(A)
fA <- eq12(A)
while (crit > 1e-11) {
dvecF <- eq13(A)
a <- a - Dup %*% solve(dvecF) %*% fA
A <- matrix(a, ncol = p)
fA <- eq12(A)
crit <- max(abs(fA))
}
## Transform dataset X to dataset Y
Yjs <- Xjs
for (j in 1:J) {
observed <- !is.na(Xjs[[j]][1, ])
XjReduced <- Xjs[[j]][ , observed, drop = FALSE]
Aj <- A[observed, observed, drop = FALSE]
Mj <- as.numeric(Mu[observed])
Yj <- t(Aj %*% t(XjReduced) + Mj)
Yjs[[j]] <- replace(Yjs[[j]], !is.na(Yjs[[j]]), Yj)
}
Y <- as.data.frame(do.call("rbind", Yjs))
colnames(Y) <- colnames(dat)
Y
}
## Function to execute Transformation 3 on a single group -- TRANSFORMATION DOES NOT RETURN CH-SQ = 0
trans3 <- function(dat, Sigma, Mu, EMcov) {
# Computing Saturated Means as a Function of A (eq. B1 in Appendix B)
mut <- function(A) {
M <- matrix(0, ncol = 1, nrow = p)
for (j in 1:J) {
M <- M + Njs[[j]] * Mjs[[j]] %*% A %*% Ybarjs[[j]]
}
Mjtoti %*% M
}
# Computing Function of A (eq. 18) whose root is desired
eq18 <- function(A) {
ga <- rep(0, pStar)
mutilda <- mut(A)
for (j in 1:J) {
Tj <- Mjs[[j]] %*% A %*% Hjs[[j]] %*% A %*% Mjs[[j]] - Mjs[[j]]
dif <- A %*% Ybarjs[[j]] - mutilda
middle <- dif %*% t(dif)
Tjnew <- Tj + Mjs[[j]] %*% middle %*% Mjs[[j]]
ga <- ga + Njs[j] * Dupinv %*% c(Tjnew)
}
ga
}
# Computing Derivative of Function eq. 18
deriv18 <- function(A) {
d18 <- matrix(0, nrow = pStar, ncol = pStar)
for (j in 1:J) {
Tj1 <- Mjs[[j]] %*% A %*% Hjs[[j]]
mutilda <- mut(A)
dif <- A %*% Ybarjs[[j]] - mutilda
Tj2 <- Mjs[[j]] %*% dif %*% t(Ybarjs[[j]])
Tj3 <- kronecker(Mjs[[j]] %*% dif, Mjs[[j]]) %*% Mjtoti %*% Tj3add
d18 <- d18 + 2*Njs[j]*Dupinv %*% ((kronecker((Tj1 + Tj2), Mjs[[j]])) - Tj3) %*% Dup
}
d18
}
## get missing data patterns
R <- ifelse(is.na(dat), 1, 0)
rowMissPatt <- apply(R, 1, function(x) paste(x, collapse = ""))
MDpattern <- unique(rowMissPatt)
## sample size within each MD pattern
Njs <- sapply(MDpattern, function(patt) sum(rowMissPatt == patt))
J <- length(MDpattern) # number of MD patterns
p <- ncol(dat) # number of variables in model
pStar <- p*(p + 1) / 2 # number of nonredundant covariance elements
## create empty lists for each MD pattern
Xjs <- vector("list", J)
Ybarjs <- vector("list", J)
Hjs <- vector("list", J)
Mjs <- vector("list", J)
Mjtot <- matrix(0, ncol = p, nrow = p)
Tj3add <- matrix(0, nrow = p, ncol = p * p)
## Create Duplication Matrix and its inverse (Magnus & Neudecker, 1999)
Dup <- lavaan::lav_matrix_duplication(p)
Dupinv <- solve(t(Dup) %*% Dup) %*% t(Dup)
## step through each MD pattern, populate Hjs and Mjs
for (j in 1:J) {
Xjs[[j]] <- apply(dat[rowMissPatt == MDpattern[j], ], 2, scale, scale = FALSE)
if (!is.matrix(Xjs[[j]])) Xjs[[j]] <- t(Xjs[[j]])
observed <- !is.na(Xjs[[j]][1, ])
pj <- p - sum(observed)
means <- colMeans(dat[rowMissPatt == MDpattern[j], ])
Ybarjs[[j]] <- replace(means, is.na(means), 0)
Sj <- t(Xjs[[j]]) %*% Xjs[[j]] / Njs[j]
Hjs[[j]] <- replace(Sj, is.na(Sj), 0)
Mjs[[j]] <- replace(Sj, !is.na(Sj), solve(Sigma[observed, observed]))
Mjs[[j]] <- replace(Mjs[[j]], is.na(Mjs[[j]]), 0)
Mjtot <- Mjtot + Njs[[j]] * Mjs[[j]]
Tj3add <- Tj3add + Njs[[j]] * kronecker(t(Ybarjs[[j]]), Mjs[[j]])
}
Mjtoti <- solve(Mjtot)
## Compute starting Values for A
if (is.null(EMcov)) {
A <- diag(p)
} else {
EMeig <- eigen(EMcov)
EMrti <- EMeig$vectors %*% diag(1 / sqrt(EMeig$values)) %*% t(EMeig$vectors)
Sigeig <- eigen(Sigma)
Sigrt <- Sigeig$vectors %*% diag(sqrt(Sigeig$values)) %*% t(Sigeig$vectors)
B <- Sigrt %*% EMrti
A <- .5*(B + t(B))
}
## Newton Algorithm for finding root (eq. 14)
crit <- .1
a <- c(A)
fA <- eq18(A)
while (crit > 1e-11) {
dvecF <- deriv18(A)
a <- a - Dup %*% solve(dvecF) %*% fA
A <- matrix(a, ncol = p)
fA <- eq18(A)
crit <- max(abs(fA))
}
## Transform dataset X to dataset Y (Z in the paper, eqs. 15-16)
Yjs <- Xjs
for (j in 1:J) {
observed <- !is.na(Xjs[[j]][1, ])
XjReduced <- Xjs[[j]][ , observed, drop = FALSE]
Aj <- A[observed, observed, drop = FALSE]
Mj <- as.numeric((Mu - mut(A))[observed])
Yj <- t(Aj %*% t(XjReduced) + Mj)
Yjs[[j]] <- replace(Yjs[[j]], !is.na(Yjs[[j]]), Yj)
}
Y <- as.data.frame(do.call("rbind", Yjs))
colnames(Y) <- colnames(dat)
Y
}
## Get a single bootstrapped sample from the transformed data. If there are
## multiple groups, bootstrapping occurs independently within each group, and
## a single data frame is returned. A new column is added to indicate group
## membership, which will be ignored in a single-group analysis.
getBootSample <- function(groupDat, group, group.label) {
bootSamp <- list()
for (g in seq_along(groupDat)) {
dat <- groupDat[[g]]
dat[ , group] <- group.label[g]
bootSamp[[g]] <- dat[sample(1:nrow(dat), nrow(dat), replace = TRUE), ]
}
do.call("rbind", bootSamp)
}
## fit the model to a single bootstrapped sample and return chi-squared
##' @importFrom lavaan lavInspect
fitBootSample <- function(dat, args, suppress) {
args$data <- dat
lavaanlavaan <- function(...) { lavaan::lavaan(...) }
if (suppress) {
fit <- suppressWarnings(do.call(lavaanlavaan, args))
} else {
fit <- do.call(lavaanlavaan, args)
}
if (!exists("fit")) return(c(chisq = NA))
if (lavInspect(fit, "converged")) {
chisq <- lavInspect(fit, "fit")[c("chisq", "chisq.scaled")]
} else {
chisq <- NA
}
if (is.na(chisq[2])) return(chisq[1]) else return(chisq[2])
}
semTools/R/emmeans_lavaan.R 0000644 0001762 0000144 00000042072 14752750163 015371 0 ustar ligges users ### Mattan S. Ben-Shachar & Terrence D. Jorgensen
### Last updated: 11 February 2025
### emmeans support for lavaan objects
##' `emmeans` Support Functions for `lavaan` Models
##'
##' @description Provide emmeans support for lavaan objects
##'
##' @param object An object of class [lavaan::lavaan()].
##' See **Details**.
##' @param lavaan.DV `character` string naming the variable(s) for which
##' expected marginal means / trends should be produced.
##' A vector of names indicates a multivariate outcome, treated by default
##' as repeated measures.
##' @param data An optional `data.frame` without missing values, to be passed
##' when `missing="FIML"` estimation was useed, thus avoiding a reference-grid
##' with missing values.
##' @param trms,xlev,grid See `emmeans::emm_basis`
##' @param ... Further arguments passed to `emmeans::recover_data.lm` or
##' `emmeans::emm_basis.lm`
##'
##' @details
##'
##' \subsection{Supported DVs}{
##' `lavaan.DV` must be an *endogenous variable*, by appearing on
##' the left-hand side of either a regression operator (`"~"`)
##' or an intercept operator (`"~1"`), or both.
##' \cr\cr
##' `lavaan.DV` can also be a vector of endogenous variable, in which
##' case they will be treated by `emmeans` as a multivariate outcome
##' (often, this indicates repeated measures) represented by an additional
##' factor named `rep.meas` by default. The `mult.name=` argument
##' can be used to overwrite this default name.
##' }
##'
##' \subsection{Unsupported Models}{
##' This functionality does not support the following models:
##' \itemize{
##' \item Multi-level models are not supported.
##' \item Models not fit to a `data.frame` (i.e., models fit to a
##' covariance matrix).
##' }
##' }
##'
##' \subsection{Dealing with Fixed Parameters}{
##' Fixed parameters (set with `lavaan`'s modifiers) are treated as-is:
##' their values are set by the users, and they have a *SE* of 0 (as such,
##' they do not co-vary with any other parameter).
##' }
##'
##' \subsection{Dealing with Multigroup Models}{
##' If a multigroup model is supplied, a factor is added to the reference grid,
##' the name matching the `group` argument supplied when fitting the model.
##' *Note that you must set* `nesting = NULL`.
##' }
##'
##' \subsection{Dealing with Missing Data}{
##' Limited testing suggests that these functions do work when the model was fit
##' to incomplete data.
##' }
##'
##' \subsection{Dealing with Factors}{
##' By default `emmeans` recognizes binary variables (0,1) as a "factor"
##' with two levels (and not a continuous variable). With some clever contrast
##' defenitions it should be possible to get the desired emmeans / contasts.
##' See example below.
##' }
##'
##' @author Mattan S. Ben-Shachar (Ben-Gurion University of the Negev;
##' \email{matanshm@@post.bgu.ac.il})
##'
##' @example inst/examples/lavaan2emmeans.R
##'
##' @name lavaan2emmeans
NULL
##' @rdname lavaan2emmeans
recover_data.lavaan <- function(object, lavaan.DV, data = NULL, ...){
if (!requireNamespace("emmeans", quietly = TRUE)){
stop("'emmeans' is not installed.")
}
.emlav_test_DV(object, lavaan.DV)
## testing multi-group requires access to ...
dots <- list(...)
#FIXME: the nesting= argument is not passed to ... here, so the warning is
# annoyingly printed even when nesting = NULL is specified.
if (lavInspect(object, 'ngroups') > 1L && !("nesting" %in% names(dots))) {
warning(
"For multi-group models, don't forget to set 'nesting = NULL'.\n",
"See `?lavaan2emmeans` for more info.",
call. = FALSE
)
}
# Fake it
lavaan_data <- .emlav_recover_data(object, data)
recovered <- emmeans::recover_data(.emlav_fake_fit(object, lavaan.DV, lavaan_data),
data = lavaan_data, ...)
# Make it
lavaan_data <- lavaan_data[, colnames(recovered), drop = FALSE]
# Fill attributes (but keep lavaan_data in case of missing data)
mostattributes(lavaan_data) <- attributes(recovered)
return(lavaan_data)
}
##' @rdname lavaan2emmeans
##' @importFrom lavaan lavInspect
emm_basis.lavaan <- function(object,trms, xlev, grid, lavaan.DV, ...){
if (!requireNamespace("emmeans", quietly = TRUE)) {
stop("'emmeans' is not installed.")
}
# Fake it
emmb <- emmeans::emm_basis(.emlav_fake_fit(object, lavaan.DV),
trms, xlev, grid, ...)
# bhat --------------------------------------------------------------------
pars <- .emlav_clean_pars_tab(object, lavaan.DV, "bhat")
par_names <- pars$rhs
if(nrow(pars) < length(emmb$bhat)) {
warning(
"Not all parameters have been estimated.\n",
"This is usually caused by a missing mean structure.\n",
"Fixing estimates for these parameters at 0.",
call. = FALSE
)
}
# re-shape to deal with any missing estimates
temp_bhat <- rep(0, length = length(emmb$bhat))
temp_bhat[seq_len(nrow(pars))] <- pars$est
names(temp_bhat) <- .emlab_find_term_names(par_names, colnames(emmb$V))
# re-order
b_ind <- match(colnames(emmb$V), names(temp_bhat))
emmb$bhat <- temp_bhat[b_ind]
# VCOV --------------------------------------------------------------------
lavVCOV <- lavInspect(object, "vcov")
pars <- .emlav_clean_pars_tab(object, lavaan.DV, "vcov")
par_names <- paste0(pars$lhs, pars$op, pars$rhs)
# only regression estimates
pattern <- paste0("^(", paste0(lavaan.DV, collapse = "|"), ")")
is_reg <- grepl(paste0(pattern, "~"), par_names)
is_cov <- grepl(paste0(pattern, "~~"), par_names)
only_reg <- is_reg & !is_cov
lavVCOV <- lavVCOV[only_reg, only_reg, drop = FALSE]
if(ncol(lavVCOV) < nrow(emmb$V)) {
warning(
"Not all parameters are included in the VCOV matrix.\n",
"Perhaps some are fixed with a modifier, or the mean structure is missing.\n",
"Fixing SEs for these parameters at 0.",
call. = FALSE
)
}
# get only RHS
par_names <- par_names[only_reg]
par_names <- sub(paste0("~1$"), "~(Intercept)", par_names)
par_names <- sub(paste0(pattern, "~"), "", par_names)
# re-shape to deal with any missing estimates
temp_vcov <- matrix(0, nrow = nrow(emmb$V), ncol = ncol(emmb$V))
temp_vcov[seq_len(ncol(lavVCOV)), seq_len(ncol(lavVCOV))] <- lavVCOV
colnames(temp_vcov) <- .emlab_find_term_names(par_names, colnames(emmb$V))
rownames(temp_vcov) <- .emlab_find_term_names(par_names, colnames(emmb$V))
# re-order
v_ind <- match(colnames(emmb$V), colnames(temp_vcov))
emmb$V <- temp_vcov[v_ind, v_ind]
# dffun & dfargs ----------------------------------------------------------
emmb$dffun <- function(...) Inf
emmb$dfargs <- list(df = Inf)
# nbasis and misc ---------------------------------------------------------
## DONT CHANGE! MESSES UP MULTI-DV REF_GRID
# emmb$nbasis <- matrix(NA, 1, 1)
# emmb$misc <- list()
return(emmb)
}
##' @keywords internal
.emlab_find_term_names <- function(terms, candidates) {
terms_split <- strsplit(terms, split = ":")
candidates_split <- strsplit(candidates, split = ":")
is_in <- matrix(NA,
nrow = length(terms_split),
ncol = length(candidates_split))
for (i in seq_along(terms_split)) {
for (j in seq_along(candidates_split)) {
is_in[i,j] <-
all(candidates_split[[j]] %in% terms_split[[i]]) &&
all(terms_split[[i]] %in% candidates_split[[j]])
}
}
if (length(terms) > 1L) is_in <- apply(is_in, 2, any)
c(terms, if (any(!is_in)) candidates[!is_in])
}
##' @keywords internal
##' @importFrom lavaan lavInspect
.emlav_test_DV <- function(object, lavaan.DV){
# has DV?
pars <- lavaan::parameterEstimates(object)
pars <- pars[pars$op %in% c("~1", "~"), ]
if (!all(lavaan.DV %in% pars$lhs)) {
lavaan.DV <- lavaan.DV[!lavaan.DV %in% pars$lhs]
lavaan.DV <- paste0(lavaan.DV, collapse = ",")
stop(
"{", lavaan.DV, "} is not predicted (endogenous) in this model!\n",
"See `?lavaan2emmeans` for more info.",
call. = FALSE
)
}
# Is DV ordered?
if (any(lavaan.DV %in% lavInspect(object, 'ordered'))) {
lavaan.DV <- lavaan.DV[lavaan.DV %in% lavInspect(object, 'ordered')]
lavaan.DV <- paste0(lavaan.DV, collapse = ",")
stop(
"{", lavaan.DV, "} is an ordered variable! ",
"Currently only continuous DVs are supported.\n",
"See `?lavaan2emmeans` for more info.",
call. = FALSE
)
}
# is multilevel?
if (lavInspect(object, 'nlevels') > 1L){
warning(
"emmeans support is unavailable for multilevel SEMs.",
call. = FALSE
)
}
invisible(NULL)
}
##' @keywords internal
##' @importFrom lavaan lavInspect
.emlav_recover_data <- function(object, data = NULL, silent = FALSE){
##This function was contributed by TDJ
dat <- lavaan::lavPredict(object, newdata = data,
type = "lv",
assemble = TRUE,
append.data = TRUE)
## convert to data.frame, if necessary (single group)
dat <- as.data.frame(dat)
if (anyNA(dat)) {
## mean-impute any NAs
if (!silent) {
warning("'data' contains missing value. Mean-imputing them.", call. = FALSE)
}
dat[] <- lapply(dat, function(v) {
if (is.numeric(v) && (ix <- length(which(is.na(v))))) {
v[ix] <- mean(v, na.rm = TRUE)
}
v
})
}
dat
}
#TODO: delete old function after verifying the new one (above) works
# function(object){
# data_obs <- lavInspect(object, "data")
# data_lat <- lavaan::lavPredict(object, type = "lv")
#
# # If multi group
# if (lavInspect(object, 'ngroups') > 1L) {
# # make single data frame + add group labels
# group_labels <- sapply(seq_along(names(data_obs)), function(i) {
# label_ <- names(data_obs)[i]
# nobs_ <- nrow(data_obs[[i]])
# rep(label_, times = nobs_)
# }, simplify = FALSE)
#
# data_obs <- data.frame(do.call(rbind, data_obs))
# data_obs[[lavInspect(object, "group")]] <- unlist(group_labels)
# data_lat <- do.call(rbind, data_lat)
# }
#
# data_full <- cbind(data_obs, data_lat)
# return(data.frame(data_full))
# }
##' @keywords internal
##' @importFrom lavaan lavInspect
.emlav_fake_fit <- function(object, lavaan.DV, lavaan_data = NULL){
if (is.null(lavaan_data)) {
lavaan_data <- .emlav_recover_data(object, silent = TRUE)
}
# Fake it
pars <- lavaan::parameterEstimates(object)
pars <- pars[pars$lhs %in% lavaan.DV & pars$op == "~", ]
# If multi-group
if (lavInspect(object, 'ngroups') > 1L) {
# condition on group (no intercept!)
RHS <- paste0(
"0 +",
lavInspect(object, "group"),
"+",
lavInspect(object, "group"),
"/(",
paste0(pars$rhs, collapse = " + "),
")"
)
} else {
RHS <- paste0(pars$rhs, collapse = " + ")
}
lavaan_formula <- stats::as.formula(paste0(
paste0("cbind(",paste0(lavaan.DV, collapse = ","),")"),
"~",
RHS
))
return(lm(lavaan_formula, lavaan_data))
}
##' @keywords internal
##' @importFrom lavaan lavInspect
.emlav_clean_pars_tab <- function(object, lavaan.DV, type = c("bhat", "vcov")){
type <- match.arg(type)
if (type == "bhat") {
pars <- lavaan::parameterEstimates(object)
pars <- pars[pars$lhs %in% lavaan.DV & pars$op %in% c("~", "~1"), ]
} else {
pars <- lavaan::parameterEstimates(object,
remove.nonfree = TRUE,
remove.def = TRUE)
}
pars$rhs[pars$op == "~1"] <- "(Intercept)"
pars$op[pars$op == "~1"] <- "~"
if (lavInspect(object, 'ngroups') > 1L) {
group_labs <- paste0(lavInspect(object, 'group'),
lavInspect(object, 'group.label'))
pars$group <- group_labs[pars$group]
temp_rhs <- paste0(pars$group, ":", pars$rhs)
temp_rhs[grepl("(Intercept)", temp_rhs)] <-
pars$group[grepl("(Intercept)", temp_rhs)]
pars$rhs <- temp_rhs
}
if (length(lavaan.DV) > 1L) {
pars$rhs <- paste0(pars$lhs, ":", pars$rhs)
}
return(pars[, colnames(pars) %in% c("lhs", "op", "rhs", "label", "est")])
}
##' @keywords internal test
.emlav_run_tests <- function() {
if (!requireNamespace("testthat")) {
stop("Need 'testthat' for testing")
}
if (!requireNamespace("emmeans")) {
stop("Need 'emmeans' for testing")
}
testthat::test_that("moderation", {
model <- '
# regressions
Sepal.Length ~ b1 * Sepal.Width + b2 * Petal.Length + b3 * Sepal.Width:Petal.Length
# simple slopes for condition effect
below := b2 + b3 * (-1)
above := b2 + b3 * (+1)
'
semFit <- lavaan::sem(model = model,
data = datasets::iris,
meanstructure = TRUE)
em_ <- summary(
emmeans::emtrends(
semFit,
~ Sepal.Width,
"Petal.Length",
lavaan.DV = "Sepal.Length",
at = list(Sepal.Width = c(-1, 1))
)
)
em_est <- em_$Petal.Length.trend
em_se <- em_$SE
lv_est <-
lavaan::parameterEstimates(semFit, output = "pretty")[15:16, "est"]
lv_se <-
lavaan::parameterEstimates(semFit, output = "pretty")[15:16, "se"]
testthat::expect_equal(em_est, lv_est, tolerance = 1e-4)
testthat::expect_equal(em_se, lv_se, tolerance = 1e-4)
})
testthat::test_that("latent", {
model <- '
LAT1 =~ Sepal.Length + Sepal.Width
LAT1 ~ b1 * Petal.Width + 1 * Petal.Length
Petal.Length ~ Petal.Length.mean * 1
V1 := 1 * Petal.Length.mean + 1 * b1
V2 := 1 * Petal.Length.mean + 2 * b1
'
semFit <- suppressWarnings(
lavaan::sem(model = model,
data = datasets::iris,
std.lv = TRUE)
)
em_ <- suppressWarnings(summary(emmeans::emmeans(
semFit,
~ Petal.Width,
lavaan.DV = "LAT1",
at = list(Petal.Width = 1:2)
)))
em_est <- em_$emmean
lv_est <-
lavaan::parameterEstimates(semFit, output = "pretty")[15:16, "est"]
testthat::expect_equal(em_est, lv_est, tolerance = 1e-4)
})
testthat::test_that("multi-dv", {
model <- '
ind60 =~ x1 + x2 + x3
# metric invariance
dem60 =~ y1 + a*y2 + b*y3 + c*y4
dem65 =~ y5 + a*y6 + b*y7 + c*y8
# scalar invariance
y1 + y5 ~ d*1
y2 + y6 ~ e*1
y3 + y7 ~ f*1
y4 + y8 ~ g*1
# regressions (slopes differ: interaction with time)
dem60 ~ b1*ind60
dem65 ~ b2*ind60 + NA*1 + Mean.Diff*1
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
# conditional mean differences (besides mean(ind60) == 0)
low := (-1*b2 + Mean.Diff) - (-1*b1) # 1 SD below M
high := (b2 + Mean.Diff) - b1 # 1 SD above M
'
semFit <- lavaan::sem(model, data = lavaan::PoliticalDemocracy)
em_ <- suppressWarnings(summary(emmeans::emmeans(
semFit,
pairwise ~ rep.meas | ind60,
lavaan.DV = c("dem60", "dem65"),
at = list(ind60 = c(-1, 1))
)[[2]]))
em_est <- em_$estimate
lv_est <-
lavaan::parameterEstimates(semFit, output = "pretty")[49:50, "est"]
em_se <- em_$SE
lv_se <-
lavaan::parameterEstimates(semFit, output = "pretty")[49:50, "se"]
testthat::expect_equal(em_est, -lv_est, tolerance = 1e-4)
testthat::expect_equal(em_se, lv_se, tolerance = 1e-4)
})
testthat::test_that("Multi Group", {
model <- '
x1 ~ c(int1, int2)*1 + c(b1, b2)*ageyr
diff_11 := (int2 + b2*11) - (int1 + b1*11)
diff_13 := (int2 + b2*13) - (int1 + b1*13)
diff_15 := (int2 + b2*15) - (int1 + b1*15)
'
semFit <-
lavaan::sem(model,
group = "school",
data = lavaan::HolzingerSwineford1939)
em_ <-
suppressWarnings(summary(
emmeans::emmeans(
semFit,
pairwise ~ school | ageyr,
lavaan.DV = "x1",
at = list(ageyr = c(11, 13, 15)),
nesting = NULL
)[[2]]
))
em_est <- em_$estimate
lv_est <-
lavaan::parameterEstimates(semFit, output = "pretty")$est[11:13]
em_se <- em_$SE
lv_se <-
lavaan::parameterEstimates(semFit, output = "pretty")$se[11:13]
testthat::expect_equal(em_est, lv_est, tolerance = 1e-4)
testthat::expect_equal(em_se, lv_se, tolerance = 1e-4)
})
testthat::test_that("all!", {
model <- '
LAT1 =~ x1 + x2 + x3
LAT2 =~ x4 + x5 + x6
LAT3 =~ LAT1 + LAT2 + x7 + x8 + x9
LAT3 ~ c(b1,b1)*ageyr + agemo
grade ~ ageyr
'
semFit <- lavaan::sem(model,
data = lavaan::HolzingerSwineford1939,
group = "school")
rg <- suppressWarnings(emmeans::ref_grid(semFit, lavaan.DV = c("LAT3", "grade")))
testthat::expect_s4_class(rg, "emmGrid")
})
testthat::test_that("missing data", {
raw_mtcars <- mtcars_na <- datasets::mtcars
mtcars_na$hp[1] <- NA
model <- " mpg ~ hp + drat + hp:drat "
fit <- lavaan::sem(model, mtcars_na, missing = "fiml.x")
testthat::expect_warning(
rg <- emmeans::ref_grid(fit, lavaan.DV = "mpg"),
regexp = "missing")
testthat::expect_false(anyNA(rg@grid))
testthat::expect_equal(rg@grid$hp, 147.871, tolerance = 0.01)
testthat::expect_warning(
rg2 <- emmeans::ref_grid(fit, lavaan.DV = "mpg",
data = raw_mtcars),
regexp = NA)
testthat::expect_equal(rg2@grid$hp, mean(raw_mtcars$hp), tolerance = 0.01)
})
message("All good!")
}
semTools/vignettes/ 0000755 0001762 0000144 00000000000 15144013752 014071 5 ustar ligges users semTools/vignettes/partialInvariance.bib 0000644 0001762 0000144 00000002704 14006342740 020204 0 ustar ligges users @preamble{ " \newcommand{\noop}[1]{} " }
@article{cheung2002,
title={Evaluating goodness-of-fit indexes for testing measurement invariance},
author={Cheung, Gordon W and Rensvold, Roger B},
journal={Structural equation modeling},
volume={9},
number={2},
pages={233--255},
year={2002},
publisher={Taylor \& Francis}
}
@article{meade2008,
title={Power and sensitivity of alternative fit indices in tests of measurement invariance.},
author={Meade, Adam W and Johnson, Emily C and Braddy, Phillip W},
journal={Journal of Applied Psychology},
volume={93},
number={3},
pages={568},
year={2008},
publisher={American Psychological Association}
}
@book{muthen1998,
title={Mplus technical appendices},
author={Muth{\'e}n, Bengt O},
publisher={Muth{\'e}n \& Muth{\'e}n},
address={Los Angeles, CA},
year={\noop{1998}1998--2004}
}
@article{cohen1992,
title={A power primer.},
author={Cohen, Jacob},
journal={Psychological bulletin},
volume={112},
number={1},
pages={155--159},
year={1992},
publisher={American Psychological Association}
}
@incollection{millsap2012,
author = {Millsap, Roger E and Olivera-Aguilar, Margarita},
title = {Investigating measurement invariance using confirmatory factor analysis},
editor = {Hoyle, Rick H},
booktitle = {Handbook of structural equation modeling},
pages = {380--392},
publisher = {Guilford},
address = {New York},
year = {2012}
} semTools/vignettes/partialInvariance.Rnw 0000644 0001762 0000144 00000024372 14006342740 020223 0 ustar ligges users \documentclass[12pt]{article}
%%\VignetteIndexEntry{Partial Invariance}
%%\VignetteDepends{semTools}
\usepackage[utf8]{inputenc}
\usepackage{amsfonts}
\usepackage{amstext}
\usepackage{amsmath}
\usepackage{natbib}
\title{A Note on Effect Size for Measurement Invariance}
\author{Sunthud Pornprasertmanit}
\begin{document}
\maketitle
This article aims to show the mathematical reasoning behind all effect sizes used in the \texttt{partialInvariance} and \texttt{partialInvarianceCat} functions in \texttt{semTools} package. In the functions, the following statistics are compared across groups: factor loadings, item intercepts (for continuous items), item thresholds (for categorical items), measurement error variances, and factor means.
The comparison can be compared between two groups (e.g., Cohen's \emph{d}) or multiple groups (e.g., $R^2$). This note provides the details of the effect sizes in comparing two groups only. The comparison between multiple groups can be done by picking the reference group and compare the other groups with the reference group in the similar fashion to dummy variables. For example, the comparison between four groups would create three effect size values (i.e., Group 1 vs. Reference, Group 2 vs. Reference, and Group 3 vs. Reference). Alternatively, for the measurement invariance, the change in comparative fit index (CFI) can be used as the measure of effect size. In the measurement invariance literature \citep{cheung2002, meade2008}, the change in CFI is used to test the equality constraints for multiple items simultaneously. The functions in \texttt{semTools} will show the change in CFI for each individual item. That is, if an item were to allow to have different statistics (e.g., loading), how large the CFI would drop from the original model. Please note that more research is needed in finding the appropriate cutoffs for the change in CFI for individual items. Are the cutoffs of .002 or .01 appropriate for this context?
In creating effect size, a target statistic needs to be standardized. Sample variances are used in the standardization formula. If researchers can assume that target variances across groups are equal in population, then pooled variances can be used in the standardization. The pooled variance $s^2_P$ can be computed as follows:
$$s^2_P = \frac{\sum^G_{g=1}(n_g - 1)s^2_g}{\sum^G_{g=1}(n_g - 1)},$$
\noindent
where $g$ represents the index of groups, $G$ is the number of groups, $s^2_g$ represents the variance of Group $g$, and $n_g$ is the Group $g$ size. If the variances are not assumed to be equal across groups, I recommend to pick a reference (baseline) group for the standardization.
In the following sections, I will show how effect sizes are defined in each type of partial invariance testing.
\section{Factor Loading}
Let $\lambda_{ijg}$ be the unstandardized factor loading of Item $i$ from Factor $j$ in Group $g$. A standardized factor loading $\lambda^*_{ijg}$ can be computed \citep{muthen1998}:
$$\lambda^*_{ijg} = \lambda_{ijg}\cdot\frac{\psi_{jg}}{\sigma_{ig}},$$
\noindent
where $\psi_{jg}$ is the standard deviation of Factor $j$ from Group $g$ and $\sigma_{ig}$ is the total standard deviation of Item $i$ from Group $g$. To quantify the difference in factor loadings between groups in standardized scale, the standard deviation in the standardization formula needs to be the same across groups. If Group A and Group B are compared, the standardized difference in factor loading is defined:
$$\Delta\lambda^*_{ij} = (\lambda_{ijA} - \lambda_{ijB})\cdot\frac{\psi_{jP}}{\sigma_{iP}},$$
\noindent
where $\psi_{jP}$ is the pooled standard deviation of Factor $j$ and $\sigma_{iP}$ is the pooled total standard deviation of Item $i$. If Group A is the reference group, $\psi_{jA}$ and $\sigma_{iA}$ can substitute $\psi_{jP}$ and $\sigma_{iP}$. Assume that standardized factor loadings are from congeneric measurement model, standardized factor loadings represent the correlation between items and factors. \cite{cohen1992} provide a guideline for interpreting the magnitude of the difference in correlations for independent groups. The correlations are transformed to Fisher's z transformation:
$$q = \arctan\left(\lambda_{ijA}\cdot\frac{\psi_{jP}}{\sigma_{iP}}\right) - \arctan\left(\lambda_{ijB}\cdot\frac{\psi_{jP}}{\sigma_{iP}}\right)$$
Then, the $q$ values of .1, .3, and .5 are interpreted as small, medium, and large effect sizes.
For continuous outcomes, the amount of mean differences implied by the factor loading difference given a factor score can be used as an effect size \citep{millsap2012}. Let $X_ijg$ be the observed score of Item $i$ loaded on Factor $j$ from Group $g$ and $W_{j}$ represents the score of Factor $j$. The expected value of the observed score differences between Group A and Group B is calculated as follows:
$$E\left(X_{iA} - X_iB | W_j \right) = \left( \nu_{iA} - \nu_{iB} \right) + \left( \lambda_{ijA} - \lambda_{ijB} \right) \times W_{j},
$$
\noindent
where $\nu_{ig}$ represents the intercept of Item $i$ in Group $g$. Let the values between $W_{jl}$ and $W_{jh}$ be the values of interest. We can find the expected difference in the observed scores under this range of the factor scores. \cite{millsap2012} proposed that, if the size of the expected difference is over the value of meaningful differences, the loading difference is not negligible. See their article for the discussion of the meaningful difference.
Note that, in the \texttt{partialInvariance} function, $W_{jl}$ is calculated by (a) finding the factor scores representing a low \emph{z}-score (e.g., -2) from all groups and (b) selecting the lowest factor score across all groups. $W_{jh}$ is calculated by (a) finding the factor scores representing a high \emph{z}-score (e.g., 2) from all groups and (b) selecting the highest factor score across all groups.
\section{Item Intercepts}
Let $\nu_{ig}$ be the intercept of Item $i$ in Group $g$. A standardized intercept $\nu^*_{ig}$ is defined as follows \citep{muthen1998}:
$$\nu^*_{ig} = \nu_{ig} / \sigma_{ig}.$$
Thus, the standardized difference between Groups A and B in item intercepts is defined:
$$\Delta\nu^*_{i} = (\nu_{iA} - \nu_{iB}) / \sigma_{iP}.$$
Note that $\sigma_{iA}$ can substitute $\sigma_{iP}$ if Group A is the reference group. By using this scale, .2, .5, and .8 can be interpreted as small, medium, and large effect sizes according to \cite{cohen1992}.
The proportion of the intercept difference over the observed score difference may be used as an effect size \citep{millsap2012}:
$$(\nu_{iA} - \nu_{iB}) / (M_{iA} - M_{iB}),
$$
\noindent
where $M_{ig}$ represents the observed mean of Item $i$ in Group $g$. \cite{millsap2012} noted that a relatively small proportion (e.g., less than 20\%) is ignorable. If the sign is negative or the value is over 1, the interpretation is doubtful.
\section{Item Thresholds}
Let $\tau_{cig}$ be the threshold categorizing between category $c$ and $c + 1$ for Item $i$ in Group $g$. Note that the maximum number of $c$ is the number of categories minus 1. Because thresholds are the location of the distribution underlying ordered categorical items (usually normal distribution), the location statistic can be standardized by dividing it by the standard deviation of the underlying distribution. The standardized threshold $\tau^*_{cig}$ is defined as follows:
$$\tau^*_{cig} = \tau_{cig} / \sigma^u_{ig},$$
\noindent
where $\sigma^u_{ig}$ is the standard deviation of the distribution underlying the categorical data for Item $i$ in Group $g$. In theta parameterization of categorical confirmatory factor analysis, $\sigma^u_{ig}$ may not be equal across groups. The standardized difference in thresholds between Group A and B needs the pooled standard deviation. The standardized difference in thresholds is defined:
$$\Delta\tau^*_{ci} = (\tau_{ciA} - \tau_{ciB}) / \sigma^u_{iP}.$$
Note that $\sigma^u_{iA}$ can substitute $\sigma^u_{iP}$ if Group A is the reference group. By using this scale, .2, .5, and .8 can be interpreted as small, medium, and large effect sizes according to \cite{cohen1992}.
\section{Measurement Error Variances}
Let $\theta_{ig}$ be the measurement error variance of Item $i$ in Group $g$. A standardized measurement error variance $\theta^*_{ig}$ is defined as follows \citep{muthen1998}:
$$\theta^*_{ig} = \theta_{ig} / \sigma_{ig},$$
Thus, the standardized difference between Groups A and B in measurement error variances could be defined:
$$\Delta\theta^*_{i} = (\theta_{iA} - \theta_{iB}) / \sigma_{iP}.$$
Note that $\sigma_{iA}$ can substitute $\sigma_{iP}$ if Group A is the reference group. However, there is no direct guideline to interpret the magnitude of the difference in measurement error variances according to Cohen (1992). A new standardized difference in measurement error variances is needed.
Assume that $\sigma_{iP}$ is always greater than $\theta_{iA}$ and $\theta_{iB}$, which is usually correct, then $\theta_{iA} / \sigma_{iP}$ and $\theta_{iB} / \sigma_{iP}$ ranges between 0 and 1 similar to a proportion statistic. \cite{cohen1992} provided a guideline in interpreting the magnitude of the difference in proportions using arcsine transformation. The new index ($h$) is defined as follows:
$$h = \sin^{-1}\sqrt{\frac{\theta_{iA}}{\sigma_{iP}}} - \sin^{-1}\sqrt{\frac{\theta_{iB}}{\sigma_{iP}}}.$$
Then, the $h$ values of .2, .5, and .8 are interpreted as small, medium, and large effect sizes.
If items are continuous, the proportion of the error variance difference over the observed variance difference may be used as an effect size \citep{millsap2012}:
$$(\theta_{iA} - \theta_{iB}) / (\sigma_{iA} - \sigma_{iB}).
$$
\noindent
If the sign is negative or the value is over 1, the interpretation is doubtful.
\section{Factor Means}
Let $\alpha_{jg}$ be the mean of Factor $j$ in Group $g$. A standardized factor mean $\alpha^*_{jg}$ is defined as follows \citep{muthen1998}:
$$\alpha^*_{jg} = \alpha_{jg} / \psi_{jg}$$
Thus, the standardized difference between Groups A and B in factor means is defined:
$$\Delta\alpha^*_{j} = (\alpha_{jA} - \alpha_{jB}) / \psi_{jP}.$$
Note that $\psi_{jA}$ can substitute $\psi_{jP}$ if Group A is the reference group. By using this scale, .2, .5, and .8 can be interpreted as small, medium, and large effect sizes according to \cite{cohen1992}.
\bibliographystyle{plainnat}
\bibliography{partialInvariance}
\end{document} semTools/data/ 0000755 0001762 0000144 00000000000 14006342740 012770 5 ustar ligges users semTools/data/exLong.rda 0000644 0001762 0000144 00000033516 14006342740 014724 0 ustar ligges users {?UqQfJd
IjLIdΔY*D$D1D$!IuIf<㾟y?{]u뵗4>#,;[v,<5p8jH=;7";w;?YKvuDG{˿s3w\wcޱjnZ)h~ͪ-2##q^ZK7Z ĊiGQ1_hõ`Ρo*. 3UR=J3zٔ?c[J: gJǡx@7guY!Dil}
y
QQ]~#5 NEK1-FHÌ:/Q+:Ïv:=t>*\WʇoIȓ>,;^}D f{uwҙ j4@lLu&sh2)'sϽȏtZ] ɬ'Х5HfJ,5>?km~wcG<0Psu:.<#JrY?)fiꧣоŢr飾>t q{ zbsTnEMGצj)0/vk^0teZ))/iϨFFu!:`.:=w"6xx=rʁ0IZʶ/ݳ4Z1\2äm36,nrq&XlEL-4&,#.܋g`tQ8bBexH0|jOQ+\G\91@;$B]Gjeq7gǪS$/ -r<9'=3Eڡ
-gPu1fxlӬ ,kBYi@٥; {;|ǥTa
Ra^-O!*ȋr>lfrrw8m.ZN5Kk.~&DVfvw0Zic:w
)Q=5\0~ZLgmia(
ǧ?~iKd(04T1rLNMR\o|^:>0/1\qh4&Sm%䡥,̭hrV'2:DŽ4HeB5.4ƸQ__ hfb BU;1M u/L9${l
Pg[O@3fo IXfh@#{ʪ Me)BwAQK!Q^u<`UՊ9ysspAidBa@;H1
TR6\Sa.g.T/۹~3OG.CwWV%ۏ_5
=.bhW d@Ge>^=.^EF']sݾ7AcMgr'r 3T-@KYGǰ5N^Y L=L%vͅ2\A{z/|B &]hwUQs6Xd: LgP-M:)㳾D`[o<
]τCZ+[_H
<?ˡytchaX$_@daA6]ߏD9q^Dm~O "F̈
&jRn0} G~*B$eɷaE+
ly_j84HĻMp]S*Pu {8p.Ѡ-d*D~!;ċSHyثPV'>KPE#9E6dTc6sxUDCcyk+hRiCUP3;|P|0Z>,UmS/5Lr;_Mg$
^4|LW4I፶ Pҙ/fXfaX '"қQQo9 ,J-;~E;CBS&Ƣ&ڦq ~s8:EPcӜԣ{F$tmeH@m8ᥒҮɂq=ӨVw)ϢS<7<_>z"8r$?7U\Z!шF#f}Cɵ:( cb|иfy0Q/'y9COԫq3
6y 6wo|u|s*4L!ftȠo
f;5ѲZ(D1$6ʡSf4$W/:
{wѠaUʝwLݸ,L]Z{RO&F^~QJǾ-ePGERAY@Zv9Lyil_GCbAJyhhްڢd7AgB?O އ|҈6[7Ԩ V0ς*I^4ϭ?8BB~r7C
Bw~1EW묐uwt+~=?
g=>Zv"hQYv!Qݺ)L&C]bl<<3?1VvhڟW.Ynzqjʷ]n0wlԩi).*;MN)a̻ĥ$eظ+Ȭ\2\o
|`*7A_rNJ`1չ0xWc
*V;+2t%m0d&\caP[zPA(䅍S2VB4uY Uǃ%1h2{ZX9baJ*Ku$b7?r\聉?.{)Un*Ætb@!10o0~J~9g1\ -G%q{i
Du
Pd̆.rVV=\a0dK74`Yzz9싄rF$L}X<!heOM)7ڗH0ʞٓc`F4Ry
-t|4x-˒+ݡ$?
6
L~+я5=GXaP}``lcGIOVkVeg0o3w~~~NsPX 4Gb[
h*<ަuݏ1Wb(sҔRCΫ(gm\X<ԙl[kbcB`JCcx
(O/\OA<;:huMagug2T-4k4zGr.K6H[0+)X&fF>1An쁜{26ƆvU P'5
u'{y=z\0{Su
l58T[]yDTX`.tao':v5Hq܋.dX)}v%"&'RyBՊۘ=\a">Bƹ{U0|ؐa蹼x|dԍva|>!Ubt9X
㎬u^FSZocĚzrb,s$g`I`HCe?/В!hsU
<[g`;ԹrRL/}0h^>:S/,U(SE)wk3&E[+~C'"`*`̛8?ElPmP@>+X<%S\A)4@i|U%+ zYsUZ%-,}K0{q}Fp4ucXg =DC9mÐK gS@/\l]6
*5`ⅵxl4Jӭ-9z.(| t=x
-wH%K9ܨ|i8I3+ˠvPx!)drUDbZǻ`H:,,ľ,m<8~UtZe?Y\C&.JeS2\b3颪;lBԠ
^l*1<'>J*[9!Cч8R#YQJ\?[