| Title: | Case Influence in Structural Equation Models |
|---|---|
| Description: | A set of tools for evaluating several measures of case influence for structural equation models. |
| Authors: | Massimiliano Pastore [aut, cre], Gianmarco Altoe' [ctb] |
| Maintainer: | Massimiliano Pastore <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.4 |
| Built: | 2026-05-23 06:55:20 UTC |
| Source: | https://github.com/cran/influence.SEM |
Internal function, called by Likedist.
bollen.loglik(N, S, Sigma)bollen.loglik(N, S, Sigma)
N |
Sample size. |
S |
Observed covariance matrix. |
Sigma |
Model fitted covariance matrix, |
The log-likelihood is computed by the function bollen.loglik using the formula 4B2 described by Bollen (1989, pag. 135).
Returns the Log-likelihood.
Massimiliano Pastore, Gianmarco Altoe'
Bollen, K.A. (1989). Structural Equations with latent Variables. New York, NY: Wiley.
data("PDII") model <- " F1 =~ y1+y2+y3+y4 " fit0 <- sem(model, data=PDII) N <- fit0@Data@nobs[[1]] S <- fit0@SampleStats@cov[[1]] Sigma <- fitted(fit0)$cov bollen.loglik(N,S,Sigma)data("PDII") model <- " F1 =~ y1+y2+y3+y4 " fit0 <- sem(model, data=PDII) N <- fit0@Data@nobs[[1]] S <- fit0@SampleStats@cov[[1]] Sigma <- fitted(fit0)$cov bollen.loglik(N,S,Sigma)
Quantifies case influence on overall model fit by change in the test statistic
where and are the test statistics obtained from original and deleted samples.
This function depends on the lavaan package.
Deltachi(model, data, ..., scaled = FALSE)Deltachi(model, data, ..., scaled = FALSE)
model |
A description of the user-specified model using the lavaan model syntax. See |
data |
A data frame containing the observed variables used in the model. If any variables are declared as ordered factors, this function will treat them as ordinal variables. |
... |
Additional parameters for |
scaled |
Logical, if |
Returns a vector of .
If for observation model does not converge or yelds a solution with negative estimated variances, the associated value of is set to NA.
This function is a particular case of fitinfluence, see example below.
Massimiliano Pastore
Pek, J., MacCallum, R.C. (2011). Sensitivity Analysis in Structural Equation Models: Cases and Their Influence. Multivariate Behavioral Research, 46, 202-228.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48, 1-36.
Rosseel, Y. (2022). The lavaan tutorial. URL: https://lavaan.ugent.be/tutorial/.
## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # Dchi <- Deltachi(model,data=PDII) # plot(Dchi,pch=19,xlab="observations",ylab="Delta chisquare") ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # Dchi <- Deltachi(model,data=PDII) # plot(Dchi,pch=19,xlab="observations",ylab="Delta chisquare",main="Deltachi function") ## the case that produces negative estimated variances # sem(model,data=PDII[-which(is.na(Dchi)),]) ## same results # Dchi <- fitinfluence("chisq",model,data=PDII)$Dind$chisq # plot(Dchi,pch=19,xlab="observations",ylab="Delta chisquare",main="fitinfluence function")## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # Dchi <- Deltachi(model,data=PDII) # plot(Dchi,pch=19,xlab="observations",ylab="Delta chisquare") ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # Dchi <- Deltachi(model,data=PDII) # plot(Dchi,pch=19,xlab="observations",ylab="Delta chisquare",main="Deltachi function") ## the case that produces negative estimated variances # sem(model,data=PDII[-which(is.na(Dchi)),]) ## same results # Dchi <- fitinfluence("chisq",model,data=PDII)$Dind$chisq # plot(Dchi,pch=19,xlab="observations",ylab="Delta chisquare",main="fitinfluence function")
It explores case influence. Cases with extreme values of the considered measure of influence are reported. Extreme values are determined using the boxplot criterion (Tukey, 1977) or user-defined cut-offs. Cases for which deletion leads to a model that does not converge or yelds a solution with negative estimated variances are also reported. In addition, explore.influence provides a graphical representation of case influence.
explore.influence(x, cut.offs = 'default', plot = 'TRUE', cook = 'FALSE', ...)explore.influence(x, cut.offs = 'default', plot = 'TRUE', cook = 'FALSE', ...)
x |
A vector containing the influence of each case as returned by
|
cut.offs |
A vector of two numeric elements containing the lower and the upper cut-offs to be considered. If |
plot |
If |
cook |
If |
... |
Additional parameters for |
A list with the following components:
n |
number of cases. |
cook |
logical, indicating if |
cut.low |
the lower cut-off. |
cut.upp |
the upper cut-off. |
not.allowed |
a vector containing cases with negative variance or not converging models. |
less.cut.low |
a vector containing cases with influence value less than the lower cut-off. |
greater.cut.low |
a vector containing cases with influence value greater than the upper cut-off. |
Gianmarco Altoe'
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
data("PDII") model <- " F1 =~ y1+y2+y3+y4 " fit0 <- sem(model, data=PDII,std.lv=TRUE) ## not run # gCD <- genCookDist(model,data=PDII,std.lv=TRUE) # explore.influence(gCD,cook=TRUE) ## ## not run: this example take several minutes model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # FI <- fitinfluence('rmsea',model,PDII) # explore.influence(FI)data("PDII") model <- " F1 =~ y1+y2+y3+y4 " fit0 <- sem(model, data=PDII,std.lv=TRUE) ## not run # gCD <- genCookDist(model,data=PDII,std.lv=TRUE) # explore.influence(gCD,cook=TRUE) ## ## not run: this example take several minutes model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # FI <- fitinfluence('rmsea',model,PDII) # explore.influence(FI)
This function evaluate the case's effect on a user-defined fit index.
This function depends on the lavaan package.
fitinfluence(index, model, data, ...)fitinfluence(index, model, data, ...)
index |
A model fit index. |
model |
A description of the user-specified model using the lavaan model syntax. See |
data |
A data frame containing the observed variables used in the model. If any variables are declared as ordered factors, this function will treat them as ordinal variables. |
... |
Additional parameters for |
For each case evaluate the influence on one or more fit indices: the difference between the chosen fit index calculated for the SEM target model and the same index computed for the SEM model excluding case .
Returns a list:
Dind |
a data.frame of case influence. |
Oind |
observed fit indices. |
If for observation model does not converge or yelds a solution with negative estimated variances, the associated value of influence is set to NA.
Massimiliano Pastore
Pek, J., MacCallum, R.C. (2011). Sensitivity Analysis in Structural Equation Models: Cases and Their Influence. Multivariate Behavioral Research, 46, 202-228.
## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # FI <- fitinfluence("cfi",model,data=PDII) # plot(FI$Dind,pch=19) ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # FI <- fitinfluence(c("tli","rmsea"),model,PDII) # explore.influence(FI$Dind$tli) # explore.influence(FI$Dind$rmsea)## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # FI <- fitinfluence("cfi",model,data=PDII) # plot(FI$Dind,pch=19) ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # FI <- fitinfluence(c("tli","rmsea"),model,PDII) # explore.influence(FI$Dind$tli) # explore.influence(FI$Dind$rmsea)
Case influence on a vector of parameters may be quantified by generalized Cook's Distance (; Cook 1977, 1986):
where and are vectors of parameter estimates obained from the original and delete samples, and is the estimated asymptotic covariance matrix of the parameter estimates obtained from reduced sample.
This function depends on the lavaan package.
genCookDist(model, data, ...)genCookDist(model, data, ...)
model |
A description of the user-specified model using the lavaan model syntax. See |
data |
A data frame containing the observed variables used in the model. If any variables are declared as ordered factors, this function will treat them as ordinal variables. |
... |
Additional parameters for |
Returns a vector of .
If for observation model does not converge or yelds a solution with negative estimated variances, the associated value of is set to NA.
Massimiliano Pastore
Cook, R.D. (1977). Detection of influential observations in linear regression. Technometrics, 19, 15-18.
Cook, R.D. (1986). Assessment of local influence. Journal of the Royal Statistical Society B, 48, 133-169.
Pek, J., MacCallum, R.C. (2011). Sensitivity Analysis in Structural Equation Models: Cases and Their Influence. Multivariate Behavioral Research, 46, 202-228.
## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # gCD <- genCookDist(model,data=PDII) # plot(gCD,pch=19,xlab="observations",ylab="Cook distance") ## not run: this example take several minutes ## an example in which the deletion of a case produces solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # gCD <- genCookDist(model,data=PDII) # plot(gCD,pch=19,xlab="observations",ylab="Cook distance")## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # gCD <- genCookDist(model,data=PDII) # plot(gCD,pch=19,xlab="observations",ylab="Cook distance") ## not run: this example take several minutes ## an example in which the deletion of a case produces solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # gCD <- genCookDist(model,data=PDII) # plot(gCD,pch=19,xlab="observations",ylab="Cook distance")
A general model-based measure of case influence on model fit is likelihood distance (Cook, 1977, 1986; Cook & Weisberg, 1982) defined as
where and are the vectors of estimated model parameters on the original and deleted samples, respectively, where . The subscript () indicates that the estimate was computed on the sample excluding case . and are the log-likelihoods based on the original and the deleted samples, respectively.
This function depends on the lavaan package.
Likedist(model, data, ...)Likedist(model, data, ...)
model |
A description of the user-specified model using the lavaan model syntax. See |
data |
A data frame containing the observed variables used in the model. If any variables are declared as ordered factors, this function will treat them as ordinal variables. |
... |
Additional parameters for |
The log-likelihoods and are computed by the function bollen.loglik using the formula 4B2 described by Bollen (1989, pag. 135).
The likelihood distance gives the amount by which the log-likelihood of the full data changes if one were to evaluate it at the reduced-data estimates. The important point is that is not the log-likelihood obtained by fitting the model to the reduced data set. It is obtained by evaluating the likelihood function based on the full data set (containing all observations) at the reduced-data estimates (Schabenberger, 2005).
Returns a vector of .
If for observation model does not converge or yelds a solution with negative estimated variances, the associated value of is set to NA.
Massimiliano Pastore, Gianmarco Altoe'
Bollen, K.A. (1989). Structural Equations with latent Variables. New York, NY: Wiley.
Cook, R.D. (1977). Detection of influential observations in linear regression. Technometrics, 19, 15-18.
Cook, R.D. (1986). Assessment of local influence. Journal of the Royal Statistical Society B, 48, 133-169.
Cook, R.D., Weisberg, S. (1986). Residuals and influence in regressions. New York, NY: Chapman & Hall.
Pek, J., MacCallum, R.C. (2011). Sensitivity Analysis in Structural Equation Models: Cases and Their Influence. Multivariate Behavioral Research, 46, 202-228.
Schabenberger, O. (2005). Mixed model influence diagnostics. In SUGI, 29, 189-29. SAS institute Inc, Cary, NC.
## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # LD <-Likedist(model,data=PDII) # plot(LD,pch=19,xlab="observations",ylab="Likelihood distances") ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # LD <-Likedist(model,data=PDII) # plot(LD,pch=19,xlab="observations",ylab="Likelihood distances")## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # LD <-Likedist(model,data=PDII) # plot(LD,pch=19,xlab="observations",ylab="Likelihood distances") ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # LD <-Likedist(model,data=PDII) # plot(LD,pch=19,xlab="observations",ylab="Likelihood distances")
Computes direction of change in parameter estimates with
where and are the parameter estimates obtained from original and deleted samples.
This function depends on the lavaan package.
parinfluence(parm, model, data, cook = FALSE, ...)parinfluence(parm, model, data, cook = FALSE, ...)
parm |
Single parameter or vector of parameters. |
model |
A description of the user-specified model using the lavaan model syntax. See |
data |
A data frame containing the observed variables used in the model. If any variables are declared as ordered factors, this function will treat them as ordinal variables. |
cook |
Logical, if |
... |
Additional parameters for |
Returns a list:
gCD |
Generalized Cook's Distance, if |
Dparm |
Direction of change in parameter estimates. |
If for observation model does not converge or yelds a solution with negative estimated variances or NA parameter values, the associated values of are set to NA.
Massimiliano Pastore
Pek, J., MacCallum, R.C. (2011). Sensitivity Analysis in Structural Equation Models: Cases and Their Influence. Multivariate Behavioral Research, 46, 202-228.
## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # PAR <- c("F1=~y2","F1=~y3","F1=~y4") # LY <- parinfluence(PAR,model,PDII) # str(LY) # explore.influence(LY$Dparm[,1]) ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # PAR <- c("F2=~y2","F2=~y3","F2=~y4") # LY <- parinfluence(PAR,model,PDII) ## not run: this example take several minutes ## dealing with ordinal data data(Q) model <- " F1 =~ it1+it2+it3+it4+it5+it6+it7+it8+it9+it10 " # fit0 <- sem(model, data=Q, ordered=colnames(Q)) # LY <- parinfluence("F1=~it4",model,Q,ordered=colnames(Q)) # explore.influence(LY$Dparm[,1])## not run: this example take several minutes data("PDII") model <- " F1 =~ y1+y2+y3+y4 " # fit0 <- sem(model, data=PDII) # PAR <- c("F1=~y2","F1=~y3","F1=~y4") # LY <- parinfluence(PAR,model,PDII) # str(LY) # explore.influence(LY$Dparm[,1]) ## not run: this example take several minutes ## an example in which the deletion of a case yelds a solution ## with negative estimated variances model <- " F1 =~ x1+x2+x3 F2 =~ y1+y2+y3+y4 F3 =~ y5+y6+y7+y8 " # fit0 <- sem(model, data=PDII) # PAR <- c("F2=~y2","F2=~y3","F2=~y4") # LY <- parinfluence(PAR,model,PDII) ## not run: this example take several minutes ## dealing with ordinal data data(Q) model <- " F1 =~ it1+it2+it3+it4+it5+it6+it7+it8+it9+it10 " # fit0 <- sem(model, data=Q, ordered=colnames(Q)) # LY <- parinfluence("F1=~it4",model,Q,ordered=colnames(Q)) # explore.influence(LY$Dparm[,1])
Simulated data set from covariance matrix reported in Bollen (1989).
data(PDII)data(PDII)
This data frame contains 75 obs. of 11 variables:
x1: num, gross national product per capita.
x2: num, consumption per capita.
x3: num, percentage of the labor force in industrial occupations.
y1: num, freedom of the press in 1960.
y2: num, freedom of group opposition in 1960.
y3: num, fairness of elections in 1960.
y4: num, elective nature and effectiveness of the legislative body in 1960.
y5: num, freedom of the press in 1965.
y6: num, freedom of group opposition in 1965.
y7: num, fairness of elections in 1965.
y8: num, elective nature and effectiveness of the legislative body in 1965.
Bollen, K.A. (1989). Structural Equations with latent Variables. New York, NY: Wiley.
data(PDII)data(PDII)
Simulated data set.
data(Q)data(Q)
This data frame contains 919 obs. of 10 ordinal discrete variables.
data(Q)data(Q)
It calculates the expected values and the residuals of a sem model.
sem.fitres(object) obs.fitres(object) lat.fitres(object)sem.fitres(object) obs.fitres(object) lat.fitres(object)
object |
An object of class |
The main function, sem.fitres(), calls one of the other two routines depending on the type of the model. If model does not contain latent variables, sem.fitres() calls the function obs.fitres(), otherwise calls the function lat.fitres().
The functions obs.fitres() and lat.fitres() are internal functions, do not use it directly.
Returns a data frame containing:
1) The observed model variables; 2) The expected values on dependent variables (indicated with hat.); 3) The residuals on dependent variables (indicated with e.)
In order to compute more interpretable fitted values and residuals, model is forced to have meanstrucure = TRUE and std.lv = TRUE.
Massimiliano Pastore
data("PDII") model <- " F1 =~ y1+y2+y3+y4 " fit0 <- sem(model, data=PDII) out <- sem.fitres(fit0) head(out) par(mfrow=c(2,2)) plot(e.y1~hat.y1,data=out) plot(e.y2~hat.y2,data=out) plot(e.y3~hat.y3,data=out) plot(e.y4~hat.y4,data=out) qqnorm(out$e.y1); qqline(out$e.y1) qqnorm(out$e.y2); qqline(out$e.y2) qqnorm(out$e.y3); qqline(out$e.y3) qqnorm(out$e.y4); qqline(out$e.y4)data("PDII") model <- " F1 =~ y1+y2+y3+y4 " fit0 <- sem(model, data=PDII) out <- sem.fitres(fit0) head(out) par(mfrow=c(2,2)) plot(e.y1~hat.y1,data=out) plot(e.y2~hat.y2,data=out) plot(e.y3~hat.y3,data=out) plot(e.y4~hat.y4,data=out) qqnorm(out$e.y1); qqline(out$e.y1) qqnorm(out$e.y2); qqline(out$e.y2) qqnorm(out$e.y3); qqline(out$e.y3) qqnorm(out$e.y4); qqline(out$e.y4)