Multivariate statistical modelling and
analysis using Exploratory and
Confirmatory factor analysis and Structural
Equation Modelling: By Sourabh Kishore

This is a mobile friendly page: please click
here for accessing the full article page

Please contact us at consulting@etcoindia.co.in
or consulting@etcoindia.net.in to
discuss your topic or to get ideas about new
topics pertaining to your subject area.

A research problem may be univariate,
bivariate, or multivariate. A univariate
problem is concerned with only one research
variable and a bivariate problem is concerned
with linearity of relationship between two
research variables. Normally, univariate
problems comprise of study of multiple and
independent research variables without
bothering about their quantitative mutual
relationships. For example, a single research
may incorporate study of attitude,
organizational commitment, and employee
performance separately in a fast food chain
without bothering about their quantitative
relationships.

Some researchers may design triangulation
studies by collecting numerical data about the
three variables but establishing their
interrelationships qualitatively.
On the other hand, bivariate research
problems incorporate study of relationships
between two variables by establishing a null
and an alternate hypothesis. Most bivariate
research problems are concerned with mutual
relationships between two variables
investigated through multiple independent
hypotheses. However, the hypotheses may
not be interrelated in the form of a structure or
theoretical framework. The hypotheses may
be tested using bivariate techniques, like
correlation analysis, regression analysis,
analysis of variance, students’ t-test,
Chi-square test, or simply the p-value testing.
The outcomes may be definitive causal
relationships (influence of an independent
variable on a dependent variable) or simply a
reflection of how a parameter varies with
respect to another within a controlled research
setting. Normally, establishing a relationship
between two variables does not guarantee
that a causal relationship is found.
Cause-effect relationships can be established
by taking support from established theories or
by investigating more variables in action
influencing the two variables. This is where
multivariate problems come in the picture.

Multivariate problems are different and
complex, requiring sophisticated techniques
for investigating relationships among multiple
variables. Most of the multivariate problems
require investigation of complex structures
than mere relationships. Hence, applying
statistics in multivariate problems is not only
about statistical calculations albeit involves
complex statistical modeling. A model may be
in the form of a theoretical framework or an
initial measurement model. Before the
multivariate techniques are discussed, it is
important to differentiate between a
theoretical framework and an initial
measurement model.

A theoretical framework is formed by
conducting intensive literature review and
creating a structure having relationships
grounded on theories. On the other hand, an
initial measurement model can be established
using the principal component analysis
technique employing orthogonal factor
rotation.

Technically, the models created following
both the approaches are considered as an
initial model and is taken through the same
reliability, validity, and model fitment tests.
However, the research studies involving
theory-based formation of the initial model
(commonly referred to as the theoretical
framework) are confirmatory or extended
studies whereas the research studies
involving principal factor analysis technique
are exploratory studies. In practice, a
theory-based modeling approach should be
chosen if the model can be grounded on an
extensive and deep theoretical foundation,
whereas the principal component analysis
technique should be chosen if the model is not
sufficiently supported by theories.

Multivariate problems have two flavours –
relationships among multiple observable
(measurable) variables or relationships
between single or multiple groups of
observable variables and a group latent
(unobservable, or immeasurable) variables.
The latter is used in highly complex research
studies.

The sequence of techniques used in
multivariate statistical modeling are –
exploratory factor analysis, confirmatory
factor analysis, and structured equation
modeling. The exploratory factor analysis
technique may be skipped if theory-based
initial modeling has been preferred. In the
exploratory factor analysis, the number of
latent (unobserved) variables influenced by a
set of observed variables is explored by
obtaining an orthogonal factor rotated
solution using VARIMAX, QURTIMAX,
EQUAMAX, PROMAX, and DIRECT
OBLIMIN rotationmethods. The most used
orthogonal factor rotation method is
VARIMAX. The number of latent variables is
determined by the number of rotated
variables having an Eigen-value above unity.
The researcher may predetermine the number
of latent variables or simply proceed to
investigate the variables having Eigen-values
more than unity. It is imperative to keep the
number of latent variables lesser than the
number of variables having Eigen-values
more than unity. This analysis is done on a
Scree plot.

The rotated factor table obtained after rotation
is of prime importance. It gives the level of
loading by each observed variable on each
latent variable.

Normally, variables with significant loadings
are selected and the rest rejected. The
significance of loadings is determined by the
loading value (should be normally at 5.0 or
greater) or the importance of the observed
variable in the reliability test. The researcher
may like to name each latent variable by
analyzing the group of observed variables
loading them, or by taking help of literatures.
Each group forms a scale representing the
corresponding latent variables. The researcher
may like to test the reliability of each scale
using Cronbach Alpha, Split Half, Guttman,
Parallel, or Strict Parallel techniques. In
Cronbach Alpha test, an alpha value of 7 or
greater is considered as a good reliability
indicator for a scale if the research involves
responses from human subjects (example,
phenomenology and grounded theory
studies). However, researchers prefer to
choose a higher alpha value in scientific and
technology-based research studies in which,
the primary data is collected from
experiments or simulations. It is normally
observed that an observed variable having a
high loading on the latent variable is a good
contributor to the Cronbach Alpha value.
However, sometimes an observed variable
with low levels of loading (below 5.0) may
appear to be a better contributor to the
Cronbach Alpha value. The contribution of
observed variables to the Cronbach
Alpha value of the scale can be determined
from a table called "scale if item deleted". In
some research studies, the researcher may
decide to conclude the research if very high
reliability values of the scales are achieved.
However, it is not guaranteed that these scales
comprising groups of highest loading
observed variables are the causal factors
influencing the latent variables. It is
recommended that a few validity tests are
also conducted. This is where the
confirmatory factor analysis technique is
useful.

The confirmatory factor analysis technique
helps in running validity tests on the model
determined either through theory-based
approach or through exploratory factor
analysis technique. It involves computation of
Average Variance Extracted (AVE), Cronbach
Alpha, Degrees of Freedom, Root Mean
Square Error of Approximation (RMSEA),
Root Mean Square Residual (RMR), and
Standardized Root Mean Square Residual
(SRMR) values. There are thresholds
recommended by various research scholars
based on the research area, and sample size
for determining validity of the model. One
should be careful about deciding the
thresholds before validating the model. If the
objective is to simply validate the initial
model, the researcher may conclude the
research at this stage. However, there can be
situations when the initial model returns
unreliable scales and invalid relationships.
This is unlikely if the
initial model has been constructed with
utmost care. But the researcher should be
ready to face surprises and should not panic
because the Structural Equation Modeling
technique will come for rescuing the research
from a probable failure.

Structural Equation Modeling helps in finding
an alternate model having acceptable
reliability and validity scores if the initial
model has failed due to some unavoidable
and irreparable issues. The technique allows
the researcher to test multiple models by
varying the relationships among variables and
finally choose the best fit model. The test
statistics that help in choosing the best fit
model are goodness of fitment, adjusted
goodness of fitment, normed fitment index,
non-normed fitment index, comparative
fitment index, parsimony fitment index, and
incremental fitment index. It should be noted
that all of
these are not suitable for every research. The
researcher should choose the most appropriate
ones depending upon the area of research and
the sample size.
It is recommended to study a number of
literatures for choosing the most appropriate
fitment indices in structural equation
modeling.

The recommended tool for applying
exploratory factor analysis technique is SPSS,
and the tool recommended for confirmatory
factor analysis and structural equation
modeling is LISREL. If you need any help in
designing a research, collecting data, applying
techniques for data analysis, and deriving
meaningful conclusions and
recommendations in a multivariate research
involving exploratory factor analysis,
confirmatory factor analysis, and structural
equation modeling, you may please contact us
at consulting@etcoindia.co.in and
consulting@etcoindia.net.in. We recommend
using Survey Monkey for collecting data and
latest academic versions of SPSS and LISREL
for applying thes techniques. The academic
version of LISREL cannot be used if the
number of variables is greater than 15.
However, in most cases the number of
variables can be reduced to 15 or lesser if
Principal Component Analysis technique has
been used and reliable scales constructed by
testing their Cronbach Alpha values. This is
another advantage of starting the research
with exploratory factor analysis rather than
theory-based structural framework. In some
research studies, it may not be possible to
keep the number of variables below 15. In
such cases, it is recommended that a
professional copy of LISREL is purchased.
Ideally, the number of variables should be
kept as low as possible especially if the sample
size is smaller (say, less than 100). Higher the
number of variables, greater is the difficulty in
determining the best fit model employing
Structural Equation Modeling. It is observed
that most of the modern causal research
problems require application of multivariate
techniques and hence, it is recommended to
master SPSS and LISREL in this context. We
can support multivariate research studies in
all the research areas mentioned on the page
detailing our SUBJECT AREAS OF
SPECIALIZATION. The choice of factors and
latent variables may be chosen as
per a problem description. Typically, latent
variables are the ones that cannot be
measured directly. Examples are: human
attitude, human feelings, commitment to the
organisation, willingness to work in a
particular field, and behavioural aspects in
groups or teams. However, the variables
lacking data availability because of lack of
systems and processes can also be chosen as
latent variables. The factors influencing the
chosen latent variables under study may be
chosen from past research studies, journal
articles, professional studies, industrial
reports, press releases, and expert advises.
The structure of the theoretical framework
may be designed by applying the exploratory
factor analysis technique, or by designing
based on literature reviews providing
adequate information on structural models
involving the factors (observed variables) and
the latent variables under study.

Some of the examples of multivariate
problems are the following:
(a) Influence of organisational citizenship
behaviour, organisational commitment,
behavioural aspects with peers and superiors,
and willingness to participate on effectiveness
of information security governance in an
organisation
(b) Influence of organisational citizenship
behaviour, organisational commitment,
behavioural aspects with peers and superiors,
and willingness to participate on project
performance
(c) Influence of multiple personality types on
effectivness of crisis management
decision-making and change management

In the above examples, the influencing
variables are unobservable and hence need to
be considered as latent variables. In order to
measure them, the factors affecting them need
to be taken from literatures. The models will
comprise of a relationship of the form:

Factor groups ---> Latent variables ---> Output
variables

The factor groups representing each latent
variable are the scales with high reliability
(Cronbach Alpha value of 6 or more). The
scales can obtained from exploratory factor
analysis or literature-supported groups. The
rest of the analysis can be completed through
confirmatory factor analysis and structural
equation modelling.

Copyright 2020 - 2026 ETCO INDIA. All Rights Reserved
Please contact us at
consulting@etcoindia.co.in or
consulting@etcoindia.net.in to
discuss your topic or to get
ideas about new topics
pertaining to your subject
area.