EMCOV REFERENCE MANUAL
September 14, 1993
(updated August 11, 1995)
OVERVIEW
EMCOVxx.EXE (v2.2, EMCOV22.EXE; v2.3 EMCOV23.EXE)
was compiled using Microsoft FORTRAN Powerstation (version 1.0)
for use in a 32-bit MS-DOS-extended environment. Additionally,
dynamic array allocation was utilized in the programming of
EMCOVxx.EXE, making it possible to accommodate any size problem
within the limit of available RAM memory.
****************************************************
EMCOV (both v2.2 and v2.3) is a beta version program. We
have used it extensively in our work and tested it extensively against
other available programs (i.e., BMDP AM). While we have confidence
in this program, we cannot be responsible for its use. If you do
suspect any problems or have ideas for improvement, please contact
us. We welcome any comments regarding the use of this program.
Direct these comments and inquiries to:
John Graham
Department of Biobehavioral Health
E-210 Health & Human Development Bldg.
Penn State University
University Park, PA 16802-6508
Phone: (814) 863-0200
email: jwg4@psuvm.psu.edu
fax: (814) 863-7525
****************************************************
PROGRAM USE
The program (v2.x) is invoked from the MS-DOS command line
by typing "EMCOV22" or "EMCOV23". The accompanying file,
DOSXMSF.EXE, must be either in the same directory as EMCOV or
resident in an accessible directory (i.e., named in the path statement
in the AUTOEXEC.BAT file).
Upon execution, the program will display the title lines and then
prompt for input of the necessary parameters. Press the carriage
return () to accept the defaults specified in the parantheses.
The following is a list of the program command lines with some
explanation where necessary:
COVARIANCE ESTIMATION WITH MISSING VALUES
An application of the EM algorithm
EMCOV.EXE v2.3
Graham & Hofer, 1993
(1) Specify input data file:
The file name for the input raw data file can include the path.
Note: The raw data file must have at least one SPACE between
each data element.
(2) Sample size (n):
(3) Number of variables (k):
(4) Input maximum # of missing patterns (20):
The exact number of missingness patterns does not need to be
known. However an upper limit for this matrix needs to be set.
If the program fails because one of the matrix limits was
exceeded, choose a larger number here.
(5) Input missing data indicator (9):
Type the numeric placeholder indicating a variable's
missingness.
In v2.2, this missing value indicator may take on NON-ZERO
integer values from -9 to 99. In v2.3, the missing value
indicator may take on NON-ZERO integer values from -999 to
9999.
Note of Caution: Take care to select a missing value indicator that is
clearly out of range of real (non-missing) values in your dataset.
In particular, a value should NOT be used as the missing value
indicator if a truncated non-missing value would equal the
missing value indicator.
For example, if you have real (non-missing) values in your
dataset in the range 9.000001 to 9.999999, you should NOT
use 9 as your missing value indicator, even if you have no real
values in your dataset exactly equal to 9.
(6) Select start value (1=LW, 2=PW, 3=CM):
LW: Listwise
PW: Pairwise
CM: Covariance Matrix
(6a) Specify start value (CM) file (START.COV):
If CM was specified in above input statement (6) then the file
containing the covariances must be indicated here.
(7) Maximum number of iterations (200):
(8) Convergence criterion (Default=.00001):
Convergence is reached when the difference in the accumulated
covariances from the present and previous iteration fall below a
specified criterion level. Currently, only the elements in the
lower triangle, including variances, are used for this calculation.
The default convergence criterion is .00001.
In v2.2, convergence is reached when the SUM of changes in
all k*k(+1)/2 elements of the covariance matrix is less than the
critical value (v2.2 default = .000001).
In v2.3, convergence is reached when the LARGEST of changes
in all elements is less than the critical value. The default critical
value in v2.3 is .00001.
(9) Additional output? (1=Yes,=No):
Several additional files can be output besides the ML
covariances. The files all have the prefix TEMP and the
following suffixes indicating the contents.
.COV maximum likelihood covariances (default)
.PTN pattern matrix showing type and number of
missing data patterns
.DAT full data matrix with missing values imputed by
single regression imputation using the b-weights
obtained from EM
.RES data matrix of residuals--can be used in
subsequent multiple imputation procedures
Note of Caution: One should NOT make use of the file TEMP.DAT by
itself. Because the missing values have been imputed without
error (i.e., the imputed value lies exactly on the regression line),
variables with imputed values have too little (error) variance.
Adding a randomly selected (non-zero) residual term from
TEMP.RES restores this variability. Please see the Reference
manual for COVIMP.EXE and ADDRES.EXE for further details.
(10) Specify output format (6(F12.6,1X)):
The format can be altered to suit the user's need for precision.
Enclose the Fortran format statement in single parentheses,
e.g.:
(8F10.6)
In v2.3 the default output format is (6F12.7).
APPENDIX A
Raw Data and Covariance Matrices: Input and Output
This appendix contains sample code for several of the major
statistical packages (SAS, SPSSx, BMDP, Systat, and Minitab), for
output of raw data and input of covariance/correlation matrices output
by EMCOV.EXE.
COVTOCOR. A utility program is included that will allow for
easier data transition between the EMCOV.EXE program and other
statistical packages. The program, COVTOCOR.EXE will read in a
covariance or correlation matrix, with or without a vector of means, in
lower triangle or full matrix form, and provide output in the following
forms:
1) Covariance or correlation matrix
2) Full or lower triangle matrix
3) With or without additional vector of means and sample sizes
4) Raw data, SAS, or SPSSx covariance (or correlation) data set
format
SAS or SPSS Format. With the additional input of a file
containing the variable names, each enclosed in single quotation
marks, the COVTOCOR program can read in the covariance matrix
from EMCOV.EXE and provide output in SAS or SPSS data set form.
This option will allow the covariance or correlation matrix to be read
directly into such SAS or SPSS procedures as FACTOR or
REGRESSION.
Raw Data Output
The raw data to be read into EMCOV.EXE must be in free
format (spaces only between data field). The data can, of course, be
wrapped onto multiple lines. The missing data points must be coded
numerically, with the default being "9".
SAS
OPTIONS MISSING='9';
DATA second; SET first;
FILE 'rawout.dat';
PUT(x1-x10)(5*9.6,/,5*9.6);
RUN;
SPSS
WRITE OUTFILE=rawout.dat
/ varlist (format)
EXECUTE
See chapter 10 of the SPSSx (3rd edition) manual for further
definitions and options.
BMDP
Within a BMDP procedure, the /SAVE command can be used to
save data to an external file. The following example shows how a raw
data file is input using free format( with missing values specified by
".") and then output to a different external file using different format.
/INPUT FILE = 'rawin.dat'.
VARIABLES = 5.
FORMAT=FREE.
MCHAR = '.'.
/VARIABLE NAMES = var1, var2, var3, var4, var5.
/SAVE FILE = 'rawout.dat'.
KEEP = var1 TO var5.
FORMAT = '(5F6.2)'.
SYSTAT
The following will write raw data to a file called
'rawout.dat' from a SYSTAT system file called 'mydata.sys':
USE mydata
PUT rawout
RUN
MINITAB
write 'rawout.dat' c1-c10;
format (10f9.6).
stop
Augmented Covariance Matrix Input
The covariances can be read out of EMCOV.EXE using the
default format (8F9.6) or one that is specified by the user. The
output matrix is a K+1 by K matrix, the last record containing the
maximum likelihood means.
SAS
The matrix of covariances and means must first be augmented
by variable names and _type_ names. Each line of the covariance data
set to be input should be set up as follows, with variable name first,
followed by the _type_, and then the covariances:
varname cov xxx xxx xxx etc.
varname cov xxx xxx xxx etc.
.
.
mean mean XXX XXX XXX etc.
N N XXX XXX XXX etc.
The SAS data step for reading in the covariance data set is:
DATA first(TYPE=COV);
_TYPE_='cov';
INFILE='emout.dat';
INPUT _name_ $ _type_ $ var1 var2 var3 ...
RUN;
PROC FACTOR DATA=first;
SPSS
The input raw data matrix consists of lower triangle covariances
(including the diagonal), means, and ROWTYPE_ designations and is
structured as follows:
N xxx xxx xxx etc.
MEAN xxx xxx xxx etc.
COV xxx .
COV xxx xxx
.
.
COV xxx xxx xxx etc..
Using the following SPSSx commands, the raw data matrix,
residing in file "fileone", is converted into an active system file:
MATRIX DATA FILE=fileone
/VARIABLES=ROWTYPE_ varlist
/FORMAT=FREE
The SPSSx active system file can then be read into such
procedures as FACTOR or REGRESSION. Because we only use one
vector for the sample sizes (default is a matrix of Ns), we must
specify MISSING=LISTWISE in the REGRESSION or other SPSSx
procedures.
FACTOR MATRIX IN(*)
REGRESSION MATRIX IN(*)
/MISSING=LISTWISE
To convert the system file containing the covariance matrix to a
file containing the correlation matrix plus a vector of standard
deviations use the following command:
MCONVERT IN(*) OUT(CORMTX)
For further information on the allowable specifications and other
options for the input of correlation or covariance
matrices, consult chapter 13 in the SPSSx (3rd edition) manual.
BMDP
A covariance or correlation can be input into BMDP using either
"lower" triangle or "square" form. For further options, see the chapter
on 4M Factor Analysis in Vol. 1 of the BMDP Statistical Software
Manual (1988). The following example reads in a full correlation
matrix for use in a factor analysis:
/INPUT FILE = 'emout.cor'.
VARIABLES = 5.
FORMAT=FREE.
TYPE = CORR.
SHAPE = SQUARE.
/VARIABLE NAMES = var1, var2, var3, var4, var5.
/FACTOR
SYSTAT
The following will read in a covariance matrix from file
'emout.cov' and write this out to a SYSTAT system file called
'newcov.sys':
GET emout.cov
INPUT var(1-5)
TYPE=COVARIANCE (or CORRELATION, if desired)
SAVE newcov
RUN
MINITAB
The following can be used to read in the covariances only:
read 'emout.cov' into c1-c10;
format(10f8.3).
copy c1-c10 into m1
note m1 is the covariance matrix