EMCOV REFERENCE MANUAL
September 14, 1993
(updated August 11, 1995)

OVERVIEW

     EMCOVxx.EXE (v2.2, EMCOV22.EXE; v2.3 EMCOV23.EXE)
was compiled using Microsoft FORTRAN Powerstation (version 1.0)
for use in a 32-bit MS-DOS-extended environment. Additionally,
dynamic array allocation was utilized in the programming of
EMCOVxx.EXE, making it possible to accommodate any size problem
within the limit of available RAM memory.

****************************************************

     EMCOV (both v2.2 and v2.3) is a beta version program. We
have used it extensively in our work and tested it extensively against
other available programs (i.e., BMDP AM).  While we have confidence
in this program, we cannot be responsible for its use.  If you do
suspect any problems or have ideas for improvement, please contact
us.  We welcome any comments regarding the use of this program. 

     Direct these comments and inquiries to:

     John Graham
     Department of Biobehavioral Health
     E-210 Health & Human Development Bldg.
     Penn State University
     University Park, PA  16802-6508

     Phone: (814) 863-0200
     email: jwg4@psuvm.psu.edu
     fax:   (814) 863-7525

****************************************************

PROGRAM USE

     The program (v2.x) is invoked from the MS-DOS command line
by typing "EMCOV22" or "EMCOV23".  The accompanying file,
DOSXMSF.EXE, must be either in the same directory as EMCOV or
resident in an accessible directory (i.e., named in the path statement
in the AUTOEXEC.BAT file).

     Upon execution, the program will display the title lines and then
prompt for input of the necessary parameters. Press the carriage
return (<CR>) to accept the defaults specified in the parantheses.
The following is a list of the program command lines with some
explanation where necessary:

      COVARIANCE ESTIMATION WITH MISSING VALUES
         An application of the EM algorithm 
                   EMCOV.EXE v2.3
                Graham & Hofer, 1993 

(1) Specify input data file:

     The file name for the input raw data file can include the path. 

     Note:  The raw data file must have at least one SPACE between
          each data element.

(2) Sample size (n):

(3) Number of variables (k):

(4) Input maximum # of missing patterns (20):

     The exact number of missingness patterns does not need to be
     known.  However an upper limit for this matrix needs to be set. 
     If the program fails because one of the matrix limits was
     exceeded, choose a larger number here. 

(5) Input missing data indicator (9):

     Type the numeric placeholder indicating a variable's
     missingness.

     In v2.2, this missing value indicator may take on NON-ZERO
     integer values from -9 to 99.  In v2.3, the missing value
     indicator may take on NON-ZERO integer values from -999 to
     9999.

Note of Caution:  Take care to select a missing value indicator that is
     clearly out of range of real (non-missing) values in your dataset. 
     In particular, a value should NOT be used as the missing value
     indicator if a truncated non-missing value would equal the
     missing value indicator.

          For example, if you have real (non-missing) values in your
     dataset in the range 9.000001 to 9.999999, you should NOT
     use 9 as your missing value indicator, even if you have no real
     values in your dataset exactly equal to 9. 

(6) Select start value  (1=LW, 2=PW, 3=CM):

     LW:  Listwise
     PW:  Pairwise
     CM:  Covariance Matrix 

(6a) Specify start value (CM) file (START.COV):

     If CM was specified in above input statement (6) then the file
     containing the covariances must be indicated here.

(7) Maximum number of iterations (200):

(8) Convergence criterion (Default=.00001):

     Convergence is reached when the difference in the accumulated
     covariances from the present and previous iteration fall below a
     specified criterion level.  Currently, only the elements in the
     lower triangle, including variances, are used for this calculation.
     The default convergence criterion is .00001.

     In v2.2, convergence is reached when the SUM of changes in
     all k*k(+1)/2 elements of the covariance matrix is less than the
     critical value (v2.2 default = .000001).

     In v2.3, convergence is reached when the LARGEST of changes
     in all elements is less than the critical value.  The default critical
     value in v2.3 is .00001.

(9) Additional output? (1=Yes,<CR>=No):
     
     Several additional files can be output besides the ML
     covariances.  The files all have the prefix TEMP and the
     following suffixes indicating the contents. 

     .COV     maximum likelihood covariances (default)      
     .PTN     pattern matrix showing type and number of
              missing data patterns
     .DAT     full data matrix with missing values imputed by
              single regression imputation using the b-weights
              obtained from EM
     .RES     data matrix of residuals--can be used in
              subsequent multiple imputation procedures

Note of Caution:  One should NOT make use of the file TEMP.DAT by
     itself.  Because the missing values have been imputed without
     error (i.e., the imputed value lies exactly on the regression line),
     variables with imputed values have too little (error) variance. 
     Adding a randomly selected (non-zero) residual term from
     TEMP.RES restores this variability.  Please see the Reference
     manual for COVIMP.EXE and ADDRES.EXE for further details.

(10) Specify output format (6(F12.6,1X)):

     The format can be altered to suit the user's need for precision. 
     Enclose the Fortran format statement in single parentheses,
     e.g.: 

          (8F10.6)

     In v2.3 the default output format is (6F12.7).

                               APPENDIX A

Raw Data and Covariance Matrices: Input and Output 

     This appendix contains sample code for several of the major
statistical packages (SAS, SPSSx, BMDP, Systat, and Minitab), for
output of raw data and input of covariance/correlation matrices output
by EMCOV.EXE. 

     COVTOCOR. A utility program is included that will allow for
easier data transition between the EMCOV.EXE program and other
statistical packages. The program, COVTOCOR.EXE will read in a
covariance or correlation matrix, with or without a vector of means, in
lower triangle or full matrix form, and provide output in the following
forms:

1)   Covariance or correlation matrix
2)   Full or lower triangle matrix
3)   With or without additional vector of means and sample sizes
4)   Raw data, SAS, or SPSSx covariance (or correlation) data set
     format

     SAS or SPSS Format. With the additional input of a file
containing the variable names, each enclosed in single quotation
marks, the COVTOCOR program can read in the covariance matrix
from EMCOV.EXE and provide output in SAS or SPSS data set form.
This option will allow the covariance or correlation matrix to be read
directly into such SAS or SPSS procedures as FACTOR or
REGRESSION.

Raw Data Output

     The raw data to be read into EMCOV.EXE must be in free
format (spaces only between data field). The data can, of course, be
wrapped onto multiple lines.  The missing data points must be coded
numerically, with the default being "9". 


SAS

     OPTIONS MISSING='9';
     DATA second; SET first;
       FILE 'rawout.dat';
       PUT(x1-x10)(5*9.6,/,5*9.6);
     RUN;

SPSS

     WRITE OUTFILE=rawout.dat
       / varlist (format)
     EXECUTE

     See chapter 10 of the SPSSx (3rd edition) manual for further
definitions and options.
    
BMDP

     Within a BMDP procedure, the /SAVE command can be used to
save data to an external file. The following example shows how a raw
data file is input using free format( with missing values specified by
".") and then output to a different external file using different format. 

     /INPUT    FILE = 'rawin.dat'.
               VARIABLES = 5.
               FORMAT=FREE.
               MCHAR = '.'.

     /VARIABLE NAMES = var1, var2, var3, var4, var5.

     /SAVE     FILE = 'rawout.dat'.
               KEEP = var1 TO var5.
               FORMAT = '(5F6.2)'.

SYSTAT

     The following will write raw data to a file called
'rawout.dat' from a SYSTAT system file called 'mydata.sys':

     USE mydata
     PUT rawout
     RUN

MINITAB

     write 'rawout.dat' c1-c10;
     format (10f9.6).    
     stop

Augmented Covariance Matrix Input

     The covariances can be read out of EMCOV.EXE using the
default format (8F9.6) or one that is specified by the user.  The
output matrix is a K+1 by K matrix, the last record containing the
maximum likelihood means.

SAS

     The matrix of covariances and means must first be augmented
by variable names and _type_ names. Each line of the covariance data
set to be input should be set up as follows, with variable name first,
followed by the _type_, and then the covariances:

     varname cov  xxx xxx xxx etc.
     varname cov  xxx xxx xxx etc.
     .
     .
     mean mean XXX XXX XXX etc.
     N      N      XXX XXX XXX etc.

     The SAS data step for reading in the covariance data set is: 

     DATA first(TYPE=COV);
       _TYPE_='cov';
       INFILE='emout.dat';
       INPUT  _name_ $ _type_ $  var1  var2  var3 ... 
     RUN;
     PROC FACTOR DATA=first;

SPSS

     The input raw data matrix consists of lower triangle covariances
(including the diagonal), means, and ROWTYPE_ designations and is
structured as follows:

     N         xxx  xxx  xxx  etc.
     MEAN      xxx  xxx  xxx  etc.
     COV       xxx  .
     COV       xxx  xxx
     .
     .
     COV       xxx  xxx  xxx  etc..     

     Using the following SPSSx commands, the raw data matrix,
residing in file "fileone", is converted into an active system file: 

     MATRIX DATA FILE=fileone
       /VARIABLES=ROWTYPE_ varlist
       /FORMAT=FREE
     The SPSSx active system file can then be read into such
procedures as FACTOR or REGRESSION. Because we only use one
vector for the sample sizes (default is a matrix of Ns), we must
specify MISSING=LISTWISE in the REGRESSION or other SPSSx
procedures. 

     FACTOR MATRIX IN(*)
     REGRESSION MATRIX IN(*)
       /MISSING=LISTWISE

     To convert the system file containing the covariance matrix to a
file containing the correlation matrix plus a vector of standard
deviations use the following command:

     MCONVERT IN(*) OUT(CORMTX)

     For further information on the allowable specifications and other
options for the input of correlation or covariance
matrices, consult chapter 13 in the SPSSx (3rd edition) manual. 

BMDP

     A covariance or correlation can be input into BMDP using either
"lower" triangle or "square" form. For further options, see the chapter
on 4M Factor Analysis in Vol. 1 of the BMDP Statistical Software
Manual (1988). The following example reads in a full correlation
matrix for use in a factor analysis:

     /INPUT    FILE = 'emout.cor'.
               VARIABLES = 5.
               FORMAT=FREE.
               TYPE = CORR.
               SHAPE = SQUARE.
     /VARIABLE NAMES = var1, var2, var3, var4, var5.
     /FACTOR

SYSTAT

     The following will read in a covariance matrix from file
'emout.cov' and write this out to a SYSTAT system file called
'newcov.sys': 

     GET emout.cov
     INPUT var(1-5)
     TYPE=COVARIANCE          (or CORRELATION, if desired)             
SAVE newcov
     RUN

MINITAB

     The following can be used to read in the covariances only: 

     read 'emout.cov' into c1-c10;
     format(10f8.3).
     copy c1-c10 into m1
     note m1 is the covariance matrix
