EMCOV REFERENCE MANUAL September 14, 1993 (updated August 11, 1995) OVERVIEW EMCOVxx.EXE (v2.2, EMCOV22.EXE; v2.3 EMCOV23.EXE) was compiled using Microsoft FORTRAN Powerstation (version 1.0) for use in a 32-bit MS-DOS-extended environment. Additionally, dynamic array allocation was utilized in the programming of EMCOVxx.EXE, making it possible to accommodate any size problem within the limit of available RAM memory. **************************************************** EMCOV (both v2.2 and v2.3) is a beta version program. We have used it extensively in our work and tested it extensively against other available programs (i.e., BMDP AM). While we have confidence in this program, we cannot be responsible for its use. If you do suspect any problems or have ideas for improvement, please contact us. We welcome any comments regarding the use of this program. Direct these comments and inquiries to: John Graham Department of Biobehavioral Health E-210 Health & Human Development Bldg. Penn State University University Park, PA 16802-6508 Phone: (814) 863-0200 email: jwg4@psuvm.psu.edu fax: (814) 863-7525 **************************************************** PROGRAM USE The program (v2.x) is invoked from the MS-DOS command line by typing "EMCOV22" or "EMCOV23". The accompanying file, DOSXMSF.EXE, must be either in the same directory as EMCOV or resident in an accessible directory (i.e., named in the path statement in the AUTOEXEC.BAT file). Upon execution, the program will display the title lines and then prompt for input of the necessary parameters. Press the carriage return () to accept the defaults specified in the parantheses. The following is a list of the program command lines with some explanation where necessary: COVARIANCE ESTIMATION WITH MISSING VALUES An application of the EM algorithm EMCOV.EXE v2.3 Graham & Hofer, 1993 (1) Specify input data file: The file name for the input raw data file can include the path. Note: The raw data file must have at least one SPACE between each data element. (2) Sample size (n): (3) Number of variables (k): (4) Input maximum # of missing patterns (20): The exact number of missingness patterns does not need to be known. However an upper limit for this matrix needs to be set. If the program fails because one of the matrix limits was exceeded, choose a larger number here. (5) Input missing data indicator (9): Type the numeric placeholder indicating a variable's missingness. In v2.2, this missing value indicator may take on NON-ZERO integer values from -9 to 99. In v2.3, the missing value indicator may take on NON-ZERO integer values from -999 to 9999. Note of Caution: Take care to select a missing value indicator that is clearly out of range of real (non-missing) values in your dataset. In particular, a value should NOT be used as the missing value indicator if a truncated non-missing value would equal the missing value indicator. For example, if you have real (non-missing) values in your dataset in the range 9.000001 to 9.999999, you should NOT use 9 as your missing value indicator, even if you have no real values in your dataset exactly equal to 9. (6) Select start value (1=LW, 2=PW, 3=CM): LW: Listwise PW: Pairwise CM: Covariance Matrix (6a) Specify start value (CM) file (START.COV): If CM was specified in above input statement (6) then the file containing the covariances must be indicated here. (7) Maximum number of iterations (200): (8) Convergence criterion (Default=.00001): Convergence is reached when the difference in the accumulated covariances from the present and previous iteration fall below a specified criterion level. Currently, only the elements in the lower triangle, including variances, are used for this calculation. The default convergence criterion is .00001. In v2.2, convergence is reached when the SUM of changes in all k*k(+1)/2 elements of the covariance matrix is less than the critical value (v2.2 default = .000001). In v2.3, convergence is reached when the LARGEST of changes in all elements is less than the critical value. The default critical value in v2.3 is .00001. (9) Additional output? (1=Yes,=No): Several additional files can be output besides the ML covariances. The files all have the prefix TEMP and the following suffixes indicating the contents. .COV maximum likelihood covariances (default) .PTN pattern matrix showing type and number of missing data patterns .DAT full data matrix with missing values imputed by single regression imputation using the b-weights obtained from EM .RES data matrix of residuals--can be used in subsequent multiple imputation procedures Note of Caution: One should NOT make use of the file TEMP.DAT by itself. Because the missing values have been imputed without error (i.e., the imputed value lies exactly on the regression line), variables with imputed values have too little (error) variance. Adding a randomly selected (non-zero) residual term from TEMP.RES restores this variability. Please see the Reference manual for COVIMP.EXE and ADDRES.EXE for further details. (10) Specify output format (6(F12.6,1X)): The format can be altered to suit the user's need for precision. Enclose the Fortran format statement in single parentheses, e.g.: (8F10.6) In v2.3 the default output format is (6F12.7). APPENDIX A Raw Data and Covariance Matrices: Input and Output This appendix contains sample code for several of the major statistical packages (SAS, SPSSx, BMDP, Systat, and Minitab), for output of raw data and input of covariance/correlation matrices output by EMCOV.EXE. COVTOCOR. A utility program is included that will allow for easier data transition between the EMCOV.EXE program and other statistical packages. The program, COVTOCOR.EXE will read in a covariance or correlation matrix, with or without a vector of means, in lower triangle or full matrix form, and provide output in the following forms: 1) Covariance or correlation matrix 2) Full or lower triangle matrix 3) With or without additional vector of means and sample sizes 4) Raw data, SAS, or SPSSx covariance (or correlation) data set format SAS or SPSS Format. With the additional input of a file containing the variable names, each enclosed in single quotation marks, the COVTOCOR program can read in the covariance matrix from EMCOV.EXE and provide output in SAS or SPSS data set form. This option will allow the covariance or correlation matrix to be read directly into such SAS or SPSS procedures as FACTOR or REGRESSION. Raw Data Output The raw data to be read into EMCOV.EXE must be in free format (spaces only between data field). The data can, of course, be wrapped onto multiple lines. The missing data points must be coded numerically, with the default being "9". SAS OPTIONS MISSING='9'; DATA second; SET first; FILE 'rawout.dat'; PUT(x1-x10)(5*9.6,/,5*9.6); RUN; SPSS WRITE OUTFILE=rawout.dat / varlist (format) EXECUTE See chapter 10 of the SPSSx (3rd edition) manual for further definitions and options. BMDP Within a BMDP procedure, the /SAVE command can be used to save data to an external file. The following example shows how a raw data file is input using free format( with missing values specified by ".") and then output to a different external file using different format. /INPUT FILE = 'rawin.dat'. VARIABLES = 5. FORMAT=FREE. MCHAR = '.'. /VARIABLE NAMES = var1, var2, var3, var4, var5. /SAVE FILE = 'rawout.dat'. KEEP = var1 TO var5. FORMAT = '(5F6.2)'. SYSTAT The following will write raw data to a file called 'rawout.dat' from a SYSTAT system file called 'mydata.sys': USE mydata PUT rawout RUN MINITAB write 'rawout.dat' c1-c10; format (10f9.6). stop Augmented Covariance Matrix Input The covariances can be read out of EMCOV.EXE using the default format (8F9.6) or one that is specified by the user. The output matrix is a K+1 by K matrix, the last record containing the maximum likelihood means. SAS The matrix of covariances and means must first be augmented by variable names and _type_ names. Each line of the covariance data set to be input should be set up as follows, with variable name first, followed by the _type_, and then the covariances: varname cov xxx xxx xxx etc. varname cov xxx xxx xxx etc. . . mean mean XXX XXX XXX etc. N N XXX XXX XXX etc. The SAS data step for reading in the covariance data set is: DATA first(TYPE=COV); _TYPE_='cov'; INFILE='emout.dat'; INPUT _name_ $ _type_ $ var1 var2 var3 ... RUN; PROC FACTOR DATA=first; SPSS The input raw data matrix consists of lower triangle covariances (including the diagonal), means, and ROWTYPE_ designations and is structured as follows: N xxx xxx xxx etc. MEAN xxx xxx xxx etc. COV xxx . COV xxx xxx . . COV xxx xxx xxx etc.. Using the following SPSSx commands, the raw data matrix, residing in file "fileone", is converted into an active system file: MATRIX DATA FILE=fileone /VARIABLES=ROWTYPE_ varlist /FORMAT=FREE The SPSSx active system file can then be read into such procedures as FACTOR or REGRESSION. Because we only use one vector for the sample sizes (default is a matrix of Ns), we must specify MISSING=LISTWISE in the REGRESSION or other SPSSx procedures. FACTOR MATRIX IN(*) REGRESSION MATRIX IN(*) /MISSING=LISTWISE To convert the system file containing the covariance matrix to a file containing the correlation matrix plus a vector of standard deviations use the following command: MCONVERT IN(*) OUT(CORMTX) For further information on the allowable specifications and other options for the input of correlation or covariance matrices, consult chapter 13 in the SPSSx (3rd edition) manual. BMDP A covariance or correlation can be input into BMDP using either "lower" triangle or "square" form. For further options, see the chapter on 4M Factor Analysis in Vol. 1 of the BMDP Statistical Software Manual (1988). The following example reads in a full correlation matrix for use in a factor analysis: /INPUT FILE = 'emout.cor'. VARIABLES = 5. FORMAT=FREE. TYPE = CORR. SHAPE = SQUARE. /VARIABLE NAMES = var1, var2, var3, var4, var5. /FACTOR SYSTAT The following will read in a covariance matrix from file 'emout.cov' and write this out to a SYSTAT system file called 'newcov.sys': GET emout.cov INPUT var(1-5) TYPE=COVARIANCE (or CORRELATION, if desired) SAVE newcov RUN MINITAB The following can be used to read in the covariances only: read 'emout.cov' into c1-c10; format(10f8.3). copy c1-c10 into m1 note m1 is the covariance matrix