**Output from Molecular Modeling Pro Plus**

Example of part of the print out from a multiple regression analysis. The model investigated here is flash point = intercept + b*(1/enthalpy of vaporization). The inverse transformation of enthalpy of vaporization was found to be the best one variable model in the data set investigated for determining flash point with a brute force regression method. The analysis below is a follow-up (explained in more detail in the tutorial in the help file).

Analysis of variance ------------------------------------------------------------------ Variation
source df SS MS Statistics ------------------------------------------------------------------ Total
(uncorrected) 360 6818859.7389 F=1066.68276 Mean 1 3523457.24432 rsquare=0.74872 Total
(corrected) 359 3295402.49458 s=48.09447 Regression 1
2467320.53578 2467320.53578 Residual 358 828081.9588 2313.0781 ------------------------------------------------------------------ Note:
probability of significant F =<0.0001 Model
coefficients and standard errors: parameter coefficient standard error t
prob ------------------------------------------------------------------ intercept
= 259.139 5.52153 46.9325 <<0.00001 1/_Enthalpy_of_vaporization_at_the_boiling_point__kJ/mole__ -7722.05 236.437 32.6601 <<0.00001 ------------------------------------------------------------------ note:
response variable: Flash_Point__C_ Printout
of response values, predicted values and residuals:
observed predicted residual acetal -21 27.8085 -48.8085 acetaldehyde -40 -29.363 -10.637 acetic
acid 40 51.613 -11.613 acetic
anhydride 54 62.7094 -8.7094 acetol
56 90.1554 -34.1554 acetone -17 -6.97324 -10.0268 acetone
cyanohydrin 63 105.799 -42.7992 ...(and so on)... |

Figure 26. Analysis of variance table, model coefficients and partial print-out of the table of response, predicted and residual values for the flash point one variable model. Analysis of variance table: Abbreviations used: df = degrees of freedom; SS = sum of squares; MS = mean squared; F= Fischer's F test; r squared = proportion of variance accounted for by the model (e.g. in this example about 75%); S = model standard deviation (about 95% of the data lies within 2 standard deviations - thus plus or minus about 96 degrees C); Coefficients table: The model is: flash point (C) = 259.1 - 7722.05*(1/enthalpy of vaporization) Both the intercept and the regression coefficient are highly statistically significant (prob <<0.00001) Printout of predicted and residual values: From this table find the largest out-liers (residuals) and determine if they have something obviously in common that will lead to a better model. For instance, if all solvents are well-accounted for, but surfactants are poorly predicted, consider developing separate models for solvents and surfactants. |

Contributions
to PRESS (Predictive Residual Sum of Squares): Compound Predictive discrepancy -------------------------- ---------------------- acetal
2405.442 acetaldehyde 115.3149 acetic
acid 135.8609 acetic
anhydride 76.35809 acetol
1173.177 acetone
102.0248 acetone
cyanohydrin 1842.063 acetonitrile 1.354055 ...(and
so on)... Total
PRESS = 843833.3 Sum
of squares of response (SSY) = 8277222 Press/SSY
= 0.1019464 The model appears to be reasonable and has passed the cross-validation test. |

Figure 27. PRESS output for the one-variable flash point model. The larger the predictive descrepancy, the more influential the data point. Leaving out an influential data point can greatly effect the model. You may want to leave out some of the more influential point and rerun the analysis. Also, as with the table of residuals (figure 26) you may want to look for obvious patterns in the types of chemistries that are influential. At the bottom of the table is an assessment of whether the model meets the Press/SSY <= 0.4 cut-off. Failure would mean that just a few data points are contributing to most of what is accounted for by the model and you may want to get rid of some of these data points and redo the whole analysis. |

Figure 28. Table of observed versus predicted values of flash point in the one variable model. Outliers (data points farthest removed from the regression line) can be identified by clicking on them on the graph with the mouse. Make sure that outlying data points are not typos (check for errors). Consider whether the calculated field (1/enthalpy of vaporization) is likely to calculate the outlying data points well and if not exclude them from the analysis and tell the users of the model that it is not reliable for those types of molecules. |

Figure 29. Plot of predicted values of flash point against the residual values (residuals = difference between predicted and observed values). Optimally this plat is a scatter plot with completely random arrangement of the data points. Curvature or patterns may indicate problems like intercorellation of the dependent and independent variables or the need for transformation of one of the fields or addition of a parabolic or cross-product term. Examination of the residuals is a critical part of model validation. Further discussion in the on-line Help file... |