Output from Molecular Modeling Pro Plus

Example of part of the print out from a multiple regression analysis. The model investigated here is flash point = intercept + b*(1/enthalpy of vaporization). The inverse transformation of enthalpy of vaporization was found to be the best one variable model in the data set investigated for determining flash point with a brute force regression method. The analysis below is a follow-up (explained in more detail in the tutorial in the help file).

                       Analysis of variance


Variation source    df       SS        MS             Statistics


Total (uncorrected) 360 6818859.7389                  F=1066.68276

Mean                1   3523457.24432              rsquare=0.74872

Total (corrected)   359 3295402.49458                 s=48.09447

Regression          1   2467320.53578  2467320.53578

Residual            358 828081.9588    2313.0781


Note: probability of significant F =<0.0001




Model coefficients and standard errors:


parameter           coefficient  standard error   t        prob


intercept =         259.139      5.52153        46.9325   <<0.00001


                   -7722.05      236.437        32.6601   <<0.00001


note: response variable: Flash_Point__C_




Printout of response values, predicted values and residuals:


                             observed     predicted      residual

acetal                      -21            27.8085       -48.8085

acetaldehyde                -40           -29.363        -10.637

acetic acid                  40            51.613        -11.613

acetic anhydride             54            62.7094       -8.7094

acetol                       56            90.1554       -34.1554

acetone                     -17           -6.97324       -10.0268

acetone cyanohydrin          63            105.799       -42.7992

...(and so on)...

Figure 26. Analysis of variance table, model coefficients and partial print-out of the table of response, predicted and residual values for the flash point one variable model.

Analysis of variance table:

Abbreviations used: df = degrees of freedom; SS = sum of squares; MS = mean squared; F= Fischer's F test; r squared = proportion of variance accounted for by the model (e.g. in this example about 75%); S = model standard deviation (about 95% of the data lies within 2 standard deviations - thus plus or minus about 96 degrees C);

Coefficients table:

The model is: flash point (C) = 259.1 - 7722.05*(1/enthalpy of vaporization)

Both the intercept and the regression coefficient are highly statistically significant (prob <<0.00001)

Printout of predicted and residual values:

From this table find the largest out-liers (residuals) and determine if they have something obviously in common that will lead to a better model. For instance, if all solvents are well-accounted for, but surfactants are poorly predicted, consider developing separate models for solvents and surfactants.


Contributions to PRESS (Predictive Residual Sum of Squares):


         Compound            Predictive discrepancy

--------------------------   ----------------------

acetal                        2405.442

acetaldehyde                  115.3149

acetic acid                   135.8609

acetic anhydride              76.35809

acetol                        1173.177

acetone                       102.0248

acetone cyanohydrin           1842.063

acetonitrile                  1.354055

...(and so on)...


Total PRESS = 843833.3

Sum of squares of response (SSY) = 8277222

Press/SSY = 0.1019464

The model appears to be reasonable and has passed the cross-validation test.

Figure 27. PRESS output for the one-variable flash point model. The larger the predictive descrepancy, the more influential the data point. Leaving out an influential data point can greatly effect the model. You may want to leave out some of the more influential point and rerun the analysis. Also, as with the table of residuals (figure 26) you may want to look for obvious patterns in the types of chemistries that are influential.

At the bottom of the table is an assessment of whether the model meets the Press/SSY <= 0.4 cut-off. Failure would mean that just a few data points are contributing to most of what is accounted for by the model and you may want to get rid of some of these data points and redo the whole analysis.


Figure 28. Table of observed versus predicted values of flash point in the one variable model. Outliers (data points farthest removed from the regression line) can be identified by clicking on them on the graph with the mouse. Make sure that outlying data points are not typos (check for errors). Consider whether the calculated field (1/enthalpy of vaporization) is likely to calculate the outlying data points well and if not exclude them from the analysis and tell the users of the model that it is not reliable for those types of molecules.


Figure 29. Plot of predicted values of flash point against the residual values (residuals = difference between predicted and observed values). Optimally this plat is a scatter plot with completely random arrangement of the data points. Curvature or patterns may indicate problems like intercorellation of the dependent and independent variables or the need for transformation of one of the fields or addition of a parabolic or cross-product term. Examination of the residuals is a critical part of model validation. Further discussion in the on-line Help file...