Fitting a PCM model

PCMFit(
  X,
  tree,
  model,
  SE = matrix(0, PCMNumTraits(model), PCMTreeNumTips(tree)),
  metaI = PCMInfo(X, tree, model, SE),
  positiveValueGuard = Inf,
  argsPCMParamLowerLimit = NULL,
  argsPCMParamUpperLimit = NULL,
  matParInit = NULL,
  numRunifInitVecParams = if (is.null(matParInit)) 1000L else 0L,
  numGuessInitVecParams = if (is.null(matParInit)) 100L else 0L,
  numCallsOptim = 10L,
  control = NULL,
  doParallel = FALSE,
  verbose = FALSE
)

Arguments

X

a k x N numerical matrix with possible NA and NaN entries. For i=1,..., N, the column i of X contains the measured trait values for species i (the tip with integer identifier equal to i in tree). Missing values can be either not-available (NA) or not existing (NaN). These two values are treated differently when calculating likelihoods (see PCMPresentCoordinates).

tree

a phylo object with N tips.

model

an S3 object specifying both, the model type (class, e.g. "OU") as well as the concrete model parameter values at which the likelihood is to be calculated (see also Details).

SE

a k x N matrix specifying the standard error for each measurement in X. Alternatively, a k x k x N cube specifying an upper triangular k x k factor of the variance covariance matrix for the measurement error for each tip i=1, ..., N. When SE is a matrix, the k x k measurement error variance matrix for a tip i is calculated as VE[, , i] <- diag(SE[, i] * SE[, i], nrow = k). When SE is a cube, the way how the measurement variance matrix for a tip i is calculated depends on the runtime option PCMBase.Transpose.Sigma_x as follows:

if getOption("PCMBase.Transpose.Sigma_x", FALSE) == FALSE (default):

VE[, , i] <- SE[, , i] %*% t(SE[, , i])

if getOption("PCMBase.Transpose.Sigma_x", FALSE) == TRUE:

VE[, , i] <- t(SE[, , i]) %*% SE[, , i]

Note that the above behavior is consistent with the treatment of the model parameters Sigma_x, Sigmae_x and Sigmaj_x, which are also specified as upper triangular factors. Default: matrix(0.0, PCMNumTraits(model), PCMTreeNumTips(tree)).

metaI

a list returned from a call to PCMInfo(X, tree, model, SE), containing meta-data such as N, M and k. Alternatively, this can be a character string naming a function or a function object that returns such a list, e.g. the functionPCMInfo or the function PCMInfoCpp from the PCMBaseCpp package.

positiveValueGuard

a real number (not necessarily positive) used during the fit as a threshold for highly positive but likely incorrect log-likelihood values. This argument is set to Inf by default and should be used only as a last escape. The recommended way to of preventing such numerical errors is by setting more stringent values for the runtime options PCMBase.Threshold.EV and PCMBase.Threshold.SV (see PCMOptions).

argsPCMParamLowerLimit, argsPCMParamUpperLimit

named lists with arguments passed to the functions PCMParamLowerLimit and PCMParamUpperLimit, respectively. Default: NULL.

matParInit

a matrix of any number of rows and p columns where, p is the number of variable numerical parameters in the model (equal to PCMParamCount(model)). Each row of this matrix specifies a suggested starting location for the optim L-BFGS-B run. Default: NULL, meaning that the initial parameters are to be chosen at random and/or using calls to GuessInitVecParams function.

numRunifInitVecParams, numGuessInitVecParams

integers specifying how many parameter vectors should be drawn from a uniform distribution between PCMParamLowerLimit(model) and PCMParamUpplerLimit(model), and how many parameter vectors should be generated by jittering the resulting vector from a call to GuessInitVecParams. Before starting the optimization the model likelihood is evaluated at each of these vectors and the top numCallsOptim vectors are chosen for starting locations for optimization. The default settings are numRunifInitVecParams = if( is.null(matParInit) ) 1000L else 0L and numGuessInitVecParams = if( is.null(matParInit) ) 100L else 0L.

numCallsOptim

integer specifying the maximum number of calls to optim function. Default: 10. Note that this parameter would be overwritten by a smaller nrow(matParInit) (if matParInit is specified) or by a smaller number of generated initial parameter vectors that satisfy the parameter limits (see also the arguments numRunifInitVecParams,numGuessInitVecParams and argsPCMParamLowerLimit, argsPCMParamUpperLimit).

control

a list passed as control argument to optim. Default: NULL.

doParallel

logical indicating if optim calls should be executed in parallel. Default: FALSE.

verbose

logical indicating if information messages should be printed to the console while running. Default: FALSE.

Value

an object of class PCMFit

References

Mitov, V., Bartoszek, K., & Stadler, T. (2019). Automatic generation of evolutionary hypotheses using mixed Gaussian phylogenetic models. Proceedings of the National Academy of Sciences of the United States of America, 35, 201813823. http://doi.org/10.1073/pnas.1813823116

Mitov, V., Bartoszek, K., Asimomitis, G., & Stadler, T. (2019). Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts. Theoretical Population Biology. http://doi.org/10.1016/j.tpb.2019.11.005

See also

PCMFitMixed PCMOptions