nlmixr2 2.1.0+ new estimation methods
By Matthew Fidler in nlmixr2
February 7, 2024
nlmixr2 2.1.0 was released and I promised to talk about the new features.
One of the things that can impact many peoples work-flow is new estimation methods for population-only data. Many people use population-only estimation methods before changing the model to a mixed effect model, so I believe these can be useful for many people trying to find the best model to the data at hand.
I will talk about the new ones (and why you may want to use them).
Gradient free optimization
There are three gradient-free optimization methods.
bobyqafromnewuoa::bobyqafor gradient free constrained optimizationuobyqafromnewuoa::uobyqafor gradient free unconstrained optimizationoptimthat uses the R functionstats::optimfor the gradient-free methods;optimControl(method="Nelder-Mead"),optimControl(method="SANN")andoptimControl(method="Brent").
I know most pharmacometricians can write these models functions themselves.
However, using the nlmixr2 interface gives the following advantages:
The log-likelihood is calculated from the model by
nlmixr2est/rxode2, you don’t have to write these equations by hand.The model is loaded into memory before the optimization begins. Because of this pre-loading, the optimization does not take as long as if you wrote a
rxode2model and used a R function to optimize and find the best solution.Parameters are scaled (based on the same method as nlmixr2, which uses parsing to try to make the parameters similar on a log-scale; this parses particular expressions and uses the symbolic gradient to try to scale the parameter). If you want it un-scaled you can turn this off in the control options.
The covariance of the estimates are calculated automatically (which is not done for a simple optimization problem)
At the solution the table of individual predictions, residuals etc are automatically added to an output table for quick diagnostic plots.
The fit is a
nlmixr2model object and can be used with any of the supporting packages, and can even be used in (my favorite feature) model piping.
So, I think there is some value to these methods over simply using the optimization directly.
nls – Traditional Nonlinear Regression
nlsusing the R functionstats::nls(or since closely relatedminpack.lm::nlsNM)
The nls estimation method has similar advantages as the
gradient-free method, but it also:
Calculates the gradient of each point and possibly the Hessian depending on the
nlsmethod. This is different from the rest of the likelihood methods since they calculate the likelihood of the problem overall, but thenlsmethods calculate these for each observation time-point.In addition to scaling based on parsed parameter forms, the scaling is adjusted by calculating the gradient at the initial parameter value. Then this gradient is scaled as well to be in the neighborhood of 1.
I have personally in testing this method that nls to be a bit less
robust than say nlminb or nlm (though it could be simply the data
set I am using).
nlminb & nlm – Optimization using symbolic gradients and Shi-adjusted finite difference Hessians
nlminbusingstats::nlminbandnlmusingstats::nlm
This has the same advantages of the nls method, though only one
likelihood is optimized.
When calculating the Hessian for
optimization the problem it uses the modified Shi 2021 algorithm
(https://arxiv.org/pdf/2110.06380.pdf) to pick the optimal step-size
for calculating the Hessian from the gradient. Instead of optimizing
the gradient directly, the algorithm optimizes the harmonic mean of
the gradient of all points. This is the same method that was used for
the generalized log-likelihood in focei
(https://blog.nlmixr2.org/blog/2022-10-23-nlmixr2-llik/) and in my
testing performed better than other methods.
Optimization with symbolic gradients of likelihood
optimthat uses the R functionstats::optimfor the gradient methods [optimControl(method="L-BFGS-B"),optimControl(method="BFGS")andoptimControl(method="CG")].nlminbthat uses the R functionstats::nlminb(by default)lbfgsb3cto estimate population only likelihoods usinglbfgsb3c::lbfgsb3c().n1qn1uses the functionn1qn1::n1qn1
These are the remaining methods; They have all the same advantages as the methods above but they do not use the Hessian in the optimization.
Looking backward
The method "focei" still downgrades to a log-likelihood when there
is no between subject variability. Unlike the above methods, the
problem gradient (or Hessian) is not calculated. This matches the
behavior of the “focei” that optimizes the “outer” problem
numerically.
If/when we add a gradient-based outer optimization, we will add the gradients to that method. However, it will likely be called something different.
Looking forward
You may wonder if we would add another population estimation method
based on another excellent optimization package in R. I don’t mind,
though it will probably go in babelmixr2 instead of the core
nlmixr2. These methods were chose because they were part of base R
or already imported into nlmixr2 (with the exception of minpack.lm
which was very close to the nls implementation so it was added).
I am grateful for all the people who have found nlmixr2 in their
work, and those who have given bug reports so nlmixr2 can get
better.
- Posted on:
- February 7, 2024
- Length:
- 4 minute read, 806 words
- Categories:
- nlmixr2
- Tags:
- estimation
- See Also: