## How do I perform model selection with mgcv? ##
For general advice on model selection from author of mgcv, Simon Wood, see the mgcv documentation article [gam.selection][1]. This article starts with a section titled *Smoothness Selection Criteria*, which refers specifically to how mgcv decides how to smooth each term of a GAM--e.g. what basis function is best and where to place knots to define the spline. This may seem strange to practitioners who come from a linear modeling background and may be accustomed to "model selection" referring specifically to the process of deciding which terms to include in a model. The article covers that process next, in sections titled *Automatic Term Selection* and *Interactive Term Selection*.
Below, we address some common questions about smoothness selection and term selection. Many of these questions are addressed in the gam.selection article, albeit tersely.
## Which smoothness selection method should I use? ##
In general, we recommend you use REML: ```gam(..., method="reml")```. The main alternatives, GCV and GCV.Cp, are prone to under-smoothing (and overfitting). Also, REML yields more pronounced optima during model fitting, leading to fewer optimization issues and less variable estimates of smoothing parameters. See Section 1.1 of [Wood (2011)][2] and [this writeup][3] for more information.
![GCV vs. REML][4]
## Should I limit the degrees of freedom for model terms, e.g. by specifying s(..., k=4)? ##
Ultimately, this is a philosophical question that is best decided by considering what you are trying to accomplish with your model. Some reasons we have seen for limiting the degrees of freedom include:
- To force relationships to conform to simple shapes expected by ecological theory, thereby making them, in a certain sense, easier to interpret ecologically. For example, by limiting a term to three degrees of freedom (by specifying `s(..., k=4)` in the mgcv formula), relationships may be no more complex than unimodal, which, if the relationship is n shaped rather than u shaped, resembles an ecological envelope.
- To force relationships to be simple and focused on the strongest effect and thereby improve transferability of the model to other times and locations (for some discussion, see [Yates et al. 2018][5]).
Arguments against this practice include:
- It results in poorer fitting models.
- When used alone, it is not effective in forcing models to be parsimonious when covariates are correlated. To force parsimony you must also limit the number of terms allowed to be in the model, because when terms are correlated and k is limited, mgcv will retain more terms, thus utilizing a large number of degrees of freedom over the aggregate suite of covariates, rather than with just one. By unlimiting k in this situation, mgcv is more likely to discard correlated terms and yield a model with fewer terms.
In general, unless you are trying to force relationships to resemble particular shapes or promote transferability by forcing arbitrary restrictions on complexity, we recommend you do not limit the degrees of freedom for your model terms. Rather, we recommend you check your results to ensure the k parameter is *not set too low*. This would be your concern, for example, if your modeling priority was to provide the best possible predictions for the study area and surveyed period, rather than focusing on interpretability or transferability. For advice on adjusting k accordingly, see the mgcv documentation article [choose.k][6].
## How do I perform automatic term selection with mgcv? ##
mgcv endeavors to simplify the modeler's job by selecting model terms in one run, without needing to perform traditional processes such as forward selection, backwards selection, or fitting and comparing comparing all possible combinations of terms. This capability is enabled through a statistical technique called "shrinkage". For technical details, please see [Marra and Wood (2011)][7], who describe several methods for performing automatic term selection with GAMs, the first two of which are implemented in mgcv.
The first method, referred to by Marra and Wood as the *double penalty approach* and the [mgcv documentation][8] as *null space penalization*, is enabled by specifying `gam(..., select=TRUE)`. This effectively enables shrinkage for all smoothed terms in the model, using the "double penalty" technique described in the paper.
The second method, referred to by Marra and Wood as the *shrinkage approach* and the [mgcv documentation][9] as *shrinkage smoothers*, is enabled by specifying on a per-term basis that a shrinkage smoother should be used. For example, to specify that a term should be smoothed with the shrinkage version of the thin plate regression spline, specify `s(..., bs="ts")` in the model formula. In general, if you are unsure which smoother to use, we recommend this one. (Note that if the `bs` parameter is not specified, the default is `bs="tp"`, which is a thin plate regression spline without shrinkage.) You can also try a cubic regression spline with shrinkage, with `bs="cs"`.
The Marra and Wood paper goes on to evaluate the performance of these approaches and several others with simulated and real data. We are aware of both methods being used in density models of cetaceans in the United States and at this time we do not recommend one over the other for marine mammal density surface modeling (although this might change in the future as we accrue additional experience).
To illustrate the shrinkage smoother approach, we modeled abundance of sperm whales on survey segments used by [Roberts et al. (2016)][10] for models off the east coast of the United States:
model <- gam(Abundance ~ offset(log(Area)) + s(Depth, bs="ts") + s(Slope, bs="ts") + s(I(DistToCan/1000), bs="ts"), family=nb(), segs, method="REML")
In this model, we modeled abundance from three covariates: the depth and slope of the sea floor and the distance to the closest submarine canyon. We used thin plate regression splines with shrinkage for each term. (The code `I(DistToCan/1000)` simply converts the distance from meters to kilometers.) We used a negative binomial distribution, with mgcv estimating the $theta$ parameter during smoothness and term selection. We used the area of the survey segment as a model offset (log transformed to match the link function for negative binomial models), as usual for density surface models fitted with mgcv.
During the model fitting procedure, mgcv "shrank to zero" the Slope covariate but not the Depth or DistToCan covariate. This can clearly be seen in the model's term plots, in which the Slope term appears to be the horizontal line $y=0$ with confidence limits enclosing zero at all values of the covariate.
![Sperm whale example model term plots][11]
In the `summary(model)` output, the *Approximate significance of smooth terms* table shows an estimated degrees of freedom (edf) and Chi square score (Chi.sq) close to zero, with a p-value > 0.05:
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(Depth) 3.156046 9 71.567 <2e-16 ***
s(Slope) 0.002644 9 0.002 0.413
s(I(DistToCan/1000)) 1.219743 9 105.841 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You can use these statistics to detect a shrunk term. It is usually sufficient to check for p-values > 0.05:
> summary(model)$s.pv
[1] 1.188157e-17 4.130385e-01 5.829356e-26
## Should I remove shrunk terms and refit the model? ##
Opinions vary. Some prefer to refit and obtain a model having only significant terms, as is done with traditional backwards selection. A benefit is that fewer covariates will be needed to predict the model.
Others prefer to leave insignificant terms in, to account for what was tried.
REML and AIC scores of models with and without such terms, and predicted densities, are usually almost identical.
## Should I remove terms not shrunk to zero that have low estimated degrees of freedom (EDF)? ##
Opinions vary. For Duke’s Navy density models of the U.S. east coast and Gulf of Mexico, we generally remove terms with EDF < 0.85, out of a desire to obtain parsimonious models and because mgcv’s estimated confidence intervals enclose zero when the EDF < 0.85.
To get the EDFs:
```R
> summary(model)$edf
[1] 3.156046242 0.002643643 1.219742914
```
Example of EDF < 0.85 from a candidate model for Clymene dolphin:
```
Approximate significance of smooth terms:
edf Ref.df F p-value
s(x,y) 11.4242 49 0.939 2.21e-07 ***
s(log10(pmax(ClimEpiMnkPB1, 0.001))) 0.7695 9 0.303 0.0349 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
![Clymene dolphin functional plot where EDF < 0.85][12]
## Given a set of fitted candidate models (e.g. with different covariates) how do I select the best model? ##
It depends on what you mean by "best". Such decisions are usually judgment calls that balance multiple competing goals, such as parsimony and precision, that usually depend on the scientific or management objectives of the study. We cannot recommend a universally-applicable method for selecting between candidate models.
## Ok, if I ask you to rank them using a traditional numerical goodness-of-fit criterion such as AIC, which criterion would you use? ##
AIC can always be used.
If you used REML as the smoothing criterion, and all of the models' unpenalized fixed effects (i.e. those not smoothed with shrinkage) are specified exactly the same way, you can use REML.
In practice, we have noticed that ranking models by AIC and REML usually result in the same order, but not always (particularly when the number of terms differ).
[1]: https://www.rdocumentation.org/packages/mgcv/topics/gam.selection
[2]: http://dx.doi.org/10.1111/j.1467-9868.2010.00749.x
[3]: https://github.com/DistanceDevelopment/dsm/wiki/Why-is-the-default-smoothing-method-%22REML%22-rather-than-%22GCV.Cp%22%3F
[4]: https://files.osf.io/v1/resources/wgc4f/providers/osfstorage/5c65f75616506900160d9c09?mode=render
[5]: https://doi.org/10.1016/j.tree.2018.08.001
[6]: https://www.rdocumentation.org/packages/mgcv/topics/choose.k
[7]: https://doi.org/10.1016/j.csda.2011.02.004
[8]: https://www.rdocumentation.org/packages/mgcv/step.gam
[9]: https://www.rdocumentation.org/packages/mgcv/step.gam
[10]: http://%20https://doi.org/10.1038/srep22615
[11]: https://files.osf.io/v1/resources/wgc4f/providers/osfstorage/5cfd88b33cd70c0016028cb1?mode=render
[12]: https://files.osf.io/v1/resources/wgc4f/providers/osfstorage/5c65f64314312700199e8fa2?mode=render