Varieties of Mobility Measures: Comparing Survey and Mobile Phone Data during the COVID-19 Pandemic

Abstract Human mobility has become a major variable of interest during the COVID-19 pandemic and central to policy decisions all around the world. To measure individual mobility, research relies on a variety of indicators that commonly stem from two main data sources: survey self-reports and behavioral mobility data from mobile phones. However, little is known about how mobility from survey self-reports relates to popular mobility estimates using data from the Global System for Mobile Communications (GSM) and the Global Positioning System (GPS). Spanning March 2020 until April 2021, this study compares self-reported mobility from a panel survey in Austria to aggregated mobility estimates utilizing (1) GSM data and (2) Google’s GPS-based Community Mobility Reports. Our analyses show that correlations in mobility changes over time are high, both in general and when comparing subgroups by age, gender, and mobility category. However, while these trends are similar, the size of relative mobility changes over time differs substantially between different mobility estimates. Overall, while our findings suggest that these mobility estimates manage to capture similar latent variables, especially when focusing on changes in mobility over time, researchers should be aware of the specific form of mobility different data sources capture.

To test whether panel attrition could have affected the survey mobility estimates, we check which respondents left the survey and whether our resampled subjects have similar mobility characteristics. To test this, we assume that those who left the panel and those resampled should have similar differences in mobility behavior compared to the panelists remaining in the survey. This is described in equation (1) calculating the mobility measure mobility i for each individual i. Where part i,t−1 is a dummy variable indicating if a participant participated in the previous wave t − 1 and part i,t is a dummy variable indicating if a participant was present in the current survey wave t. We calculate separate models for each possible pair of survey waves and compare the differences in the two dummy variables using a linear hypothesis test (Fox and Weisberg 2019). The basic idea is that those who dropped out and those who are resampled should have roughly similar differences in mobility compared to those who are present in both waves. Hence the two dummy variables should be roughly equal in size (formally β 1 = β 2 ). mobility i = β 1 * part i,t−1 + β 2 * part i,t + β 3 * wave i,t + α + ϵ i We find that the average difference between these two coefficients is quite small (0.0118) and get 16 statistically significant differences out of 215 p<0.05 and 29 for p<0.1. These shares are roughly equal to what we would expect because of false positives. This suggests that those joining or rejoining the sample have similar differences to those those who stay in the sample. Aggregate comparisons between waves should therefore provide quite consistent results if the assumption holds that the initial selection of participants into the survey is independent of patterns of change in the population. We note that these differences are not only small because similar panellists were re-sampled but also because the average mobility of those who dropped out did not differ much from those who remained in the sample (β 1 = 0.023).
To check for potential biases because of item-nonresponse we first asses the shares of missing values for our aggregate mobility estimate and each mobility component. in their survey questions if field periods increase in time. Note: Item nonresponse: 0 = responded, 1 = avoided an answer ("Don't know" or "No answer"). Because of the space considerations we report only every third wave fixed effect. The model contains all wave dummies. Full results are available on request. Standard errors within parentheses.
limit, the estimate obtained from the survey is limited by the 'daily' answer option. This limit, however, should not lead to substantial measurement error due to data censoring since most people would not go to work, go shopping or do sports more often than daily.
This notion is supported by the low shares of respondents choosing the answer category 'daily' (see Figure S1 below). Despite this limit in the survey, the self-reported data features more work-related mobility than Google's Mobility Reports. This suggests that the issue of reduced variance in relative mobility due to an upper limit in the survey scale should be most suitable.  The survey question on whether respondents went out to 'buy non-food products' was only asked from the fourth wave onward. This was because, until then, the governmental restrictions in Austria basically prohibited nonessential shopping. Hence, our additive index of mobility consists of 8 items until wave three, and 9 items afterwards. To account for this difference, we also exclude the place category 'retail & recreation' for the Google estimate in the first three waves. Cronbach's alpha for the survey index on mobility is 0.74 across all waves, indicating that the index represents a solid construct. Dropping the items on 'work' and 'walking pets' increases alpha by only 0.02, leading to the conclusion that the index from all items is reliable.

B Baseline week
Our baseline week (March 23-29) contains no major Austrian holidays or (non-pandemic related) school closures. Nevertheless, the choice of the baseline week can have substantial effects on our findings. To study how robust our results are to changes in the baseline week of our estimates, we calculated correlations and coefficients for different baselines.
Correlation coefficients between the survey and the GSM estimates are unaffected by changes in the selected reference week. Differences only emerge with regard to Google's Mobility Reports, because this estimate relies on the assumption that every relative change in the mobility categories is equally important for relative changes in the overall mobility estimate. The other estimates calculate the aggregate indices first and then calculate the relative changes. We cannot use this approach for the Google data due  Figure S2 shows the trends in relative mobility by subgroup throughout the pandemic analog to Figure 1 in the main manuscript. Overall gender differences are quite similar when comparing mobile phone and survey data. Both estimate slightly higher relative increases in mobility after the lockdown for women compared to men. While these differences remain quite stable in the survey data, they disappear after summer 2020 in the GSM data. Notably, mobile phone data suggests higher effects of the end of the lockdown in April and May 2020 and at the beginning of 2021. Absolute mobility trends are depicted in Figure S3. These trends show that, while mobility of women increased more than mobility of men, absolute mobility estimates are always higher for men than for women.

C Mobility trends by subgroups
Again, the differences in mobility estimates between the two data sources are quite small.
The results are slightly different focusing on subgroup differences by age. Younger and older subjects generally show higher relative increases in mobility compared to others (see Figure S2). As suggested in the main text, this might stem from differences in the importance of work-related mobility in these age groups. Because work-related mobility remained important during the lockdown -as not all workers could work from home -the working age population had fewer opportunities to decrease their mobility. This higher mobility level decreases the potential for relative increases after the end of the lockdown,

Varieties of mobility measures
Supplementary Materials as larger absolute changes in mobility nevertheless appear smaller in the relative mobility estimate. In line with this interpretation, the lockdown in the end of 2020 seems to have decreased these age differences. Again, increases in mobility at the beginning of 2020 and 2021 are more pronounced in the mobile phone data compared to the survey estimates. Figure S3 containing trends of absolute mobility estimates by age-group supports this explanation, as those in working age show lower increases in mobility in the first half of 2020 compared to the younger and older age groups. The absolute mobility estimates also indicate that mobility decreases with higher age both in the GSM and in the survey data.  Figure S5 focusing on differences in absolute mobility for the GSM and survey estimates shows that these relative differences emerge also due to large differences in the initial raw mobility values. As Google does not provide their absolute mobility estimates, the graph shows the relative changes based on Google's initial baseline. This indicates large relative decreases in the mobility related to 'shopping (other)'. While also low in the survey data (∼0.1, indicating the average answer was between 'never' and 'on some days'), the survey mobility estimate increased less than the Google estimate. Besides higher initial values at the start of the pandemic in the survey estimate, this could also be because Google's 'retail and recreation' place category covers more mobility categories (such as visits to restaurants and movie theaters) than  In addition to basic correlation coefficients, we also calculated regressions on aggregated mobility estimates with time fixed effects including dummy variables for the data sources (models 1-4), subgroups (models 3-4), and their interactions (to account for heterogeneous effects across data sources). The coefficient estimates of these models are presented in Table S2.

E Results using a single item measure
To check the robustness of our results, we calculate similar analyses using a second mobility measure that utilizes respondent's self-reported frequency of leaving their home. This variable is only available every second wave. Specifically we compare the weekly absolute shares of people having a ROG of more than 500 meters to the absolute normalized survey wave average of a question asking for respondents' frequency of staying at home (we inverse this variable -question wording can be found in Appendix B). The results reported in Figure S6 and Figure S7 are quite similar to the ones using the additive index.
Correlations are high and rather consistent. Also, the differences between genders are consistent across the estimates. Differences between age groups indicate that younger individuals are exceeding the mobility thresholds more frequently, which is particularly visible in the survey measure. The trends in the GSM measure might indicate that this indicator reaches upper bounds with lower age groups. However, also here, the elderly are clearly less mobile.  to pass a lockdown limited to those specific provinces starting on 1 st of April. Burgenland returned to the nationwide rules on 19 th of April, while Vienna and Lower Austria prolonged the lockdown until 2 nd of May. We use this disparity within Austria for a Difference in Difference (DiD) estimation of the effectiveness of regional lockdowns on reducing mobility by comparing the mobility in Austria's east and west using all three different estimates. If estimates follow the same underlying trends in mobility, we would expect somewhat similar sizes of the overall lockdown effect. Moreover, as possibilities for shopping were most affected by the regional lockdown, we would expect higher lockdown effects in this mobility category.
To estimate the DiD analyses we leverage information of respondents' home region in the survey data and utilize the fact that Google provides mobility estimates by region in addition to the estimates by country. Furthermore, the GSM data also contain information about the regional variation of the daily median ROG. In this dataset, the home location is calculated using the nighttime location of the mobile devices.
We assessed the validity of this DiD estimation strategy by first testing the critical parallel trends assumption. Figure S8 visually confirms that trends in average mobility estimates in all three datasets between March 2020 and March 2021 (i.e., pre-treatment) were rather similar in the treatment group (Burgenland, Lower Austria, and Vienna) and control group (Carinthia, Upper Austria, Salzburg, Styria, Tyrol, and Vorarlberg). Figure S9 shows that this also holds if we focus on mobility in the category shopping (other) in those data sources with distinguishable mobility categories. Figure S8 shows an increase in mobility between March 2021 and April 2021 in the control group and a clear decrease in the treatment group within this time frame. Our DiD approach enabled us to estimate the causal impact of this lockdown treatment (Table   1). The constants describe the average mobility in the control group before the treatment.  points. This is also captured visually in the more pronounced variation between March and April of 2021 in Figure S9.
Structurally similar results are obtained when using survey data: model 4 suggests a reduction in mobility of 14.1 percentage points in the East due to the lockdown (not statistically significant at the .05 level). This effect strongly increases in model 5, focusing on the single mobility measure indicating how often people went shopping (47.8 percentage points, p = 0.002). The GSM data (model 3) show the most pronounced general effect of the Lockdown in Austria's east, indicating a reduction in mobility by 44.9 percentage points. Again, this shows the higher variation of this measure over time.
Generally, however, all measures capture a reduction in mobility in Austria's East due to the Lockdown.
In addition, we note that differences between estimates could also stem from different definitions of an individual's home region. While cross-regional commuting traffic and tourism within Austria contribute to the mobility of an individual's usual home-region in the GSM and possibly also in the survey data, this is likely to be different for the mobility estimate reported by Google. Note: As we only observe aggregate changes within the Google and GSM measures we cannot calculate estimates on the statistical variance of these effects with this data. Standard errors within parentheses.

Literature
Fox, John and Sanford Weisberg. 2019. "An R Companion to Applied Regression", Third Edition, London: Sage.