Agreeing to disagree? Explaining self–other disagreement on leadership behaviour

ABSTRACT Leadership research tends to treat differences among ratings of the same leaders as measurement error. Our study makes such varying perceptions of leadership behaviour its main phenomenon of investigation. We conceptualize divergent leadership ratings based on the difference between managers’ self-ratings and team members’ assessments of leadership behaviour. Using data from three German public organizations on 51 teams and 190 leader–follower dyads, we find that divergent leadership ratings are a function of managers’ motivation, their use of managerial reflection routines, and team members’ personality. The findings point to the importance of using multisource feedback and developing managers’ self- and other-awareness.


Introduction
Imagine a public manager who considers him or herself to be a great leader, but their team members do not agree. False self-perceptions of leadership can be just as detrimental to organizations as poor leadership behaviour itself. Inaccurate self-ratings are not only associated with lower performance but may also have a negative effect on subordinates' job satisfaction and organizational commitment (Fleenor et al. 2010(Fleenor et al. , 1019. Divergent leadership ratings among managers and employees are an indicator of lacking selfawareness, which is seen as an important antecedent of leadership effectiveness (Atkins and Wood 2002;Atwater and Yammarino 1992). Or, as Fletcher (1997, 186) states: if selfperceptions differ from those of others‚ '[…] then it is difficult to see how one can manage work relationships successfully, how one can contribute well as a team member, and how one can adapt one's behaviour to the circumstances and individuals. ' If managers are bad leaders, they receive training, mentoring, or are assigned less supervisory responsibilities. If managers are bad leaders, but they think they do a great job and hence overestimate their leadership skills, however, organizations may be even worse off. In such cases in which managers lack self-awareness, leadership problems may go undetected or behavioural change may require coercion as managers do not see the need to alter their behaviour (Atwater et al. 2005). Team members will become frustrated due to inadequate or a lack of leadership, and they may leave the team or agency, or simply respond by showing low levels of motivation, engagement, or performance (Bass and Yammarino 1991;Lee and Carpenter 2018).
Practitioners of human resource management and leadership development are aware of the self-other disagreement problem (Gentry, Cullen, and Altman 2016;Sala 2003). More than twenty-five years ago, employing the Johari Window for mapping what is known or unknown to a person's self and to others was a de rigueur aspect in leadership training. Since then, training and development specialists have promoted the use of multisource feedback to offer managers a more realistic view of their leadership behaviour (Kutcher, Donovan, and Lorenzet 2010). So-called 180 and 360 (degree) reviews have become popular components of public agencies' leadership development programs (Van Wart 2003, 220; U.S. Office of Personnel Management 2017; Hunt and Ivergard 2007). The former type of reviews refers to assessments that include feedback from subordinates and superiors, while the latter incorporates additional horizontal feedback from peers or other sources (Chappelow 2004).
Meanwhile, research is trying to catch up with these developments. Although leadership research has not been ignoring issues related to incongruent ratings, we see a tendency to discount managers' self-ratings as unreliable and a preference to replace them with other-ratings from subordinates. However, by treating divergences in ratings as measurement error that needs to be minimized, we miss an opportunity to study and better understand a phenomenon with real-life relevance: misconceptions of leadership behaviour and factors that may drive or mitigate the difference in self-other rating agreement (Fleenor et al. 2010).
Scholarship in the areas of business management, organizational behaviour, and psychology has begun to address this research gap (for an overview, see Fleenor et al. 2010 as well as Lee and Carpenter 2018). While public management research has made a great deal of progress with regard to the understanding of different leadership behaviours and their respective impact (Van Wart 2013; Kroll and Vogel 2014;Vogel and Masal 2015;Tummers and Knies 2016;Crosby and Bryson 2018), we have given little attention to the topic of self-other rater disagreement (Hassan and Rohrbaugh 2009 as well as Jacobsen and Andersen 2015 are notable exceptions).
With this article, we want to spotlight the issue of divergent self-and other-ratings of leadership and stimulate discussion among public management scholars. We think this is necessary because the rating gap may be, in fact, larger in the public than private sector. Rigid hierarchies, less flexible career paths, and separation due to educational degrees (Bach and Kessler 2007;van Dorp 2018) could lead to more distance among political appointees, managers at different echelons, and employees resulting in more rating disagreement. In addition, the perception gap may be more detrimental in the public sector as well. Due to the lack of financial incentives, less managerial resources, and more goal ambiguity (Hvidman and Andersen 2014;Pandey and Rainey 2007;Chen and Rainey 2014), it is often difficult to manage government organizations through 'systems' or 'by design,' and instead, such organizations have to rely on functional human relations, effective leadership, and realistic perceptions of such leadership behaviour.
Incongruent leadership assessments can be caused by misperceptions on the part of the leaders as well as the team members. With regard to leaders, we argue that their leadership perceptions will be more in agreement with those of team members, the more the former are able and willing to reflect on their own behaviour. In particular, we find that leaders with a strong motivation to lead show more agreement with follower assessments, pointing towards the importance of people's interest in being or becoming leaders. We also find that the creation of managerial learning routines, involving leaders as well as team members, increases self-other agreement. On the part of the team members, we expect that certain personalities may be less prone to producing incongruent assessments than others. More specifically, we find that employees' agreeableness decreases rating incongruence, while openness increases it.
One of the implications of this research is that in addition to leaders' self-awareness, a concept widely studied and praised in the business literature (Fleenor et al. 2010), other-awareness matters too. Employees with different personalities, levels of job satisfaction, and of years of age seem to differ systematically in the evaluations of their leaders. Leaders, who are aware of this and take into account that different people have different leadership expectations, will also be more likely to meet these expectations, can customize leadership behaviour, and satisfy team members' needs. Our empirical analysis is based on data from three German public organizations from which the responses of 190 team members were matched with those of 51 team leaders.
A model of divergent self-and other-ratings of leadership

Previous research and an integrative approach
A meta-analysis by Lee and Carpenter (2018) identifies 41 published articles on rater incongruence across industries and sectors. They find that self-assessment and otherratings of leadership are only moderately correlated, and that leaders tend to overestimate specific behaviours such as transformational leadership, servant leadership, and ethical leadership. Studies that examine public sector datasets confirm these findings. School principals' self-assessments of instructional and transformational leadership vary greatly from the other-assessments provided by the teachers in the same schools (Ham, Duyar, and Gumus 2015;Park and Ham 2016;Wang, Wilhite, and Martino 2016; for contrasting findings, see Muterera et al. 2018).
Research in the realm of public management on this topic includes work by Hassan and Rohrbaugh (2009) who examine the 360-degree feedback ratings of 68 midlevel public managers. This study finds that self-ratings as well as ratings by supervisors, peers, and subordinates were incongruent; the gap between self-and other-ratings was bigger than that among several types of other-ratings; and that these incongruences varied in different performance domains. In a study on leadership behaviour, Jacobsen and Andersen (2015) show that only employee-based leadership ratings, as opposed to self-ratings, were positively related to performance improvements, pointing towards the difference between intended and perceived leadership practices.
Whereas the public management literature does not provide many insights on factors influencing the identified differences between self-and other ratings, work from other fields has documented important correlates. For example, managers in higher positions are more likely to overestimate their leadership behaviour, just like managers with lower educational attainment. Empathy mitigates self-other disagreement, whereas narcissism increases it. Managers in individualist cultures show less self-other agreement with regard to their leadership skills than managers in collectivist cultures (e.g. Atwater et al. 2009;Judge, LePine, and Rich 2006;Ostroff, Atwater, and Feinberg 2004; for a more detailed overview, see Fleenor et al. 2010).
While research on the topic is accumulating, we are still in search of holistic theories that help explain the gap between self-and other-ratings. We argue that incongruences in leadership ratings between managers and followers can be a function of factors occurring at different conceptual levels shown in Figure 1. Hence, we cannot fully assess the importance of factors at one level if we do not, at the same time, account for factors at a different level, which is why we take an integrative approach to the study of rater disagreement. Our perspective is in line with previous calls for more integrative theory on leader-environment-follower interactions (Wofford 1982) that is inclusive of, among other things, social dynamics and cognitive elements (Avolio 2007). In that sense, leadership is constructed through the relationships connecting individuals rather than being an attribute of individuals (Balkundi and Kilduff 2006, 420). Figure 1 shows our theoretical framework that integrates actors at three different levels. In line with previous work, we use both self-ratings by managers and otherratings by followers and conceptualize disagreement as the lack of consensus between the ratings as our dependent variable (Atwater and Yammarino 1997;Fleenor et al. 2010). We call this 'self-other disagreement,' but also refer to it as 'rater incongruence or divergence' or 'perception gap.' We think of leaders as people who are in charge of managing other people ('team leaders' and 'team managers'), which is a much more inclusive definition than that of the organization's top management. We consider leadership as an essential part of managers' responsibilities (Mintzberg 2009) and do not differentiate between the terms leader and manager. We speak of followers, employees, subordinates, or team members when we refer to the people being led. The team leaders themselves have to report to superiors at a higher hierarchical level, whom we refer to as higher-tier supervisors.
Although it is rarely feasible to construct a model of self-other disagreement that incorporates multiple factors at each conceptual level, it was our goal to hypothesize and test one specific variable at each level that could be linked to rating incongruences. With regard to higher-tier supervisors, we think that their social expectations of team leaders may affect the latters' leadership behaviour and related self-ratings (Merton 1957;Shivers-Blackwell 2004). An important factor for team leaders themselves that may drive behaviour and self-perceptions thereof is their own motivation (Chan and Drasgow 2001;Gagné and Deci 2005). To capture the role of social interactions between leaders and followers, we turn to managerial routines that are not directly related to leadership but offer opportunities for both sides to participate in managing the work unit (Levitt and March 1988;Schedler and Proeller 2010). At the follower level, we focus on team members' personality, which is a stable trait known to be influential in predicting workplace attitudes and perceptions (Costa and MacCrae 1992;Tett and Guterman 2000). In what follows, we explain all theoretical linkages in greater detail.

Supervisor-and leader-related factors driving incongruence
A first factor that can explain differences in self-other disagreement are the signals leaders receive from their direct supervisors ('higher-tier supervisors'). Most managers also report to higher-tier superiors (department or division heads) themselves. We argue that studying the relationship between these managers (team leaders) and their supervisor helps to better understand leaders' behaviours and their self-perceptions thereof. In particular, we believe that supervisors' interest in managers' leadership generates normative pressure to care about leadership and to seek ways for improvement. Furthermore, interested supervisors may also provide important insights and feedback that will help managers to reflect on their own practices more objectively. The normative pressure generated by a supervisors' interest can be explained by role theory (Merton 1957). By being a team leader, managers fulfil a social role, which is associated with certain expectations the managers themselves, as well as others, have about how a team leader should behave (Shivers-Blackwell 2004, 43). These role expectations are directly or indirectly communicated to the team leaders (Kahn et al. 1964) and have a significant impact on how leaders behave. In addition, higher-tier supervisors' expectations are especially influential because their role expectations can be transformed into role pressure. The more intense the role pressure perceived by a person, the more likely they will succumb to the resulting expectations. We assume that an interested supervisor is more likely to communicate their role expectations and createimplicitly or explicitlypressure on managers to take on an active leadership role.
A second mechanism through which higher-tier supervisors affect managers' leadership perceptions is external feedback. Multisource feedback which includes feedback from 'above' has been found to affect leaders' performance and yield more accurate self-assessments (Atkins and Wood 2002;Atwater, Roush, and Fischthal 1995;Day et al. 2014). If a supervisor is interested in how their subordinate managers lead, and if the supervisor addresses the issue of leadership with them, the subordinate managers gain a broader understanding of how their behaviour is perceived by others. Because these managers receive more feedback on their leadership behaviour, they are able to anchor their self-perceptions and better put their leadership practices into perspective, resulting in less perception incongruence. A second variable that may affect managers' self-assessments is their 'motivation to lead,' which is their motivation to assume leadership responsibility (Chan and Drasgow 2001). Such motivation consists of three dimensions that tap into different motives: affective-identity (taking on leadership responsibility due to selfinclination and preference), social-normative (taking on leadership responsibility due to compliance with social norms), and non-calculative motivation to lead (taking on leadership responsibility without calculating costs and benefits). Research on the impact of managers' motivation to lead has increased over the last 15 years. Such work showed that a high motivation to lead has a positive impact on leadership emergence, emotional intelligence, leadership behaviour, and leadership effectiveness (e.g. Hong, Catano, and Liao 2011;Van Iddekinge, Ferris, and Heffner 2009;Vogel 2016).
Our model accounts for the affective-identity dimension of managers' motivation to lead and its impact on their self-assessments. Leaders with a high affective-identity motivation find intrinsic enjoyment in leading other people (Chan and Drasgow 2001, 482), as managers do not aspire to become leaders because of external rewards or expectations. We assume that managers who deeply care about leadership will also put more effort into leadership-related tasks (Vogel 2016) and are more interested in getting feedback from their team members. Since they have an intrinsic motivation to lead, they might be more responsive to the feedback given by their followers and, therefore, have a more realistic understanding of their employees' demands and perceptions. Hence, hypothesis 2 states that leaders with a high affective-identity motivation to lead show more congruent self-assessments than others.
H 2 : A manager's affective-identity motivation to lead will decrease the gap in leadership perceptions.
In addition to feedback from 'above' (higher-tier supervisors) and managers' motivation to lead, we will now take a closer look at the impact of broader organizational learning routines that are not necessarily linked to leadership behaviour but still offer opportunities for leaders and followers to interact and exchange as well as discuss ideas in a less hierarchical setting. The learning literature emphasizes how organizations can benefit from individual experiences through the creation of learning routines and, more specifically, forums in which individuals from different hierarchical levels and departments within the organization share, discuss, and make sense of more and less successful strategies of problem-solving (Levitt and March 1988;Piening 2013). Along these lines, public management research has documented that it is often not enough to adopt sophisticated management systems, but that there is a need to establish learning routines which allow people to regularly reflect on information from these tools to facilitate its purposeful use (Moynihan 2008).
Sophisticated management practices, which are associated with the 'modern' paradigm of public management, include management by objectives, process optimization, quality management, and performance management (Schedler and Proeller 2010). We call these practices 'managerial reflection routines' as they are supposed to help managers systematically analyse and improve work processes and results as well as provide decision-makers with a better information base. We consider managers to be users of managerial reflection routines if they regularly optimize processes, discuss and define goals with their team, reflect upon work outcomes with their team, adapt team practices to client expectations, and host regular events where they proactively discuss how to improve the team's performance.
Side-effects of such learning routines are that they establish channels for two-way communication and increase opportunities to exercise and experience leadership. Rothstein (1990) argues that observers who frequently interact with leaders have more detailed insights into the leaders' work and therefore report ratings that are less divergent from leaders' self-assessments (Lee and Carpenter 2018). Hence, we argue that reflection routines, even when built around management practices rather than specific leadership behaviours, may also lead to more self-reflection that, in turn, should decrease the perception gap between leaders and employees.
H 3 : The use of managerial reflection routines will decrease the gap in leadership perceptions.

Follower-related personality factors driving incongruence
While rater incongruences in leadership assessments can be a function of managers' misperceptions, they can also be attributed to follower-related factors (e.g. Hansbrough, Lord, and Schyns 2015; Felfe and Heinitz 2010). Since little is known about the impact of such factors (Fleenor et al. 2010), we focus on employees' personality, which is one of the most basic characteristics that may help explain differences in other-ratings. We employ the five-factor model of personality ('big five'), which proposes that most variation in personality can be accounted for by the following five robust factors: openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism (Costa and MacCrae 1992). Although personality has been found to significantly affect self-ratings (Judge, LePine, and Rich 2006;Bell and Arthur 2008), we know much less about its impact on otherratings, suggesting that this relationship requires further exploration (Fleenor et al. 2010). Trait activation theory (Tett and Guterman 2000) can help to understand this relationship by explaining why certain personality characteristics influence subordinates' leadership ratings. The theory states that humans react in specific situations to cues they find in this situation, which then activate a certain trait (Lievens et al. 2006). We argue that such cues, related to the context of evaluating a supervisor, activate some personality traits more than others. In particular, we think that agreeableness and conscientiousness affect other-ratings, while we do not see sufficient empirical evidence to propose hypotheses for the other three personality traits.
People who are highly agreeable are helpful, trusting, and empathetic (Costa and MacCrae 1992), and they are predisposed to view others positively (Hansbrough, Lord, and Schyns 2015, 222). In a situation in which agreeable persons have to assess others, this trait is activated, which is why their ratings of others tend to be less critical and more positive. In a study of students who rated each other's performance, Bernardin, Cooke, and Villanova (2000) find empirical evidence for such a leniency bias. A similar effect was found by Bartels and Doverspike (1997) when conducting an assessment centre and by Cheng, Hui, and Cascio (2017) regarding the rating of employees in a real estate agency in which agreeable raters gave elevated ratings. All these findings are underlined by a meta-analysis of 28 studies conducted by Harari, Rudolph, and Laginess (2015) that finds the same leniency effect of raters' agreeableness on their performance ratings. Since agreeable employees tend to provide less critical leadership ratings, we expect their assessments to be closer to leaders' selfratings.
H 4 : A follower's agreeableness will decrease the gap in leadership perceptions.
The second personality trait that may be activated due to the process of assessing a supervisor's leadership behaviour is subordinates' conscientiousness. Conscientious people tend to think carefully before they act, and they put much attention to details (Costa and MacCrae 1992;Hansbrough, Lord, and Schyns 2015). They assess others more carefully and systematically and base their judgment on precise observations instead of categorization-based cognitive short-cuts (Hansbrough, Lord, and Schyns 2015). This accuracy hypothesis is also supported by Tziner, Murphy, and Cleveland (2005, 94) who argue that 'highly conscientious raters may be less easily swayed by the rating context than their less conscientious peers. ' Bernardin, Cooke, and Villanova (2000) support this notion by showing that conscientious students rate their peers' performance more accurately.
Despite being thorough and accurate in their ratings, research has also shown that conscientious people may provide less straightforward assessments in certain situations (Cheng, Hui, and Cascio 2017). According to Bernardin, Tyler, and Villanova (2009) and Bernardin et al. (2016), conscientious people do not want to make their peers feel bad, which is why at times they may sacrifice accuracy for benevolence. However, in real-life settings, conscientious raters tend to be more accurate. Harari, Rudolph, and Laginess (2015) find evidence for such a pattern in their meta-analysis in which they compare rater effects across field and laboratory settings. Since our study is set up in a field environmentsubordinates rate their real-life supervisors regarding experienced leadership behaviourwe expect conscientious team members to provide thoughtful and accurate assessments of their superiors, which are unlikely to be swayed by personal biases and thus likely to differ from leaders' self-assessments.
H 5 : A follower's conscientiousness will increase the gap in leadership perceptions.

Research design
We use data from two different sources (leader and employee surveys) and operationalize rater incongruence, our dependent variable, as the difference between leaders' selfassessment and follower assessment of leadership behaviour. In addition to the variables we theorized about in the section above as potential predictors of self-other disagreement, we also control for confounding factors which were identified in prior research (see Fleenor et al. 2010). These include gender, age, job satisfaction, leadership training, and experience as well as the full set of the 'big five' personality traits.
To account for the fact that drivers of perception incongruence are measured at different levels (leader-related versus follower-related variables) and that employees are nested in teams that work under the same leader (increasing the possibility of correlated errors), we use hierarchical linear modelling. The following sections explain the data, measurements, and limitations in greater detail.

Data
The data for our analysis were collected from three German public organizations between May and October 2014. We aimed for selecting organizations, which directly provide specific public services to citizens or businesses and at the same time vary in tasks, size, and jurisdiction (state versus local level). 1 The idea behind creating variation in these selection variables was to make sure that potential findings hold across contexts and ensure their external validity. Our data collection strategy resembles the cluster sampling approach: only a few organizations were targeted, but a full census of all employees was conducted. 2 In the selected organizations we addressed all 'street-level managers' (i.e. line managers who do not lead other managers) and all their followers. The authors sent online-based questionnaires including all items to measure managers' leadership behaviour as well as the independent variables to 112 managers and 1,364 followers. The response rates were 57.1 per cent for managers and 34.0 per cent for followers. Based on this response, 232 manager-follower dyads were created. A nonresponse bias analysis comparing early respondents with late respondents (Kypri, Stephenson, and Langley 2004;Groves 2006) did not reveal any indication of nonresponse bias.
To be able to group responses from the same team, participants typed in a teamspecific code at the beginning of the questionnaire (Vogel 2018). Participation in the survey was voluntary, it was assured that the raw data will not be shared with the organizations, and all participants will remain anonymous.

Measuring incongruence in leadership ratings
Leadership behaviour was assessed by using items from the Managerial Practices Survey (MPS, version G 17-4, see Table A1) (Yukl 2012). The MPS was designed to capture the 15 leadership behaviours set out in the taxonomy of effective leadership behaviour (Yukl 2012). The behaviours are grouped into four dimensions: task-oriented, relationsoriented, change-oriented and external leadership behaviours. We used one item from the MPS for each leadership behaviour (which makes 3-4 items per dimension) and asked the leaders to assess their own behaviour based on these items. In addition, followers were asked to assess their leader's behaviour using the same items. 3 We employed a second-order confirmatory factor analysis (CFA) to create one factor score based on the four leadership dimensions and 15 leadership behaviours. In line with Yukl's (2012) conceptualization, we labelled this factor score 'effective leadership behaviour' (see Figure A1). The model fit indices indicate a good fit (Chi2 (86) = 179.21, p < .001, CFI = .96, TLI = .96, RMSEA = .047, p(RMSEA ≤ .047) = .73, SRMR = .026), all paths are significant at p < .01, and the standardized factor loadings range between .71 and .96. The full result of the CFA is displayed in Figure A1.
Based on the results of the CFA, we operationalized our dependent variable: incongruence in leadership ratings. This incongruence is indicated by the absolute difference between a leader's self-perception of their leadership behaviour and the perception of each of their followers. In contrast to other studies on self-other agreement in leadership (e.g. Atwater et al. 2009;Braddy et al. 2014), we kept the perception of every follower as a single observation and decided not to aggregate followers' perceptions at the team level. If we aggregated followers' perceptions to a group mean, it would be impossible to fully take advantage of the nested structure of the data set and to consider follower characteristics as factors explaining divergent ratings. While our decision to keep followers' perceptions separate and calculate a single incongruence value for every dyad in the data set was mainly a conceptual one, we also employed several statistical measures to examine the appropriateness of data aggregation. The findings are mixed and justify our approach: r WG(j) = .86 (based on 95 per cent confidence intervals this number should be .92 or higher to be significant), ICC(1) = .44, and ICC(2) = .75.

Measuring independent and control variables
The independent variables have been operationalized at two levels: the leader-related variables by surveying the leaders and the follower-related variables by asking the followers. All itemsif not otherwise indicatedhave been measured using a fivepoint Likert scale (1 = do not agree at all; 5 = totally agree). The wording of all the items can be found in Table A1. The measures for our controls (leadership experience, leadership training, gender, and age) are straightforward. We capture supervisor interest with two items and managers' affective motivation to lead as well as job satisfaction with one item (Chan and Drasgow 2001;Van de Ven and Ferry 1980;Hackman and Oldham 1974).
Our measures of reflection routines pick up on 'modern' managerial practices, including process analysis, goal setting, and performance management (outcome and customer orientation as well as performance improvement efforts). However, all items do not only measure the existence of these practices but also whether they have been regularly used to involve managers and team members to discuss and reflect on the insights generated by them. Since these items are somewhat prone to social desirability bias, we asked the team members to assess them, instead of the team leaders. All reflection routines show high internal consistency (α = .93) and, according to principal factor analysis, load on a single factor.
We measure followers' personality employing the German 10-item short version of the big five inventory (Rammstedt and John 2007). This questionnaire uses two items for each of the five personality factors. In addition, we followed the advice of Rammstedt and John (2007) and included an additional item for agreeableness. A Promax rotated solution of a principal factor analysis shows that the 11 items load on the five factors they were theoretically associated with (see Table A1).

Limitations
Some limitations of our study are worth noting before we get to the results and their discussion. Although we have been able to reduce common method bias by using data from different sources (managers and followers) and constructing the dependent variable based on data from both sources, the study is prone to the common weaknesses of cross-sectional survey research, especially the limited ability to make causal claims. Another limitation is the drop in observations because we can only include dyads for which we can match followers and managers, resulting in a loss of information for all individuals without an appropriate match.
Our dependent variable does not discriminate between managers' over and underestimation of their leadership behaviour. This is because the opposite of an incongruent assessment is a congruent assessment, no matter whether the incongruence has a positive or a negative sign. More normatively, we believe that any misperception is something organizations and managers themselves should avoid. To test whether the statistical results are sensitive to our coding decisions, we reran our models and this time only included those dyads where supervisors overestimated their own leadership behaviour. However, the results we present in this article remain the same and are unaffected by these different choices. We also want to emphasize that it is not the purpose of the study to determine whose assessment (leader versus follower) is more accurate. Rather, our point is that divergences (no matter what their origin is) are problematic and will likely result in organizational dysfunctions. Table 1 provides an overview of the descriptives of the used variables and the correlations among them. We can see that the leadership rating incongruence ranges between 0.03 and 3.12, with the mean being 0.87 and the standard deviation 0.70. One-quarter of the observations show incongruences of 0.31 or less and one-quarter of more than 1.20 (see Figure A2). 148 of 190 observations (78 per cent) are over-estimations, which means the leaders themselves reported higher scores than their followers. The theoretical maximum of the incongruence is 3.58, and it is noteworthy that the empirically observed maximum of 3.12 is relatively close to the theoretical maximum, suggesting serious differences between ratings of leaders and followers.

Results
To assess what drives incongruent leadership ratings, we use hierarchical linear modelling. Each leader-follower dyad was treated as a single observation in the analysis. Biased estimators were avoided by considering the hierarchical structure of the data. This means that the model accounts for the fact that some leaders are considered multiple times if more than one follower responded. Therefore, the analysis considers two different observational levels. Level 1 consists of the followers' responses, whereas level 2 is based on the responses by the leaders. In addition, we use agency-level fixed effects to control for organizational differences.
In the analysis, a random intercept model with level 1 and 2 predictors is used (Garson 2013). That is, we assume that the effects for different leaders have different intercepts for each of the leaders. The result of the hierarchical linear model is displayed in Table 2. When comparing the full model (2) to the empty baseline model (1), several indicators suggest a good fit: The significance of the deviance test for the likelihood ratios indicates that the full model explains significantly more variance in the dependent variable than the baseline model. In fact, the full model reduces the unexplained variance by 41 per cent (R 2 , S&B). As expected, the Akaike Information Criterion (AIC) is lower for the full model compared to the baseline, while this is not the case for the Bayesian Information Criterion (BIC).  Table 2 first displays the effects of the variables measured at the leader level (level 2), while the follower level (level 1) variables are shown in the second part of the table. The first hypothesis does not find support: higher-tier supervisors' interest in the leadership behaviour of their subordinated managers (hypothesis 1) does not reduce the leadership perception gap. In contrast, managers who enjoy leading others and find pleasure in taking on leadership responsibilities show significantly lesser differences between their selfperception and the perception of their followers (b = −0.16, SE = 0.087, p = .067). Hypothesis 3which states that leaders' use of managerial reflection routines will reduce rater incongruencecan be confirmed (b = −0.29, SE = 0.072, p < .001).
With regard to team members' personality, we find evidence for the hypothesized effect of agreeableness (hypothesis 4). Followers who are more agreeable show less of a perception gap (b = −0.18, SE = 0.083, p = .027). In contrast with our expectations formulated in hypothesis 5, conscientiousness has no significant effect on rater incongruence. Interestingly, however, we see that followers who are more open to new experiences have reported significantly higher levels of self-other disagreement in leadership ratings (b = 0.22, SE = 0.092, p = .019). As expected, extraversion and neuroticism are not associated with rating incongruences.
Among the control variables, job satisfaction, gender, and age show significant coefficients. The gender effect seems to be the most interesting one, as it suggests that female leaders are less likely to have incongruent leadership ratings. Note: Agency-level fixed effects included but not reported; robust standard errors (Huber-White sandwich estimator); estimator: maximum likelihood; a indicates the significance of the deviance test; * p < 0.1, ** p < 0.05, *** p < 0.01. Some of our findings confirm intuition, others are unexpected, but all of them tend to be novel. That is, the literature on the factors we theorize about is sparse, which is why our hypothesis tests make a theoretical contribution, particularly since they are based on a robust multilevel analysis. Managers' motivation to lead is significant despite the fact that we control for two skill-related variables (training and experience); reflection matters even if centred on management routines instead of leadership behaviour; and agreeableness and openness show effects, but conscientiousness does not. We also find it noteworthy that our study provides evidence for the significance of the leadership perception gap in public organizations since the standardized mean difference between self-and other-ratings in our public-sector sample is 0.7, while it is close to zero in Lee and Carpenter's (2018) sector-spanning meta-analysis. The relevance and meaning of the hypothesized and unexpected findings require more explanation, which we offer in the following section.

Discussion
In contrast with what was hypothesized, we do not find evidence that managers provide more congruent self-assessments if their supervisor shows great interest in their leadership behaviour. Although prior research documents that feedback generally reduces bias in self-assessments (e.g. Atwater, Roush, and Fischthal 1995), it is possible that feedback 'from below' (employees) is more important than 'from above' (higher-tier supervisors). Our empirical set-up may also constitute too demanding a test of the feedback-from-above hypothesis because we tested whether supervisor feedback reduces the gap in assessments between managers and employees (a 180degree measure of rater incongruence), without being able to include actual assessments by higher-tier supervisors (which would require a 360-degree measure). It is also possible that supervisor interest affects how managers behave, but in ways that do not create more self-awareness. As explained in our theory section, such interest can be experienced as 'role pressure' on the part of the managers, who may respond to high expectations from their supervisors by reporting inflated scores of their leadership behaviour, which would eventually increase the gap between self-and other-assessments (Shivers-Blackwell 2004).
The analysis, however, suggests that managers with a greater motivation to lead provide more accurate self-assessments of their leadership behaviour. This is particularly interesting, considering that scholars and practitioners alike have devoted more attention to 'skill' (the ability to lead) as opposed to 'will' (the desire to lead) when discussing leadership development efforts (e.g. Ingraham and Getha-Taylor 2004). Most developmental programs and selection processes focus on what abilities managers have, techniques they use to lead, or how they act in certain situationssuccinctly their leadership 'skills' (Seidle, Fernandez, and Perry 2016). Our findings, however, imply that there is another driver of managers' behaviour that should be assessed, developed, and cultivated: managers' urge to lead and take on responsibilitytheir 'will' (see also Colquitt, LePine, and Noe 2000). While we suggest that the intrinsic motivation to lead needs to play a bigger role when selecting and promoting managers, this does not imply that organizations should focus their developmental efforts only on the most motivated managers (in fact, the most struggling managers may still need the most training) (Aguinis and Kraiger 2009). 4 Our article also tests an extension of the generally accepted finding that feedback will reduce bias in leadership assessments (Bailey and Fletcher 2002;Day et al. 2014). While most research studied the impact of direct feedback about leadership behaviour, we examined the role of broader reflection routines, built around 'modern' management practices that involve managers as well as employees. Our expectation was that, although these reflection routines are not primarily concerned with leadership behaviour but with organizational learning and improvement, such routines create additional opportunities for managers and employees to interact and exercise as well as experience leadership (Levitt and March 1988 ;Piening 2013) and hence reduce the leadership perception gap. We find support for this hypothesis and see that the use of such reflection routines makes leadership assessments of managers and employees more congruent. For scholars interested in leadership, our findings suggest it is not only the leadership-centred dialogue with employees that fosters selfawareness among managers, but that several types of leader-follower exchanges can have such an effect (see also Rothstein 1990). For management scholars, we can conclude that managerial practices related to instruments like process analysis or performance management can also have more or less intended impacts on leadership, and employees' perception thereof.
Although we did not hypothesize an effect of managers' gender on rater incongruence, it is worth discussing that we see such an effect in our analysis. We find that self-other disagreement is significantly lower if the assessed leader is female. Surprisingly, there is only limited research on gender differences in selfother assessments of leadership and therefore limited theorizing about such an effect (Fleenor et al. 2010). Ostroff, Atwater, and Feinberg (2004) suggest that women might be less likely to overestimate their own leadership behaviour, possibly because the difficulties they experienced when rising the ranks made them also more self-critical. Another explanation assumes that women are more likely to seek feedback and act on feedback by changing their behaviour and selfassessment (Roberts and Nolen-Hoeksema 1989), leading to more accurate selfassessments by female managers.
Our study is among the first to examine whether team members' personality affects the difference between leadership ratings by these members and self-ratings by their supervisors. We find that personality has some impact, although one hypothesis could not be confirmed. Employees, who are more agreeable, are more likely to provide lenient ratings similar to leaders' self-assessments. Employee openness leads to more incongruent ratingsa finding that deserves further explanation. One possibility is to account for the specifics of our sample. As reported above, for most leader-follower dyads an incongruent rating means that followers assigned leaders lower scores than the latter did in their selfassessment. With regard to openness, this could mean that employees who score high on this variable are also more critical of their leaders and expect them to be equally open-minded towards ideas, feelings, and new insights, resulting in lower leadership ratings and more incongruence. At the same time, we suggest caution when interpreting this finding. While we have statistical confidence in the reported coefficient, we will need additional research to sort out the exact causal mechanism behind the finding.

Conclusion
This article examined differences in leadership perceptions between managers and employees. We put forward an integrative model (e.g. Wofford 1982;Avolio 2007) that understands leadership (and the perceptions thereof) as being constructed through the social interactions among actors at different levels of the organization. Hence, we argue that incongruent ratings are likely to be a function of factors located at different levels, including higher-tier supervisor's expectations; team leaders' motivation; managerial practices through which leaders and team members interact; and team members' personality. Using two surveys, administered in three German public organizations to collect data from 190 leaderfollower dyads, we made the following observations.
First, managers' self-assessments of leadership become more aligned with assessments by their subordinates, the greater the managers' intrinsic motivation to lead is. This motivational effect was more relevant than their leadership experience, training, or feedback from their direct supervisors. Second, the establishment of managerial reflection routines reduces incongruence in leadership assessments. Although such routines were built around organizational learning and improvement efforts, we found 'spillover effects,' in that they offered additional opportunities for leaders and followers to interact, thereby reducing incongruence in leadership assessments. Third, we studied the extent to which followers' personality affects how they experience and rate their managers' leadership behaviour. While three out of five personality traits did not account for variation in rating differences, we found that agreeableness reduces rater disagreement whereas openness to experience reinforces it.
What can this article contribute to the study of leadership in public organizations? Understanding leadership behaviour is crucial, but often perceptions of behaviour are even more important than the behaviour itself. Variation in leadership perceptions can lead to differential organizational responses and consequences, including in-and out-group membership, the creation and deterioration of social capital, or absenteeism and job involvement. One main objective of this article was to direct attention to the study of selfother agreement in leadership research, a blind spot in the public management literature.
A second contribution is to provide more empirical insights into managers' selfand other-awareness. Incongruences in leader-follower ratings have been associated with little self-awareness, and we find indeed that self-awareness is driven by leadership-related 'will' rather than 'skill' factors, and that intrinsic drive matters more than external pressure. However, we also document that follower-related factors account for at least part of the rater incongruence. We, therefore, suggest that managers, in order to increase their odds to be successful, do not only have to be self-aware but also 'other-aware.' Put differently, if employees perceive the same leadership behaviour in different ways, then leaders may need to adjust behaviour from individual to individual, account for variation in leadership preferences, orat a minimumbe aware that employees will respond differently to the same stimulus.
We see ample opportunities for further research. On the one hand, we acknowledge the need for private-public comparisons and replications of generic leadership studies on rater incongruence to better understand the extent to which we can expect convergent or divergent findings in public organizations. On the other hand, we suggest conducting research that involves variables unique to the public sector setting. For example, we could examine self-and other-ratings among political appointees and managers at different echelons, specifically, as we know that the accuracy of self-assessments varies across hierarchical positions (Gentry, Cullen, and Altman 2016;Sala 2003). Additionally, examining leadership ratings in the presence of high goal ambiguity and the absence of strong financial rewards could provide important insights into self-other disagreement issues in the public sector. Further, research is needed that looks more closely at the negative consequences of incongruent leadership ratings and, hence, treats this phenomenon as the independent rather than the dependent variable.
We are in need of more research that simultaneously accounts for managers' as well as team members' personality and tests whether incongruent ratings are less likely to occur if managers and followers share similar personality profiles. We believe studying personality traits is important because these findings can provide important insights into the composition of teams and the adaptation of leadership styles when teams are dominated by certain personalities. Of course, our study should only be considered as one of the first steps in this direction, and we will need more research on the matter before being able to draw definitive conclusions.

Notes
1. Due to an anonymity agreement, we are not at liberty to disclose any additional information about the three organizations. 2. In one of the organizations, it was not possible to include all teams. In this case, the sample consists of one complete department plus one sub-unit of each of the five other departments. 3. We focus on the taxonomy of effective leadership behaviour and the Managerial Practices Survey instead of other common operationalizations of leadershiplike transformational leadershipbecause we wanted to capture a wide range of leadership behaviours. As the study of self-other agreement of leadership ratings is only at its beginning in the public sector, we belief it is important to gain a broad overview before focusing on more detailed aspects of leadership. This is in line with the development in empirical leadership research more broadly, where the distinction between task-oriented and relations-oriented leadership behaviours has only been emphasized relatively recently (Judge, Piccolo, and Ilies 2004;Humphrey 2002;Lee and Carpenter 2018). 4. Managers' motivation to lead should complement other selection criteria like merit or diversity, rather than replace them. Psychometric instruments to assess such a motivation could be used developmentally to increase self-awareness and help future leaders align interests, needs, and aspirations. They could also be employed to identify cases in which change is needed to make leadership roles more enjoyable, meaningful, or impactful.

Disclosure statement
No potential conflict of interest was reported by the authors.

Notes on contributors
Dominik Vogel (dominik.vogel-2@uni-hamburg.de) is assistant professor of business administration, especially public management at the University of Hamburg (Germany). His research is focused on leadership in the public sector, motivation of public employees, interaction of citizens and administration, and behavioural public administration.
Alexander Kroll (akroll@fiu.edu) is assistant professor in the Steven J. Green School of International and Public Affairs at Florida International University. He studies the management of government organizations, the use of performance systems, and the role of organizational behaviour in improving public services. He has received awards from the American Society for Public Administration and the Academy of Management.

Operationalization
Leadership (Yukl, Wall, and Lepsinger 1990) The Managerial Practices Survey (MPS) is not freely accessible. However, the wording of the items is quite close to a specific aspect of the definition of the corresponding behaviours (Yukl 2012, 84-85   Managerial reflection routines (α = .93) • My team and I regularly optimize our processes in order to minimize process duration and streamline interactions with other units.
• My team and I regularly discuss our goals and define new goals.
• My team and I regularly reflect on our work outcomes.
• My team and I regularly reflect on our clients' expectations of our work and adapt practices accordingly.
• My team and I have regular events, where we proactively discuss how to improve our performance.
(1 = do not agree at all; 5 = totally agree) I see myself as someone who  Figure A1. Confirmatory factor analysis. Source: Authors. Available at https://doi.org/10.6084/m9.figshare. 7624028 under a CC-BY 4.0 license. Note: Confirmatory factor analysis of the leadership construct (standardized values are displayed). All paths are significant at p < .01. Estimation is based on maximum likelihood with clustered robust standard errors.