We developed an experimental framework using Big Five personality surveys and uncovered a previously undetected social desirability bias in a wide range of LLMs. Specifically, LLMs report personality scores skewed towards the desirable ends of trait dimensions (i.e., more extroverted, less neurotic, etc). LLMs exhibited biases when they were told they were being assessed or could infer it from being presented with 5 or more survey questions at once. The average changes across Big Five scores were largest for GPT-4 and were comparable to 1.20 human standard deviations, a very large effect. Bias levels were more pronounced for models that were larger and had more reinforcement learning from human feedback (RLHF). Randomization of question order and paraphrasing did not reduce this bias; however, when all survey items were reverse-coded, bias levels decreased. Our findings demonstrate that emergent biases can be uncovered by subjecting LLMs to psychometric tests, and that caution is needed when using LLMs as proxies for human survey participants.