Main content



Loading wiki pages...

Wiki Version:
# The Harmony Project Harmony is a tool using AI which allows you to compare items from questionnaires and identify similar content. The source code is at You can read more at There is a live demo at: ![Screenshot][1] ## Harmonising Mental Health Data Harmony is a data harmonisation project that uses Natural Language Processing to help researchers make better use of existing data from different studies by supporting them with the harmonisation of various measures and items used in different studies. ## Who worked on Harmony? Harmony is a collaboration project between the University of Ulster, University College London, the Universidade Federal de Santa Maria in Brazil, and Fast Data Science Ltd. The team at Harmony is made up of: * Bettina Moltrecht, PhD (UCL) * Dr Eoin McElroy (University of Ulster) * Dr George Ploubidis (UCL) * Dr Mauricio Scopel Hoffman (Universidade Federal de Santa Maria, Brazil) * Thomas Wood ([Fast Data Science]( ## Who to contact about Harmony? You can contact us at ## How do I cite Harmony? McElroy, E., Moltrecht, B., Ploubidis, G.B., Scopel Hoffman, M., Wood, T.A., Harmony [Computer software], Version 1.0, accessed at Ulster University (2022) ## Does Harmony store my data? If you upload a questionnaire or instrument, Harmony does not store or save it. You can read more on our [Privacy Policy page]( ## How does Harmony work? Harmony passes the text of each questionnaire item through a neural network called Sentence-BERT, in order to convert it into a vector. The similarity of two texts is then measured as the similarity between their vectors. Two identical texts have a similarity of 100% while two completely different texts have a similarity of 0%. You can read more in this [technical blog post]( and you can even [download and run Harmony’s source code]( ## How reliable is Harmony? Harmony was able to reconstruct the matches of the questionnaire harmonisation tool developed by McElroy et al in 2020 with the following AUC scores: childhood 81%, adulthood 77%. Harmony was able to match the questions of the English and Portuguese GAD-7 instruments with AUC 100%. You can read more in [this blog post]( ## What do the numbers mean? The numbers are the cosine similarity of document vectors. The cosine similarity of two vectors can range from -1 to 1 based on the angle between the two vectors being compared. We have converted these to percentages. We have also used a preprocessing stage to convert positive sentences to negative and vice-versa (e.g. I feel anxious -> I do not feel anxious). If the match between two sentences improves once this preprocessing has been applied, then the items are assigned a negative similarity. ## Does Harmony give p-values? At this time Harmony does not give p-values. But you can interpret the percentage matches like correlation coefficients. In future we hope to provide more statistical data to Harmony’s users. ## Who developed the Python code of Harmony? * Thomas Wood ([Fast Data Science]( [1]:
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.