PSYKOSE: A Motor Activity Database of Patients with Schizophrenia

Using sensor data from devices such as smart-watches or mobile phones is very popular in both computer science and medical research. Such movement data can predict certain health states or performance outcomes. However, in order to increase reliability and replication of the research it is important to share data and results openly. In medicine, this is often difficult due to legal restrictions or to the fact that data collected from clinical trials is seen as very valuable and something that should be kept "in-house". In this paper, we therefore present PSYKOSE, a publicly shared dataset consisting of motor activity data collected from body sensors. The dataset contains data collected from patients with schizophrenia. Schizophrenia is a severe mental disorder characterized by psychotic symptoms like hallucinations and delusions, as well as symptoms of cognitive dysfunction and diminished motivation. In total, we have data from 22 patients with schizophrenia and 32 healthy control persons. For each person in the dataset, we provide sensor data collected over several days in a row. In addition to the sensor data, we also provide some demographic data and medical assessments during the observation period. The patients were assessed by medical experts from Haukeland University hospital. In addition to the data, we provide a baseline analysis and possible use-cases of the dataset.


I. INTRODUCTION
Objective physiological parameters collected from sensors and analyzed by machine learning techniques have gained considerable interest as a tool to support the existing subjective diagnostic practice within mental health [1]. To perform reliable and reproducible research with such data it is important to share both data and results openly. In the medical field, sharing data is often problematic due to various privacy policies. We have previously shared the DEPRESJON dataset [2], containing motor activity data collected from bipolar and unipolar patients. In this paper, we present our second openly shared anonymized dataset on motor activity, containing actigraph data collected from patients with schizophrenia. The Norwegian Regional Medical Research Ethics Committee West approved the original protocol for the study collecting the data for both datasets, and all processes were in accordance with the Helsinki Declaration of 1975 [3].
Actigraphy is a non-invasive method of monitoring human rest and activity cycles, and is normally recorded with a wristworn device that registers gravitational acceleration units [3]. Data from actigraphs have been applied to studies of sleep [4] and psychiatric diagnosis like bipolar disorder [5] and ADHD [6], and in some extent in the investigation of Schizophrenia. Schizophrenia is characterized by "positive" symptoms like hallucinations and delusions, "negative" symptoms like diminished motivation and cognitive symptoms like slower mental processing [7]. A recent systematic review summarised motor activity studies of schizophrenia, all applying traditional statistical analysis [8]. Overall, patients with schizophrenia are associated with lower motor activity levels as well as repetitious and rigid patterns of behavior when compared to healthy controls. Motor activity also reflects the symptomatic state. Increasing positive symptoms correlates with augmented complexity in activity patterns and increased sleep disturbance. Increased negative symptoms associates with overall reduced activity and amplified nighttime sleep disturbance [8].
The circadian system, an internal self-regulating clock, regulates the diurnal oscillating cycles of nighttime sleep and daytime activity [9]. Integrated and interlocked in the circadian clock are various ultradian rhythms of shorter duration regulating patterns like rest/activity cycles, feeding habits, and hormone levels. Time series of motor activity is an articulation of this recurring complex clock system in interaction with daily social rhythms [10]. Disturbed sleep patterns and lurched rest/active cycles are characterizing symptoms of schizophrenia [7].
An alternative method to detect and classify schizophrenia is electroencephalography (EEG) measuring electrical activity in the brain [11]. Machine learning appears promising in differentiating between schizophrenic patients and healthy controls in such data [12]. Still, data collected with electrodes placed on the scalp seems like a substantially more cumbersome and demanding process than a simple wrist-worn actigraph registering motor activity.
The aim of this paper is to provide a comprehensive dataset of motor activity of patients with schizophrenia and make it publicly available. More, to enable additional investigation by sharing the dataset and ideas for further research. The main contributions of this paper are: 1) A new publicly available dataset containing sensor and demographic data of a substantial number of patients with schizophrenia.
2) The dataset contains additionally sensor data from a large number of healthy control persons. 3) Baseline experiments that can be used by other researchers to compare their results. Classifying schizophrenic versus non-schizophrenic patterns, including recommendations for evaluation metrics. In the following, we describe the diagnosis of Schizophrenia (Section II), how the data was collected and the attributes of the data itself (Section III). Section IV lists some of the potential applications of this dataset. Section V presents some suggested evaluation metrics. This is followed by an experiment section containing the baseline experiments (Section VI). In Section VII we also discuss possible future research questions using the dataset and give a conclusion.

II. MEDICAL BACKGROUND
Schizophrenia is a severe mental disorder that affects approximately one percent of the global population. Symptoms of schizophrenia begin in early adulthood, and the debut age is younger for males than females. The disorder tends to be chronic and relapsing, however with a highly variable disease burden and degree of disability between individuals. A range of different symptoms, including "positive" symptoms like hallucinations, delusions or psycho-motoric agitation, "negative" symptoms like impaired affective experience or expression and diminished motivation, and cognitive symptoms like problems with focus or paying attention and problem solving may occur [7], [13]. The main treatment of schizophrenia is antipsychotic medication, both for acute psychotic episodes and for relapse prevention. The therapeutic effects of, and side effects related to, antipsychotics vary substantially among individuals [7]. Antipsychotics target the dopamine system, and the antidopaminergic effect may influence motor activity through side effects such as extrapyramidal syndrome and akathisia [14]. Akathisia is characterized by subjective and objective psycho-motoric restlessness [15]. These distressing side effects are an important factor for patients quitting their prescribed antipsychotic medications [16]. Akathisia investigated in motor activity studies appears therefore as a relevant and important topic for future research. Unfortunately, this is not possible in the present dataset. In retrospect, we have identified several patient characteristics that would have been beneficial to studies like this, however requiring a larger sample size. Variables like previous antipsychotic use, duration of use, type of antipsychotic including dosage and alteration of dosage, serum concentration of antipsychotics to verify intake, patient status (inpatient/outpatient), duration of untreated psychosis, alcohol consumption and substance use may all be valuable in further motor activity studies in schizophrenia [15], [17].

III. DATASET DETAILS
Motor activity was collected with a wrist-worn actigraph device (Actiwatch, Cambridge Neurotechnology Ltd, England, model AW4) entailing a piezoelectric accelerometer programmed to record the integration of intensity, amount and duration of movement in x, y and z axes. The sampling frequency was 32 Hz and movements over 0.05 g recorded. The output is an integer value proportional to the movement intensity for 1 minute epochs [3]. Figure 1 shows a 24 hour subset of the actigraphy data produced by the device for one of the patients. The dataset consists of actigraph data collected from 22 psychotic patients hospitalized at a long-term open psychiatric ward at Haukeland University hospital. All are diagnosed with schizophrenia, and all used antipsychotic medications. The group contained 3 females and 19 males with an average age of 46.2 ± 10.9 years (range 27 -69 years). The mean age at first time of hospitalization was 24.8 ± 9.3 (range 10 -52 years). Clinical experts diagnosed the patients using a semi-structured interview based on DSM-IV criteria [18]. 17 of the patients were recognized as paranoid schizophrenic. For the other 5 patients no subtype of schizophrenia was specified, beyond that they were non-paranoid. DSM-5, the currently used diagnostic manual do not recognize schizophrenia subtypes [19]. The present psychotic symptomatic state of the patients were rated on the Brief Psychiatric Rating Scale (BPRS), a frequently used rating scale for measuring the overall psychopathology of schizophrenic patients. BPRS consists of 18 items rated from 1 to 7, and higher sum scores indicate a more severe condition [20]. 17 of the patients were rated on the BPRS scale, mean score was 50.0 ± 8.8 (range 34 -59). Further details on the dataset are presented in previous papers analyzing the dataset with various linear and nonlinear statistical approaches [3], [21]- [23].
The dataset also contains actigraphy data from 32 healthy control persons, consisting of 23 hospital employees, 5 nursing students, and 4 healthy persons recruited from a general practitioner. None had a history of either psychotic or affective disorders. The group consists of 20 females and 12 males, with a mean age of 38.2 ± 13.0 (range 21 -66 years). The gender composition is mismatched between the groups. Nevertheless, previous studies of motor activity within mental health have not identified gender differences in activation [24].
The participants used the actigraph devices for an average of 12.7 days in the control and condition groups. See Table I for details. The battery life of the device is about 14 days, thus, it didn't need charging during the study. The total number of collected days was 687 comprising 402 days in the control group and 285 in the condition group. Note that the actigraph files might contain more days, but only the first n days were considered in our analysis where n is the number of days reported in the days.csv file. These are the days during which the study took place. Figure 2 shows a boxplot of the average activity per day for the condition and control group. Here, it can be seen that the condition group has lower activity levels compared to the control group.

A. Dataset Structure
The root folder of the dataset contains five items. Two folders, one contains the activity data for the controls and the other the data for the patients. For each patient and control, a CSV file is provided containing the actigraphy activity measurements over time. The columns in the patient and control files are timestamp (one-minute intervals), date (date of measurement), activity (activity measurement from the actigraph watch). Figure 3 shows an extract of the first 10 lines of data from patient 18. The root folder also contains a file named patients info.csv. This file contains the following columns: Number (patient identifier), gender (male or female), age (age of the patient), days (whole days the patient wore the actigraph), schtype (type of schizophrenia), bprs (BPRS sum score), cloz (did the patient use clozapine as antipsychotic medication), trad (did the patient use traditional neuroleptic or modern antipsychotic medication), moodst (did the patient use mood stabilizing medications), agehosp (age first time hospitalized).
Another file in the root folder is named days.csv. This file contains the number of days the patient and controls are in the study. It contains the columns id (identifier) and days (number of full days).
Finally, the root folder contains a file named schizophrenia features.csv. This contains the statistical features used for the baseline experiments. The file contains four columns: userid (patient identifier), class (class to predict binary), class str (class name as string), f.mean (the mean), f.sd (the standard deviation), f.propZeros (proportion of zeros).

IV. APPLICATIONS OF THE DATASET
The main goal of publishing this dataset is to encourage other researchers to use the data to improve the quality of life for mental health patients. The dataset has several application areas, of which some will be discussed in the following. Some suggested future research directions using this dataset could be: • Use machine learning for schizophrenia v.s. nonschizophrenia classification. • Analysis of circadian and ultradian cycles in schizophrenia compared to non-schizophrenia.
• Diurnal and nocturnal activity analysis of schizophrenia versus non-schizophrenia. We believe that this dataset can be useful to the machine learning community since during the last years the use of machine learning for mental health has shown promising results [1], [25], [26].
In addition, we also want to point out that this dataset can be combined with our previously published Depresjon dataset [2], to increase the number of persons and measurements for both datasets. When comparing the motor activity profiles of depressed patients, schizophrenic patients and healthy controls, the distribution and length of active and resting periods differentiate in motor activity [22]. Complexity analyzes have also identified motor activity profiles segregating the three groups [21] [27]. Therefore, by combing these two datasets, some potential applications emerge: • Use machine learning for schizophrenia, depression state classification. • Compare attributes of schizophrenia and depression patients. • Differences in diurnal/nocturnal patterns and/or the rest/activity cycles of schizophrenia versus nonschizophrenia versus depressed. In addition to these specific medical research questions, more general research questions in the field of machine learning could also be addressed using this dataset. For example, comparing different algorithms and metrics on the dataset, over and under-sampling techniques and their effect measured using the dataset, and researching and developing more advanced time-series based analysis algorithms. Examples of more advanced algorithms include those based on deep learning, such as convolutional neural networks or recurrent neural networks.

V. SUGGESTED METRICS
The evaluation of classification algorithms can be done in a variety of different ways. Sometimes, metrics that measure the same thing have different names depending on the discipline in which they are discussed. For example, recall in information retrieval is often called sensitivity in a medical context. In the following, we will present two experiments using different metrics that we recommend for this dataset. In general, there are two important things to take into account. Firstly, medical datasets are often imbalanced (one class is presented more often than another). For an imbalanced dataset like this, it is important to weigh the metrics based on the number of classes. Such weighting is specifically applicable to binary classifications. Secondly, it is good practice to present a comprehensive set of outcome metrics, beyond the frequently reported limited subset of accuracy or precision, recall, and F1-score.
All outcome metrics we recommend are calculated by using True positives ((TP) number of correctly classified patients with schizophrenia), true negatives ((TN) number of correctly classified controls), false positives ((FP) number of misclassified controls) and false negatives ((FN) number of misclassified patients with schizophrenia). The metrics used for this dataset are, False-Positive Rate, Precision, Recall/Sensitivity, Matthews Correlation Coefficient (MCC) and F1-score. In addition, we recommend using Precision-Recall-Curves (PRC) and Receiver-Operating-Characteristic-Curves (ROC). Additionally, to obtain better generalizable models, a cross-validation approach ought to be utilized. We propose either N-fold or leave-one-patient-out cross-validation.

VI. BASELINE PERFORMANCE
To provide a baseline performance and also to inspire future work, we present two baseline experiments using the dataset. The goal of both experiments is to classify patients into schizophrenia or non-schizophrenia. For all experiments, we used statistical features extracted from the activity data. The features used are standard deviation, proportion of zeros and mean. The features are calculated per full day per patient. That is, one feature vector is extracted per day and per participant. This leads to 687 data points (feature vectors) which corresponds to the total number of study days across all participants. From these, 285 are from schizophrenic patients and 402 from controls. Details about how many days per participant were collected can be found in the days.csv file of the dataset. The extracted features used for the experiments are shared with the dataset for reproducibility. Figure 4 shows a projection of the extracted features into a 2D plane using Multidimensional Scaling (MDS). It can be seen that those features, to some extent, are able to separate both groups but not perfectly, though. In the next few sections, we present baseline results using machine learning classifiers to infer each points' class (no-schizophrenia and schizophrenia).

A. Experiment 1
For the first experiment, we perform 10-fold cross-validation for the training and leave a certain amount of data out for testing (90%, 66%, 50%, 33%, and 10%). The data left out The experiments are performed using four different algorithms, namely, Logistic Regression (LR) [28], Random Forest (RF) [29], Extreme Gradient Boosting (XGB) [30] and Light Gradient Boosting (LGB) [31]. All four are commonly used for machine learning tasks. In addition, we also used ensemble to combine the four different algorithms to perform a combined classification. For all tested algorithm,s we report the average precision (from the PRC) and the area under the curve (from the ROC). For the best working one, we also present plots of the PRC and ROC. Implementations are made using Scikit-learn [32] and the packages XGBoost 1 and LightGBM 2 for the two respective algorithms. The implementation details and configurations are shared with the dataset.
Looking at table II, we can observe that all algorithms perform well with average precision and area under the curve above 0.80. Overall, the logistic regression performs best in terms of average precision and area under the curve. Figure 5 shows the precision-recall curve for the LR and 90% of the data as a testset. It is interesting to see that the performance is very good, even with a small number of training data. The random baseline threshold would be 0.41 (true positive divided by all samples). For the ROC shown in Figure 6, we can make the same observation with an area under the curve of 0.92.

B. Experiment 2
For experiment 2, we changed the evaluation of crossvalidation training and separate test set to leave one patient out cross-validation. This means we leave one patient out of the training and use that for testing. This is repeated until all patients have been assigned once to the test set. For these experiments, we used the WEKA [33] machine learning library. We are reporting the weighted average of the metrics. The tested algorithms are ZeroR (which is the majority class baseline), Random Tree (RT), Random Forest (RF), and classification via Regression (CVR). From the results in table III, we can see that all algorithms outperform the ZeroR baseline. Looking at the Matthews correlation   coefficient (MCC), we can see that Random Forest is the overall best performing classifier. The other two algorithms seem to have a problem to efficiently detect schizophrenia compared to non-schizophrenia.

C. Experiments Summary
Both sets of experiments showed promising results for using activity data to detect schizophrenia versus non-schizophrenia. However, the results are not optimal, and there is still potential for large improvements. For example, it might be better to look at the complete activity using more sophisticated methods such as recurrent neural networks.

VII. CONCLUSIONS
In this paper, we have presented a dataset containing motor activity data from patients with schizophrenia. The baseline analysis of our experimental results showed the potential for using such data to answer medical relevant research questions. We also discussed possible applications using the dataset such as schizophrenia versus non-schizophrenia classification of patients. In this respect, we hope that this dataset will encourage other researchers to both perform experiments using the data, and also to share their own insights and datasets. The PSYKOSE dataset will hopefully enable reproducible and comparable results and assist in the development of future automated systems supporting the existing subjective diagnostic practice within mental health.