**Models and materials from "*Structural dynamics of isolated myosin motor domains encode differences in their mechanochemical cycles*"**
In this OSF repository, we have uploaded large files that were used in our paper, "Structural dynamics of isolated myosin motor domains encode differences in their mechanochemical cycles." This includes:
1. The starting structures for each myosin isoform examined.
2. The data to fit Markov state models for each of the myosin motors examined in this work.
2. The PCA and *k*-NN models used to classify P-loop conformations.
3. The annotations of 114 myosin crystal structures that existed at time of writing.
## Data in this Repository
This repository is organized into folders, each containing a different type of data.
#### P-loop PCA
This folder contains the PCA model fit on the *MYH7* P-loop conformational ensemble, the hierarchical clustering of those P-loop conformations, and the *k*-NN model used to assign other P-loops to the states.
1. `myh7-pca-4dim-bb-only-full-pline.joblib` contains a fit instance of sklearn (version 0.21.2) Pipeline of a `PCA` object and an `InductiveClusterer` object, serialized with `joblib`'s `dump` method. Load it with `joblib.load('myh7-pca-4dim-bb-only-full-pline.joblib')`
2. `inductive_clusterer.py` contains code that glues a hierarchical clustering object together with a *k*-NN classifier. It based on this code from [sklearn]. You will need it to deserialize the model (since `InductiveClusterer` is not built in to sklearn).
#### PDB Annotations
In the paper, we analyze the statistical properties of the distribution of all solved crystal structures. The csv `pdb_annotations.csv` contains the munged data for this analysis. Its columns include information about the residue numbers each P-loop starts and stops at, the RMSD to the reference structure 4PA0, etc.
#### Whole Motor Models
For each myosin motor, we built an MSM with many thousands of states to represent its conformational landscape. To do this, we clustered the trajectories into microstates. The results of this clustering is posted here. Each subdirectory (named by myosin gene name, following the main text) contains the following:
1. Assignments file, ending with `-assignments.h5`. This file is an [enspara ragged array] assigning each frame of each trajectory in the dataset to a cluster center.
2. A center indices file, ending with `center-inds.npy`. This file is a numpy file that contains the trajectory number and frame number of the center conformation in a cluster.
3. A distances file, ending with `distances.h5`. his file is an [enspara ragged array] giving the distance between each frame and the nearest cluster center.
4. A feature centers file, ending with `feature-centers.npy`. This file is a numpy file and it contains the sidechain solvent accessible surface area for each residue in the conformation that is at the center of each cluster center. The first dimension of this array is the cluster id (matching assignments, center indices, and center structures files), and the second dimension is residue index.
5. A structure centers file, ending with `structure-centers.h5`. This file is an MDTraj HDF5 file containing the conformation of the conformation at the center of each conformation. The `i`th "frame" of the trajectory is the conformation at the center of the `i`th cluster center (to match the other files in the directory).