Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
**All Files in Project** Folder *Code* Required dependencies: 1. NumPy 2. Pandas 3. scikit-learn 4. SciPy 4. SoftImpute Files: 1. AUC_SoftImpute_CV.R: R code that initiates and runs the SoftImpute algorithm on Community AUCs. The code performs cross-validation to find the optimal regularization parameter, and then performs leave-one-out and leave 10% out predictions. 2. Regression_Models.ipynb: A Jupyter notebook that initiates and runs the linear regressions on Community AUCs. Weighting and cross-validation to find optimal hyperparameters included. 3. LowRankRegressor.ipynb: A Jupyter notebook that initiates and runs the Low Ranl Regressor on Community AUCs. Weighting and cross-validation to find optimal hyperparameters included. 4. Folder rank1model: Source code for the Low Rank Regressor. Navigate inside this folder and run 'pip install .' to install. After installation, the package can be used in the LowRankRegressor.ipynb Folder *Data* Files: 1. Strain_Taxonomy.xlsx: an excel spreadsheet containing the taxonomy and full 16S sequence of all 16 strains used 2. Soil Sampling Sites.xlsx: an excel spreadsheet with data for the seven sites where soil samples were obtained from. 3. ALL_Strains_Tree.phylotree: a phylotree file that generates the phylogenetic tree using the 16S sequences for all 16 strains 4. All_Communities_Compositions_and_AUCs.xlsx: an excel spreadsheet containing the composition of all 70 original communities plus 10 generated communities, in +1/-1 format (where +1 indicates a strain's presence in a community while -1 indicates its absence). Strains are orderd by degraders first, and then nondegraders. Afterwards, the AUC of BPA degradation of each community in all five concentrations is listed. Note that every community is duplicated, since technical duplicates were prepared during experiments. 5. All_Communities_TimeSeries.xlsx: an excel spreadsheet containing the BPA degradation dynamics for all 70 original communities plus 10 generated communities in each of the original five concentrations. Each concentration is listed as its own sheet. The last column displays the AUC of degradation. Note that every community is duplicated, since technical duplicates were prepared during experiments. 6. Regression_Coefficients.xlsx: an excel spreadsheet containing the values of the coefficients for each of the five regressions (one for each initial BPA concentration), including the intercept term $\beta_0$ 7. MonocultureTimeSeries.xlsx: an excel spreadsheet containing the BPA degradation dynamics for each of the ten BPA degraders from initial concentrations of 60 ppm and 150 ppm (each its own sheet). Note that entries are duplicated, since each strain was measured in technical duplicates. 8. Soil_Remediation_TimeSeries&AUCs.xlsx: an excel spreadsheet containing the BPA degradation dynamics for all 10 generated communities in both remediation soils and in both BPA concentrations. Technical duplicates are reported as separate sheets, and the average of duplicates are reported in a third sheet. Average AUCs of BPA degradation in all conditions is reported in a fourth sheet.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.