# Data for: Optimizing Sensor Data Interpretation via Hybrid Parametric Bootstrapping
**Overview:**
"Optimizing Sensor Data Interpretation via Hybrid Parametric Bootstrapping" is a research study by Victor V. Golovko, published in the *Sensors* in 2025. The study presents new statistical method to analyze data in nuclear cleanup processes, reducing waste and operational costs.
**Publication Information:**
- **Journal**: *Sensors*
- **Year**: 2025
- **Article Number**: 143910
- **DOI**: [https://doi.org/10.3390/s25041183](https://doi.org/10.3390/s25041183)
- **Link**: [MDPI](https://www.mdpi.com/1424-8220/25/4/1183)
The *Sensors* is an international, peer-reviewed, open access journal on the science and technology of sensors. It has an impact factor of 3.4 (2024).
### Podcast Discussion
For a detailed discussion on the findings of this study, listen to the [podcast](https://osf.io/3pwmk/files/osfstorage/67b5e8a09751e2d541a54a05) created by Artificial Inelegance NotebookLM.
## Analysis of Egyptian Granite Density Data in R (granite-Egypt-dry-bulk-density-v00.R)
This page provides an overview of a script that analyzes dry bulk density measurements for a dataset of Egyptian granite samples, calculating mean densities with uncertainties and testing data normality. This page also includes guidance on how to install R, run the script, and interpret the output.
## Overview of Script
This R script performs the following tasks:
1. **Data Setup**: Initializes a data frame with dry bulk density measurements (in kg/m³) for five groups of Egyptian granite samples.
2. **Data Formatting**: Formats density values with associated uncertainties for easier interpretation.
3. **Normality Test**: Applies the Shapiro-Wilk test to assess the normality of the density distribution.
4. **Output**: Displays formatted density values with uncertainties and the results of the Shapiro-Wilk test.
5. **Execution Time Measurement**: Measures and outputs the script’s total execution time.
## Required Libraries
- **None**: This script only uses base R functions; no additional packages are required.
## How to Get R
To run this script, you’ll need R installed on your computer. Follow these steps to get started:
1. **Download R**: Go to the [Comprehensive R Archive Network (CRAN)](https://cran.r-project.org/) and download R for your operating system (Windows, macOS, or Linux).
2. **Install RStudio (Optional)**: Although not required, [RStudio](https://posit.co/download/rstudio/) is a popular, user-friendly integrated development environment (IDE) for R. It provides a clean interface, debugging tools, and other features to make working with R easier.
## How to Run the Script
Once R (and optionally RStudio) is installed, you can run the script by following these steps:
1. **Set the Working Directory**: Set the working directory in R to the folder where this script is saved.
- Example command: `setwd("path/to/your_directory")`
2. **Open the Script in RStudio or Another R Environment**:
- If you’re using RStudio, open the script file and click "Run" to execute it.
- Alternatively, you can run the script in the R console using the `source()` function:
```r
source("path/to/your_script.R")
```
3. **View Outputs**: The results will be printed to the console, including the formatted density values with uncertainties and Shapiro-Wilk normality test results.
## Example Output
After running the script, you should see output similar to the following in your console:
- **Density Values with Uncertainties**: Formatted values with standard deviations in parentheses.
- **Shapiro-Wilk Normality Test Results**: Includes W-statistic and p-value, indicating whether the density data follows a normal distribution.
- **Execution Time**: The total time taken to run the script.
## Script Code
Below is the full script code for reference:
```r
# Load necessary libraries
# Only `stats` is used, which is loaded by default in R
# Set digit precision for display
options(digits = 7)
# Set the working directory (ensure this path is correct)
setwd("path/to/your_script")
# Start timer
ptm <- proc.time()
# Create a data frame for dry bulk density
density_data <- data.frame(
Group = c("G1", "G2", "G3", "G4", "G5"),
Dry_Bulk_Density_kg_m3 = c(2617.07, 2590.42, 2612.28, 2724.35, 2748.23),
Std_Dev = c(8.74, 9.25, 4.72, 3.73, 3.18),
Lower_Bound = c(2608.29, 2579.19, 2608.13, 2720.44, 2744.60),
Upper_Bound = c(2627.65, 2600.29, 2618.41, 2728.98, 2751.87)
)
# Display the data frame
print(density_data)
# Function to format values with uncertainties
format_with_uncertainty <- function(value, uncertainty) {
formatted_value <- sprintf("%.2f", value)
formatted_uncertainty <- sprintf("(%.2f)", uncertainty)
return(paste0(formatted_value, formatted_uncertainty))
}
# Apply the function to format density data with uncertainties
formatted_density <- mapply(format_with_uncertainty, density_data$Dry_Bulk_Density_kg_m3, density_data$Std_Dev)
# Print the formatted density values
formatted_density_output <- paste(formatted_density, collapse = ", ")
cat("density values with uncertainties: ", formatted_density_output, "\n")
# Apply the Shapiro-Wilk normality test
shapiro_test <- shapiro.test(density_data$Dry_Bulk_Density_kg_m3)
# Print the results of the Shapiro-Wilk test
cat("\nShapiro-Wilk Normality Test Results for Egypt granite density (small dataset):\n")
print(shapiro_test)
# Stop timer
proc.time() - ptm
```
---
# Analysis of Uranium Isotope (U-235) Data in R (U235_Egypt-0.R)
This page provides an overview of an R script designed to analyze uranium isotope measurements (U-235 and U-238) from an Excel dataset. The script extracts values and uncertainties, formats the data, and calculates summary statistics, including mean and standard deviation for U-235.
## Overview of Script
The script performs the following tasks:
1. **Reads Data**: Loads an Excel file containing U-235 and U-238 measurements with uncertainties.
2. **Data Preparation**: Trims and renames columns for consistency, and extracts numerical values and uncertainties from formatted text.
3. **Formatting and Display**: Formats U-235 values with uncertainties for display.
4. **Statistical Analysis**: Calculates the sample mean and standard deviation for U-235 values.
5. **Execution Time Measurement**: Measures and displays the total time taken to run the script.
## Required Libraries
This script uses the following libraries:
- **dplyr**: For data manipulation (renaming columns, creating new columns, and selecting columns).
- **readxl**: For reading data from an Excel file.
- **stringr**: For string manipulation (trimming spaces, extracting numeric values and uncertainties).
## How to Get R and Install Required Packages
1. **Download and Install R**: Visit [CRAN](https://cran.r-project.org/) to download R for your operating system.
2. **Install Required Packages**: After installing R, open R or RStudio and run the following commands to install the necessary packages:
```r
install.packages("dplyr")
install.packages("readxl")
install.packages("stringr")
```
## How to Run the Script
1. **Set the Working Directory**: Ensure that the working directory in R is set to the folder where this script and the Excel file are located:
```r
setwd("path/to/your_directory")
```
2. **Prepare the Excel File**: Place the file `"U-235_U238_Egypt.xlsx"` in the working directory.
3. **Run the Script**: Open the script in an R environment (RStudio, R GUI, or R terminal) and execute it.
4. **View Output**: The results will be displayed in the console, including:
- **Formatted U-235 Values with Uncertainties**
- **Mean and Standard Deviation of U-235**
- **Total Execution Time**
## Example Output
Upon running the script, you should see output similar to the following in your console:
- **Formatted U-235 values with uncertainties**: Displays U-235 values in a user-friendly format with associated uncertainties.
- **Sample Mean and Standard Deviation**: Provides the calculated mean and standard deviation of U-235 values.
- **Execution Time**: Shows the total time taken to run the script.
---
# U-235 Specific Activity Data Analysis Script (Figure3.R)
---
### Overview
This script provides an analysis of U-235 specific activity data by performing a Kolmogorov-Smirnov (K-S) test and visualizing distributions through histograms. The analysis helps to understand the distributional differences between a subset of randomly sampled data and the original dataset with specified outliers removed.
---
### Key Functions
1. **Outlier Removal**: Removes predefined outliers from the dataset for a cleaner analysis.
2. **Random Sampling with Seed**: Uses a fixed seed (8621) to ensure reproducibility, sampling 9 elements from the cleaned dataset.
3. **Kolmogorov-Smirnov (K-S) Test**: Compares the distribution of the sampled subset to that of the original dataset without outliers.
4. **Histogram Visualization**: Provides histograms to visually compare the distributions of the original (26 elements) and sampled (9 elements) datasets.
---
### Requirements
To successfully run this script, you will need the following R packages:
- **ggplot2**: For creating histograms.
- **gridExtra**: For arranging multiple plots side by side.
Install these packages in R if they are not already available:
```r
install.packages("ggplot2")
install.packages("gridExtra")
```
---
### How to Run the Script
1. **Open R or RStudio**: Ensure R or RStudio is installed and set up.
2. **Set the Working Directory**: In R, set the working directory to the folder containing this script using:
```r
setwd("path/to/your/directory")
```
3. **Install Necessary Packages**: Make sure `ggplot2` and `gridExtra` are installed as specified in the requirements.
4. **Execute the Script**: Run the script in R or RStudio.
---
### Expected Output
Upon successful execution, the script will:
- **Print**: Display the K-S test's D-statistic and p-value in the console.
- **Histograms**: Show histograms comparing the distributions of the original and sampled datasets, allowing visual assessment of the distributional similarities or differences.
---
# U-235 Bootstrap Sample Analysis Script (Figure4.R)
---
### Overview
This R script performs statistical analysis on U-235 bootstrap sample results. It calculates a 2-sigma confidence interval, identifies the Most Frequent Value (MFV), and generates a histogram to visually compare these metrics. This analysis helps in understanding the distribution and central tendency of U-235 bootstrap sample MFV.
---
### Requirements
To run this script, you need the following:
- **R**: Download and install R from the Comprehensive R Archive Network (CRAN):
- Go to [CRAN](https://cran.r-project.org/).
- Select the version compatible with your operating system (Windows, macOS, or Linux) and follow the installation instructions.
- **RStudio (optional)**: For a user-friendly R interface, download RStudio from [RStudio Download](https://posit.co/download/rstudio/).
- **ggplot2 Package**: This package is necessary for creating histograms. Install it by running the following command in your R console:
```r
install.packages("ggplot2")
```
---
### Instructions for Running the Script
1. **Prepare the Data File**:
- Ensure the bootstrap sample data file, `U235_9_HBM_bootstrap_results.txt`, is in the same directory as this script. If the file is located elsewhere, specify the correct file path in the script.
2. **Set Working Directory** (optional):
- If needed, set the working directory in R to where the script and data file are located:
```r
setwd("path/to/your_directory")
```
3. **Execute the Script**:
- Run the script in an R environment, such as RStudio, R GUI, or directly in the R terminal.
4. **Expected Output**:
- **Console Output**:
- The script will calculate and display the 2-sigma confidence interval and the MFV for the U-235 bootstrap sample data.
- **Histogram**:
- A histogram will be generated to visualize the distribution of bootstrap sample means with:
- Annotations marking the MFV.
- Vertical lines representing the 2-sigma confidence interval boundaries.
5. **View Results**:
- Review the printed confidence intervals and MFV in the console.
- Inspect the histogram to visually assess the distribution and central tendency.
---
**Wiki Page: Generating Histograms for U-235 Specific Activity Measurements**
---
# Generating Histograms for U-235 Specific Activity Measurements (Figure2.R)
This R script generates histograms for U-235 specific activity measurements (Bq/kg) in two datasets: one with outliers removed and one with the original values. It calculates and annotates each histogram with the mean and Most Frequent Value (MFV) as vertical lines. The script then saves the combined histograms to a PDF file named `"U235his.pdf"`.
---
### Summary of Functionality
- **Outlier Removal**: The script first removes specified outliers from the original U-235 activity dataset.
- **Statistical Calculations**: Calculates the mean and MFV for both datasets (with and without outliers).
- **Histogram Plotting**: Creates density histograms for each dataset and overlays vertical lines marking the mean and MFV values.
- **Saving Output**: Exports the combined histogram plot to a PDF file titled `"U235his.pdf"` for easy viewing and sharing.
---
### Requirements
The script requires the following R packages for plotting and arranging the output:
- **ggplot2**: For creating histograms and annotations.
- **gridExtra**: For combining and arranging multiple plots into a single output.
To install these packages, you can run the following commands in R:
```R
install.packages("ggplot2")
install.packages("gridExtra")
```
---
### How to Run the Script
1. **Setup**: Open R or RStudio and set the working directory to the folder containing this script.
```R
setwd("path/to/your_directory")
```
2. **Install Required Packages**: Ensure that `ggplot2` and `gridExtra` are installed (as described in the Requirements section).
3. **Run the Script**: Execute the script in your R environment (R, RStudio, etc.). The script will generate histograms with annotated vertical lines for both datasets.
4. **Output File**: Check your working directory for a file named `"U235his.pdf"`, which contains the combined histograms.
---
### Example Output
The generated `"U235his.pdf"` file contains two histograms arranged in a grid:
- **Top Plot**: U-235 specific activity measurements with outliers removed.
- **Bottom Plot**: U-235 specific activity measurements from the original dataset.
Each plot includes vertical lines showing the calculated **Mean** (dashed line) and **Most Frequent Value (MFV)** (solid line) for easy comparison.
---