Run HISEA Mixed Stock Analysis

This is the main wrapper function and core set of utilities for running the HISEA mixed-stock analysis framework, allowing simulation, analysis, or bootstrap estimation of stock composition from mixture samples.

Supported operation modes:

SIMULATION: Simulate mixtures based on known proportions and evaluate performance of classification and estimators.
ANALYSIS: Apply trained classifier to real mixture data to estimate stock proportions.
BOOTSTRAP: Resample real mixture to evaluate variability of estimates.

Supported classifiers: LDA, QDA, Random Forest, SVM, k-NN, ANN, XGBoost, Naive Bayes, Mclust, MLR. Supported estimators: RAW, Cook, Constrained Cook, EM (Millar), Maximum Likelihood.

Includes integrated 10-fold cross-validation and model quality evaluation (accuracy, kappa, F1, etc.).

Usage

.resample_baseline_data_helper(
  original_baseline_list,
  resampled_sizes,
  stock_names_for_error,
  nv_fallback
)

Arguments

type: Character. "SIMULATION", "ANALYSIS" or "BOOTSTRAP".
np: Integer. Number of populations (stocks).
nv: Integer. Number of variables.
seed_val: Integer. Random seed for reproducibility.
nsamps: Integer. Number of replicates.
Nmix: Integer. Sample size of the simulated mixture (for SIMULATION only).
actual: Numeric vector. True proportions used in simulation.
baseline_path: Character. File path to the baseline .std file.
mix_path: Character. File path to the mixture .mix file.
export_csv: Logical. Whether to export summary and confusion matrix to CSV.
output_dir: Character. Output directory.
verbose: Logical. Print progress messages.
method_class: Character. Classification method (e.g., "LDA", "RF", "SVM", etc.).
stocks_names: Character vector. Optional vector of stock names.
resample_baseline: Logical. Resample the baseline for each replicate.
resampled_baseline_sizes: Integer vector. Sizes of resamples per stock.
phi_method: Character. "standard" or "cv" (cross-validation-based confusion matrix).
mclust_model_names: Character vector. Models to test with Mclust.
mclust_perform_cv: Logical. Whether to cross-validate Mclust.

Value

A list with:

estimation_summary: Summary table with mean, SD, and RMSE of estimates.
classification_model: Final trained classifier object.
baseline_classification_quality: Accuracy, Kappa, and per-class metrics.
phi_matrix: Estimated confusion matrix used in corrections.
mixture_classification_details: List with predicted pseudo-classes and likelihoods.

A .rda file of results is also saved in output_dir.

Examples

if (FALSE) { # \dontrun{
run_hisea_all(type="SIMULATION",
             np=3, nv=5,
             actual=c(0.2,0.3,0.5),
             Nmix=200,
             baseline_path="baseline.std",
             method_class="RF",
             resample_baseline=TRUE,
             resampled_baseline_sizes=c(100,100,100),
             output_dir="results")
} # }

Usage

Arguments

Value

See also

Examples