Skip to contents

This is the main wrapper function and core set of utilities for running the HISEA mixed-stock analysis framework, allowing simulation, analysis, or bootstrap estimation of stock composition from mixture samples.

Supported operation modes:

  • SIMULATION: Simulate mixtures based on known proportions and evaluate performance of classification and estimators.

  • ANALYSIS: Apply trained classifier to real mixture data to estimate stock proportions.

  • BOOTSTRAP: Resample real mixture to evaluate variability of estimates.

Supported classifiers: LDA, QDA, Random Forest, SVM, k-NN, ANN, XGBoost, Naive Bayes, Mclust, MLR. Supported estimators: RAW, Cook, Constrained Cook, EM (Millar), Maximum Likelihood.

Includes integrated 10-fold cross-validation and model quality evaluation (accuracy, kappa, F1, etc.).

Usage

.resample_baseline_data_helper(
  original_baseline_list,
  resampled_sizes,
  stock_names_for_error,
  nv_fallback
)

Arguments

type

Character. "SIMULATION", "ANALYSIS" or "BOOTSTRAP".

np

Integer. Number of populations (stocks).

nv

Integer. Number of variables.

seed_val

Integer. Random seed for reproducibility.

nsamps

Integer. Number of replicates.

Nmix

Integer. Sample size of the simulated mixture (for SIMULATION only).

actual

Numeric vector. True proportions used in simulation.

baseline_path

Character. File path to the baseline .std file.

mix_path

Character. File path to the mixture .mix file.

export_csv

Logical. Whether to export summary and confusion matrix to CSV.

output_dir

Character. Output directory.

verbose

Logical. Print progress messages.

method_class

Character. Classification method (e.g., "LDA", "RF", "SVM", etc.).

stocks_names

Character vector. Optional vector of stock names.

resample_baseline

Logical. Resample the baseline for each replicate.

resampled_baseline_sizes

Integer vector. Sizes of resamples per stock.

phi_method

Character. "standard" or "cv" (cross-validation-based confusion matrix).

mclust_model_names

Character vector. Models to test with Mclust.

mclust_perform_cv

Logical. Whether to cross-validate Mclust.

Value

A list with:

estimation_summary

Summary table with mean, SD, and RMSE of estimates.

classification_model

Final trained classifier object.

baseline_classification_quality

Accuracy, Kappa, and per-class metrics.

phi_matrix

Estimated confusion matrix used in corrections.

mixture_classification_details

List with predicted pseudo-classes and likelihoods.

A .rda file of results is also saved in output_dir.

See also

compute_cook_estimators, estimate_millar, estimate_ml, get_cv_metrics_and_phi, train_model, predict_model, .resample_baseline_data_helper

Examples

if (FALSE) { # \dontrun{
run_hisea_all(type="SIMULATION",
             np=3, nv=5,
             actual=c(0.2,0.3,0.5),
             Nmix=200,
             baseline_path="baseline.std",
             method_class="RF",
             resample_baseline=TRUE,
             resampled_baseline_sizes=c(100,100,100),
             output_dir="results")
} # }