This is the main wrapper function and core set of utilities for running the HISEA mixed-stock analysis framework, allowing simulation, analysis, or bootstrap estimation of stock composition from mixture samples.
Supported operation modes:
SIMULATION: Simulate mixtures based on known proportions and evaluate performance of classification and estimators.
ANALYSIS: Apply trained classifier to real mixture data to estimate stock proportions.
BOOTSTRAP: Resample real mixture to evaluate variability of estimates.
Supported classifiers: LDA, QDA, Random Forest, SVM, k-NN, ANN, XGBoost, Naive Bayes, Mclust, MLR. Supported estimators: RAW, Cook, Constrained Cook, EM (Millar), Maximum Likelihood.
Includes integrated 10-fold cross-validation and model quality evaluation (accuracy, kappa, F1, etc.).
Usage
.resample_baseline_data_helper(
original_baseline_list,
resampled_sizes,
stock_names_for_error,
nv_fallback
)
Arguments
- type
Character. "SIMULATION", "ANALYSIS" or "BOOTSTRAP".
- np
Integer. Number of populations (stocks).
- nv
Integer. Number of variables.
- seed_val
Integer. Random seed for reproducibility.
- nsamps
Integer. Number of replicates.
- Nmix
Integer. Sample size of the simulated mixture (for SIMULATION only).
- actual
Numeric vector. True proportions used in simulation.
- baseline_path
Character. File path to the baseline
.std
file.- mix_path
Character. File path to the mixture
.mix
file.- export_csv
Logical. Whether to export summary and confusion matrix to CSV.
- output_dir
Character. Output directory.
- verbose
Logical. Print progress messages.
- method_class
Character. Classification method (e.g., "LDA", "RF", "SVM", etc.).
- stocks_names
Character vector. Optional vector of stock names.
- resample_baseline
Logical. Resample the baseline for each replicate.
- resampled_baseline_sizes
Integer vector. Sizes of resamples per stock.
- phi_method
Character. "standard" or "cv" (cross-validation-based confusion matrix).
- mclust_model_names
Character vector. Models to test with Mclust.
- mclust_perform_cv
Logical. Whether to cross-validate Mclust.
Value
A list with:
- estimation_summary
Summary table with mean, SD, and RMSE of estimates.
- classification_model
Final trained classifier object.
- baseline_classification_quality
Accuracy, Kappa, and per-class metrics.
- phi_matrix
Estimated confusion matrix used in corrections.
- mixture_classification_details
List with predicted pseudo-classes and likelihoods.
A .rda
file of results is also saved in output_dir
.
See also
compute_cook_estimators
, estimate_millar
, estimate_ml
,
get_cv_metrics_and_phi
, train_model
, predict_model
, .resample_baseline_data_helper
Examples
if (FALSE) { # \dontrun{
run_hisea_all(type="SIMULATION",
np=3, nv=5,
actual=c(0.2,0.3,0.5),
Nmix=200,
baseline_path="baseline.std",
method_class="RF",
resample_baseline=TRUE,
resampled_baseline_sizes=c(100,100,100),
output_dir="results")
} # }