This is the main wrapper function and core set of utilities for running the HISEA mixed-stock analysis framework, allowing simulation, analysis, or bootstrap estimation of stock composition from mixture samples.
Supported operation modes: - **SIMULATION**: Simulate mixtures based on known proportions and evaluate performance of classification and estimators. - **ANALYSIS**: Apply trained classifier to real mixture data to estimate stock proportions. - **BOOTSTRAP**: Resample real mixture to evaluate variability of estimates.
Supported classifiers: LDA, QDA, Random Forest, SVM, k-NN, ANN, XGBoost, Naive Bayes, Mclust, MLR. Supported estimators: RAW, Cook, Constrained Cook, EM (Millar), Maximum Likelihood.
Includes integrated 10-fold cross-validation and model quality evaluation (accuracy, kappa, F1, etc.).
Usage
run_hisea_all(
type = "ANALYSIS",
np,
nv,
seed_val = 123456,
var_cols_std = NULL,
var_cols_mix = NULL,
stock_col = NULL,
nsamps = 1000,
Nmix = 100,
actual = NULL,
baseline_input = NULL,
mix_input = NULL,
export_csv = FALSE,
output_dir = ".",
verbose = FALSE,
method_class = "LDA",
stocks_names = NULL,
resample_baseline = FALSE,
resampled_baseline_sizes = NULL,
phi_method = c("standard", "cv"),
mclust_model_names = NULL,
mclust_perform_cv = TRUE,
...
)Arguments
- type
Character. "SIMULATION", "ANALYSIS" or "BOOTSTRAP".
- np
Integer. Number of populations (stocks).
- nv
Integer. Number of variables.
- seed_val
Integer. Random seed for reproducibility.
- var_cols_std
Character vector of column names for baseline variables.
- var_cols_mix
Character vector of column names for mixture variables.
- stock_col
Character name of stock column in baseline data.
- nsamps
Integer. Number of replicates.
- Nmix
Integer. Sample size of the simulated mixture (for SIMULATION only).
- actual
Numeric vector. True proportions used in simulation.
- baseline_input
Data frame or file path for baseline data.
- mix_input
Data frame or file path for mixture data.
- export_csv
Logical. Whether to export summary and confusion matrix to CSV.
- output_dir
Character. Output directory.
- verbose
Logical. Print progress messages.
- method_class
Character. Classification method (e.g., "LDA", "RF", "SVM", etc.).
- stocks_names
Character vector. Optional vector of stock names.
- resample_baseline
Logical. Resample the baseline for each replicate.
- resampled_baseline_sizes
Integer vector. Sizes of resamples per stock.
- phi_method
Character. "standard" or "cv" (cross-validation-based confusion matrix).
- mclust_model_names
Character vector. Models to test with Mclust.
- mclust_perform_cv
Logical. Whether to cross-validate Mclust.
- ...
Additional arguments passed to the underlying classification models (e.g., ntree for Random Forest, cost for SVM).
Value
A list of length 8 containing the statistical summary of the estimation (the same as the `estimation_summary` element in the saved file):
- mean_estimates
Matrix [np x nsamps] of mean estimated proportions.
- sd_estimates
Standard deviations of the estimates.
- mse_estimates
Mean Squared Error (if applicable).
- var_emp
Empirical variance of the estimates.
- covar_ml
Maximum Likelihood covariance matrix.
- cor_ml
Correlation matrix.
- covar_inv_ml
Inverse of the covariance matrix.
- det_covar_ml
Determinant of the covariance matrix (checks for singularity).
Saved Results Structure
The function automatically saves a `.rda` file in `output_dir` containing a master list named `out`. This list includes:
- estimation_summary
The list of 8 statistical metrics described above.
- classification_model
The trained classifier object (e.g., LDA, RF).
- baseline_classification_quality
Accuracy, Kappa, and F1 scores.
- phi_matrix
The confusion matrix used for bias correction.
- mixture_classification_details
Predicted classes and posterior probabilities.
Examples
data(baseline)
data(mixture)
res <- run_hisea_all(
baseline_input = baseline,
mix_input = mixture,
stock_col = "population",
var_cols_std = c("d13c", "d18o"),
var_cols_mix = c("d13c_ukn", "d18o_ukn"),
output_dir = tempdir(),
np = 2, nv = 2, nsamps = 5, Nmix = 50, method_class = "LDA"
)
print(res$mean_estimates)
#> RAW COOK COOKC EM ML
#> Stock_1 0.3771111 0.3376263 0.3376263 0.3376264 0.3216781
#> Stock_2 0.6228889 0.6623737 0.6623737 0.6623736 0.6783219
