Performs balanced bootstrap (or bootknife) resampling of clustered data and calculates bootstrap bias, standard errors and confidence intervals. -- Function File: bootclust (DATA) -- Function File: bootclust (DATA, NBOOT) -- Function File: bootclust (DATA, NBOOT, BOOTFUN) -- Function File: bootclust ({D1, D2, ...}, NBOOT, BOOTFUN) -- Function File: bootclust (DATA, NBOOT, {BOOTFUN, ...}) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTSZ) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO) -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO, SEED) -- Function File: STATS = bootclust (...) -- Function File: [STATS, BOOTSTAT] = bootclust (...) 'bootclust (DATA)' uses nonparametric balanced bootstrap resampling to generate 1999 resamples from clusters of rows of the DATA (column vector or matrix). By default, each rows is it's own cluster (i.e. no clustering). The means of the resamples are then computed and the following statistics are displayed: - original: the original estimate(s) calculated by BOOTFUN and the DATA - bias: bootstrap bias of the estimate(s) - std_error: bootstrap standard error of the estimate(s) - CI_lower: lower bound(s) of the 95% bootstrap confidence interval - CI_upper: upper bound(s) of the 95% bootstrap confidence interval 'bootclust (DATA, NBOOT)' specifies the number of bootstrap resamples, where NBOOT is a scalar, positive integer corresponding to the number of bootstrap resamples. THe default value of NBOOT is the scalar: 1999. 'bootclust (DATA, NBOOT, BOOTFUN)' also specifies BOOTFUN: the function calculated on the original sample and the bootstrap resamples. BOOTFUN must be either a: <> function handle or anonymous function, <> string of function name, or <> a cell array where the first cell is one of the above function definitions and the remaining cells are (additional) input arguments to that function (other than the data arguments). In all cases BOOTFUN must take DATA for the initial input argument(s). BOOTFUN can return a scalar or any multidimensional numeric variable, but the output will be reshaped as a column vector. BOOTFUN must calculate a statistic representative of the finite data sample; it should NOT be an estimate of a population parameter (unless they are one of the same). If BOOTFUN is @mean or 'mean', narrowness bias of the confidence intervals for single bootstrap are reduced by expanding the probabilities of the percentiles using Student's t-distribution [1]. By default, BOOTFUN is @mean. 'bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)' resamples from the clusters of rows of the data vectors D1, D2 etc and the resamples are passed onto BOOTFUN as multiple data input arguments. All data vectors and matrices (D1, D2 etc) must have the same number of rows. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA)', where ALPHA is numeric and sets the lower and upper bounds of the confidence interval(s). The value(s) of ALPHA must be between 0 and 1. ALPHA can either be: <> scalar: To set the (nominal) central coverage of equal-tailed percentile confidence intervals to 100*(1-ALPHA)%. <> vector: A pair of probabilities defining the (nominal) lower and upper percentiles of the confidence interval(s) as 100*(ALPHA(1))% and 100*(ALPHA(2))% respectively. The percentiles are bias-corrected and accelerated (BCa) [2]. The default value of ALPHA is the vector: [.025, .975], for a 95% BCa confidence interval. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)' also sets CLUSTID, which are identifiers that define the grouping of the DATA rows for cluster bootstrap case resampling. CLUSTID should be a column vector or cell array with the same number of rows as the DATA. Rows in DATA with the same CLUSTID value are treated as clusters of observations that are resampled together. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTSZ)' groups consecutive DATA rows into clusters of length CLUSTSZ. This is equivalent to block bootstrap resampling. By default, CLUSTSZ is 1. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO)' sets the resampling method. If LOO is false, the resampling method used is balanced bootstrap resampling. If LOO is true, the resampling method used is balanced bootknife resampling [3]. Where N is the number of clusters, bootknife cluster resampling involves creating leave-one-out jackknife samples of size N - 1, and then drawing resamples of size N with replacement from the jackknife samples, thereby incorporating Bessel's correction into the resampling procedure. LOO must be a scalar logical value. The default value of LOO is false. 'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO, SEED)' initialises the Mersenne Twister random number generator using an integer SEED value so that bootclust results are reproducible. 'STATS = bootclust (...)' returns a structure with the following fields (defined above): original, bias, std_error, CI_lower, CI_upper. '[STATS, BOOTSTAT] = bootclust (...)' returns BOOTSTAT, a vector or matrix of bootstrap statistics calculated over the bootstrap resamples. REQUIREMENTS: The function file boot.m (or better boot.mex) and bootcdf, which are distributed with the statistics-resampling package. BIBLIOGRAPHY: [1] Hesterberg, Tim (2014), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, http://arxiv.org/abs/1411.5279 [2] Efron, and Tibshirani (1993) An Introduction to the Bootstrap. New York, NY: Chapman & Hall [3] Hesterberg T.C. (2004) Unbiasing the Bootstrap—Bootknife Sampling vs. Smoothing; Proceedings of the Section on Statistics & the Environment. Alexandria, VA: American Statistical Association. bootclust (version 2023.09.20) Author: Andrew Charles Penn https://www.researchgate.net/profile/Andrew_Penn/ Copyright 2019 Andrew Charles Penn This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; % 95% expanded BCa bootstrap confidence intervals for the mean bootclust (data, 1999, @mean);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mean Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (1.3%, 97.6%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +29.65 +1.066e-14 +2.579 +23.85 +34.68
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ... 'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'}; % 95% expanded BCa bootstrap confidence intervals for the mean with % cluster resampling bootclust (data, 1999, @mean, [0.025,0.975], clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: mean Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (1.4%, 99.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +29.65 -0.03173 +2.871 +23.28 +36.33
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; % 90% equal-tailed percentile bootstrap confidence intervals for % the variance bootclust (data, 1999, {@var, 1}, 0.1);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Percentile (equal-tailed) Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -6.648 +41.77 +98.16 +235.6
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ... 'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'}; % 90% equal-tailed percentile bootstrap confidence intervals for % the variance bootclust (data, 1999, {@var, 1}, 0.1, clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Percentile (equal-tailed) Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -9.616 +33.17 +104.8 +214.0
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; % 90% BCa bootstrap confidence intervals for the variance bootclust (data, 1999, {@var, 1}, [0.05 0.95]);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 90% (12.2%, 98.7%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -6.544 +42.20 +115.9 +263.7
The following code
% Input univariate dataset data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ... 0 33 28 34 4 32 24 47 41 24 26 30 41].'; clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ... 'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'}; % 90% BCa bootstrap confidence intervals for the variance bootclust (data, 1999, {@var, 1}, [0.05 0.95], clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: var Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 90% (13.5%, 98.8%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +171.5 -9.804 +33.43 +123.8 +231.4
The following code
% Input dataset y = randn (20,1); x = randn (20,1); X = [ones(20,1), x]; % 90% BCa confidence interval for regression coefficients bootclust ({y,X}, 1999, @(y,X) X\y, [0.05 0.95]); % Could also use @regress
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: @(y, X) X \ y Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage: 90% Bootstrap Statistics: original bias std_error CI_lower CI_upper -0.2666 +0.02243 +0.2679 -0.6970 +0.1652 -0.1811 -0.02153 +0.2923 -0.6874 +0.2454
The following code
% Input dataset y = randn (20,1); x = randn (20,1); X = [ones(20,1), x]; clustid = [1;1;1;1;2;2;2;3;3;3;3;4;4;4;4;4;5;5;5;6]; % 90% BCa confidence interval for regression coefficients bootclust ({y,X}, 1999, @(y,X) X\y, [0.05 0.95], clustid);
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: @(y, X) X \ y Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage: 90% Bootstrap Statistics: original bias std_error CI_lower CI_upper +0.004980 +0.02224 +0.2339 -0.3471 +0.4229 +0.04880 +0.01775 +0.1999 -0.2698 +0.3704
The following code
% Input bivariate dataset x = [576 635 558 578 666 580 555 661 651 605 653 575 545 572 594].'; y = [3.39 3.3 2.81 3.03 3.44 3.07 3 3.43 ... 3.36 3.13 3.12 2.74 2.76 2.88 2.96].'; clustid = [1;1;3;1;1;2;2;2;2;3;1;3;3;3;2]; % 95% BCa bootstrap confidence intervals for the correlation coefficient bootclust ({x, y}, 1999, @cor, [], clustid); % Please be patient, the calculations will be completed soon...
Produces the following output
Summary of nonparametric cluster bootstrap estimates of bias and precision ****************************************************************************** Bootstrap settings: Function: cor Resampling method: Balanced, bootstrap cluster resampling Number of resamples: 1999 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) Nominal coverage (and the percentiles used): 95% (2.1%, 97.2%) Bootstrap Statistics: original bias std_error CI_lower CI_upper +0.7764 -0.02494 +0.1451 +0.4094 +1.003
Package: statistics-resampling