Package 'modehunt' reference manual

Title:	Multiscale Analysis for Density Functions
Description:	Given independent and identically distributed observations X(1), ..., X(n) from a density f, provides five methods to perform a multiscale analysis about f as well as the necessary critical values. The first method, introduced in Duembgen and Walther (2008), provides simultaneous confidence statements for the existence and location of local increases (or decreases) of f, based on all intervals I(all) spanned by any two observations X(j), X(k). The second method approximates the latter approach by using only a subset of I(all) and is therefore computationally much more efficient, but asymptotically equivalent. Omitting the additive correction term Gamma in either method offers another two approaches which are more powerful on small scales and less powerful on large scales, however, not asymptotically minimax optimal anymore. Finally, the block procedure is a compromise between adding Gamma or not, having intermediate power properties. The latter is again asymptotically equivalent to the first and was introduced in Rufibach and Walther (2010).
Authors:	Kaspar Rufibach <[email protected]> and Guenther Walther <[email protected]>
Maintainer:	Kaspar Rufibach <[email protected]>
License:	GPL (>= 2)
Version:	1.0.7
Built:	2025-02-28 04:46:10 UTC
Source:	https://github.com/cran/modehunt

Multiscale Analysis for Density Functions

Description

Provides five methods and corresponding critical values to perform mode hunting, i.e. to compute multiscale test statistics based on local order statistics and spacings that provide simultaneous confidence statements for the existence and location of local increases and decreases of a density.

Details

Package:	modehunt
Type:	Package
Version:	1.0.7
Date:	2015-07-03
License:	GPL (>=2)

In Duembgen and Walther (2008) a multiscale test statistic based on spacings was introduced. This method provides simultaneous confidence statements for the existence and location of local increases and decreases of a density. The procedure guarantees finite–sample significance levels and possesses certain asymptotic optimality and adaptivity properties. However, since the local test statistics are computed on all $O(n^2)$ intervals in the set

$\mathcal{I}_{all} = \Bigl\{(j, \ k ) \ : \ 0 \le j < k \le n+1, \ k - j > 1\Bigr\},$

this latter procedure is computationally very expensive. Furthermore, the correction term $\Gamma$ employed by Duembgen and Walther (2008) to prevent the global test statistic to be dominated by the values of the local test statistics on small scales needs in principle to be re–derived for any new local test statistic, a non–trivial task in general. In Rufibach and Walther (2010), two new procedures are proposed: One that within the original framework of Duembgen and Walther (2008) approximates the set $\mathcal{I}_{all}$ by a specific subset of intervals $\mathcal{I}_{app}$ that only contains $O(n \log n )$ intervals. It is shown that considering $\mathcal{I}_{app}$ yields a procedure that is in terms of power asymptotically equivalent to that based on $\mathcal{I}_{all}$ , however, computationally much more efficient.

Finally, Rufibach and Walther (2010) propose a block procedure. Here, all intervals under consideration are grouped into blocks, where each interval in a block contains approximately the same number of original observations. Critical values are then computed per block. Again, this procedure is basically asymptotically equivalent to the standard approach proposed in Duembgen and Walther (2008), but again computationally much faster. It further offers a (finite–sample) tradeoff between employing or omitting the additive correction $\Gamma$ .

The initial procedure by Duembgen and Walther (2008) is implemented as the function modeHunting. The help file to the latter function also contains some more description of the mathematical details. criticalValuesAll can be used to compute critical values for this approach and cvModeAll contains a table of critical values (with and without correction term) for some $n$ and $alpha$ .

The corresonding functions and $p$ -values for the approximation are made available as modeHuntingApprox, criticalValuesApprox, and cvModeApprox and for the block method as modeHuntingBlock, criticalValuesBlock, and cvModeBlock.

Author(s)

Kaspar Rufibach (maintainer), [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

Kaspar Rufibach acknowledges support by the Swiss National Science Foundation SNF, http://www.snf.ch.

Guenther Walther acknowledges support by NSF grants DMS-9875598, DMS-0505682, and NIH grant 5R33HL068522.

References

Duembgen, L. and Walther, G. (2008). Multiscale Inference about a density. Ann. Statist., 36, 1758–1785.

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## generate random sample
set.seed(1977)
n <- 200; a <- 0; b <- 0.5; s <- 2 / (b - a) 
X.raw <- rlin(n, a, b, s)

## input critical values
alpha <- 0.05
data(cvModeAll); data(cvModeApprox); data(cvModeBlock)
cv.all <- cvModeAll[cvModeAll$alpha == alpha & cvModeAll$n == n, 3:4]
cv.approx <- cvModeApprox[cvModeApprox$alpha == alpha & cvModeApprox$n == n, 3:4]
cv.block <- cvModeBlock[cvModeBlock$alpha == alpha & cvModeBlock$n == n, 3:11]

## standard procedure from Duembgen and Walther (2008)
mod1 <- modeHunting(X.raw, lower = 0, upper = 1, cv.all, min.int = TRUE)

## procedure from Rufibach and Walther (2010) based on I_app
mod2 <- modeHuntingApprox(X.raw, lower = 0, upper = 1, 
                          crit.vals = cv.approx, min.int = TRUE)

## block procedure from Rufibach and Walther (2010)
mod3 <- modeHuntingBlock(X.raw, lower = 0, upper = 1, 
                         crit.vals = cv.block, min.int = TRUE)

## display
mod1; mod2; mod3
## generate random sample
set.seed(1977)
n <- 200; a <- 0; b <- 0.5; s <- 2 / (b - a) 
X.raw <- rlin(n, a, b, s)

## input critical values
alpha <- 0.05
data(cvModeAll); data(cvModeApprox); data(cvModeBlock)
cv.all <- cvModeAll[cvModeAll$alpha == alpha & cvModeAll$n == n, 3:4]
cv.approx <- cvModeApprox[cvModeApprox$alpha == alpha & cvModeApprox$n == n, 3:4]
cv.block <- cvModeBlock[cvModeBlock$alpha == alpha & cvModeBlock$n == n, 3:11]

## standard procedure from Duembgen and Walther (2008)
mod1 <- modeHunting(X.raw, lower = 0, upper = 1, cv.all, min.int = TRUE)

## procedure from Rufibach and Walther (2010) based on I_app
mod2 <- modeHuntingApprox(X.raw, lower = 0, upper = 1, 
                          crit.vals = cv.approx, min.int = TRUE)

## block procedure from Rufibach and Walther (2010)
mod3 <- modeHuntingBlock(X.raw, lower = 0, upper = 1, 
                         crit.vals = cv.block, min.int = TRUE)

## display
mod1; mod2; mod3

Computes number of observations for each block

Description

In Rufibach and Walther (2010) a new multiscale mode hunting procedure is presented that compares the local test statistics with critical values given by blocks. Blocks are collection of intervals on a given grid that contain roughly the same number of original observations.

Usage

blocks(n, m0 = 10, fm = 2)blocks(n, m0 = 10, fm = 2)

Arguments

`n`	Number of observations.
`m0`	Initial parameter that determines the number of observations in one block.
`fm`	Factor by which $m$ is increased from block to block.

Details

In our block procedure, we only consider a subset $\mathcal{I}_{app}$ of all possible intervals $\mathcal{I}_{all}$ where

$\mathcal{I}_{all} = \Bigl\{(j, \ k ) \ : \ 0 \le j < k \le n+1, \ k - j > 1\Bigr\}.$

This subset $\mathcal{I}_{app}$ is computed as follows:

Set $d_1, m_1, f_m > 1$ . Then:

$for \ \ r = 1,\ldots,\#blocks$

$d_r := round(d_1 f_m^{(r-1)/2}), \ m_r := m_1 f_m^{r-1}.$

Include $(j,k)$ in $\mathcal{I}_{app}$ if

(a) $j, k \in \{1+i d_r, \ i = 0, 1, \dots \}$ \ \ (we only consider every $d$ –th observation) and

(b) $m_r \le k-j-1 \le 2m_r-1$ \ \ ( $\mathcal{I}_{jk}$ contains between $m_r$ and $2m_r - 1$ observations)

$end \ \ for$

Value

$b \times 2$ –matrix, where $b$ is the number of blocks and the columns contain the lower and the upper number of observations that form each block.

Note

The asymptotic results in Rufibach and Walther (2010) are only derived for $f_m = 2$ .

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Compute critical values based on the set of all intervals

Description

This function computes critical values that are needed to perform the multiscale analysis about a density using the function modeHunting.

Usage

criticalValuesAll(n, alpha, M, display, path)criticalValuesAll(n, alpha, M, display, path)

Arguments

`n`	Number of observations.
`alpha`	Significance level, real number in $(0,1)$ .
`M`	Number of runs to perform.
`display`	If `display == 1`, every $100$ –th step is indicated in the output window, else not.
`path`	If `path != NA`, the current number of performed simulations is saved in this location.

Details

For more details see the function modeHunting and the data set cvModeAll.

Value

A 2-dimensional vector containing the critical value for the test statistic with or without additive correction $\Gamma$ .

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## compute critical values and compare to those in cvModeAll 
## (to see output in R, press CTRL + W)
cv1 <- criticalValuesAll(n = 200, alpha = 0.05, M = 10 ^ 2, display = 1, path = NA)
data(cvModeAll)
cv2 <- cvModeAll[cvModeAll$alpha == 0.05 & cvModeAll$n == 200, 3:4]
rbind(cv1, cv2)
## compute critical values and compare to those in cvModeAll 
## (to see output in R, press CTRL + W)
cv1 <- criticalValuesAll(n = 200, alpha = 0.05, M = 10 ^ 2, display = 1, path = NA)
data(cvModeAll)
cv2 <- cvModeAll[cvModeAll$alpha == 0.05 & cvModeAll$n == 200, 3:4]
rbind(cv1, cv2)

Compute critical values for (1) the original test statistic with or without additive correction, based on the aprroximating set of intervals and (2) for the block procedure

Description

This function computes critical values that can be used to perform the multiscale analysis about a density with the functions modeHuntingApprox and modeHuntingBlock.

Usage

criticalValuesApprox(n, d0 = 2, m0 = 10, fm = 2, alpha = 0.05, 
        gam = 2, tail = 10, M = 10 ^ 5, display = 0, path = NA)criticalValuesApprox(n, d0 = 2, m0 = 10, fm = 2, alpha = 0.05, 
        gam = 2, tail = 10, M = 10 ^ 5, display = 0, path = NA)

Arguments

`n`	Number of observations.
`d0`	Initial parameter for the grid resolution.
`m0`	Initial parameter for the number of observations in one block.
`fm`	Factor by which $m$ is increased from block to block.
`alpha`	Significance level, real number in $(0,1)$ .
`gam`	Weighting exponent for level in each block.
`tail`	Offset, determines together with `gam` the decrease of the level from one block to another.
`M`	Number of runs to perform.
`display`	If `display == 1`, every $100$ –th step is indicated in the output window, else not.
`path`	If `path != NA`, the current number of performed simulations is saved in this location.

Details

For details see the function modeHuntingApprox and the data set cvModeApprox.

Value

`approx`	A 2-dimensional vector containing the critical value for the test statistic with or without additive correction $\Gamma$ .
`block`	A vector containing the critical value for each block.

Note

The asymptotic results in Rufibach and Walther (2010) are only derived for $f_m = 2$ .

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## compute critical values and compare to those in cvModeAll and cvModeBlock
## (to see output in R, press CTRL + W)
cv <- criticalValuesApprox(n = 200, d0 = 2, m0 = 10, fm = 2, 
     alpha = 0.05, gam = 2, tail = 10, M = 10 ^ 2, display = 1, path = NA)
cv1 <- cv$approx; cv2 <- cv$block

data(cvModeApprox); data(cvModeBlock)
cv3 <- cvModeApprox[cvModeApprox$alpha == 0.05 & cvModeApprox$n == 200, 3:4]
cv4 <- cvModeBlock[cvModeBlock$alpha == 0.05 & cvModeBlock$n == 200, 3:6]
rbind(cv1, cv3)
rbind(cv2, cv4)
## compute critical values and compare to those in cvModeAll and cvModeBlock
## (to see output in R, press CTRL + W)
cv <- criticalValuesApprox(n = 200, d0 = 2, m0 = 10, fm = 2, 
     alpha = 0.05, gam = 2, tail = 10, M = 10 ^ 2, display = 1, path = NA)
cv1 <- cv$approx; cv2 <- cv$block

data(cvModeApprox); data(cvModeBlock)
cv3 <- cvModeApprox[cvModeApprox$alpha == 0.05 & cvModeApprox$n == 200, 3:4]
cv4 <- cvModeBlock[cvModeBlock$alpha == 0.05 & cvModeBlock$n == 200, 3:6]
rbind(cv1, cv3)
rbind(cv2, cv4)

Critical values for test statistic based on all intervals

Description

This dataset contains critical values for some $n$ and $\alpha$ for the test statistic based on all intervals, with or without additive correction term $\Gamma$ .

Usage

data(cvModeAll)data(cvModeAll)

Format

A data frame providing 15 different combinations of $n$ and $\alpha$ and the following columns:

`alpha`	The levels at which critical values were simulated.
`n`	The number of observations for which critical values were simulated.
`withadd`	Critical values based on $T_n^+({\bf{U}})$ and the set of all intervals $\mathcal{I}_{all}$ .
`noadd`	Critical values based on $T_n({\bf{U}})$ and the set of all intervals $\mathcal{I}_{all}$ .

Details

For details on the above test statistics see modeHunting. Critical values are based on $M=100'000$ simulations of i.i.d. random vectors

${\bf{U}} = (U_1,\dots,U_n)$

where $U_i$ is a uniformly on $[0,1]$ distributed random variable, $i=1,\dots,M$ .

Remember

$n$ is the number of interior observations, i.e. if you are analyzing a sample of size $m$ , then you need critical values corresponding to

`n = m-2`	If no additional information on $a$ and $b$ is available.
`n = m-1`	If either $a$ or $b$ is known to be a certain finite number.
`n = m`	If both $a$ and $b$ are known to be certain finite numbers,

where $[a,b] = \{x \ : \ f(x) > 0\}$ is the support of $f$ .

Source

These critical values were generated using the function criticalValuesAll. Critical values for other combinations for $\alpha$ and $n$ can be computed using this latter function.

References

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## extract critical values for alpha = 0.05, n = 200
data(cvModeAll)
cv <- cvModeAll[cvModeAll$alpha == 0.05 & cvModeAll$n == 200, 3:4]
cv
## extract critical values for alpha = 0.05, n = 200
data(cvModeAll)
cv <- cvModeAll[cvModeAll$alpha == 0.05 & cvModeAll$n == 200, 3:4]
cv

Critical values for test statistic based on the approximating set of intervals

Description

This dataset contains critical values for some $n$ and $\alpha$ for the test statistic based on the approximating set of intervals, with or without additive correction term $\Gamma$ .

Usage

data(cvModeApprox)data(cvModeApprox)

Format

A data frame providing 15 different combinations of $n$ and $\alpha$ and the following columns:

`alpha`	The levels at which critical values were simulated.
`n`	The number of observations for which critical values were simulated.
`withadd`	Critical values based on $T_n^+({\bf{U}})$ and the approximating set of intervals $\mathcal{I}_{app}$ .
`noadd`	Critical values based on $T_n({\bf{U}})$ and the approximating set of intervals $\mathcal{I}_{app}$ .

Details

For details see modeHunting. Critical values are based on $M=100'000$ simulations of i.i.d. random vectors

${\bf{U}} = (U_1,\dots,U_n)$

where $U_i$ is a uniformly on $[0,1]$ distributed random variable, $i=1,\dots,M$ .

Remember

$n$ is the number of interior observations, i.e. if you are analyzing a sample of size $m$ , then you need critical values corresponding to

`n = m-2`	If no additional information on $a$ and $b$ is available.
`n = m-1`	If either $a$ or $b$ is known to be a certain finite number.
`n = m`	If both $a$ and $b$ are known to be certain finite numbers,

where $[a,b] = \{x \ : \ f(x) > 0\}$ is the support of $f$ .

Source

These critical values were generated using the function criticalValuesApprox. Critical values for other combinations for $\alpha$ and $n$ can be computed using this latter function.

References

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## extract critical values for alpha = 0.05, n = 200
data(cvModeApprox)
cv <- cvModeApprox[cvModeApprox$alpha == 0.05 & cvModeApprox$n == 200, 3:4]
cv
## extract critical values for alpha = 0.05, n = 200
data(cvModeApprox)
cv <- cvModeApprox[cvModeApprox$alpha == 0.05 & cvModeApprox$n == 200, 3:4]
cv

Critical values for test statistic based on the block procedure

Description

This dataset contains critical values for some $n$ and $\alpha$ for the block procedure.

Usage

data(cvModeBlock)data(cvModeBlock)

Format

A data frame providing 15 different combinations of $n$ and $\alpha$ and the following columns:

`alpha`	The levels at which critical values were simulated.
`n`	The number of observations for which critical values were simulated.
`block 1 - 9`	Critical values for the respective blocks.

Details

For details see modeHunting. Critical values are based on $M=100'000$ simulations of i.i.d. random vectors

${\bf{U}} = (U_1,\dots,U_n)$

where $U_i$ is a uniformly on $[0,1]$ distributed random variable, $i=1,\dots,M$ .

Remember

$n$ is the number of interior observations, i.e. if you are analyzing a sample of size $m$ , then you need critical values corresponding to

`n = m-2`	If no additional information on $a$ and $b$ is available.
`n = m-1`	If either $a$ or $b$ is known to be a certain finite number.
`n = m`	If both $a$ and $b$ are known to be certain finite numbers,

where $[a,b] = \{x \ : \ f(x) > 0\}$ is the support of $f$ .

Source

These critical values were generated using the function criticalValuesBlock. Critical values for other combinations for $\alpha$ and $n$ can be computed using this latter function.

References

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## extract critical values for alpha = 0.05, n = 200
data(cvModeBlock)
cv <- cvModeBlock[cvModeBlock$alpha == 0.05 & cvModeBlock$n == 200, 3:11]
cv
## extract critical values for alpha = 0.05, n = 200
data(cvModeBlock)
cv <- cvModeBlock[cvModeBlock$alpha == 0.05 & cvModeBlock$n == 200, 3:11]
cv

Perturbed Uniform Distribution

Description

Density function, distribution function, quantile function and random generation for the perturbed uniform distribution having a linear increase of slope $s$ on an interval $[a,b] \in [0,1]$ .

Usage

dlin(x, a, b, s) 
plin(q, a, b, s) 
qlin(p, a, b, s)
rlin(n, a, b, s)dlin(x, a, b, s) 
plin(q, a, b, s) 
qlin(p, a, b, s)
rlin(n, a, b, s)

Arguments

`x`, `q`	Vector of quantiles.
`p`	Vector of probabilities.
`n`	Number of observations.
`a`	Left interval endpoint, real number in $[0,1)$ .
`b`	Right interval endpoint, real number in $(0,1]$ .
`s`	Slope parameter, real number such that $\|s\| \le 2/(b-a)$ .

Details

The what we call perturbed uniform distribution (PUD) with perturbation on an interval $[a,b] \in [0,1]$ with slope parameter $s$ such that $|s| \le 2 / (b-a)$ has density function

$f_{a, b, s}(x) = \Bigl(sx-s\frac{a+b}{2}\Bigr)1\{x \in [a,b)\} + 1\{[0,a) \cup [b,1]\},$

distribution function

$F_{a, b, s}(q) = \Bigl(q+\frac{s}{2}(q^2-a^2+(a-x)(a+b)) \Bigr)1\{q \in [a,b)\} + q\{[0,a) \cup [b,1]\},$

and quantile function

$F_{a, b, s}^{-1}(p) = \Bigl(-s^{-1}+\frac{a+b}{2}+\frac{s \sqrt{(a-b)^2+\frac{4}{s}(\frac{1}{s}-(a+b)+2p)}}{2|s|} \Bigr) \ 1\{p \in [a,b)\} + p\{[0,a) \cup [b,1]\}.$

This function was used to carry out the simulations to compute the power curves given in Rufibach and Walther (2010).

Value

dlin gives the values of the density function, plin those of the distribution function, and qlin those of the quantile function of the PUD at $x, q,$ and $p$ , respectively. rlin generates $n$ random numbers, returned as an ordered vector.

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Compute set of minimal intervals

Description

In general, all intervals that have a test statistic bigger than the respective critical value are output. For a given set of intervals $\mathcal{K}$ , all intervals $J$ such that $\mathcal{K}$ does not contain a proper subset of $J$ are called minimal. Given $\mathcal{K}$ , this function computes the set of minimal intervals.

Usage

minimalIntervals(ints)minimalIntervals(ints)

Arguments

ints

Either one of the sets $\mathcal{D}^+$ or $\mathcal{D}^-$ as output by one of the functions modeHunting, modeHuntingApprox, or modeHuntingBlock.

Value

Returns the set of minimal elements $\bf{D}^\pm$ , corresponding to the set of input intervals $\mathcal{D}^\pm$ .

Note

Depending on the value of $min.int$ , this function is called by modeHunting,
modeHuntingApprox, and modeHuntingBlock.

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Minimal intervals were first introduced (although for a different multiscale procedure) on p. 517 in

Lutz Dümbgen (2002). Application of Local Rank Tests to Nonparametric Regression. Journal of Nonparametric Statistics, 14, 511–537.

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Multiscale analysis of a density on all possible intervals

Description

Simultanous confidence statements for the existence and location of local increases and decreases of a density f, computed on all intervals spanned by two observations.

Usage

modeHunting(X.raw, lower = -Inf, upper = Inf, crit.vals, min.int = FALSE)
modeHunting(X.raw, lower = -Inf, upper = Inf, crit.vals, min.int = FALSE)

Arguments

`X.raw`	Vector of observations.
`lower`	Lower support point of $f$ , if known.
`upper`	Upper support point of $f$ , if known.
`crit.vals`	2-dimensional vector giving the critical values for the desired level.
`min.int`	If `min.int = TRUE`, the set of minimal intervals is output, otherwise all intervals with a test statistic above the critical value are given.

Details

In general, the methods modeHunting, modeHuntingApprox, and modeHuntingBlock compute for a given level $\alpha \in (0, 1)$ and the corresponding critical value $c_{jk}(\alpha)$ two sets of intervals

$\mathcal{D}^\pm(\alpha) = \Bigl\{ \mathcal{I}_{jk} \ : \ \pm T_{jk}({\bf{X}} ) > c_{jk}(\alpha) \Bigr\}$

where $\mathcal{I}_{jk}:=(X_{(j)},X_{(k)})$ for $0\le j < k \le n+1, k-j> 1$ and $c_{jk}$ are appropriate critical values.

Specifically, the function modeHunting computes $\mathcal{D}^\pm(\alpha)$ based on the two test statistics

$T_n^+({\bf{X}}, \mathcal{I}) = \max_{(j,k) \in \mathcal{I}} \Bigl( |T_{jk}({\bf{X}})| / \sigma_{jk} - \Gamma \Bigl(\frac{k-j}{n+2}\Bigr)\Bigr)$

and

$T_n({\bf{X}}, \mathcal{I}) = \max_{(j,k) \in \mathcal{I}} ( |T_{jk}({\bf{X}})| / \sigma_{jk} ),$

using the set $\mathcal{I} := \mathcal{I}_{all}$ of all intervals spanned by two observations $(X_{(j)}, X_{(k)})$ :

$\mathcal{I}_{all} = \Bigl\{(j, \ k ) \ : \ 0 \le j < k \le n+1, \ k - j > 1\Bigr\}.$

We introduced the local test statistics

$T_{jk}({\bf{X}}) := \sum_{i=j+1}^{k-1} ( 2 X_{(i; j, k)} - 1) 1\{X_{(i; j, k)} \in (0,1)\},$

for local order statistics

$X_{(i; j, k)} := \frac{X_{(i)}-X_{(j)}}{X_{(k)} - X_{(j)}},$

the standard deviation $\sigma_{jk} := \sqrt{(k-j-1)/3}$ and the additive correction term $\Gamma(\delta) := \sqrt{2 \log(e / \delta)}$ for $\delta > 0$ .

If min.int = TRUE, the set $\mathcal{D}^\pm(\alpha)$ is replaced by the set ${\bf{D}}^\pm(\alpha)$ of its minimal elements. An interval $J \in \mathcal{D}^\pm(\alpha)$ is called minimal if $\mathcal{D}^\pm(\alpha)$ contains no proper subset of $J$ . This minimization post-processing step typically massively reduces the number of intervals. If we are mainly interested in locating the ranges of increases and decreases of $f$ as precisely as possible, the intervals in $\mathcal{D}^\pm(\alpha) \setminus \bf{D}^\pm(\alpha)$ do not contain relevant information.

Value

`Dp`	The set $\mathcal{D}^+(\alpha)$ (or $\bf{D}^+(\alpha)$ ), based on the test statistic with additive correction $\Gamma$ .
`Dm`	The set $\mathcal{D}^-(\alpha)$ (or $\bf{D}^-(\alpha)$ ), based on the test statistic with $\Gamma$ .
`Dp.noadd`	The set $\mathcal{D}^+(\alpha)$ (or $\bf{D}^+(\alpha)$ ), based on the test statistic without $\Gamma$ .
`Dm.noadd`	The set $\mathcal{D}^+(\alpha)$ (or $\bf{D}^-(\alpha)$ ), based on the test statistic without $\Gamma$ .

Note

Critical values for modeHunting and some combinations of $n$ and $\alpha$ are provided in the data set cvModeAll. Critical values for other values of $n$ and $\alpha$ can be generated using criticalValuesAll.

Parts of this function were derived from MatLab code provided on Lutz Duembgen's webpage,
http://www.staff.unibe.ch/duembgen.

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Duembgen, L. and Walther, G. (2008). Multiscale Inference about a density. Ann. Statist., 36, 1758–1785.

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## for examples type
help("mode hunting")
## and check the examples there## for examples type
help("mode hunting")
## and check the examples there

Multiscale analysis of a density on the approximating set of intervals

Description

Simultanous confidence statements for the existence and location of local increases and decreases of a density f, computed on the approximating set of intervals.

Usage

modeHuntingApprox(X.raw, lower = -Inf, upper = Inf, 
    d0 = 2, m0 = 10, fm = 2, crit.vals, min.int = FALSE)
modeHuntingApprox(X.raw, lower = -Inf, upper = Inf, 
    d0 = 2, m0 = 10, fm = 2, crit.vals, min.int = FALSE)

Arguments

`X.raw`	Vector of observations.
`lower`	Lower support point of $f$ , if known.
`upper`	Upper support point of $f$ , if known.
`d0`	Initial parameter for the grid resolution.
`m0`	Initial parameter for the number of observations in one block.
`fm`	Factor by which $m$ is increased from block to block.
`crit.vals`	2-dimensional vector giving the critical values for the desired level.
`min.int`	If `min.int = TRUE`, the set of minimal intervals is output, otherwise all intervals with a test statistic above the critical value are given.

Details

See blocks for details how $\mathcal{I}_{app}$ is generated and modeHunting for a proper introduction to the notation used here. The function modeHuntingApprox computes $\mathcal{D}^\pm(\alpha)$ based on the two test statistics $T_n^+({\bf{X}}, \mathcal{I}_{app})$ and $T_n({\bf{X}}, \mathcal{I}_{app})$ .

Value

`Dp`	The set $\mathcal{D}^+(\alpha)$ (or $\bf{D}^+(\alpha)$ ), based on the test statistic with additive correction $\Gamma$ .
`Dm`	The set $\mathcal{D}^-(\alpha)$ (or $\bf{D}^-(\alpha)$ ), based on the test statistic with $\Gamma$ .
`Dp.noadd`	The set $\mathcal{D}^+(\alpha)$ (or $\bf{D}^+(\alpha)$ ), based on the test statistic without $\Gamma$ .
`Dm.noadd`	The set $\mathcal{D}^+(\alpha)$ (or $\bf{D}^-(\alpha)$ ), based on the test statistic without $\Gamma$ .

Note

Critical values for modeHuntingApprox and some combinations of $n$ and $\alpha$ are provided in the data set cvModeApprox. Critical values for other values of $n$ and $\alpha$ can be generated using criticalValuesApprox.

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Duembgen, L. and Walther, G. (2008). Multiscale Inference about a density. Ann. Statist., 36, 1758–1785.

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## for examples type
help("mode hunting")
## and check the examples there## for examples type
help("mode hunting")
## and check the examples there

Multiscale analysis of a density via block procedure

Description

Simultanous confidence statements for the existence and location of local increases and decreases of a density f, computed via the block procedure.

Usage

modeHuntingBlock(X.raw, lower = -Inf, upper = Inf, d0 = 2, 
    m0 = 10, fm = 2, crit.vals, min.int = FALSE)
modeHuntingBlock(X.raw, lower = -Inf, upper = Inf, d0 = 2, 
    m0 = 10, fm = 2, crit.vals, min.int = FALSE)

Arguments

`X.raw`	Vector of observations.
`lower`	Lower support point of $f$ , if known.
`upper`	Upper support point of $f$ , if known.
`d0`	Initial parameter for the grid resolution.
`m0`	Initial parameter for the number of observations in one block.
`fm`	Factor by which $m$ is increased from block to block.
`crit.vals`	2-dimensional vector giving the critical values for the desired level.
`min.int`	If `min.int = TRUE`, the set of minimal intervals is output, otherwise all intervals with a test statistic above the critical value (in their respective block) are given.

Details

See blocks for details how $\mathcal{I}_{app}$ is generated and modeHunting for a proper introduction to the notation used here. The function modeHuntingBlock uses the test statistic $T^+_n({\bf X}, \mathcal{B}_r)$ , where $\mathcal{B}_r$ contains all intervals of Block $r$ , $r=1,\ldots,\#blocks$ . Critical values for each block individually are received via finding an $\tilde \alpha$ such that

$P(B_n({\bf{X}}) > q_{r,\tilde \alpha / (r+tail)^\gamma} \ for \ at \ least \ one \ r) \le \alpha,$

where $q_{r,\alpha}$ is the $(1-\alpha)$ –quantile of the distribution of $T^+_n({\bf X}, \mathcal{B}_r).$ We then define the sets $\mathcal{D}^\pm(\alpha)$ as

$\mathcal{D}^\pm(\alpha) := \Bigl\{\mathcal{I}_{jk} \ : \ \pm T_{jk}({\bf{X}}) > q_{r,\tilde \alpha / (r+tail)^\gamma} \, , \ r = 1,\ldots \#blocks\Bigr\}.$

Note that $\gamma$ and $tail$ are automatically determined by $crit.vals$ .

Value

`Dp`	The set $\mathcal{D}^+(\alpha)$ (or $\bf{D}^+(\alpha)$ ).
`Dm`	The set $\mathcal{D}^-(\alpha)$ (or $\bf{D}^-(\alpha)$ ).

Note

Critical values for some combinations of $n$ and $\alpha$ are provided in the data sets cvModeBlock. Critical values for other values of $n$ and $\alpha$ can be generated using criticalValuesApprox.

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

References

Duembgen, L. and Walther, G. (2008). Multiscale Inference about a density. Ann. Statist., 36, 1758–1785.

Rufibach, K. and Walther, G. (2010). A general criterion for multiscale inference. J. Comput. Graph. Statist., 19, 175–190.

Examples

## for examples type
help("mode hunting")
## and check the examples there## for examples type
help("mode hunting")
## and check the examples there

Round 5 up to the next higher integer

Description

The built-in R function round rounds a 5 to the even digit. Instead, we preferred the more intuitive rounding meaning that a 5 is always rounded to the next higher digit.

Usage

myRound(d)myRound(d)

Arguments

`d`	Real number.

Value

The biggest integer not bigger than $d$ if $d - \lfloor d \rfloor < 0.5$ and the smallest integer greater than $d$ if $d - \lfloor d \rfloor \ge 0.5$ .

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

Examples

x <- c(1.5, 2.5)

## built in R function
round(x)
## [1] 2 2

## this function
myRound(x)
## [1] 2 3
x <- c(1.5, 2.5)

## built in R function
round(x)
## [1] 2 2

## this function
myRound(x)
## [1] 2 3

Prepare data vector according to available information on support endpoints of f

Description

Preprocesses the initial data vector X.raw according to whether the upper and/or lower endpoint of the support of f is known.

Usage

preProcessX(X.raw, lower = -Inf, upper = Inf)preProcessX(X.raw, lower = -Inf, upper = Inf)

Arguments

`X.raw`	Vector of observations.
`lower`	Lower support point of $f$ , if known.
`upper`	Upper support point of $f$ , if known.

Details

Depending whether $lower$ and $upper$ are known, the vector of raw observations $X.raw$ is supplemented by $lower$ and/or $upper$ and finally sorted.

Value

Sorted vector of (processed) observations.

Note

This function is called by modeHunting, modeHuntingApprox, and modeHuntingBlock.

This function was derived from MatLab code provided on Lutz Duembgen's webpage,
http://www.staff.unibe.ch/duembgen.

Author(s)

Kaspar Rufibach, [email protected],
http://www.kasparrufibach.ch

Guenther Walther, [email protected],
www-stat.stanford.edu/~gwalther

Package 'modehunt'

Help Index

Multiscale Analysis for Density Functions

Description

Details

Author(s)

References

Examples

Computes number of observations for each block

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Compute critical values based on the set of all intervals

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Compute critical values for (1) the original test statistic with or without additive correction, based on the aprroximating set of intervals and (2) for the block procedure

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Critical values for test statistic based on all intervals

Description

Usage

Format

Details

Remember

Source

References

Examples

Critical values for test statistic based on the approximating set of intervals

Description

Usage

Format

Details

Remember

Source

References

Examples

Critical values for test statistic based on the block procedure

Description

Usage

Format

Details

Remember

Source

References

Examples

Perturbed Uniform Distribution

Description

Usage

Arguments

Details

Value

Author(s)

References

Compute set of minimal intervals

Description

Usage

Arguments

Value

Note