Title: | Least Squares, Logistic, and Cox-Regression with Ordered Predictors |
---|---|
Description: | In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of a precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. This package implements an active set algorithm that efficiently computes such estimators. |
Authors: | Kaspar Rufibach |
Maintainer: | Kaspar Rufibach <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.6 |
Built: | 2024-12-12 03:09:46 UTC |
Source: | https://github.com/cran/OrdFacReg |
In biomedical studies, researchers are often interested in assessing the association between one or more ordinal explanatory variables and an outcome variable, at the same time adjusting for covariates of any type. The outcome variable may be continuous, binary, or represent censored survival times. In the absence of a precise knowledge of the response function, using monotonicity constraints on the ordinal variables improves efficiency in estimating parameters, especially when sample sizes are small. This package implements an active set algorithm that efficiently computes such estimators.
Package: | OrdFacReg |
Type: | Package |
Version: | 1.0.6 |
Date: | 2015-07-03 |
License: | GPL (>=2) |
LazyLoad: | yes |
Use this package to get estimates in least squares, logistic, or Cox-regression where coefficients corresponding to
dummy variables of ordered factors are estimated to be in non-decreasing order and at least 0.
The package offers an active set algorithm implemented in the functions ordFacReg
for least squares
and logistic regression and ordFacRegCox
for Cox-regression.
Kaspar Rufibach (maintainer)
[email protected]
http://www.kasparrufibach.ch
Rufibach, K. (2010). An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors. Comput. Statist. Data Anal., 54, 1442-1456.
Examples are given in the help files of the functions ordFacReg
and ordFacRegCox
.
Internal functions for ordered factor regression functions.
These functions are not intended to be called by users directly.
Abeta
Function in Rufibach (2010) that collects the indices of
the inequalities violated by
.
constraintMats
Function that computes the matrices (collects the basis vectors
given in Theorem 3.1 of Duembgen et al. (2007)) and
(collects the vectors
that
make up the cone
in Section 3.1 of Duembgen et al. (2007)).
coxDeriv
Computes gradient of (pseudo-)log-likelihood function in Cox-regression.
coxLoglik
Computes value of (pseudo-)log-likelihood function in Cox-regression.
coxSubspace
Computes maximizer on subspace, denoted by in Table 1
of Duembgen et al. (2007).
dummy
Generate a matrix of dummy variables corresponding to the levels of the inputed factor.
The dummy variable corresponding to the lowest level of the factor is omitted.
expandBeta
After computation of on subspace “blow up” this vector again
to original dimension.
indexDummy
Compute column numbers of the dummy variables of the ordered factor(s).
lmLSE
Compute value of least squares criterion and least squares estimate.
lmSS
Compute value of least squares criterion and its gradient.
logRegDeriv
Gradient of log-likelihood function in logistic regression.
logRegLoglik
Compute value of log-likelihood function in logistic regression.
logRegSubspace
Computes maximizer on subspace, denoted by in
Table 1 of Duembgen et al. (2007).
LSEsubspace
Computes maximizer on subspace, denoted by in
Table 1 of Duembgen et al. (2007).
maxStep
Compute maximal permissible steplength, denoted by in Table 1 in
Duembgen et al. (2007).
phi_jl
Function in Rufibach (2010) that maps the original indices
to
the inequality index
.
setminus
Remove elements in vector from vector
.
shrinkBeta
Collapse according to the active constraints specified by the set
.
Kaspar Rufibach (maintainer)
[email protected]
http://www.kasparrufibach.ch
Duembgen, L., Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at http://arxiv.org/abs/0707.4643.
Rufibach, K. (2010). An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors. Comput. Statist. Data Anal., 54, 1442-1456.
All these functions are used by the ordered factor computation functions ordFacReg
and ordFacRegCox
.
This function computes estimates in least squares or logistic regression where coefficients corresponding to dummy variables of ordered factors are estimated to be in non-decreasing order and at least 0. An active set algorithm as described in Duembgen et al. (2007) is used.
ordFacReg(D, Z, fact, ordfact, ordering = NA, type = c("LS", "logreg"), intercept = TRUE, display = 0, eps = 0)
ordFacReg(D, Z, fact, ordfact, ordering = NA, type = c("LS", "logreg"), intercept = TRUE, display = 0, eps = 0)
D |
Response vector, either in |
Z |
Matrix of predictors. Factors are coded with levels from 1 to |
fact |
Specify columns in |
ordfact |
Specify columns in |
ordering |
Vector of the same length as |
type |
Specify type of response variable. |
intercept |
If |
display |
If |
eps |
Quantity to which the criterion in the Basic Procedure 2 in Duembgen et al. (2007) is compared. |
For a detailed description of the problem and the algorithm we refer to Rufibach (2010).
L |
Value of the criterion function at the maximum. |
beta |
Computed regression coefficients. |
A |
Set |
design.matrix |
Design matrix that was generated. |
Kaspar Rufibach (maintainer)
[email protected]
http://www.kasparrufibach.ch
Duembgen, L., Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at http://arxiv.org/abs/0707.4643.
Rufibach, K. (2010). An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors. Comput. Statist. Data Anal., 54, 1442-1456.
ordFacRegCox
computes estimates for Cox-regression.
## ======================================================== ## To illustrate least squares estimation, we generate the same data ## that was used in Rufibach (2010), Table 1. ## ======================================================== ## -------------------------------------------------------- ## initialization ## -------------------------------------------------------- n <- 200 Z <- NULL intercept <- FALSE ## -------------------------------------------------------- ## quantitative variables ## -------------------------------------------------------- n.q <- 3 set.seed(14012009) if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, mean = 1, sd = 2))}} ## -------------------------------------------------------- ## unordered factors ## -------------------------------------------------------- un.levels <- 3 for (i in 1:length(un.levels)){Z <- cbind(Z, sample(rep(1:un.levels[i], each = ceiling(n / un.levels)))[1:n])} fact <- n.q + 1:length(un.levels) ## -------------------------------------------------------- ## ordered factors ## -------------------------------------------------------- levels <- 8 for (i in 1:length(un.levels)){Z <- cbind(Z, sample(rep(1:levels[i], each = ceiling(n / levels)))[1:n])} ordfact <- n.q + length(un.levels) + 1:length(levels) ## -------------------------------------------------------- ## generate data matrices ## -------------------------------------------------------- Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y ## -------------------------------------------------------- ## generate response ## -------------------------------------------------------- D <- apply(Y * matrix(c(rep(c(2, -3, 0), each = n), rep(c(1, 1), each = n), rep(c(0, 2, 2, 2, 2, 5, 5), each = n)), ncol = ncol(Y)), 1, sum) + rnorm(n, mean = 0, sd = 4) ## -------------------------------------------------------- ## compute estimates ## -------------------------------------------------------- res1 <- lmLSE(D, Y) res2 <- ordFacReg(D, Z, fact, ordfact, ordering = "i", type = "LS", intercept, display = 1, eps = 0) b1 <- res1$beta g1 <- lmSS(b1, D, Y)$dL b2 <- res2$beta g2 <- lmSS(b2, D, Y)$dL Ls <- c(lmSS(b1, D, Y)$L, lmSS(b2, D, Y)$L) names(Ls) <- c("LSE", "ordFact") disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), round(cbind(b2, g2, cumsum(g2)), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls ## ======================================================== ## Artificial data is used to illustrate logistic regression. ## ======================================================== ## -------------------------------------------------------- ## initialization ## -------------------------------------------------------- set.seed(1977) n <- 500 Z <- NULL intercept <- FALSE ## -------------------------------------------------------- ## quantitative variables ## -------------------------------------------------------- n.q <- 2 if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, rgamma(2, 2, 1)))}} ## -------------------------------------------------------- ## unordered factors ## -------------------------------------------------------- un.levels <- c(8, 2) for (i in 1:length(un.levels)){Z <- cbind(Z, sample(round(runif(n, 0, un.levels[i] - 1)) + 1))} fact <- n.q + 1:length(un.levels) ## -------------------------------------------------------- ## ordered factors ## -------------------------------------------------------- levels <- c(2, 4, 10) for (i in 1:length(levels)){Z <- cbind(Z, sample(round(runif(n, 0, levels[i] - 1)) + 1))} ordfact <- n.q + length(un.levels) + 1:length(levels) ## -------------------------------------------------------- ## generate response ## -------------------------------------------------------- D <- sample(c(rep(0, n / 2), rep(1, n/2))) ## -------------------------------------------------------- ## generate design matrix ## -------------------------------------------------------- Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y ## -------------------------------------------------------- ## compute estimates ## -------------------------------------------------------- res1 <- matrix(glm.fit(Y, D, family = binomial(link = logit))$coefficients, ncol = 1) res2 <- ordFacReg(D, Z, fact, ordfact, ordering = NA, type = "logreg", intercept = intercept, display = 1, eps = 0) b1 <- res1 g1 <- logRegDeriv(b1, D, Y)$dL b2 <- res2$beta g2 <- logRegDeriv(b2, D, Y)$dL Ls <- unlist(c(logRegLoglik(res1, D, Y), res2$L)) names(Ls) <- c("MLE", "ordFact") disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), round(cbind(b2, g2, cumsum(g2)), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls ## -------------------------------------------------------- ## compute estimates when the third ordered factor should ## have *decreasing* estimated coefficients ## -------------------------------------------------------- res3 <- ordFacReg(D, Z, fact, ordfact, ordering = c("i", "i", "d"), type = "logreg", intercept = intercept, display = 1, eps = 0) b3 <- res3$beta g3 <- logRegDeriv(b3, D, Y)$dL Ls <- unlist(c(logRegLoglik(res1, D, Y), res2$L, res3$L)) names(Ls) <- c("MLE", "ordFact ddd", "ordFact iid") disp <- cbind(1:length(b1), round(cbind(b1, b2, b3), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls
## ======================================================== ## To illustrate least squares estimation, we generate the same data ## that was used in Rufibach (2010), Table 1. ## ======================================================== ## -------------------------------------------------------- ## initialization ## -------------------------------------------------------- n <- 200 Z <- NULL intercept <- FALSE ## -------------------------------------------------------- ## quantitative variables ## -------------------------------------------------------- n.q <- 3 set.seed(14012009) if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, mean = 1, sd = 2))}} ## -------------------------------------------------------- ## unordered factors ## -------------------------------------------------------- un.levels <- 3 for (i in 1:length(un.levels)){Z <- cbind(Z, sample(rep(1:un.levels[i], each = ceiling(n / un.levels)))[1:n])} fact <- n.q + 1:length(un.levels) ## -------------------------------------------------------- ## ordered factors ## -------------------------------------------------------- levels <- 8 for (i in 1:length(un.levels)){Z <- cbind(Z, sample(rep(1:levels[i], each = ceiling(n / levels)))[1:n])} ordfact <- n.q + length(un.levels) + 1:length(levels) ## -------------------------------------------------------- ## generate data matrices ## -------------------------------------------------------- Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y ## -------------------------------------------------------- ## generate response ## -------------------------------------------------------- D <- apply(Y * matrix(c(rep(c(2, -3, 0), each = n), rep(c(1, 1), each = n), rep(c(0, 2, 2, 2, 2, 5, 5), each = n)), ncol = ncol(Y)), 1, sum) + rnorm(n, mean = 0, sd = 4) ## -------------------------------------------------------- ## compute estimates ## -------------------------------------------------------- res1 <- lmLSE(D, Y) res2 <- ordFacReg(D, Z, fact, ordfact, ordering = "i", type = "LS", intercept, display = 1, eps = 0) b1 <- res1$beta g1 <- lmSS(b1, D, Y)$dL b2 <- res2$beta g2 <- lmSS(b2, D, Y)$dL Ls <- c(lmSS(b1, D, Y)$L, lmSS(b2, D, Y)$L) names(Ls) <- c("LSE", "ordFact") disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), round(cbind(b2, g2, cumsum(g2)), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls ## ======================================================== ## Artificial data is used to illustrate logistic regression. ## ======================================================== ## -------------------------------------------------------- ## initialization ## -------------------------------------------------------- set.seed(1977) n <- 500 Z <- NULL intercept <- FALSE ## -------------------------------------------------------- ## quantitative variables ## -------------------------------------------------------- n.q <- 2 if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, rgamma(2, 2, 1)))}} ## -------------------------------------------------------- ## unordered factors ## -------------------------------------------------------- un.levels <- c(8, 2) for (i in 1:length(un.levels)){Z <- cbind(Z, sample(round(runif(n, 0, un.levels[i] - 1)) + 1))} fact <- n.q + 1:length(un.levels) ## -------------------------------------------------------- ## ordered factors ## -------------------------------------------------------- levels <- c(2, 4, 10) for (i in 1:length(levels)){Z <- cbind(Z, sample(round(runif(n, 0, levels[i] - 1)) + 1))} ordfact <- n.q + length(un.levels) + 1:length(levels) ## -------------------------------------------------------- ## generate response ## -------------------------------------------------------- D <- sample(c(rep(0, n / 2), rep(1, n/2))) ## -------------------------------------------------------- ## generate design matrix ## -------------------------------------------------------- Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y ## -------------------------------------------------------- ## compute estimates ## -------------------------------------------------------- res1 <- matrix(glm.fit(Y, D, family = binomial(link = logit))$coefficients, ncol = 1) res2 <- ordFacReg(D, Z, fact, ordfact, ordering = NA, type = "logreg", intercept = intercept, display = 1, eps = 0) b1 <- res1 g1 <- logRegDeriv(b1, D, Y)$dL b2 <- res2$beta g2 <- logRegDeriv(b2, D, Y)$dL Ls <- unlist(c(logRegLoglik(res1, D, Y), res2$L)) names(Ls) <- c("MLE", "ordFact") disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), round(cbind(b2, g2, cumsum(g2)), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls ## -------------------------------------------------------- ## compute estimates when the third ordered factor should ## have *decreasing* estimated coefficients ## -------------------------------------------------------- res3 <- ordFacReg(D, Z, fact, ordfact, ordering = c("i", "i", "d"), type = "logreg", intercept = intercept, display = 1, eps = 0) b3 <- res3$beta g3 <- logRegDeriv(b3, D, Y)$dL Ls <- unlist(c(logRegLoglik(res1, D, Y), res2$L, res3$L)) names(Ls) <- c("MLE", "ordFact ddd", "ordFact iid") disp <- cbind(1:length(b1), round(cbind(b1, b2, b3), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls
This function computes estimates in Cox-regression where coefficients corresponding to dummy variables of ordered factors are estimated to be in non-decreasing order and at least 0. An active set algorithm as described in Duembgen et al. (2007) is used.
ordFacRegCox(ttf, tf, Z, fact, ordfact, ordering = NA, intercept = TRUE, display = 0, eps = 0)
ordFacRegCox(ttf, tf, Z, fact, ordfact, ordering = NA, intercept = TRUE, display = 0, eps = 0)
ttf |
Survival times. |
tf |
Censoring indicator (1 = event, 0 = censored). |
Z |
Matrix of predictors. Factors are coded with levels from 1 to |
fact |
Specify columns in |
ordfact |
Specify columns in |
ordering |
Vector of the same length as |
intercept |
If |
display |
If |
eps |
Quantity to which the criterion in the Basic Procedure 2 in Duembgen et al. (2007) is compared. |
For a detailed description of the problem and the algorithm we refer to Rufibach (2010).
L |
Value of the criterion function at the maximum. |
beta |
Computed regression coefficients. |
A |
Set |
design.matrix |
Design matrix that was generated. |
Kaspar Rufibach (maintainer)
[email protected]
http://www.kasparrufibach.ch
Duembgen, L., Huesler, A. and Rufibach, K. (2010). Active set and EM algorithms for log-concave densities based on complete and censored data. Technical report 61, IMSV, Univ. of Bern, available at http://arxiv.org/abs/0707.4643.
Rufibach, K. (2010). An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors. Comput. Statist. Data Anal., 54, 1442-1456.
ordFacReg
computes estimates for least squares and logistic regression.
## ======================================================== ## Artificial data is used to illustrate Cox-regression. ## ======================================================== ## -------------------------------------------------------- ## initialization ## -------------------------------------------------------- set.seed(1977) n <- 500 Z <- NULL intercept <- FALSE ## -------------------------------------------------------- ## quantitative variables ## -------------------------------------------------------- n.q <- 2 if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, rgamma(2, 2, 1)))}} ## -------------------------------------------------------- ## unordered factors ## -------------------------------------------------------- un.levels <- c(8, 2)[2] for (i in 1:length(un.levels)){Z <- cbind(Z, sample(round(runif(n, 0, un.levels[i] - 1)) + 1))} fact <- n.q + 1:length(un.levels) ## -------------------------------------------------------- ## ordered factors ## -------------------------------------------------------- levels <- c(4, 5, 10) for (i in 1:length(levels)){Z <- cbind(Z, sample(round(runif(n, 0, levels[i] - 1)) + 1))} ordfact <- n.q + length(un.levels) + 1:length(levels) ## -------------------------------------------------------- ## generate response ## -------------------------------------------------------- ttf <- rexp(n) tf <- round(runif(n)) ## -------------------------------------------------------- ## generate design matrix ## -------------------------------------------------------- Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y ## -------------------------------------------------------- ## compute estimates ## -------------------------------------------------------- res1 <- eha::coxreg.fit(Y, Surv(ttf, tf), max.survs = length(tf), strats = rep(1, length(tf)))$coefficients res2 <- ordFacRegCox(ttf, tf, Z, fact, ordfact, ordering = NA, intercept = intercept, display = 1, eps = 0) b1 <- matrix(res1, ncol = 1) g1 <- coxDeriv(b1, ttf, tf, Y)$dL b2 <- res2$beta g2 <- coxDeriv(b2, ttf, tf, Y)$dL Ls <- c(coxLoglik(b1, ttf, tf, Y)$L, res2$L) names(Ls) <- c("MLE", "ordFact") disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), round(cbind(b2, g2, cumsum(g2)), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls
## ======================================================== ## Artificial data is used to illustrate Cox-regression. ## ======================================================== ## -------------------------------------------------------- ## initialization ## -------------------------------------------------------- set.seed(1977) n <- 500 Z <- NULL intercept <- FALSE ## -------------------------------------------------------- ## quantitative variables ## -------------------------------------------------------- n.q <- 2 if (n.q > 0){for (i in 1:n.q){Z <- cbind(Z, rnorm(n, rgamma(2, 2, 1)))}} ## -------------------------------------------------------- ## unordered factors ## -------------------------------------------------------- un.levels <- c(8, 2)[2] for (i in 1:length(un.levels)){Z <- cbind(Z, sample(round(runif(n, 0, un.levels[i] - 1)) + 1))} fact <- n.q + 1:length(un.levels) ## -------------------------------------------------------- ## ordered factors ## -------------------------------------------------------- levels <- c(4, 5, 10) for (i in 1:length(levels)){Z <- cbind(Z, sample(round(runif(n, 0, levels[i] - 1)) + 1))} ordfact <- n.q + length(un.levels) + 1:length(levels) ## -------------------------------------------------------- ## generate response ## -------------------------------------------------------- ttf <- rexp(n) tf <- round(runif(n)) ## -------------------------------------------------------- ## generate design matrix ## -------------------------------------------------------- Y <- prepareData(Z, fact, ordfact, ordering = NA, intercept)$Y ## -------------------------------------------------------- ## compute estimates ## -------------------------------------------------------- res1 <- eha::coxreg.fit(Y, Surv(ttf, tf), max.survs = length(tf), strats = rep(1, length(tf)))$coefficients res2 <- ordFacRegCox(ttf, tf, Z, fact, ordfact, ordering = NA, intercept = intercept, display = 1, eps = 0) b1 <- matrix(res1, ncol = 1) g1 <- coxDeriv(b1, ttf, tf, Y)$dL b2 <- res2$beta g2 <- coxDeriv(b2, ttf, tf, Y)$dL Ls <- c(coxLoglik(b1, ttf, tf, Y)$L, res2$L) names(Ls) <- c("MLE", "ordFact") disp <- cbind(1:length(b1), round(cbind(b1, g1, cumsum(g1)), 4), round(cbind(b2, g2, cumsum(g2)), 4)) ## -------------------------------------------------------- ## display results ## -------------------------------------------------------- disp Ls
This function takes a matrix consisting of quantitative variables, unordered, and ordered factors
and generates the corresponding matrix of dummy variables, and some further quantities that are used
by the active set algorithm in ordFacReg
and ordFacRegCox
.
prepareData(Z, fact = NA, ordfact, ordering = NA, intercept = TRUE)
prepareData(Z, fact = NA, ordfact, ordering = NA, intercept = TRUE)
Z |
Matrix with quantitative variables in the first |
fact |
Specify columns in |
ordfact |
Specify columns in |
ordering |
Vector of the same length as |
intercept |
If |
Quantities that are used by the active set algorithm. The names of the objects roughly correspond to those in Rufibach (2010).
Kaspar Rufibach (maintainer)
[email protected]
http://www.kasparrufibach.ch
Rufibach, K. (2010). An Active Set Algorithm to Estimate Parameters in Generalized Linear Models with Ordered Predictors. Comput. Statist. Data Anal., 54, 1442-1456.
This function is used by the ordered factor computation functions ordFacReg
and ordFacRegCox
.