Last updated: 2017-06-13
Code version: 46206a9
In the old code of flash. we found that the greedy algorithm doesn’t increase the lower bound of the log-likelihood. So we turn to understand the likelihood of rank-one and rank-zero model.
Here I will rerun all the representative code, and just provide the original document describing this issue.
The old version document is in ash-sfa
git repo ash-sfa repo.
In the document, we have several observations (please ignore this section since it is just a record for the old code, I will show all the cases in the next section):
ash
might cause this problem, and we find one case that the penalty term does have effect in rank one model. After we change method = "fdr"
to method = "shrink"
in ash
, the objective function monotonically increase in the algorithm. But when we change the mehtod ="shrink"
in greedy
and backfitting
function, the log-likelihood still doesn’t increase in each time by adding a new factor. show case in old code.method = shrink
can guarantee that rank one and backfitting(if not add or remove factor) algorithm monotonic increase. The only problem is adding or removing factor, which cause the log-likelihood not monotonic. Then we turn to the simplest case of adding or removing factor: rank-one vs rank-zero model. In this document, one thing we can’t understand well is that the prediction result seems not corresponding to the objective value. show case in old codeWe will rerun the analysis in this document. The above observations just show the old documents.
The following is the main part of this document.
I think the KL(q || p) is not continuous as q,p -> delta_0 [point mass at 0] eg q_n = N(0,1/n) and p_n = N(0,2/n) I think lim_{n } KL(q_n, p_n) may not be 0? If that is right then the lower bound F(g; qhat(g)) is not necessarily continuous in g (even though likelihood is continuous in g I think)… maybe this explains the gap; and may also tell us what to do about it. —— Matthew’s comments
\[lim_{n \rightarrow \inf} KL(q_n, p_n) = -\frac{1}{2}\{\log2 - \frac{1}{2}\} \neq 0\]
section 2.2 shows that rank one FLASH with 4 different method: method = "fdr"
, method = "shrink"
, method = "spike and slab"
, method = "uniform with point mass"
. We can see that the lower bound of the log likelihood is increase if we apply different methods without penalty term. The objective value is not always increase in method = "fdr"
case.
In section 2.2, there is an interesting observation that the RMSE results don’t always agree wit the value of the objective function, which means that the method with larger objective function value doesn’t has smaller RMSE value (better in prediction).
In section 2.3 we use method = "shrink"
to the same data in section 2.1.2 where the loglikelihood is not monotonically increase, and we find that the loglikelihood still decrease when we add fifth factor. So this is probably because of the penalty term.
Section 2.4 shows that the method = "fdr"
performs very well in rank zero case.
In section 2.3 and 2.12, we can see that rank 4 model have larger value of lower bound but rank 5 model has better performance in RMSE.
We will start from the problem in greedy algorithm:
library(flashr)
sim_K = function(K, N, P, SF, SL, signal,noise){
E = matrix(rnorm(N*P,0,noise),nrow=N)
Y = E
L_true = array(0, dim = c(N,K))
F_true = array(0, dim = c(P,K))
for(k in 1:K){
lstart = rnorm(N, 0, signal)
fstart = rnorm(P, 0, signal)
index = sample(seq(1:N),(N*SL))
lstart[index] = 0
index = sample(seq(1:P),(P*SF))
fstart[index] = 0
L_true[,k] = lstart
F_true[,k] = fstart
Y = Y + lstart %*% t(fstart)
}
return(list(Y = Y, L_true = L_true, F_true = F_true, Error = E))
}
“fdr” method looks fine in many cases.
Greedy_showcase = function(K=5,N = 100, P = 200, SF = 0.8, SL = 0.8, signal = 1,noise = 1){
set.seed(99)
data = sim_K(K,N, P, SF , SL , signal ,noise )
Y = data$Y
E = data$Error
g1 = greedy(Y,K = 10)
plot(g1$clik_vec + cumsum(g1$priorpost_vec),main = "fdr method")
# show the prediction
rmse_5 = sqrt(mean((Y - g1$l %*% t(g1$f) -E)^2)) / sqrt(mean((Y - E)^2))
rmse_4 = sqrt(mean((Y - g1$l[,1:4] %*% t(g1$f[,1:4]) -E)^2)) / sqrt(mean((Y - E)^2))
return(list(obj_track = g1$clik_vec + cumsum(g1$priorpost_vec),RMSE = c(rmse_4,rmse_5)))
}
Greedy_showcase(K=5,N = 100, P = 200, SF = 0.8, SL = 0.8, signal = 1,noise = 1)
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
$obj_track
[1] -29754.56 -29569.95 -29393.01 -29301.72 -29223.50
$RMSE
[1] 0.4631371 0.3659628
But in some case, the lower bound is decrease when we add one more factor:
Greedy_showcase(K=5,N = 100, P=200, SF = 0.5, SL = 0.5, signal = 1,noise = 3)
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
$obj_track
[1] -51550.21 -51490.24 -51464.19 -51429.13 -51434.28
$RMSE
[1] 0.7213337 0.6820024
We guess that it is because the penalty term in ash
we use.
Here we use four different methods: fdr, shrink, uniform without penalty and spike and slab.
rank1test = function(K,N, P, SF , SL , signal ,noise ){
set.seed(99)
par(mfrow = c(3, 2))
par(cex = 0.6)
par(mar = c(3, 3, 0.8, 0.8), oma = c(1, 1, 1, 1))
data = sim_K(K,N, P, SF , SL , signal ,noise )
Y = data$Y
E = data$Error
# "l" is for lower bound
g3_1 = flash(Y,objtype = "l")
plot(g3_1$obj_val_track,main = "fdr method",type = "l")
g3_2 = flash(Y,objtype = "l",ash_para = list( prior = "uniform",pointmass = TRUE,method = NULL))
plot(g3_2$obj_val_track,main = "uniform + pointmass method",type = "l")
g3_3 = flash(Y,objtype = "l",ash_para = list(method = "shrink"))
plot(g3_3$obj_val_track,main = "shrink method",type = "l")
g3_4 = flash(Y,objtype = "l",ash_para = list( prior = "uniform",pointmass = TRUE,method = NULL,mixsd = c(1)))
plot(g3_4$obj_val_track,main = "spike and slab method",type = "l")
obj = c(g3_1$obj_val,g3_2$obj_val,g3_3$obj_val,g3_4$obj_val)
RMSE = c(sqrt(mean((Y - g3_1$l %*% t(g3_1$f) -E)^2)) / sqrt(mean((Y - E)^2)),
sqrt(mean((Y - g3_2$l %*% t(g3_2$f) -E)^2)) / sqrt(mean((Y - E)^2)),
sqrt(mean((Y - g3_3$l %*% t(g3_3$f) -E)^2)) / sqrt(mean((Y - E)^2)),
sqrt(mean((Y - g3_4$l %*% t(g3_4$f) -E)^2)) / sqrt(mean((Y - E)^2)))
plot(RMSE, col =c("magenta","cyan","yellow","green"), pch = 16, cex = 2,main = "RMSE")
legend("top", cex = 1,
legend = c("fdr","Uniform","shrink","SpikeSlab"),
col = c("magenta","cyan","yellow","green"),
pch=16)
plot(obj, col =c("magenta","cyan","yellow","green"), pch = 16, cex = 2,main = "objective value")
legend("top", cex = 1,
legend = c("fdr","Uniform","shrink","SpikeSlab"),
col = c("magenta","cyan","yellow","green"),
pch=16)
}
rank1test(1,100,200,0.5,0.5,1,1)
we can see that the RMSE doesn’t agree with the value of objective function.
rank1test(1,100,200,0.8,0.8,1,2)
We can see that the objective function is not increasing in method = "fdr"
rank1test(1,20,30,0.5,0.6,1,1)
rank1test(1,10,200,0.5,0.6,1,1)
this case
rank1test(1,10,200,0.5,0.5,1,3)
this case show that the shink
and uniform + point mass
are not same.
rank1test(1,10,200,0.5,0.5,1,2.5)
In this case, fdr
method tend to get rank zero estimation, which has highest value of objective function, but rank one model with spike and slab
has better performance in prediction.
rank1test(1,100,300,0.9,0.9,1,1)
rank1test(1,100,300,0.95,0.99,1,0.5)
Greedy_showcase_1 = function(K=5,N = 100, P = 200, SF = 0.8, SL = 0.8, signal = 1,noise = 1,flash_para = list(ash_para = list(method = "shrink"))){
set.seed(99)
data = sim_K(K,N, P, SF , SL , signal ,noise )
Y = data$Y
E = data$Error
g1 = greedy(Y,K = 10,flash_para)
plot(g1$clik_vec + cumsum(g1$priorpost_vec),main = "shrink method")
# show the prediction
rmse_5 = sqrt(mean((Y - g1$l %*% t(g1$f) -E)^2)) / sqrt(mean((Y - E)^2))
rmse_4 = sqrt(mean((Y - g1$l[,1:4] %*% t(g1$f[,1:4]) -E)^2)) / sqrt(mean((Y - E)^2))
return(list(obj_track = g1$clik_vec + cumsum(g1$priorpost_vec),RMSE = c(rmse_4,rmse_5)))
}
Greedy_showcase_1(K=5,N = 100, P=200, SF = 0.5, SL = 0.5, signal = 1,noise = 3,flash_para = list(ash_para = list(method = "shrink")))
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
$obj_track
[1] -51543.11 -51477.99 -51445.58 -51408.87 -51418.96
$RMSE
[1] 0.6844070 0.6806335
by using the method = "shrink"
, we can see that the objective function getting large but the lower bound is still not monotonically increasing. So it’s not because of the penalty we use in ash
.
rank1test_0 = function(K,N, P, SF , SL , signal ,noise ){
set.seed(99)
par(mfrow = c(3, 2))
par(cex = 0.6)
par(mar = c(3, 3, 0.8, 0.8), oma = c(1, 1, 1, 1))
data = sim_K(K,N, P, SF , SL , signal ,noise )
Y = data$Y
E = data$Error
# "l" is for lower bound
g3_1 = flash(Y,objtype = "l")
plot(g3_1$obj_val_track,main = "fdr method",type = "l")
g3_2 = flash(Y,objtype = "l",ash_para = list( prior = "uniform",pointmass = TRUE,method = NULL))
plot(g3_2$obj_val_track,main = "uniform + pointmass method",type = "l")
g3_3 = flash(Y,objtype = "l",ash_para = list(method = "shrink"))
plot(g3_3$obj_val_track,main = "shrink method",type = "l")
g3_4 = flash(Y,objtype = "l",ash_para = list( prior = "uniform",pointmass = TRUE,method = NULL,mixsd = c(1)))
plot(g3_4$obj_val_track,main = "spike and slab method",type = "l")
obj = c(g3_1$obj_val,g3_2$obj_val,g3_3$obj_val,g3_4$obj_val)
RMSE = c(sqrt(mean((Y - g3_1$l %*% t(g3_1$f) -E)^2)) ,
sqrt(mean((Y - g3_2$l %*% t(g3_2$f) -E)^2)) ,
sqrt(mean((Y - g3_3$l %*% t(g3_3$f) -E)^2)) ,
sqrt(mean((Y - g3_4$l %*% t(g3_4$f) -E)^2)) )
plot(RMSE, col =c("magenta","cyan","yellow","green"), pch = 16, cex = 2,main = "RMSE")
legend("top", cex = 1,
legend = c("fdr","Uniform","shrink","SpikeSlab"),
col = c("magenta","cyan","yellow","green"),
pch=16)
plot(obj, col =c("magenta","cyan","yellow","green"), pch = 16, cex = 2,main = "objective value")
legend("top", cex = 1,
legend = c("fdr","Uniform","shrink","SpikeSlab"),
col = c("magenta","cyan","yellow","green"),
pch=16)
}
rank1test_0(1,20,60,1,0.6,1,1)
rank1test_0(1,100,200,1,0.6,1,2)
rank1test_0(1,60,100,1,0.6,1,0.5)
test ## Session information
sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.4 (unknown)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] flashr_0.1.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.10 knitr_1.15.1 magrittr_1.5
[4] REBayes_0.73 MASS_7.3-45 munsell_0.4.3
[7] doParallel_1.0.10 pscl_1.4.9 colorspace_1.3-2
[10] SQUAREM_2016.8-2 lattice_0.20-34 foreach_1.4.3
[13] plyr_1.8.4 ashr_2.1-10 stringr_1.2.0
[16] tools_3.3.0 parallel_3.3.0 grid_3.3.0
[19] gtable_0.2.0 irlba_2.1.2 git2r_0.18.0
[22] htmltools_0.3.5 iterators_1.0.8 assertthat_0.2.0
[25] lazyeval_0.2.0 yaml_2.1.14 rprojroot_1.2
[28] digest_0.6.12 tibble_1.2 Matrix_1.2-8
[31] ggplot2_2.2.1 codetools_0.2-15 evaluate_0.10
[34] rmarkdown_1.3 stringi_1.1.2 Rmosek_7.1.2
[37] scales_0.4.1 backports_1.0.5 truncnorm_1.0-7
This R Markdown site was created with workflowr