RCC array jobs

In pps cluster
- scripts in PPS
RCC case
- questions
Thanks a lot!
Session information

Last updated: 2016-11-18

Code version: a90593a97f44cea7be2258bc57a94e5ec12670f8

This is a template for writing reports with R Markdown.

In pps cluster

I have done something as following to run a .R file multiple times:

scripts in PPS

R function

args<-commandArgs(TRUE)
g_file=args[1]

library("MASS")
N = 100
P = 200
# all the code .....
# ......
# and then I got the result I would like to restore
result=cbind(beta,vbb,vbp,lvbb,lvbp)
write.table(result, g_file)

The args here is from the shell script shown below. In the command Rscript --verbose ${IPATH}/diag10.R ${OUTPATH}/output${i}.txt we pass the argument into R file and give the value to g_file to write the result into a .txt file.

shell script

#!/bin/sh                                                                     
#$ -t 1-50
#$ -o /mnt/lustre/home/weidong/lvbsim/diag/diag10.out                         
#$ -e /mnt/lustre/home/weidong/lvbsim/diag/diag10.error                       
      
INDIR=/mnt/lustre/home/weidong/lvbsim/diag                                    
IPATH=/mnt/lustre/home/weidong/lvbsim/diag                                    
OUTPATH=/mnt/lustre/home/weidong/lvbsim/diag/diag10out
    
cd  ${INDIR}
for ((i=1; i<=50; i++))                                                       
do
if ((i==${SGE_TASK_ID})); then                                                
Rscript --verbose ${IPATH}/diag10.R ${OUTPATH}/output${i}.txt                 
fi
done

In this way, I run my diag10.R file 50 times and restore the reslut into “OUTPATH=/mnt/lustre/home/weidong/lvbsim/diag/diag10out” with name called output1.txt … output50.txt

RCC case

I want to do similar thing using .sbatch

I can’t access the RCC cluster now to show my script.

Here is the psuedo code

R function

# generate the data 
Y = data_gen(...)$Y
miss_index = data_gen(..)$missindex
system("mkdir NSFAout")
writeMat(miss_index,"./NSFAout/missindex.mat")
g_pma = pma(Y,miss_index)
g_flash = flash(Y,miss_index)
# and other methods...

# call this matlab function 
system("matlab -nospalsh -nodesktop -r \"run('...') ;exit; \" ") # just call the command line to apply matlab function
# In this matlab function, we restore the result into "./NSFAout/nsfaresult.mat"
g_nsfa = readMat("./NSFAout/nsfaresult.mat")

result = c(g_pma$rmse, g_flash$rmse, g_nsfa$rmse)

sbatch script

#!/bin/bash

#SBATCH --job-name=arrayJob
#SBATCH --output=arrayJob_%A_%a.out
#SBATCH --error=arrayJob_%A_%a.err
#SBATCH --array=1-16
#SBATCH --time=01:00:00
#SBATCH --partition=sandyb
#SBATCH --ntasks=1
#SBATCH --mem-per-cpu=4000


######################
# Begin work section #
######################

# Print this sub-job's task ID
mkdir test$$SLURM_ARRAY_TASK_ID
# this make a directory called test1 .... test16...
cd test$$SLURM_ARRAY_TASK_ID
Rscript --verbose Rcc_wrapper.R

questions

I tried the args this in .sbatch and .R script and it doesn’t work. Is there any to pass this argument in the .sbatch file into R file just like what pps example did?
I tried the INDIR=/mnt/lustre/home/weidong/lvbsim/diag and cd ${INDIR} in .sbatch script and it doesn’t work. Is there any way to do that?
Could you please show some example to run .R file multiple time in RCC and point to specific output directory with the output file having name like output1.txt, ….output10.txt… Within this process, I have need to creat some folder called test1 … test10 to restore my missindex.mat file each time.

more details

I am running 4 methods on missing data imputation and compare them. So for each data I have, I would like to generate some index and throw away the data points corresponding to these index. By applying those 4 methods we can predict the missing value and compare with the true value which we throw away. In this way we can calculate the RMSE for this data and this missing index. I would like to this many times.

Let’s say I would do this procedure 100 times on each data. For each time, I need to restore the missing index somewhere as a .mat file to run matlab function. I also need to restore the output of the matlab funtion to compare with the output of R functions. I would like restore all the output into a .txt file somewhere in each time to aggregate the result finally.

I kind of make this procedure work but in a very silly way. I think the solution in the PPS example is better and neat. Is there any way I can do something in RCC similar with the PPS example?

Thanks a lot!

Session information

R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (unknown)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.14

loaded via a namespace (and not attached):
 [1] magrittr_1.5    formatR_1.4     tools_3.2.4     htmltools_0.3.5
 [5] yaml_2.1.13     Rcpp_0.12.7     stringi_1.1.2   rmarkdown_1.0  
 [9] stringr_1.1.0   digest_0.6.10   evaluate_0.10

RCC array jobs

First Last

YYYY-MM-DD

In pps cluster

scripts in PPS

R function

shell script

RCC case

R function

sbatch script

questions

more details

Thanks a lot!

Session information