Note that there are some explanatory texts on larger screens.

plurals
  1. POLSF issue with large scale runs
    primarykey
    data
    text
    <p>On our cluster, when I submit jobs requesting more than (including) 40 nodes or 640 cores, the $LSB_HOSTS gets empty and so the job stops. I use this variable to generate a nodelist file which I use with the mpirun command line as the following:</p> <pre><code>#BSUB -q cpu #BSUB -J gromacs #BSUB -o job.out #BSUB -e job.err #BSUB -n 640 ##################################################### ##################################################### INPUT=test184000atoms_verlet.tpr echo "" echo "----------------------- INTIALIZATIONS -----------------------------" echo "" source /lustre/utility/intel/composer_xe_2013.3.163/bin/compilervars.sh intel64 source /lustre/utility/intel/mkl/bin/intel64/mklvars_intel64.sh source /lustre/utility/intel/impi/4.1.1.036/bin64/mpivars.sh MPIRUN=/path/to/intel/impi/4.1.1.036/intel64/bin/mpirun EXE=mdrun_mpi if test ! -x `which $EXE` ; then echo echo "ERROR: `which $EXE` not existent or not executable" echo "Aborting" exit 1 fi CURDIR=$PWD cd $CURDIR rm -f nodelist &gt;&amp; /dev/null touch nodelist for host in `echo $LSB_HOSTS` do echo $host &gt;&gt; nodelist sleep 2 done NP=`cat nodelist |wc -l` NN=`cat nodelist |sort |uniq|tee nodes |wc -l` echo echo "Executable : `which $EXE`" echo "Working directory is $CURDIR" echo "Running on host `hostname`" echo "Directory is `pwd`" echo "This jobs runs on $NN nodes" echo "This job has allocated $NP core(s)" echo ulimit -aH echo ls -al echo "" echo "----------------------- RUN -----------------------------" echo "" date '+RUN STARTED ON %m/%d/%y AT %H:%M:%S' $MPIRUN -np $NN -machinefile nodes $EXE -v -deffnm $INPUT &gt;&amp; $EXE.log date '+RUN ENDED ON %m/%d/%y AT %H:%M:%S' echo "" echo "----------------------- DONE ----------------------------" echo "" ls -al </code></pre> <p>Any hints here?</p> <p>Can you see something wrong with this script?</p> <p>Thanks,</p> <p>Éric.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload