Start your first job

This part is now much easier as in principle all you need to do is to run

condor_submit analyze.sub
Submitting job(s)....................
20 job(s) submitted to cluster 210254.

Condor will probably accept your job, but if it tries to run it, it will most certainly hit a problem and place the jobs in the “hold” state. You can check this by running condor_q:

condor_q

-- Schedd: condor1.atlas.local : <10.20.30.16:9618?... @ 05/25/20 05:53:58
OWNER   BATCH_NAME    SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
carsten ID: 210254   5/25 05:49      _      _      _     20     20 210254.0-19

Total for query: 20 jobs; 0 completed, 0 removed, 0 idle, 0 running, 20 held, 0 suspended
Total for carsten: 20 jobs; 0 completed, 0 removed, 0 idle, 0 running, 20 held, 0 suspended
Total for all users: 29362 jobs; 0 completed, 0 removed, 7532 idle, 21810 running, 20 held, 0 suspended

As there is quite a lot of information here, let us focus on the important bits. Both condor_submit and condor_q tell you the ClusterId of your jobs (210254), the latter tool also showing that there are 20 processes as part of this cluster 210254.0-19. Unfortunately, all are “on hold”, thus let us investigate what went wrong:

condor_q -hold 210254.0

-- Schedd: condor1.atlas.local : <10.20.30.16:9618?... @ 05/25/20 06:01:25
ID        OWNER          HELD_SINCE  HOLD_REASON
210254.0   carsten         5/25 05:50 Error from slot1_64@a4606.atlas.local: Failed to open '/work/carsten//Condor/FirstSteps/out/0.out' as standard output: No such file or directory (errno 2)

Total for query: 1 jobs; 0 completed, 0 removed, 0 idle, 0 running, 1 held, 0 suspended
Total for all users: 29996 jobs; 0 completed, 0 removed, 7559 idle,
22417 running, 20 held, 0 suspended

Luckily, fixing this problem is easy: In our submit file analyze.sub we told condor to write the stdout and stderr outputs into files under the directories out and err and we simply forgot to create them. Therefore, let us create these and let condor restart the jobs:

mkdir err out
condor_release 210254

Running condor_q again, one can see if the jobs are idle, on hold again or running:

# all jobs are idle:
OWNER   BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
carsten ID: 210254   5/25 05:49      _      _     20     20
210254.0-19
# [a short while later]:
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
carsten ID: 210254   5/25 05:49      9     11      _     20 210254.9-19

This means, your jobs have started (and a few have already finished).