Pycbc’s workflow module configuration file(s) and command line interface

Introduction

The workflow module at its core is designed to be flexible and allow the user to do what they want to create the pipeline that they want to run. One of the ways to allow this is by having a, sometimes large, configuration file that serves two purposes

  • Tell the workflow planner, how to run the various stages specified in the top-level workflow script.
  • Specify, as completely as possible, all command line options that will be sent to every executable that is run in the pipeline. Tags are used to identify options sent a subset of jobs, as described more fully later.

The idea is that the only input that a user needs is the configuration file. However, it may often be useful for certain options, such as user-specific locations and analysis start/end times, to be supplied on the command line. To allow this we allow a method by which configuration file options can be supplied, or overridden, on the command line.

Ihope used similar .ini files in every analysis. However, it was noted that these files grew huge and it becomes difficult for a novice to understand which options can be safely changed and which ones to leave well alone. It is also difficult so see which options are going to which job, inspiral.c for example looks for options in > 10 places and it isn’t clear where those places are.

To attempt to solve this the workflow module has a number of features

  • Multiple configuration files: You can now supply multiple configuration files to, for e.g. identify a file containing only injection generation parameters, which a user may want to change often. It is even possible to have sections split across files, so one could have a configuration file of key options, ones that might be changed, and another file of “leave alone” options.
  • Direct command line options: In the workflow module command line options are not drawn from obscure sections, they correspond one-to-one with the executables. Options in the [inspiral] section will be sent to the inspiral executable and only to the inspiral executable.
  • Combined sections: To avoid the issue of specifiying common options repeatedly we have allowed the ability of combined sections. So if you have two executables with a large set of shared options you can specify a [exe1&exe2] section to provide the shared options and [exe1] and [exe2] sections to supply the individual options. One can also use the [sharedoptions-NAME] sections to acheive the same thing.
  • Interpolation: As in configparser 3.0+ we have the ability to specify an option in one place and use an interpolation string to also provide it in other places, this is described below.
  • Tags/subsections: In some cases options may only need to be sent to certain jobs, or you may want to call individual modules multiple times and do different things. To accomodate this the workflow module includes a tagging (or subsections) system to provide options to only a subset of jobs, or to a specific call to a module. For example, options in [inspiral] are sent to all inspiral jobs, options in [inspiral-h1] would be sent to inspiral jobs running only on h1 data.
  • Executable expanding: The workflow module includes macros to enable the user to more easily specify executable paths. For example $(which:exe1} will be expanded to the location of exe1 in the users path automatically.

most of these features will be applied directly after reading in the configuration file. The workflow module will then dump the parser configuration back to disk so the user/reviewer can more easily see what the analysis is actually doing.

In this page we describe the layout of the workflow module .ini configuration file and what the various sections mean, how they are used, and how an ini file should be set out.

NOTE: A number of features that have been put in here, are available in the python 3.X version of ConfigParser. In addition this version also has a duplicate option check. In python 2.X if I do:

[inspiral]
detect-gravitational-waves = True
LOTS OF GARBAGE
detect-gravitational-waves = False

it will set the value to False, and proceed happily. THERE IS NO WAY TO CATCH THIS! There is a python 2.X backport of this new version, it is available in pypi, but not in macports. It would be good to pick up this new version and have some of these features available natively.

Supplying the config file on the command line and overriding options

The workflow module only uses two command line options, one to specify the configuration files and one to specify and overriding options. First the config files:

  • –config-files FILE1 [FILE2 FILE3 ….]

where FILEX corresponds to the configuration files. Second the overriding options:

  • –config-overrides section1:option1:value1 [section2:option2:value2 …]

These specify options that should be added to the config files, or if already present overwritten. The section, option and value refer to the section option and value to be added. If the section doesn’t already exist in the configuration file it will be added. In some cases you will want to supply an option without a value. This can be done with either

section:option:

or

section:option

Example

Here is an example of running a workflow from the command line:

python weekly_ahope.py --config-files weekly_ahope.ini pipedown.ini inj.ini --config-overrides workflow:start-time:${GPS_START_TIME} workflow:end-time:${GPS_END_TIME}

Here the analysis start and end times are being overriden with values from the user’s environment.

Global options - the [workflow] section

The [workflow] section and [workflow-XXX] subsections should appear at the top of a configuration file.

The [workflow] section and [workflow-XXX] subsections of the configuration file are used to store options that the workflow module uses to make decisions on what paths to take when deciding how to construct the workflow. Options in here are not going to end up supplied to any executable on the command line.

The [workflow] section must contain two entries

  • start-time=START
  • end-time=END

which are used to tell the workflow that is only to consider times in [START,END) for analysis. These will often be supplied as override options directly on the command line.

Another optional entry in the [workflow] section, that we recommend be used is the:

  • file-retention-level = all_files

entry. This can take one of 4 values: “all_files”, “all_triggers”, “merged_triggers” or “results”. These specify how many files produced during the workflow should be stored after the workflow finishes. With “all_files”, which is the default value, everything produced in the workflow will be stored. With “results” only the critical result files are stored. “all_triggers” and “merged_triggers” store some subset of the full set of files. Defining whether a file should be stored under each of these levels is the job of the Executable class, which carries a current_retention_level attribute (one of executable.INTERMEDIATE_PRODUCT, executable.ALL_TRIGGERS, executable.MERGED_TRIGGERS or executable.FINAL_RESULT). When building workflows one can set this atrribute when creating executable instances to set under what conditions a file should be stored.

It is okay to store other important and widely used values in here. You might often see cases where channel names are given here as these are sent to a number of codes on the command line, and it is easier to refer to them here, at the very top of the .ini file, so that the user can more easily see and change such values.

[workflow-XXX] subsections

Each module that you use when setting up your workflow will need an [workflow-XXX] subsection. The name of the subsection and the particular options needed can be found in each module’s documentation page.

If you want to call any module more than once you will need to use the workflow module’s tagging system. As an example let’s say I want to call the template bank module twice, once to set up a pycbc template bank and once to set up a SVD template bank. I could then create [workflow-tmpltbank-pycbc] and [workflow-tmpltbank-svd] sections to provide options that are unique to each tag. I could also use [exename-pycbc] and [exename-svd] sections if the two methods are using the same executable, but need different options. In both cases options in [workflow-tmpltbank] and [exename] would be used for both tags. (If the two codes were using different executables then [exename1] and [exename2] sections would suffice.)

An example of where this section might be used is in the template bank stage where one can either run with a pre-generated bank or generate banks within the workflow. This information would be provided in this section.

Requirements

The [workflow] section in every .ini file should contain a link to this page to see what options are needed.

The [workflow-XXX] sections in every .ini file should start with a link to that module’s documentation to see what options/values are relevant for that section.

Example

Here is an example of the [workflow] section of a .ini file:

[workflow]
; https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/workflow/initialization.html
; provides details of how to set up a pycbc workflow configuration .ini file
h1-channel-name = H1:LDAS-STRAIN
l1-channel-name = L1:LDAS-STRAIN
;h2-channel-name = H2:LDAS-STRAIN
workflow-html-basedir = /home/spxiwh/public_html/workflow/development/weekly_ahope/test

[workflow-ifos]
; This is the list of ifos to analyse
h1 =
l1 =

[workflow-datafind]
; See https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/workflow/datafind.html
datafind-method = AT_RUNTIME_SINGLE_FRAMES
datafind-h1-frame-type = H1_LDAS_C02_L2
datafind-l1-frame-type = L1_LDAS_C02_L2
;datafind-h2-frame-type = H2_LDAS_C02_L2
datafind-check-segment-gaps = update_times
datafind-check-frames-exist = raise_error
datafind-check-segment-summary = no_test
; Set this to sepcify the datafind server. If this is not set the code will
; use the value in ${LIGO_DATAFIND_SERVER}
;datafind-ligo-datafind-server = ""

[workflow-segments]
; See https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/workflow/segments.html
; PIPEDOWN demands we use AT_RUNTIME
segments-method = AT_RUNTIME
segments-H1-science-name = H1:DMT-SCIENCE:4
segments-L1-science-name = L1:DMT-SCIENCE:4
;segments-V1-science-name = V1:ITF_SCIENCEMODE:6
segments-database-url = https://segdb.ligo.caltech.edu
segments-veto-definer-url = https://www.lsc-group.phys.uwm.edu/ligovirgo/cbc/public/segments/S6/H1L1V1-S6_CBC_LOWMASS_B_OFFLINE-937473702-0.xml
segments-veto-categories = 2,3,4
segments-minimum-segment-length = 2000
segments-generate-coincident-segments =

[workflow-tmpltbank]
; See https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/workflow/template_bank.html
tmpltbank-method=WORKFLOW_INDEPENDENT_IFOS
; Remove the option below to disable linking with matchedfilter_utils
tmpltbank-link-to-matchedfltr=

[workflow-injections]
; See https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/workflow/injections.html
injections-method=IN_WORKFLOW

[workflow-timeslides]
; See https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/workflow/time_slides.html
timeslides-method=AT_RUNTIME

Executable locations - the [executables] section

This section should contain the names of each of the executables that will be used in the workflow and their locations.

executable macros

The following macros can be used only within this section to automatically fill in full path names

which(executable)

In the following example tmpltbank’s value will be replaced with the output of which(lalapps_tmpltbank):

[executables]
tmpltbank = ${which:lalapps_tmpltbank}
inspiral = /full/path/to/lalapps_inspiral

Requirements

All executables used in the workflow should be supplied in this section, and only in this section.

Example

Here is an example of the [executables] section of a pycbc workflow .ini file:

[executables]
tmpltbank         = /home/cbc/opt/s6b/ab577e4e5dad14e46fce511cffdb04917836ba36/bin/lalapps_tmpltbank
inspiral          = /home/cbc/opt/s6b/ab577e4e5dad14e46fce511cffdb04917836ba36/bin/lalapps_inspiral
inspinj           = /home/cbc/opt/s6b/ab577e4e5dad14e46fce511cffdb04917836ba36/bin/lalapps_inspinj
thinca            = ${which:ligolw_thinca}

Executable options

For each of the executables in the [executables] section, options for that executable should be listed under the section corresponding to that executable. Options in the [tmpltbank] section are sent to lalapps_tmpltbank, options in the [inspiral] section are sent to lalapps_inspiral etc.

It is possible to have more than one [tmpltbank] section, ConfigParser will simply combine them together when reading in. Therefore ‘’’important options’’’ and ‘’’options that a novice user might want to change’’’ could be supplied in a first [tmpltbank] section near the top of the .ini file. This section could be commented accordingly. The modules documentation page should also include instructions for each of the supported executables (usually the code’s own help message). Options that are not so important and ones that a novice user would not want to change could be placed in a second [tmpltbank] section at the bottom of the ini file, this section would be labelled accordingly and also contain a link to documentation for that executable.

Some options are only sent to a subset of jobs using a given executable. For example those running on H1 data. Options like these will be provided in sections labelled [executable_name-subset_tag]. So for the H1 example the section would be called [tmpltbank-H1]. As well as obeying the rules above these section must clearly state ‘’which’’ jobs will be sent those options. This can also be used when calling a section multiple times with different tags. Nested tags are not supported (ie [tmpltbank-H1-pycbc])

Some options need to be sent to more than one executable, for example the channel names are used by any code that reads the data. Such sections should be given as the combination of executable names separated by the & token. So options sent to tmpltbank ‘’’and’’’ inspiral would go in a section called [tmpltbank&inspiral]. The code parsing the .ini file will automatically separate and duplicate these options in memory. All of the above rules apply. If I want to send an option to all tmpltbank and inspiral jobs running on H1 data, I might do something like [tmpltbank-H1&inspiral-H1].

If an option is given in more than one section (ie. if I specify –time-window 0.5 in [inspiral] and –time-window 1.0 in another [inspiral] or [inspiral&tmpltbank] or [inspiral-H1] the code will throw an error. Specifying –time-window 1.0 in [inspiral-H1] and –time-window 0.5 in [inspiral-L1] is valid as long as the subset of H1 jobs and the subset of L1 jobs do not overlap.

If a particular code (let’s say inspiral) wants to use an option supplied in the [workflow] section (for e.g. the channel names) it can do this by using:

[inspiral-h1]
channel-name = ${workflow|h1-channel}

[inspiral-l1]
channel-name = ${workflow|l1-channel}

[inspiral-v1]
channel-name = ${workflow|v1-channel}

Similar macros can be added as needed, but these should be limited to avoid namespace confusion.

Example complete workflow .ini file

Please see individual workflow documentation pages for some examples of complete .ini files and example workflows.

[sharedoptions] section

An alternative to the [exe1&exe2] section, especially when options are split well into groups of options, is to use the [sharedoptions] section. An example of this follows:

[sharedoptions]
massranges = exe1,exe2,exe3-mass
metric = exe1,exe2-range,exe3-metric, exe5

[sharedoptions-massranges]
min-mass1 = 2.0
max-mass1 = 48.0
min-mass2 = 2.0
max-mass2 = 48.0
max-total-mass = 4.2
min-total-mass = 4.0
max-eta = 0.25
max-ns-spin-mag = 0.9899
max-bh-spin-mag = 0.9899

[sharedoptions-metric]
pn-order = threePointFivePN
f0 = 70.0
f-low = 30.0
f-upper = 1100.0
delta-f = 0.01

This will ensure that all options in [sharedoptions-massranges] are added to the [exe1], [exe2] and [exe3-mass] sections. All options in [sharedoptions-metric] are added to [exe1], [exe2-range], [exe-metric] and [exe5].

Code documentation

The parsing of .ini files and command line parsing is done from within the pycbc.workflow.configuration module. The functions in this module are shown below

pycbc.workflow.configuration Module

This module provides a wrapper to the ConfigParser utilities for pycbc workflow construction. This module is described in the page here: https://ldas-jobs.ligo.caltech.edu/~cbc/docs/pycbc/ahope/initialization_inifile.html

class pycbc.workflow.configuration.WorkflowConfigParser(configFiles=None, overrideTuples=None, parsedFilePath=None, deleteTuples=None, copy_to_cwd=False)[source]

Bases: glue.pipeline.DeepCopyableConfigParser

This is a sub-class of glue.pipeline.DeepCopyableConfigParser, which lets us add a few additional helper features that are useful in workflows.

add_options_to_section(section, items, overwrite_options=False)[source]

Add a set of options and values to a section of a ConfigParser object. Will throw an error if any of the options being added already exist, this behaviour can be overridden if desired

Parameters:
  • section (string) – The name of the section to add options+values to
  • items (list of tuples) – Each tuple contains (at [0]) the option and (at [1]) the value to add to the section of the ini file
  • overwrite_options (Boolean, optional) – By default this function will throw a ValueError if an option exists in both the original section in the ConfigParser and in the provided items. This will override so that the options+values given in items will replace the original values if the value is set to True. Default = True
check_duplicate_options(section1, section2, raise_error=False)[source]

Check for duplicate options in two sections, section1 and section2. Will return a list of the duplicate options.

Parameters:
  • section1 (string) – The name of the first section to compare
  • section2 (string) – The name of the second section to compare
  • raise_error (Boolean, optional (default=False)) – If True, raise an error if duplicates are present.
Returns:

duplicates – List of duplicate options

Return type:

List

classmethod from_cli(opts)[source]

Initialize the config parser using options parsed from the command line.

The parsed options opts must include options provided by add_workflow_command_line_group().

Parameters:opts (argparse.ArgumentParser) – The command line arguments parsed by argparse
get_cli_option(section, option_name, **kwds)[source]

Return option using CLI action parsing

Parameters:
  • section (str) – Section to find option to parse
  • option_name (str) – Name of the option to parse from the config file
  • kwds (keywords) – Additional keywords are passed directly to the argument parser.
Returns:

The parsed value for this option

Return type:

value

get_opt_tag(section, option, tag)[source]

Convenience function accessing get_opt_tags() for a single tag: see documentation for that function. NB calling get_opt_tags() directly is preferred for simplicity.

Parameters:
  • self (ConfigParser object) – The ConfigParser object (automatically passed when this is appended to the ConfigParser class)
  • section (string) – The section of the ConfigParser object to read
  • option (string) – The ConfigParser option to look for
  • tag (string) – The name of the subsection to look in, if not found in [section]
Returns:

The value of the options being searched for

Return type:

string

get_opt_tags(section, option, tags)[source]

Supplement to ConfigParser.ConfigParser.get(). This will search for an option in [section] and if it doesn’t find it will also try in [section-tag] for every value of tag in tags. Will raise a ConfigParser.Error if it cannot find a value.

Parameters:
  • self (ConfigParser object) – The ConfigParser object (automatically passed when this is appended to the ConfigParser class)
  • section (string) – The section of the ConfigParser object to read
  • option (string) – The ConfigParser option to look for
  • tags (list of strings) – The name of subsections to look in, if not found in [section]
Returns:

The value of the options being searched for

Return type:

string

get_subsections(section_name)[source]

Return a list of subsections for the given section name

has_option_tag(section, option, tag)[source]

Convenience function accessing has_option_tags() for a single tag: see documentation for that function. NB calling has_option_tags() directly is preferred for simplicity.

Parameters:
  • self (ConfigParser object) – The ConfigParser object (automatically passed when this is appended to the ConfigParser class)
  • section (string) – The section of the ConfigParser object to read
  • option (string) – The ConfigParser option to look for
  • tag (string) – The name of the subsection to look in, if not found in [section]
Returns:

Is the option in the section or [section-tag]

Return type:

Boolean

has_option_tags(section, option, tags)[source]

Supplement to ConfigParser.ConfigParser.has_option(). This will search for an option in [section] and if it doesn’t find it will also try in [section-tag] for each value in tags. Returns True if the option is found and false if not.

Parameters:
  • self (ConfigParser object) – The ConfigParser object (automatically passed when this is appended to the ConfigParser class)
  • section (string) – The section of the ConfigParser object to read
  • option (string) – The ConfigParser option to look for
  • tags (list of strings) – The names of the subsection to look in, if not found in [section]
Returns:

Is the option in the section or [section-tag] (for tag in tags)

Return type:

Boolean

interpolate_exe(testString)[source]

Replace testString with a path to an executable based on the format.

If this looks like

${which:lalapps_tmpltbank}

it will return the equivalent of which(lalapps_tmpltbank)

Otherwise it will return an unchanged string.

Parameters:testString (string) – The input string
Returns:newString – The output string.
Return type:string
interpolate_string(testString, section)[source]

Take a string and replace all example of ExtendedInterpolation formatting within the string with the exact value.

For values like ${example} this is replaced with the value that corresponds to the option called example *in the same section*

For values like ${common|example} this is replaced with the value that corresponds to the option example in the section [common]. Note that in the python3 config parser this is ${common:example} but python2.7 interprets the : the same as a = and this breaks things

Nested interpolation is not supported here.

Parameters:
  • testString (String) – The string to parse and interpolate
  • section (String) – The current section of the ConfigParser object
Returns:

testString – Interpolated string

Return type:

String

perform_exe_expansion()[source]

This function will look through the executables section of the ConfigParser object and replace any values using macros with full paths.

For any values that look like

${which:lalapps_tmpltbank}

will be replaced with the equivalent of which(lalapps_tmpltbank)

Otherwise values will be unchanged.

perform_extended_interpolation()[source]

Filter through an ini file and replace all examples of ExtendedInterpolation formatting with the exact value. For values like ${example} this is replaced with the value that corresponds to the option called example *in the same section*

For values like ${common|example} this is replaced with the value that corresponds to the option example in the section [common]. Note that in the python3 config parser this is ${common:example} but python2.7 interprets the : the same as a = and this breaks things

Nested interpolation is not supported here.

populate_shared_sections()[source]

Parse the [sharedoptions] section of the ini file.

That section should contain entries according to:

  • massparams = inspiral, tmpltbank
  • dataparams = tmpltbank

This will result in all options in [sharedoptions-massparams] being copied into the [inspiral] and [tmpltbank] sections and the options in [sharedoptions-dataparams] being copited into [tmpltbank]. In the case of duplicates an error will be raised.

read_ini_file(cpFile)[source]

Read a .ini file and return it as a ConfigParser class. This function does none of the parsing/combining of sections. It simply reads the file and returns it unedited

Stub awaiting more functionality - see configparser_test.py

Parameters:cpFile (Path to .ini file, or list of paths) – The path(s) to a .ini file to be read in
Returns:cp – The ConfigParser class containing the read in .ini file
Return type:ConfigParser
resolve_file_url(test_string)[source]

Replace test_string with a path to an executable based on the format.

If this looks like

${which:lalapps_tmpltbank}

it will return the equivalent of which(lalapps_tmpltbank)

Otherwise it will return an unchanged string.

Parameters:test_string (string) – The input string
Returns:new_string – The output string.
Return type:string
resolve_urls()[source]

This function will look through all sections of the ConfigParser object and replace any URLs that are given the resolve magic flag with a path on the local drive.

Specifically for any values that look like

${resolve:https://git.ligo.org/detchar/SOME_GATING_FILE.txt}

the file will be replaced with the output of resolve_url(URL)

Otherwise values will be unchanged.

sanity_check_subsections()[source]

This function goes through the ConfigParset and checks that any options given in the [SECTION_NAME] section are not also given in any [SECTION_NAME-SUBSECTION] sections.

section_to_cli(section, skip_opts=None)[source]

Converts a section into a command-line string.

For example:

[section_name]
foo =
bar = 10

yields: ‘–foo –bar 10’.

Parameters:
  • section (str) – The name of the section to convert.
  • skip_opts (list, optional) – List of options to skip. Default (None) results in all options in the section being converted.
Returns:

The options as a command-line string.

Return type:

str

split_multi_sections()[source]

Parse through the WorkflowConfigParser instance and splits any sections labelled with an “&” sign (for e.g. [inspiral&tmpltbank]) into [inspiral] and [tmpltbank] sections. If these individual sections already exist they will be appended to. If an option exists in both the [inspiral] and [inspiral&tmpltbank] sections an error will be thrown

pycbc.workflow.configuration.add_workflow_command_line_group(parser)[source]

The standard way of initializing a ConfigParser object in workflow will be to do it from the command line. This is done by giving a

–local-config-files filea.ini fileb.ini filec.ini

command. You can also set config file override commands on the command line. This will be most useful when setting (for example) start and end times, or active ifos. This is done by

–config-overrides section1:option1:value1 section2:option2:value2 …

This can also be given as

–config-overrides section1:option1

where the value will be left as ‘’.

To remove a configuration option, use the command line argument

–config-delete section1:option1

which will delete option1 from [section1] or

–config-delete section1

to delete all of the options in [section1]

Deletes are implemented before overrides.

This function returns an argparse OptionGroup to ensure these options are parsed correctly and can then be sent directly to initialize an WorkflowConfigParser.

Parameters:parser (argparse.ArgumentParser instance) – The initialized argparse instance to add the workflow option group to.
pycbc.workflow.configuration.istext(s, text_characters=None, threshold=0.3)[source]

Determines if the string is a set of binary data or a text file. This is done by checking if a large proportion of characters are > 0X7E (0x7F is <DEL> and unprintable) or low bit control codes. In other words things that you wouldn’t see (often) in a text file. (ASCII past 0x7F might appear, but rarely).

Code modified from https://www.safaribooksonline.com/library/view/python-cookbook-2nd/0596007973/ch01s12.html

pycbc.workflow.configuration.resolve_url(url, directory=None, permissions=None, copy_to_cwd=True)[source]

Resolves a URL to a local file, and returns the path to that file.

If a URL is given, the file will be copied to the current working directory. If a local file path is given, the file will only be copied to the current working directory if copy_to_cwd is True (the default).