pycbc.inference package

Submodules

pycbc.inference.burn_in module

This modules provides classes and functions for determining when Markov Chains have burned in.

class pycbc.inference.burn_in.MCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]

Bases: object

Provides methods for estimating burn-in of an ensemble MCMC.

available_tests = ('halfchain', 'min_iterations', 'max_posterior', 'posterior_step', 'nacl', 'ks_test')
evaluate(filename)[source]

Runs all of the burn-in tests.

classmethod from_config(cp, sampler)[source]

Loads burn in from section [sampler-burn_in].

halfchain(filename)[source]

Just uses half the chain as the burn-in iteration.

ks_test(filename)[source]

Applies ks burn-in test.

max_posterior(filename)[source]

Applies max posterior test to self.

min_iterations(filename)[source]

Just checks that the sampler has been run for the minimum number of iterations.

nacl(filename)[source]

Burn in based on ACL.

This applies the following test to determine burn in:

  1. The first half of the chain is ignored.
  2. An ACL is calculated from the second half.
  3. If nacls times the ACL is < the length of the chain / 2, the chain is considered to be burned in at the half-way point.
posterior_step(filename)[source]

Applies the posterior-step test.

class pycbc.inference.burn_in.MultiTemperedMCMCBurnInTests(sampler, burn_in_test, **kwargs)[source]

Bases: pycbc.inference.burn_in.MCMCBurnInTests

Adds support for multiple temperatures to the MCMCBurnInTests.

pycbc.inference.burn_in.ks_test(samples1, samples2, threshold=0.9)[source]

Applies a KS test to determine if two sets of samples are the same.

The ks test is applied parameter-by-parameter. If the two-tailed p-value returned by the test is greater than threshold, the samples are considered to be the same.

Parameters:
  • samples1 (dict) – Dictionary of mapping parameters to the first set of samples.
  • samples2 (dict) – Dictionary of mapping parameters to the second set of samples.
  • threshold (float) – The thershold to use for the p-value. Default is 0.9.
Returns:

Dictionary mapping parameter names to booleans indicating whether the given parameter passes the KS test.

Return type:

dict

pycbc.inference.burn_in.max_posterior(lnps_per_walker, dim)[source]

Burn in based on samples being within dim/2 of maximum posterior.

Parameters:
  • lnps_per_walker (2D array) – Array of values that are proportional to the log posterior values. Must have shape nwalkers x niterations.
  • dim (int) – The dimension of the parameter space.
Returns:

  • burn_in_idx (array of int) – The burn in indices of each walker. If a walker is not burned in, its index will be be equal to the length of the chain.
  • is_burned_in (array of bool) – Whether or not a walker is burned in.

pycbc.inference.burn_in.posterior_step(logposts, dim)[source]

Finds the last time a chain made a jump > dim/2.

Parameters:
  • logposts (array) – 1D array of values that are proportional to the log posterior values.
  • dim (int) – The dimension of the parameter space.
Returns:

The index of the last time the logpost made a jump > dim/2. If that never happened, returns 0.

Return type:

int

pycbc.inference.entropy module

The module contains functions for calculating the Kullback-Leibler divergence.

pycbc.inference.entropy.check_hist_params(samples, hist_min, hist_max, hist_bins)[source]

Checks that the bound values given for the histogram are consistent, returning the range if they are or raising an error if they are not. Also checks that if hist_bins is a str, it corresponds to a method available in numpy.histogram

Parameters:
  • samples (numpy.array) – Set of samples to get the min/max if only one of the bounds is given.
  • hist_min (numpy.float64) – Minimum value for the histogram.
  • hist_max (numpy.float64) – Maximum value for the histogram.
  • hist_bins (int or str) – If int, number of equal-width bins to use in numpy.histogram. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.
Returns:

  • hist_range (tuple or None) – The bounds (hist_min, hist_max) or None.
  • hist_bins (int or str) – Number of bins or method for optimal width bin calculation.

pycbc.inference.entropy.compute_pdf(samples, method, bins, hist_min, hist_max)[source]

Computes the probability density function for a set of samples.

Parameters:
  • samples (numpy.array) – Set of samples to calculate the pdf.
  • method (str) – Method to calculate the pdf. Options are ‘kde’ for the Kernel Density Estimator, and ‘hist’ to use numpy.histogram
  • bins (str or int, optional) – This option will be ignored if method is kde. If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator).
  • hist_min (numpy.float64, optional) – Minimum of the distributions’ values to use. This will be ignored if kde=True.
  • hist_max (numpy.float64, optional) – Maximum of the distributions’ values to use. This will be ignored if kde=True.
Returns:

pdf – Discrete probability distribution calculated from samples.

Return type:

numpy.array

pycbc.inference.entropy.entropy(pdf1, base=2.718281828459045)[source]

Computes the information entropy for a single parameter from one probability density function.

Parameters:
  • pdf1 (numpy.array) – Probability density function.
  • base ({numpy.e, numpy.float64}, optional) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).
Returns:

The information entropy value.

Return type:

numpy.float64

pycbc.inference.entropy.js(samples1, samples2, kde=False, bins=None, hist_min=None, hist_max=None, base=2.718281828459045)[source]

Computes the Jensen-Shannon divergence for a single parameter from two distributions.

Parameters:
  • samples1 (numpy.array) – Samples.
  • samples2 (numpy.array) – Samples.
  • kde (bool) – Set to True to estimate the probability density function using kernel density estimation (KDE).
  • bins (int or str, optional) – If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.
  • hist_min (numpy.float64) – Minimum of the distributions’ values to use. This will be ignored if kde=True.
  • hist_max (numpy.float64) – Maximum of the distributions’ values to use. This will be ignored if kde=True.
  • base (numpy.float64) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).
Returns:

The Jensen-Shannon divergence value.

Return type:

numpy.float64

pycbc.inference.entropy.kl(samples1, samples2, pdf1=False, pdf2=False, kde=False, bins=None, hist_min=None, hist_max=None, base=2.718281828459045)[source]

Computes the Kullback-Leibler divergence for a single parameter from two distributions.

Parameters:
  • samples1 (numpy.array) – Samples or probability density function (for the latter must also set pdf1=True).
  • samples2 (numpy.array) – Samples or probability density function (for the latter must also set pdf2=True).
  • pdf1 (bool) – Set to True if samples1 is a probability density funtion already.
  • pdf2 (bool) – Set to True if samples2 is a probability density funtion already.
  • kde (bool) – Set to True if at least one of pdf1 or pdf2 is False to estimate the probability density function using kernel density estimation (KDE).
  • bins (int or str, optional) – If int, number of equal-width bins to use when calculating probability density function from a set of samples of the distribution. If str, it should be one of the methods to calculate the optimal bin width available in numpy.histogram: [‘auto’, ‘fd’, ‘doane’, ‘scott’, ‘stone’, ‘rice’, ‘sturges’, ‘sqrt’]. Default is ‘fd’ (Freedman Diaconis Estimator). This option will be ignored if kde=True.
  • hist_min (numpy.float64) – Minimum of the distributions’ values to use. This will be ignored if kde=True.
  • hist_max (numpy.float64) – Maximum of the distributions’ values to use. This will be ignored if kde=True.
  • base (numpy.float64) – The logarithmic base to use (choose base 2 for information measured in bits, default is nats).
Returns:

The Kullback-Leibler divergence value.

Return type:

numpy.float64

pycbc.inference.evidence module

This modules provides functions for estimating the marginal likelihood or evidence of a model.

pycbc.inference.evidence.arithmetic_mean_estimator(log_likelihood)[source]

Returns the log evidence via the prior arithmetic mean estimator (AME).

The logarithm form of AME is used. This is the most basic evidence estimator, and often requires O(billions) of samples from the prior.

Parameters:log_likelihood (1d array of floats) – The log likelihood of the data sampled from the prior distribution.
Returns:Estimation of the log of the evidence.
Return type:float
pycbc.inference.evidence.harmonic_mean_estimator(log_likelihood)[source]

Returns the log evidence via posterior harmonic mean estimator (HME).

The logarithm form of HME is used. This method is not recommended for general use. It is very slow to converge, formally, has infinite variance, and very error prone.

Not recommended for general use.

Parameters:log_likelihood (1d array of floats) – The log likelihood of the data sampled from the posterior distribution.
Returns:Estimation of the log of the evidence.
Return type:float
pycbc.inference.evidence.stepping_stone_algorithm(log_likelihood, betas)[source]

Returns the log evidence of the model via stepping stone algorithm. Also returns an estimated standard deviation for the log evidence.

Parameters:
  • log_likelihood (3d array of shape (betas, walker, iteration)) – The log likelihood for each temperature separated by temperature, walker, and iteration.
  • betas (1d array) – The inverse temperatures used in the MCMC.
Returns:

  • log_evidence (float) – Estimation of the log of the evidence.
  • mcmc_std (float) – The standard deviation of the log evidence estimate from Monte-Carlo spread.

pycbc.inference.evidence.thermodynamic_integration(log_likelihood, betas, method='simpsons')[source]

Returns the log evidence of the model via thermodynamic integration. Also returns an estimated standard deviation for the log evidence.

Current options are integration through the trapezoid rule, a first-order corrected trapezoid rule, and Simpson’s rule.

Parameters:
  • log_likelihood (3d array of shape (betas, walker, iteration)) – The log likelihood for each temperature separated by temperature, walker, and iteration.
  • betas (1d array) – The inverse temperatures used in the MCMC.
  • method ({"trapzoid", "trapezoid_corrected", "simpsons"},) – optional. The numerical integration method to use for the thermodynamic integration. Choices include: “trapezoid”, “trapezoid_corrected”, “simpsons”, for the trapezoid rule, the first-order correction to the trapezoid rule, and Simpson’s rule. [Default = “simpsons”]
Returns:

  • log_evidence (float) – Estimation of the log of the evidence.
  • mcmc_std (float) – The standard deviation of the log evidence estimate from Monte-Carlo spread.

pycbc.inference.gelman_rubin module

This modules provides functions for evaluating the Gelman-Rubin convergence diagnostic statistic.

pycbc.inference.gelman_rubin.gelman_rubin(chains, auto_burn_in=True)[source]

Calculates the univariate Gelman-Rubin convergence statistic which compares the evolution of multiple chains in a Markov-Chain Monte Carlo process and computes their difference to determine their convergence. The between-chain and within-chain variances are computed for each sampling parameter, and a weighted combination of the two is used to determine the convergence. As the chains converge, the point scale reduction factor should go to 1.

Parameters:
  • chains (iterable) – An iterable of numpy.array instances that contain the samples for each chain. Each chain has shape (nparameters, niterations).
  • auto_burn_in (bool) – If True, then only use later half of samples provided.
Returns:

psrf – A numpy.array of shape (nparameters) that has the point estimates of the potential scale reduction factor.

Return type:

numpy.array

pycbc.inference.gelman_rubin.walk(chains, start, end, step)[source]

Calculates Gelman-Rubin conervergence statistic along chains of data. This function will advance along the chains and calculate the statistic for each step.

Parameters:
  • chains (iterable) – An iterable of numpy.array instances that contain the samples for each chain. Each chain has shape (nparameters, niterations).
  • start (float) – Start index of blocks to calculate all statistics.
  • end (float) – Last index of blocks to calculate statistics.
  • step (float) – Step size to take for next block.
Returns:

  • starts (numpy.array) – 1-D array of start indexes of calculations.
  • ends (numpy.array) – 1-D array of end indexes of caluclations.
  • stats (numpy.array) – Array with convergence statistic. It has shape (nparameters, ncalculations).

pycbc.inference.geweke module

Functions for computing the Geweke convergence statistic.

pycbc.inference.geweke.geweke(x, seg_length, seg_stride, end_idx, ref_start, ref_end=None, seg_start=0)[source]

Calculates Geweke conervergence statistic for a chain of data. This function will advance along the chain and calculate the statistic for each step.

Parameters:
  • x (numpy.array) – A one-dimensional array of data.
  • seg_length (int) – Number of samples to use for each Geweke calculation.
  • seg_stride (int) – Number of samples to advance before next Geweke calculation.
  • end_idx (int) – Index of last start.
  • ref_start (int) – Index of beginning of end reference segment.
  • ref_end (int) – Index of end of end reference segment. Default is None which will go to the end of the data array.
  • seg_start (int) – What index to start computing the statistic. Default is 0 which will go to the beginning of the data array.
Returns:

  • starts (numpy.array) – The start index of the first segment in the chain.
  • ends (numpy.array) – The end index of the first segment in the chain.
  • stats (numpy.array) – The Geweke convergence diagnostic statistic for the segment.

pycbc.inference.option_utils module

This module contains standard options used for inference-related programs.

class pycbc.inference.option_utils.ParseLabelArg(type=<type 'str'>, nargs=None, **kwargs)[source]

Bases: argparse.Action

Argparse action that will parse arguments that can accept labels.

This assumes that the values set on the command line for its assigned argument are strings formatted like PARAM[:LABEL]. When the arguments are parsed, the LABEL bit is stripped off and added to a dictionary mapping PARAM -> LABEL. This dictionary is stored to the parsed namespace called {dest}_labels, where {dest} is the argument’s dest setting (by default, this is the same as the option string). Likewise, the argument’s dest in the parsed namespace is updated so that it is just PARAM.

If no LABEL is provided, then PARAM will be used for LABEL.

This action can work on arguments that have nargs != 0 and type set to str.

class pycbc.inference.option_utils.ParseParametersArg(type=<type 'str'>, nargs=None, **kwargs)[source]

Bases: pycbc.inference.option_utils.ParseLabelArg

Argparse action that will parse parameters and labels from an opton.

Does the same as ParseLabelArg, with the additional functionality that if LABEL is a known parameter in pycbc.waveform.parameters, then the label attribute there will be used in the labels dictionary. Otherwise, LABEL will be used.

Examples

Create a parser and add two arguments that use this action (note that the first argument accepts multiple inputs while the second only accepts a single input):

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--parameters', type=str, nargs="+",
                        action=ParseParametersArg)
>>> parser.add_argument('--z-arg', type=str, action=ParseParametersArg)

Parse a command line that uses these options:

>>> import shlex
>>> cli = "--parameters 'mass1+mass2:mtotal' ra ni --z-arg foo:bar"
>>> opts = parser.parse_args(shlex.split(cli))
>>> opts.parameters
['mass1+mass2', 'ra', 'ni']
>>> opts.parameters_labels
{'mass1+mass2': '$M~(\mathrm{M}_\odot)$', 'ni': 'ni', 'ra': '$\alpha$'}
>>> opts.z_arg
'foo'
>>> opts.z_arg_labels
{'foo': 'bar'}

In the above, the first argument to --parameters was mtotal. Since this is a recognized parameter in pycbc.waveform.parameters, the label dictionary contains the latex string associated with the mtotal parameter. A label was not provided for the second argument, and so ra was used. Since ra is also a recognized parameter, its associated latex string was used in the labels dictionary. Since ni and bar (the label for z-arg) are not recognized parameters, they were just used as-is in the labels dictionaries.

pycbc.inference.option_utils.add_density_option_group(parser)[source]

Adds the options needed to configure contours and density colour map.

Parameters:parser (object) – ArgumentParser instance.
pycbc.inference.option_utils.add_injsamples_map_opt(parser)[source]

Adds option to parser to specify a mapping between injection parameters an sample parameters.

pycbc.inference.option_utils.add_plot_posterior_option_group(parser)[source]

Adds the options needed to configure plots of posterior results.

Parameters:parser (object) – ArgumentParser instance.
pycbc.inference.option_utils.add_scatter_option_group(parser)[source]

Adds the options needed to configure scatter plots.

Parameters:parser (object) – ArgumentParser instance.
pycbc.inference.option_utils.expected_parameters_from_cli(opts)[source]

Parses the –expected-parameters arguments from the plot_posterior option group.

Parameters:opts (ArgumentParser) – The parsed arguments from the command line.
Returns:Dictionary of parameter name -> expected value. Only parameters that were specified in the –expected-parameters option will be included; if no parameters were provided, will return an empty dictionary.
Return type:dict
pycbc.inference.option_utils.plot_ranges_from_cli(opts)[source]

Parses the mins and maxs arguments from the plot_posterior option group.

Parameters:opts (ArgumentParser) – The parsed arguments from the command line.
Returns:
  • mins (dict) – Dictionary of parameter name -> specified mins. Only parameters that were specified in the –mins option will be included; if no parameters were provided, will return an empty dictionary.
  • maxs (dict) – Dictionary of parameter name -> specified maxs. Only parameters that were specified in the –mins option will be included; if no parameters were provided, will return an empty dictionary.
pycbc.inference.option_utils.prior_from_config(cp, prior_section='prior')[source]

Loads a prior distribution from the given config file.

Parameters:
  • cp (pycbc.workflow.WorkflowConfigParser) – The config file to read.
  • sections (list of str, optional) – The sections to retrieve the prior from. If None (the default), will look in sections starting with ‘prior’.
Returns:

The prior distribution.

Return type:

distributions.JointDistribution

Module contents