modeling submodule#

class footprint_tools.modeling.bias.bias_model#
offset()#
predict(probs, n=100)#

Compute cleavage propensities from sequence

Parameters
probsnumpy.ndarray

An array of probilities (relative values)

nint

Number of total tags to distrbute

shuffle()#

Randomly shuffle the bias model

Returns
modelbias_model

A shuffled bias model

class footprint_tools.modeling.bias.kmer_model(filepath)#
probs(seq)#

Generate cleavage preference array from DNA sequence

Parameters
seq: str

A DNA sequence to compute relative sequence preference

Returns
outnumpy.ndarray

Relate sequence preferencew

read_model(filepath)#

Read the k-mer model from a file.

Parameters
filepathstr

Path to a K-mer model file

class footprint_tools.modeling.bias.uniform_model#
probs(seq)#

This module contains classess and functions that implement a dispersion model.

footprint_tools.modeling.dispersion.base64decode()#
footprint_tools.modeling.dispersion.base64encode()#
class footprint_tools.modeling.dispersion.dispersion_model#

Dispersion model class

fit_mu()#

Computes the fitted mu term for the negative binomial from a piece-wise linear fit.

Parameters
xfloat
Returns
mufloat

mu computed from the regression fit

fit_r()#

Computes the dispersion term for the negative binomial from a piece-wise linear fit. Note that the model parameters estimate the inverse.

Parameters
xfloat
Returns
rfloat

r computed from the regression fit

h#

Histrogram of observed cleavages at each predicted cleavage rate

log_pmf_values()#

Compute the log probability mass function

Parameters
exp:class:numpy.ndarray

Expected cleavage counts

obs:class:numpy.ndarray

Observed cleavage counts

Returns
logpnumpy.ndarray

Array of log probability mass function values computed from the expected cleavage distributions

log_pmf_values_0()#

Computing the log probability mass function to pointer

Parameters
exp:class:numpy.ndarray

Expected cleavage counts

obs:class:numpy.ndarray

Observed cleavage counts

Returns
logpnumpy.ndarray (memoryview)

Array pointer to log probability mass function values computed from the expected cleavage distributions

Notes

This function is equivalent to log_pmf_values, except it stores values to a matrix pointer

metadata#
mu_params#
p#

Array of the negative binomial MLE fit parameters p

p_values()#

Compute cumulative distribution (lower-tail p-value) from negative binomial

Parameters
exp:class:numpy.ndarray

Expected cleavage counts

obs:class:numpy.ndarray

Observed cleavage counts

Returns
pvalsnumpy.ndarray

Array of p-values

pmf_values()#

Compute the probability mass function

Parameters
exp:class:numpy.ndarray

Expected cleavage counts

obs:class:numpy.ndarray

Observed cleavage counts

Returns
pnumpy.ndarray

Array of probability mass function values computed from the expected cleavage distributions

pmf_values_0()#

Compute the probability mass function to pointer

Parameters
exp:class:numpy.ndarray

Expected cleavage counts

obs:class:numpy.ndarray

Observed cleavage counts

Returns
pnumpy.ndarray (memoryview)

Array pointer to probability mass function values computed from the expected cleavage distributions

Notes

This function is equivalent to pmf_values, except it stores values to a matrix pointer

r#

Array of the negative binomial MLE fit parameters r

r_params#
sample()#

Sample counts from negative binomial distribution and compute p-values

Parameters
xnumpy.ndarray

Count values to specifying from which distribution to resample. This typically expected count values.

timesint

Number of times to sample (per element)

Returns
sampled_countsnumpy.ndarray

Array of sample counts (2-D array - positions by number of samples)

sampled_pvalsnumpy.ndarray

Array of sample counts (2-D array - positions by number of samples)

footprint_tools.modeling.dispersion.learn_dispersion_model()#

Learn a dispersion model from the expected vs. observed histogram

Parameters
hnumpy.ndarray

A 2-dimemsional array containing the distribution of observerd cleavages at each expected cleavage rate

cutoffint

Mininum number of observed cleavages to perform ML negative binomial fit at each value of expected cleavages

trimtuple (float)

Percent of data to trim from the observed cleavage count (to mitigate outlier effects)

Returns
modeldispersion_model

A dispersion model learned from observed and expected counts

footprint_tools.modeling.dispersion.load_dispersion_model()#

Load a dispersion model encoded in JSON format

Parameters
filenamestr

Path to JSON-format dispersion model

Returns
modeldispersion_model

A dispersion model loaded from file

footprint_tools.modeling.dispersion.piecewise_five()#
footprint_tools.modeling.dispersion.piecewise_four()#
footprint_tools.modeling.dispersion.piecewise_three()#
footprint_tools.modeling.dispersion.write_dispersion_model()#

Write a JSON format dispersion model

Parameters
modeldispersion_model

An instance of dispersion_model

Returns
outstr

JSON-formatted dump of dispersion model

class footprint_tools.modeling.predict.prediction(read_func, fasta_func, bm, half_win_width=5, smoothing_half_win_width=0, smoothing_clip=0.01)#
Class that holds a wrapper function to

compute the expected cleavage counts

Attributes
bmbias.bias_model

Sequence bias model to apply

read_funccutcounts.bamfile

Cut-counts reader

fasta_func:class`pysam.FastaFile`

FASTA-file reader

half_win_widthint

Window width to apply bias model (final windows size = 2W+1)

paddingint

Padding applied to region when retrieving per-nucleotide data

smoothing_clipfloat

Fraction of nucleotides to trim when computing smoothed mean

smoothing_half_win_widthint

Half width of window used to compute windowed tag counts

compute(x)#

Computed expected cleavage counts

Parameters
xgenome_toools.genomic_interval

Genomic region to generate predicted cleavages

Returns
out: tuple of dict

Observed, expected and windowed cleavage counts

footprint_tools.modeling.predict.reverse_complement()#

Computes reverse complement of a DNA sequence

Parameters
seqstr

DNA sequence string

Returns
outstr

Reverse complement of seq