modeling submodule#
- class footprint_tools.modeling.bias.bias_model#
- offset()#
- predict(probs, n=100)#
Compute cleavage propensities from sequence
- Parameters
- probs
numpy.ndarray An array of probilities (relative values)
- nint
Number of total tags to distrbute
- probs
- shuffle()#
Randomly shuffle the bias model
- Returns
- model
bias_model A shuffled bias model
- model
- class footprint_tools.modeling.bias.kmer_model(filepath)#
- probs(seq)#
Generate cleavage preference array from DNA sequence
- Parameters
- seq: str
A DNA sequence to compute relative sequence preference
- Returns
- out
numpy.ndarray Relate sequence preferencew
- out
- read_model(filepath)#
Read the k-mer model from a file.
- Parameters
- filepathstr
Path to a K-mer model file
This module contains classess and functions that implement a dispersion model.
- footprint_tools.modeling.dispersion.base64decode()#
- footprint_tools.modeling.dispersion.base64encode()#
- class footprint_tools.modeling.dispersion.dispersion_model#
Dispersion model class
- fit_mu()#
Computes the fitted mu term for the negative binomial from a piece-wise linear fit.
- Parameters
- xfloat
- Returns
- mufloat
mu computed from the regression fit
- fit_r()#
Computes the dispersion term for the negative binomial from a piece-wise linear fit. Note that the model parameters estimate the inverse.
- Parameters
- xfloat
- Returns
- rfloat
r computed from the regression fit
- h#
Histrogram of observed cleavages at each predicted cleavage rate
- log_pmf_values()#
Compute the log probability mass function
- Parameters
- exp:class:numpy.ndarray
Expected cleavage counts
- obs:class:numpy.ndarray
Observed cleavage counts
- Returns
- logp
numpy.ndarray Array of log probability mass function values computed from the expected cleavage distributions
- logp
- log_pmf_values_0()#
Computing the log probability mass function to pointer
- Parameters
- exp:class:numpy.ndarray
Expected cleavage counts
- obs:class:numpy.ndarray
Observed cleavage counts
- Returns
- logp
numpy.ndarray(memoryview) Array pointer to log probability mass function values computed from the expected cleavage distributions
- logp
Notes
This function is equivalent to log_pmf_values, except it stores values to a matrix pointer
- metadata#
- mu_params#
- p#
Array of the negative binomial MLE fit parameters p
- p_values()#
Compute cumulative distribution (lower-tail p-value) from negative binomial
- Parameters
- exp:class:numpy.ndarray
Expected cleavage counts
- obs:class:numpy.ndarray
Observed cleavage counts
- Returns
- pvals
numpy.ndarray Array of p-values
- pvals
- pmf_values()#
Compute the probability mass function
- Parameters
- exp:class:numpy.ndarray
Expected cleavage counts
- obs:class:numpy.ndarray
Observed cleavage counts
- Returns
- p
numpy.ndarray Array of probability mass function values computed from the expected cleavage distributions
- p
- pmf_values_0()#
Compute the probability mass function to pointer
- Parameters
- exp:class:numpy.ndarray
Expected cleavage counts
- obs:class:numpy.ndarray
Observed cleavage counts
- Returns
- p
numpy.ndarray(memoryview) Array pointer to probability mass function values computed from the expected cleavage distributions
- p
Notes
This function is equivalent to pmf_values, except it stores values to a matrix pointer
- r#
Array of the negative binomial MLE fit parameters r
- r_params#
- sample()#
Sample counts from negative binomial distribution and compute p-values
- Parameters
- x
numpy.ndarray Count values to specifying from which distribution to resample. This typically expected count values.
- timesint
Number of times to sample (per element)
- x
- Returns
- sampled_counts
numpy.ndarray Array of sample counts (2-D array - positions by number of samples)
- sampled_pvals
numpy.ndarray Array of sample counts (2-D array - positions by number of samples)
- sampled_counts
- footprint_tools.modeling.dispersion.learn_dispersion_model()#
Learn a dispersion model from the expected vs. observed histogram
- Parameters
- h
numpy.ndarray A 2-dimemsional array containing the distribution of observerd cleavages at each expected cleavage rate
- cutoffint
Mininum number of observed cleavages to perform ML negative binomial fit at each value of expected cleavages
- trimtuple (float)
Percent of data to trim from the observed cleavage count (to mitigate outlier effects)
- h
- Returns
- model
dispersion_model A dispersion model learned from observed and expected counts
- model
- footprint_tools.modeling.dispersion.load_dispersion_model()#
Load a dispersion model encoded in JSON format
- Parameters
- filenamestr
Path to JSON-format dispersion model
- Returns
- model
dispersion_model A dispersion model loaded from file
- model
- footprint_tools.modeling.dispersion.piecewise_five()#
- footprint_tools.modeling.dispersion.piecewise_four()#
- footprint_tools.modeling.dispersion.piecewise_three()#
- footprint_tools.modeling.dispersion.write_dispersion_model()#
Write a JSON format dispersion model
- Parameters
- model
dispersion_model An instance of dispersion_model
- model
- Returns
- outstr
JSON-formatted dump of dispersion model
- class footprint_tools.modeling.predict.prediction(read_func, fasta_func, bm, half_win_width=5, smoothing_half_win_width=0, smoothing_clip=0.01)#
- Class that holds a wrapper function to
compute the expected cleavage counts
- Attributes
- bm
bias.bias_model Sequence bias model to apply
- read_func
cutcounts.bamfile Cut-counts reader
- fasta_func:class`pysam.FastaFile`
FASTA-file reader
- half_win_widthint
Window width to apply bias model (final windows size = 2W+1)
- paddingint
Padding applied to region when retrieving per-nucleotide data
- smoothing_clipfloat
Fraction of nucleotides to trim when computing smoothed mean
- smoothing_half_win_widthint
Half width of window used to compute windowed tag counts
- bm
- compute(x)#
Computed expected cleavage counts
- Parameters
- x
genome_toools.genomic_interval Genomic region to generate predicted cleavages
- x
- Returns
- out: tuple of dict
Observed, expected and windowed cleavage counts
- footprint_tools.modeling.predict.reverse_complement()#
Computes reverse complement of a DNA sequence
- Parameters
- seqstr
DNA sequence string
- Returns
- outstr
Reverse complement of
seq