learn_dm#

Learn a negative binomial dispersion model

This creates expected cleavage counts from data corrected for intrinsic sequence preferences and then builds a dispersion model used for footprint detection and analysis.

INTERVAL_FILE is a BED-formatted file contained genomic regions to be analyzed. BAM_FILE is the path to a BAM-format tag alignment file. FASTA_FILE is the path to genome FASTA file (requires associated FASTA index in same folder (see documentation on how to create an index).

Outputs a JSON-formated dispersion model

learn_dm [OPTIONS] INTERVAL_FILE BAM_FILE FASTA_FILE

Options

--bias_model_file <bias_model_file>#: Use a k-mer model for sequence bias (supplied by file). If argument is not provided the model defaults to uniform sequence bias.

--half_win_width <half_win_width>#

Half window width to apply bias model

Default: 5

--min_qual <min_qual>#

Ignore reads with mapping quality lower than this threshold

Default: 1

--keep_dups <keep_dups>#

Keep duplicate reads

Default: True

--keep_qcfail <keep_qcfail>#

Keep QC-failed reads

Default: False

--outfile <outfile>#

Output prefix

Default: dm.json

--bam_offset <bam_offset>#

BAM file offset (enables support for other datatypes – e.g. Tn5/ATAC)

Default: 0,-1

--n_threads <n_threads>#

Number of processors to use

Default: 2

--batch_size <batch_size>#

Batch size of intervals to process

Default: 100

Arguments

INTERVAL_FILE#: Required argument

BAM_FILE#: Required argument

FASTA_FILE#: Required argument

learn_beta

plot_dm