learn_dm#

Learn a negative binomial dispersion model

This creates expected cleavage counts from data corrected for intrinsic sequence preferences and then builds a dispersion model used for footprint detection and analysis.

INTERVAL_FILE is a BED-formatted file contained genomic regions to be analyzed. BAM_FILE is the path to a BAM-format tag alignment file. FASTA_FILE is the path to genome FASTA file (requires associated FASTA index in same folder (see documentation on how to create an index).

Outputs a JSON-formated dispersion model

learn_dm [OPTIONS] INTERVAL_FILE BAM_FILE FASTA_FILE

Options

--bias_model_file <bias_model_file>#

Use a k-mer model for sequence bias (supplied by file). If argument is not provided the model defaults to uniform sequence bias.

--half_win_width <half_win_width>#

Half window width to apply bias model

Default

5

--min_qual <min_qual>#

Ignore reads with mapping quality lower than this threshold

Default

1

--keep_dups <keep_dups>#

Keep duplicate reads

Default

True

--keep_qcfail <keep_qcfail>#

Keep QC-failed reads

Default

False

--outfile <outfile>#

Output prefix

Default

dm.json

--bam_offset <bam_offset>#

BAM file offset (enables support for other datatypes – e.g. Tn5/ATAC)

Default

0,-1

--n_threads <n_threads>#

Number of processors to use

Default

2

--batch_size <batch_size>#

Batch size of intervals to process

Default

100

Arguments

INTERVAL_FILE#

Required argument

BAM_FILE#

Required argument

FASTA_FILE#

Required argument