learn_dm#
Learn a negative binomial dispersion model
This creates expected cleavage counts from data corrected for intrinsic sequence preferences and then builds a dispersion model used for footprint detection and analysis.
INTERVAL_FILE is a BED-formatted file contained genomic regions to be analyzed. BAM_FILE is the path to a BAM-format tag alignment file. FASTA_FILE is the path to genome FASTA file (requires associated FASTA index in same folder (see documentation on how to create an index).
Outputs a JSON-formated dispersion model
learn_dm [OPTIONS] INTERVAL_FILE BAM_FILE FASTA_FILE
Options
- --bias_model_file <bias_model_file>#
Use a k-mer model for sequence bias (supplied by file). If argument is not provided the model defaults to uniform sequence bias.
- --half_win_width <half_win_width>#
Half window width to apply bias model
- Default
5
- --min_qual <min_qual>#
Ignore reads with mapping quality lower than this threshold
- Default
1
- --keep_dups <keep_dups>#
Keep duplicate reads
- Default
True
- --keep_qcfail <keep_qcfail>#
Keep QC-failed reads
- Default
False
- --outfile <outfile>#
Output prefix
- Default
dm.json
- --bam_offset <bam_offset>#
BAM file offset (enables support for other datatypes – e.g. Tn5/ATAC)
- Default
0,-1
- --n_threads <n_threads>#
Number of processors to use
- Default
2
- --batch_size <batch_size>#
Batch size of intervals to process
- Default
100
Arguments
- INTERVAL_FILE#
Required argument
- BAM_FILE#
Required argument
- FASTA_FILE#
Required argument