cutcounts submodule#

This modules contains classes and functions to compute cleavage counts directly from an alignment file.

exception footprint_tools.cutcounts.GenotypeError#
exception footprint_tools.cutcounts.ReadError(e)#
ERROR_5PROXIMITY = (1, "Variant too close to 5' end of tag")#
ERROR_ALIGNMENT = (0, 'Read alignment problematic (QC fail, duplicate, or MAPQ < 1)')#
ERROR_BASEQ = (2, 'Base quality < 20')#
ERROR_GENOTYPE = (3, 'Base does not match reference or expected alternate allele')#
ERROR_MISMATCH = (4, 'Read contains too many mismatches')#
class footprint_tools.cutcounts.bamfile(filepath, min_qual=1, remove_dups=False, remove_qcfail=True, offset=(0, - 1))#

Class to access BAM files

Attributes
min_qualint

Filter reads by minimim mapping quality (MAPQ)

offsettuple

Position offsets to apply to the + and - strands,. DNase I data (0, -1). Tn5-derived data would use (4,-5). (default = (0, -1))

remove_dupsbool

Remove reads with duplicate flag (512) set

remove_qcfailbool

Remove reads with QC fail flag (1024) set

samfilepysam.Samfile

SAM/BAM file object

close()#

Closes BAM file

lookup(interval)#

Lookup reads in a defined genomic region

Parameters
intervals: iterable (genomic_interval)
Returns
counts: dict

Dictionary of read counts (keys: ‘+’ or ‘-), which contain arrays with counts on each strand

lookup_allelic(chrom, start, end, pos, ref, alt, flip=False)#

Lookup function for allelically resolved counts

Parameters
Returns
read_pair_generator(chrom, start, end)#

Generator function that returns sequencing tags within a given region

Parameters
chromstr

Chromosome

startint

Start coordinate

endint

End coordinate

Yields
reads: tuple

A tuple of pysam.AlignedSegment. Elements may be NoneType if single-end sequencing or one of pairs falls outisde of query range.

validate_read(read)#

Validate BAM tag

Parameters
readpysam.AlignedSegment

Read from BAM/SAM file

Returns
read:class:pysam.AlignedSegment

Same read as input

Raises
ReadError

Raises error if read fails QC flag, is a duplicate or MAPQ < minimum