Python API: mutyper package

Primary class for using an ancestral genome FASTA to annotate mutation types for variants in VCF format

class mutyper.Ancestor(fasta, k=3, target=None, strand_file=None, **kwargs)[source]

Ancestral state of a chromosome.

Parameters:
  • fasta (str) – path to ancestral sequence FASTA

  • k (int) – the size of the context window (default 3)

  • target (Optional[int]) – which position for the site within the kmer (default middle)

  • strand_file (Union[str, TextIO, None]) – path to bed file (or I/O object) with regions where reverse strand defines mutation context, e.g. direction of replication or transcription. Sites not in these regions are assigned forward strand context. If not provided, collapse by reverse complement, with the target base as A or C. Note that bed file regions should be 0-based and right-open.

  • kwargs – additional keyword arguments passed to base class. Useful ones are key_function (for chromosome name parsing), read_ahead (for buffering), and sequence_always_upper (to allow lowercase nucleotides to be considered ancestrally identified)

mutation_type(chrom, pos, ref, alt)[source]

Mutation type of a given snp, oriented or collapsed by strand, returns a tuple of ancestral and derived kmers.

Parameters:
  • chrom (str) – FASTA record chromosome identifier

  • pos (int) – position (0-based)

  • ref (str) – reference allele (A, C, G, or T)

  • alt (str) – alternative allele (A, C, G, or T)

Return type:

Tuple[str, str]

region_contexts(chrom, start=None, end=None)[source]

Ancestral context of each site in a BED style region (0-based, half- open), oriented according to self.strandedness or collapsed by reverse complementation (returns None if ancestral state at target not in capital ACGT)

Parameters:
  • chrom (str) – chromosome name

  • start (Optional[int]) – region start position (default to chromsome start)

  • end (Optional[int]) – region end position (default to chromsome end)

Return type:

Generator[str, None, None]

targets(bed=None)[source]

Return a dictionary of the number of sites of each k-mer.

Parameters:

bed (Union[str, TextIO, None]) – optional path to BED mask file, or I/O object

Return type:

Dict[str, int]