AMOS¶

AMOS is a file format that is similar to any assembly file format such as ACE or SAM. It contains information about each read that is used to assemble each contig.

The format is broken into different message blocks. For the Ray assembler, it produces an AMOS file that is broken into 3 types of message blocks

RED
{RED iid:\d+ eid:\d+ seq: [ATGC]+ . qlt: [A-Z]+ }
iid

Integer identifier

eid

Same as iid?

seq

Sequence data

qlt

Should be quality, but is only a series of D’s from Ray assembler
TLE
{TLE src:\d+ off:\d+ clr:\d+,\d+ }
src

RED iid that was used

off

One would think offset, but unsure what it actually means

clr

Not sure what this is either
CTG
{CTG iid:\d+ eid:\w+ com: .*$ . seq: [ATGC]+ . qlt: [A-Z]+ . {TLE ... } }
iid

integer id of contig

eid

contig name

com

Communication software that generated this contig

seq

Contig sequence data

qlt

Supposed to be contig quality data, but for Ray it only produces D’s

TLE

0 or more TLE blocks that represent RED sequences that compose the contig

Parsing¶

bio_bits contains an interface to parse a given file handle that has been opened on an AMOS file.

To read in the AMOS file you simply do the following

from bio_bits import amos
a = None
with open('AMOS.afg') as fh:
    a = amos.AMOS(fh)

CTG¶

To get information about the contigs(CTG) you can access the .ctgs attribute. The contigs are indexed based on their iid so to get the sequence of contig iid 1 you would do the following:

ctg = a.ctgs[1]
seq = ctg.seq

To retrieve all the reads(RED) that belong to a specific contig:

reads = []
for tle in ctg.tlelist:
    reads.append(a.reds[tle.src])

RED¶

To get information about the reads(RED) you can access the .reds attribute. The reds are indexed based on their iid so to get the sequence of red iid 1 you would do the following:

red = a.reds[1]
seq = red.seq

If you want to convert a RED entry into anything you can use the .format method. The .format method allows you to utilize any of the properties of a RED object such as .iid, .eid, .seq, .qlt. You can see in the examples below how to do this.

Examples¶

Here is an example of how to convert all RED blocks into a single fastq file

from bio_bits import amos

# Fastq format string
fastq_fmt = '@{iid}\n{seq}\n+\n{qlt}'

with open('amos.fastq','w') as fh_out:
    with open('AMOS.afg') as fh_in:
        for iid, red in amos.AMOS(fh_in).reds.items():
            fq = red.format(fastq_fmt)
            fh_out.write(fq + '\n')