degen_regions¶
Finds all degenerate bases in a given fasta input file that may contain multiple sequeces and reports their position as well as the annotated gene name that contains them.
The fasta file must be previously aligned to the query sequence. That is, if you are using a genbank annotation file or having the script download it for you, you should have aligned all your input sequences to that sequence.
The annotation is retrieved via supplied genbank accession, genbank file path or gene tab/csv file.
Using Genbank Files¶
If you already have downloaded the genbank annotation file(typically the extension is .gb) you can use the –gb-file argument
The following will use the test input fasta file as well as the test input genbank file to find all degenerate bases and will put the output in a tab separated file called output.tsv
degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-file tests/testinput/sequence.gb
Fetching Genbank Files Automatically¶
If you want the script to automatically fetch the Genbank annotation file from the internet you can use the –gb-id option and specify an accession number.
degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-id KJ189367
Using tab/csv file of gene annotation info¶
If you have a tab/csv file of gene annotations you can supply that using the –tab-file argument
You can read more about the format of the tab/csv annotation file in the degen docs
degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-file tests/testinput/sequence.gb
Manually specify CDS¶
You can use the --cds
argument to set the coding region.
This argument should be comma separated such as start,stop
.
Specifying this argument will override any other cds found in the tab file, genbank
file or fetched genbank file.
The following would mark all locations as NON-CODING as you are specifying that only position 1 is coding
degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-file tests/testinput/sequence.gb --cds 1,1
Without Gene Information¶
The gene information is optional. If it is not provided the output file will not be annotated with the gene information; otherwise, the output will look the same (you will also lose the “non-coding region” flag.)
degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv
Output¶
The output is a simple tab separated file
seq id nt Position aa position nt composition aa composition gene name
----------------------------------------- ------------- ------------- ---------------- ---------------- -------------------------------
721 991 331 WCA S/T envelope protein
721 1307 436 AYA I/T envelope protein
721 1826 609 AYA I/T envelope protein
721 1865 622 GRA E/G envelope protein
721 7766 2589 ARA K/R nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1 1927 643 RAC D/N envelope protein
2055_Den4/AY618992_1/Thailand/2001/Den4_1 2833 945 YCG P/S nonstructural protein NS1
2055_Den4/AY618992_1/Thailand/2001/Den4_1 3565 1189 YAT H/Y nonstructural protein NS2A
2055_Den4/AY618992_1/Thailand/2001/Den4_1 6271 2091 RAA E/K nonstructural protein NS3
2055_Den4/AY618992_1/Thailand/2001/Den4_1 8656 2886 YAT H/Y nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1 8998 3000 YAG */Q nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1 9811 3271 YCC P/S nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1 10542 3515 AGN NON-CODING -
2055_Den4/AY618992_1/Thailand/2001/Den4_1 10543 3515 NNN NON-CODING -
2055_Den4/AY618992_1/Thailand/2001/Den4_1 10541 3514 NNN NON-CODING -
2055_Den4/AY618992_1/Thailand/2001/Den4_1 10539 3514 NNN NON-CODING -
2055_Den4/AY618992_1/Thailand/2001/Den4_1 10546 3516 NNN NON-CODING -
2055_Den4/AY618992_1/Thailand/2001/Den4_1 10544 3515 NNN NON-CODING -
2055_Den4/AY618992_1/Thailand/2001/Den4_1 10542 3515 NNN NON-CODING -
1942_Den4/AY618992_1/Thailand/2001/Den4_1 4540 1514 RTA I/V nonstructural protein NS3
1942_Den4/AY618992_1/Thailand/2001/Den4_1 10177 3393 MCA P/T nonstructural protein NS5
1942_Den4/AY618992_1/Thailand/2001/Den4_1 10546 3516 NNN NON-CODING -
1942_Den4/AY618992_1/Thailand/2001/Den4_1 10544 3515 NNN NON-CODING -
1942_Den4/AY618992_1/Thailand/2001/Den4_1 10542 3515 NNN NON-CODING -
1875_Den4/AY618992_1/Thailand/2001/Den4_1 1514 505 AYG M/T envelope protein
1875_Den4/AY618992_1/Thailand/2001/Den4_1 3056 1019 ARA K/R nonstructural protein NS1
1875_Den4/AY618992_1/Thailand/2001/Den4_1 3058 1020 KCA A/S nonstructural protein NS1
1875_Den4/AY618992_1/Thailand/2001/Den4_1 3073 1025 WTT F/I nonstructural protein NS1
1875_Den4/AY618992_1/Thailand/2001/Den4_1 3491 1164 AYC I/T nonstructural protein NS2A
1875_Den4/AY618992_1/Thailand/2001/Den4_1 3895 1299 RTG M/V nonstructural protein NS2A
1875_Den4/AY618992_1/Thailand/2001/Den4_1 7445 2482 GYA A/V nonstructural protein NS4B
948_Den4/AY618992_1/Thailand/2001/Den4_1 2819 940 ARC N/S nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 2947 983 RCC A/T nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3058 1020 KCA A/S nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3073 1025 WTT F/I nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3116 1039 GYG A/V nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3181 1061 RTW I/V nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3179 1060 RTW I/V nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3338 1113 ART N/S nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3362 1121 ARA K/R nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3373 1125 WCR S/T nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 3371 1124 WCR S/T nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1 4314 1439 ATV I/M nonstructural protein NS2B
871_Den4/AY618992_1/Thailand/2001/Den4_1 7045 2349 WCC S/T nonstructural protein NS4B
871_Den4/AY618992_1/Thailand/2001/Den4_1 10536 3513 GAW NON-CODING -
871_Den4/AY618992_1/Thailand/2001/Den4_1 10537 3513 YCA NON-CODING -
947_Den4/AY618992_1/Thailand/2001/Den4_1 2971 991 YTY F/L nonstructural protein NS1
947_Den4/AY618992_1/Thailand/2001/Den4_1 2969 990 YTY F/L nonstructural protein NS1
947_Den4/AY618992_1/Thailand/2001/Den4_1 6763 2255 YTT F/L 2K peptide
1793_Den4/AY618992_1/Thailand/2001/Den4_1 223 75 MAG K/Q anchored capsid protein
1793_Den4/AY618992_1/Thailand/2001/Den4_1 556 186 RCC A/T membrane glycoprotein precursor
1793_Den4/AY618992_1/Thailand/2001/Den4_1 586 196 RGT G/S membrane glycoprotein precursor
1793_Den4/AY618992_1/Thailand/2001/Den4_1 613 205 YCA P/S membrane glycoprotein precursor
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2875 959 YCG P/S nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2943 982 AAN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2944 982 NNG GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2942 981 NNG GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2976 993 ATN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2977 993 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2975 992 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2973 992 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2980 994 NTG GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2987 996 ANN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2986 996 ANN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2989 997 NGT GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2996 999 TNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2995 999 TNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3001 1001 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2999 1000 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 2997 1000 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3004 1002 NCC GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3073 1025 NTT GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3086 1029 ARC N/S nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3095 1032 CNG GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3116 1039 GNG GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3144 1049 GAN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3159 1054 GAN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3160 1054 NNC GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3158 1053 NNC GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3206 1069 GNC GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3235 1079 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3233 1078 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3231 1078 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3238 1080 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3236 1079 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3234 1079 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3241 1081 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3239 1080 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3237 1080 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3244 1082 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3242 1081 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3240 1081 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3247 1083 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3245 1082 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3243 1082 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3250 1084 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3248 1083 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3246 1083 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3253 1085 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3251 1084 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3249 1084 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3256 1086 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3254 1085 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3252 1085 NNN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3316 1106 NGG GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3337 1113 NAT GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3341 1114 GNA GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3408 1137 ATN GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3412 1138 NTG GAPFOUND nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3493 1165 MCC P/T nonstructural protein NS2A
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3509 1170 ANT GAPFOUND nonstructural protein NS2A
1793_Den4/AY618992_1/Thailand/2001/Den4_1 3837 1280 TTN GAPFOUND nonstructural protein NS2A
1793_Den4/AY618992_1/Thailand/2001/Den4_1 6185 2062 ARG K/R nonstructural protein NS3
1793_Den4/AY618992_1/Thailand/2001/Den4_1 6187 2063 RAR E/K nonstructural protein NS3
1793_Den4/AY618992_1/Thailand/2001/Den4_1 6185 2062 RAR E/K nonstructural protein NS3
1793_Den4/AY618992_1/Thailand/2001/Den4_1 6614 2205 TYT F/S nonstructural protein NS4A
1793_Den4/AY618992_1/Thailand/2001/Den4_1 6650 2217 ARA K/R nonstructural protein NS4A
1793_Den4/AY618992_1/Thailand/2001/Den4_1 8630 2877 ART N/S nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1 8844 2949 AAN GAPFOUND nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1 9938 3313 AYT I/T nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1 9941 3314 GRC D/G nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1 10015 3339 RTT I/V nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1 10087 3363 NGR GAPFOUND nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1 10085 3362 NGR GAPFOUND nonstructural protein NS5
1901_Den4/AY618992_1/Thailand/2001/Den4_1 15 6 AAN NON-CODING 5'UTR
1901_Den4/AY618992_1/Thailand/2001/Den4_1 111 38 TTN GAPFOUND anchored capsid protein
1901_Den4/AY618992_1/Thailand/2001/Den4_1 2279 760 GYT A/V envelope protein
1901_Den4/AY618992_1/Thailand/2001/Den4_1 8798 2933 ARA K/R nonstructural protein NS5
1901_Den4/AY618992_1/Thailand/2001/Den4_1 10195 3399 RAG E/K nonstructural protein NS5
1901_Den4/AY618992_1/Thailand/2001/Den4_1 10366 3456 RGG NON-CODING 3'UTR
1934_Den4/AY618992_1/Thailand/2001/Den4_1 15 6 AAN NON-CODING 5'UTR
1934_Den4/AY618992_1/Thailand/2001/Den4_1 111 38 TTN GAPFOUND anchored capsid protein
1934_Den4/AY618992_1/Thailand/2001/Den4_1 998 333 GMT A/D envelope protein
1934_Den4/AY618992_1/Thailand/2001/Den4_1 4515 1506 TTM F/L nonstructural protein NS3
1934_Den4/AY618992_1/Thailand/2001/Den4_1 8798 2933 ARA K/R nonstructural protein NS5