degen_regions

Finds all degenerate bases in a given fasta input file that may contain multiple sequeces and reports their position as well as the annotated gene name that contains them.

The fasta file must be previously aligned to the query sequence. That is, if you are using a genbank annotation file or having the script download it for you, you should have aligned all your input sequences to that sequence.

The annotation is retrieved via supplied genbank accession, genbank file path or gene tab/csv file.

Usage

You can view the usage of degen_regions via:

degen_regions --help

Using Genbank Files

If you already have downloaded the genbank annotation file(typically the extension is .gb) you can use the –gb-file argument

The following will use the test input fasta file as well as the test input genbank file to find all degenerate bases and will put the output in a tab separated file called output.tsv

degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-file tests/testinput/sequence.gb

Fetching Genbank Files Automatically

If you want the script to automatically fetch the Genbank annotation file from the internet you can use the –gb-id option and specify an accession number.

degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-id KJ189367

Using tab/csv file of gene annotation info

If you have a tab/csv file of gene annotations you can supply that using the –tab-file argument

You can read more about the format of the tab/csv annotation file in the degen docs

degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-file tests/testinput/sequence.gb

Manually specify CDS

You can use the --cds argument to set the coding region. This argument should be comma separated such as start,stop. Specifying this argument will override any other cds found in the tab file, genbank file or fetched genbank file.

The following would mark all locations as NON-CODING as you are specifying that only position 1 is coding

degen_regions -i tests/Den4_MAAPS_TestData16.fasta -o output.tsv --gb-file tests/testinput/sequence.gb --cds 1,1

Output

The output is a simple tab separated file

seq id                                       nt Position    aa position  nt composition    aa composition    gene name
-----------------------------------------  -------------  -------------  ----------------  ----------------  -------------------------------
721                                                  991            331  WCA               S/T               envelope protein
721                                                 1307            436  AYA               I/T               envelope protein
721                                                 1826            609  AYA               I/T               envelope protein
721                                                 1865            622  GRA               E/G               envelope protein
721                                                 7766           2589  ARA               K/R               nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1           1927            643  RAC               D/N               envelope protein
2055_Den4/AY618992_1/Thailand/2001/Den4_1           2833            945  YCG               P/S               nonstructural protein NS1
2055_Den4/AY618992_1/Thailand/2001/Den4_1           3565           1189  YAT               H/Y               nonstructural protein NS2A
2055_Den4/AY618992_1/Thailand/2001/Den4_1           6271           2091  RAA               E/K               nonstructural protein NS3
2055_Den4/AY618992_1/Thailand/2001/Den4_1           8656           2886  YAT               H/Y               nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1           8998           3000  YAG               */Q               nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1           9811           3271  YCC               P/S               nonstructural protein NS5
2055_Den4/AY618992_1/Thailand/2001/Den4_1          10542           3515  AGN               NON-CODING        -
2055_Den4/AY618992_1/Thailand/2001/Den4_1          10543           3515  NNN               NON-CODING        -
2055_Den4/AY618992_1/Thailand/2001/Den4_1          10541           3514  NNN               NON-CODING        -
2055_Den4/AY618992_1/Thailand/2001/Den4_1          10539           3514  NNN               NON-CODING        -
2055_Den4/AY618992_1/Thailand/2001/Den4_1          10546           3516  NNN               NON-CODING        -
2055_Den4/AY618992_1/Thailand/2001/Den4_1          10544           3515  NNN               NON-CODING        -
2055_Den4/AY618992_1/Thailand/2001/Den4_1          10542           3515  NNN               NON-CODING        -
1942_Den4/AY618992_1/Thailand/2001/Den4_1           4540           1514  RTA               I/V               nonstructural protein NS3
1942_Den4/AY618992_1/Thailand/2001/Den4_1          10177           3393  MCA               P/T               nonstructural protein NS5
1942_Den4/AY618992_1/Thailand/2001/Den4_1          10546           3516  NNN               NON-CODING        -
1942_Den4/AY618992_1/Thailand/2001/Den4_1          10544           3515  NNN               NON-CODING        -
1942_Den4/AY618992_1/Thailand/2001/Den4_1          10542           3515  NNN               NON-CODING        -
1875_Den4/AY618992_1/Thailand/2001/Den4_1           1514            505  AYG               M/T               envelope protein
1875_Den4/AY618992_1/Thailand/2001/Den4_1           3056           1019  ARA               K/R               nonstructural protein NS1
1875_Den4/AY618992_1/Thailand/2001/Den4_1           3058           1020  KCA               A/S               nonstructural protein NS1
1875_Den4/AY618992_1/Thailand/2001/Den4_1           3073           1025  WTT               F/I               nonstructural protein NS1
1875_Den4/AY618992_1/Thailand/2001/Den4_1           3491           1164  AYC               I/T               nonstructural protein NS2A
1875_Den4/AY618992_1/Thailand/2001/Den4_1           3895           1299  RTG               M/V               nonstructural protein NS2A
1875_Den4/AY618992_1/Thailand/2001/Den4_1           7445           2482  GYA               A/V               nonstructural protein NS4B
948_Den4/AY618992_1/Thailand/2001/Den4_1            2819            940  ARC               N/S               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            2947            983  RCC               A/T               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3058           1020  KCA               A/S               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3073           1025  WTT               F/I               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3116           1039  GYG               A/V               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3181           1061  RTW               I/V               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3179           1060  RTW               I/V               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3338           1113  ART               N/S               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3362           1121  ARA               K/R               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3373           1125  WCR               S/T               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            3371           1124  WCR               S/T               nonstructural protein NS1
871_Den4/AY618992_1/Thailand/2001/Den4_1            4314           1439  ATV               I/M               nonstructural protein NS2B
871_Den4/AY618992_1/Thailand/2001/Den4_1            7045           2349  WCC               S/T               nonstructural protein NS4B
871_Den4/AY618992_1/Thailand/2001/Den4_1           10536           3513  GAW               NON-CODING        -
871_Den4/AY618992_1/Thailand/2001/Den4_1           10537           3513  YCA               NON-CODING        -
947_Den4/AY618992_1/Thailand/2001/Den4_1            2971            991  YTY               F/L               nonstructural protein NS1
947_Den4/AY618992_1/Thailand/2001/Den4_1            2969            990  YTY               F/L               nonstructural protein NS1
947_Den4/AY618992_1/Thailand/2001/Den4_1            6763           2255  YTT               F/L               2K peptide
1793_Den4/AY618992_1/Thailand/2001/Den4_1            223             75  MAG               K/Q               anchored capsid protein
1793_Den4/AY618992_1/Thailand/2001/Den4_1            556            186  RCC               A/T               membrane glycoprotein precursor
1793_Den4/AY618992_1/Thailand/2001/Den4_1            586            196  RGT               G/S               membrane glycoprotein precursor
1793_Den4/AY618992_1/Thailand/2001/Den4_1            613            205  YCA               P/S               membrane glycoprotein precursor
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2875            959  YCG               P/S               nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2943            982  AAN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2944            982  NNG               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2942            981  NNG               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2976            993  ATN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2977            993  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2975            992  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2973            992  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2980            994  NTG               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2987            996  ANN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2986            996  ANN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2989            997  NGT               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2996            999  TNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2995            999  TNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3001           1001  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2999           1000  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           2997           1000  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3004           1002  NCC               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3073           1025  NTT               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3086           1029  ARC               N/S               nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3095           1032  CNG               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3116           1039  GNG               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3144           1049  GAN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3159           1054  GAN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3160           1054  NNC               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3158           1053  NNC               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3206           1069  GNC               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3235           1079  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3233           1078  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3231           1078  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3238           1080  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3236           1079  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3234           1079  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3241           1081  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3239           1080  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3237           1080  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3244           1082  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3242           1081  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3240           1081  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3247           1083  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3245           1082  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3243           1082  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3250           1084  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3248           1083  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3246           1083  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3253           1085  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3251           1084  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3249           1084  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3256           1086  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3254           1085  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3252           1085  NNN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3316           1106  NGG               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3337           1113  NAT               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3341           1114  GNA               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3408           1137  ATN               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3412           1138  NTG               GAPFOUND          nonstructural protein NS1
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3493           1165  MCC               P/T               nonstructural protein NS2A
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3509           1170  ANT               GAPFOUND          nonstructural protein NS2A
1793_Den4/AY618992_1/Thailand/2001/Den4_1           3837           1280  TTN               GAPFOUND          nonstructural protein NS2A
1793_Den4/AY618992_1/Thailand/2001/Den4_1           6185           2062  ARG               K/R               nonstructural protein NS3
1793_Den4/AY618992_1/Thailand/2001/Den4_1           6187           2063  RAR               E/K               nonstructural protein NS3
1793_Den4/AY618992_1/Thailand/2001/Den4_1           6185           2062  RAR               E/K               nonstructural protein NS3
1793_Den4/AY618992_1/Thailand/2001/Den4_1           6614           2205  TYT               F/S               nonstructural protein NS4A
1793_Den4/AY618992_1/Thailand/2001/Den4_1           6650           2217  ARA               K/R               nonstructural protein NS4A
1793_Den4/AY618992_1/Thailand/2001/Den4_1           8630           2877  ART               N/S               nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1           8844           2949  AAN               GAPFOUND          nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1           9938           3313  AYT               I/T               nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1           9941           3314  GRC               D/G               nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1          10015           3339  RTT               I/V               nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1          10087           3363  NGR               GAPFOUND          nonstructural protein NS5
1793_Den4/AY618992_1/Thailand/2001/Den4_1          10085           3362  NGR               GAPFOUND          nonstructural protein NS5
1901_Den4/AY618992_1/Thailand/2001/Den4_1             15              6  AAN               NON-CODING        5'UTR
1901_Den4/AY618992_1/Thailand/2001/Den4_1            111             38  TTN               GAPFOUND          anchored capsid protein
1901_Den4/AY618992_1/Thailand/2001/Den4_1           2279            760  GYT               A/V               envelope protein
1901_Den4/AY618992_1/Thailand/2001/Den4_1           8798           2933  ARA               K/R               nonstructural protein NS5
1901_Den4/AY618992_1/Thailand/2001/Den4_1          10195           3399  RAG               E/K               nonstructural protein NS5
1901_Den4/AY618992_1/Thailand/2001/Den4_1          10366           3456  RGG               NON-CODING        3'UTR
1934_Den4/AY618992_1/Thailand/2001/Den4_1             15              6  AAN               NON-CODING        5'UTR
1934_Den4/AY618992_1/Thailand/2001/Den4_1            111             38  TTN               GAPFOUND          anchored capsid protein
1934_Den4/AY618992_1/Thailand/2001/Den4_1            998            333  GMT               A/D               envelope protein
1934_Den4/AY618992_1/Thailand/2001/Den4_1           4515           1506  TTM               F/L               nonstructural protein NS3
1934_Den4/AY618992_1/Thailand/2001/Den4_1           8798           2933  ARA               K/R               nonstructural protein NS5