Annotations and data layers
Starting with version 1.4, R2DT can visualise additional annotations as data layers on top of the secondary structure diagrams.
By default, R2DT displays the expected accuracy of the alignment between the query sequence and the template. The accuracy is calculated by the Infernal software as posterior probabilities (see page 110 of the Infernal User Guide for more information).
Alignment accuracy
For example, the following diagram shows the secondary structure of a SAM riboswitch RNA sequence from Bacillus subtilis visualised using the RF00162 Rfam template:
The colored circles highlight the hairpin loop region where Infernal is not 100% confident in the alignment.
Hovering over the nucleotides shows the alignment accuracy (posterior probability) values. Similar to Infernal #=GR PP
lines, the posterior probability p
is encoded as 11 possible characters from 0
to 10
:
0.0 ≤ p < 0.05 is coded as
0
0.05 ≤ p < 0.15 is coded as
1
(… and so on …)0.85 ≤ p < 0.95 is coded as
9
0.95 ≤ p ≤ 1.0 (shown as
*
in Infernal output) is coded as10
.
The nucleotides are colored based on a color-blind friendly palette from a paper by Bang Wong. The default colors can be found in the colorscheme.json file and are shown below:
0-5 or <55% accuracy
6 or ≥55-65% accuracy
7 or ≥65-75% accuracy
8 or ≥75-85% accuracy
9 or ≥95% accuracy
10
and*
are shown with white background.
Example annotation file
The annotations can be provided in a tab-separated format where the first two columns are called residue_index
and residue_name
. The number of rows in the file (not counting the header line) and residue names must match the query sequence.
The third column (in this case posterior_probability
) can have an arbitrary name matching the colorscheme.json
.
The following example is shortened for display purposes but a complete TSV file is available for reference:
residue_index residue_name posterior_probability
1 U 10
2 U 10
3 C 10
...
50 G 10
51 G 10
52 U 10
53 G 9
54 U 9
55 A 9
56 A 8
57 U 8
58 G 7
59 G 8
60 C 8
61 G 8
62 A 8
63 U 8
64 C 8
65 A 8
66 G 8
67 C 8
68 C 8
69 A 7
70 U 7
71 G 9
72 A 10
73 C 10
...
117 A 10
118 A 10
Example color scheme file
The colors for each possible posterior_probability
value from the TSV file are configured using a colorscheme.json
file, for example:
{
"coloring": {
"posterior_probability": {
"label": "Alignment accuracy",
"values": {
"0": "rgb(213, 94, 0)",
"1": "rgb(213, 94, 0)",
"2": "rgb(213, 94, 0)",
"3": "rgb(213, 94, 0)",
"4": "rgb(213, 94, 0)",
"5": "rgb(213, 94, 0)",
"6": "rgb(230, 159, 0)",
"7": "rgb(240, 228, 66)",
"8": "rgb(86, 180, 233)",
"9": "rgb(0, 158, 115)",
"10": "rgb(255, 255, 255)",
"*": "rgb(255, 255, 255)"
}
}
}
}
Note that the header of the third column of the annotation file must match the corresponding field of colorscheme.json
.
Example R2DT commands
You will need 3 files:
query.fasta
- a query sequencerna_id
in fasta format,annotations.tsv
- per-residue annotations in TSV format,colorscheme.json
- RGB colors for each annotation category in JSON format.
Visualise secondary structure using R2DT:
# predict and visualise secondary structure using templates r2dt.py draw <query.fasta> <output_folder> # alternatively, visualise provided secondary structure without a template r2dt.py templatefree <query.fasta> <output_folder>
Generate additional SVG file colored using custom annotations:
python3 /rna/traveler/utils/enrich_json.py --input-json <output_folder>/json/<rna_id>.colored.json \ --input-data <annotations.tsv> --output <output_folder>/json/<rna_id>.enriched.json python3 /rna/traveler/utils/json2svg.py -p <colorscheme.json> \ -i <output_folder>/json/<rna_id>.enriched.json -o <output_folder>/svg/<rna_id>.enriched.svg
The final result will be found in
<output_folder>/svg/<rna_id>.enriched.svg
.
These commands add metadata from the TSV file to an RNA 2D JSON Schema file using the enrich_json.py script and then generate an SVG file using json2svg.py (both scripts are part of the Traveler software and are available in the R2DT Docker container).
Displaying other types of data
It is possible to visualise other residue annotations, such as the location of single nucleotide polymorphisms (SNPs), conservation, SHAPE reactivity, location of RNA protein binding sites, and other information.
To visualise custom data:
the annotations should be provided in the third column of the TSV file,
the
posterior_probability
field of thecolorscheme.json
file should match the name of the third column of the TSV file,the
values
field ofcolorscheme.json
should match the possible annotation values.