= UTGB Manual = {{{ #!html
August 17, 2004
1 Welcome
2 Keyword Search
2.1 Gene List
2.2 Mapping Position List
3 Online Sequence Alignment
4 Main Window
4.1 Scrolling and Zooming the Main Window
4.2 Functions of Buttons
4.3 Addition, Deletion, and Rearrangement of Tracks
4.4 Setting for Tracks
5 Pre-defined Tracks
5.1 Overview Tracks
5.2 Ruler and Zooming Tracks
5.3 Base Color Track
5.4 Gap Track
5.5 GC Content Track
5.6 Mapped Gene Track
5.7 Genscan Track
5.8 Comparative Genomics Track
Ramen Assembler / UT Genome Browser Team Members
Acknowledgements
1. Welcome
Welcome to UT Genome Browser (University of Tokyo Genome Browser). With this browser you can browse genomic information of a target species (at present Medaka only) from several aspects. For example, the following tasks can be performed with different combinations.
ü You can search gene, by gene name, clone name, cluster name and display its surroundings gene groups.
ü You can map a genomic sequence to any genome and observe its surroundings.
ü You can save the sequence itself on your system from here.
Questions, suggestions or comments are welcome to webmaster@utgenome.org.
The top window displays the keyword search engine, as above picture illustrates. Inputting any keyword at the query box, we can see the detail information about the query. More about keyword search can be found in the help, written on the same window. In addition, if the keyword is a part of targets, then all the targets will be displayed. For example, the keyword “BCD” will match to “ABCD”, “BCDE”, “BCD” and “BCDF” etc. (this is worked out with using a suffix array.)
When there are many genes for one keyword search, all genes will be displayed on a list. Pressing the (*) button on the left side, you can go to the mapped region for the gene. This list appears when multiple candidate entries are found, typically when class names or part of a gene number are input as a keyword. If a Genebank accession number is given, the number must be unique, and hence this list does not appear.
If the query gene has mapped to several regions on the genomic sequence, the list of all positions will be shown. Pressing the (*) button on the left side displays the mapped region of the gene. If there is only one mapped region of the gene, the list is skipped to display.
Sequence mapping is possible by inputting a sequence in the online mapping frame (above image) at the bottom of the main window. After selecting the species and its revision, press the search button on the bottom. The request is sent to the alignment server ALPS to align the given sequence to the selected genome. Afterwards, the list of alignment results is returned if the system finds some alignments.
From the list, you can select one alignment by clicking the result button as illustrated above. Then, the aligned result will appear on the browser. The ALPS mapping track in following snapshot displays the added mapping status.
The above picture shows the main screen for UT Genome Browser. It is possible to see the targeted region of genomic sequence by many tracks. This browser provides the following functions to manipulate tracks on this image screen.
ü Shifting, zooming in and out, and reversing of the target region. It facilitates zooming in the sequence until bases can be seen, while providing a panoramic, structural view for the sequence.
ü Adding, deleting and reordering the tracks.
ü Customizing of each track’s display.
ü Retrieving the present displayed sequence in Fasta format.
ü Continuing keyword search.
In what follows, we describe the detail about each function.
On main window, it is possible to shift and to zoom in/out the present observing sequence. These tasks can be carried out by:
ü Inputting value in the input box method.
ü Using shifting button method.
ü Using overview, ruler and zooming track method.
You can input values into the input boxes (like above image) inside the main window. The details about this method are as follows,
ü Species: Select the species to display. The selectable revisions would be changed with respect to the species. At present, only Medaka is possible to be selected.
ü Revision: Select the revision of the target specie’s genome. For Medaka, the revision name format is yyyymm. The selecting topics will change with respect to the selected revision.
ü Target: Select the target region to be displayed, from the selected species and selected revision. In the case of 200406 revision of Medaka, only scaffold name (e.g. scaffold123) can be selected but not the chromosome name (e.g. chr10). Chromosome numbers will be available in our later revision.
ü Start: Select the start position from your targeted sequence to display. The number should be 1-origin (i.e. the starting point will be 1).
ü End: Select the end point from your targeted sequence to display. The number should be 1-origin. If the ending point is greater than the starting point then the display will be normal. Otherwise, the display will be reverse. In both cases, the ending base will be included in the displaying sequence.
ü Width: Input the width in terms of pixels for output track display. Input the value that is suitable for your vision. Normally, it is assumed that width is equal to the screen width. It is also possible to input the width as 10 times bigger than the screen width. In that case, it seems to be convenient to use the scroll of the browser.
After manipulating the previous items, press the “Apply” button to shifting to the desired position of the sequence.
Moreover, there is another method by using the button (like above image). Normally, the input box method is useful only when it is important to have the correct position of the sequence. As number must be supplied in this method, sometime this process is quit cumbersome. So we provide more intuitive methods for your convenience. So the buttons related to these methods are as follows,
ü
Shifting (,
,
,
):
,
,
,
are buttons to
shifting the sequence. The
button scrolls sequence up to one
screen size where the
button scrolls half screen to
the left. The inverse buttons are for shifting to the right.
ü
Scaling (,
,
,
,
,
,
): The
button does scaling
down and the display will be half of the present size where
button
does scaling up and the display will be 2 times bigger than present size. The
fractional form button specifies the scale directly. The scaling unit is
1bp/pixel. For example,
means 1 bp is
displayed by 1 pixel, whereas
means 10 bp is
displayed by 1 pixel. So
means the scale
that enlarges the display up to the base level.
is a scale ratio
useful in displaying 1M bps on the screen. At
the time of scaling up or down the view points of the present display will be
remained unchanged.
In addition, there are some methods to shifting. These are overview, ruler, zooming trucks. You can shift the viewpoint by clicking any position of these tracks. For details, please refer to the respective details about the each track.
ü (rev) You can display the sequence in reverse mode by UT Genome Browser. Normally a sequence is displayed from left to right but you can reverse it with the (rev) button. An alternative way to display in reverse way is to input “start” value greater than “end” value in the Input Box. The same task can be done easily by clicking the (rev) button. As the (rev) button reverses start and end points of the present displayed sequence so this can change the display mode from normal to reverse or vice versa. If you press the (fasta) button on reverse mode, then the complementary strand of displayed strand will be retrieved.
ü (top) By pressing this button you can return to the main screen. Information about present displayed range, track and each track setting etc. will be preserved.
ü (clear) By pressing this button you can return to the main screen. Information about present displayed range, track and each track setting etc. will be initialized.
ü (Track) Addition, deletion, and rearranging can be performed by this button. Details description will be given later.
ü (fasta) Present displaying sequence can be retrieved with this button in fasta format. You will be instructed to save as a file name. For example, Medaka-200406-scaffold429 (64587-66612). fasta
When the (track) button of the main screen is pressed, the above track-editing screen will pop up. Here you can do addition, deletion and rearrangement of the tracks. This editing screen can be divided into three parts - displaying tracks, removed tracks, and adding new track.
ü Displaying track: Here, tracks’ deletion and rearrangement can be done to the present displayed screen. The deleted tracks will be replaced to next removed track.
ü Removed track: This is the list of all tracks, which are deleted at Displaying Tracks. It is something like garbage bin. Undo-processing is possible for the tracks that are displayed here.
ü Add new track: New tracks, which are not available at present display, can be added by this track. By setting the URL of the new tracks, push add button. If the new tracks’ URL is read properly, the new tracks will be added after the present displayed tracks.
On main screen, when the button,
beside the track name, is pressed, the display of that track can be customized
as above image. The customizing contents of the tracks differ from each other.
On the other hand, there are some tracks which contents cannot be customized at
all. For details, please refer to the individual tracks information.
This track shows the range of present displayed window in the total genome. In above image, for instance, the blue-colored viewpoint shows the range from 125kb to 145kb. Clicking on any position of this overview track moves the viewpoint the middle position. For example, when you click on around 100k positions, the viewpoint will move to 90k-110k.
Both the ruler track and the zooming track
are displaying same contents with different color. These two show the base
position number of the present displayed content with respect to the total
sequence. If you click any position of the ruler track, the viewpoint
will move to the middle position. For example, if the rightmost end of the
ruler track is clicked then the same effect will be happed as the button
of main screen is pressed. Before using the zoom button on main screen, it is
better to have the required region on middle position by using the ruler track
to smoothing future enlarge processing.
In addition to
the effect of the ruler track, if you click any position on the zooming track
then two times zoom will be happened. For example, it is same to click the
central button on zoom track as the button on main screen.
On the other hand, it is also same to clicking the rightmost end of the zooming
track as click
then click
on
main screen. This track is useful to zoom out our interested region on the
sequence..
Base color track changes the sequence to color mode. All four ATGC bases are displayed by converting to the individual color. It is natural to display each base pair by one pixel. By zooming in, you can see any specific region of the sequence in alphabet symbol (above bottom image).
It is also possible to customize the corresponding base pair color also. For example, if GC base pair are only colored then we can use it as a simple GC content track. To get the sequence itself, it is better to save from fasta track.
Gap Track is a track that displays gaps in a scaffold. By clicking any place except the gap, we can get the whole contig sequence in FASTA format. It is different from the (fasta) button in the main window, because it does not retrieve the displayed sequence only, but saves the whole contig sequence.
This track displays the ratio of GC contents for 5 pixels. If the pixel number of a base pair is more than 5, the coloring will be done to tell whether it is GC or AT.
This track displays the mapping result of Fugu, Zebra fish and Medaka’s Est to the Medaka genome, done by ALPS (http://alps.gi.k.u-tokyo.ac.jp). The mapping region for the gene is displayed by line and the exons are shown in rectangular box. The arrow sign shows the plus or minus strand of the genome. The above image displays Medaka’s ESTs in black color, while the green color indicates cDNAs of Fugu. The current data sources of mapped genes are Medaka Unigene Build#10, Zebra fish Unigene Build#71, and Fugu ensemble pufferfish v21.2.c1( 10th May, 2004). We will revise the alignment periodically in response to the update of the data sources.
If the button on the left side of
Mapped Gene Track is pressed, the display setting windows will pop up. There
are three parts of this setting; namely, changing the total display style,
alternating the mapping results if a gene will be displayed or not, and modifying
the color of a gene. The setting, whether to show or not, will display all
which fulfill the conditions. The color setting can be done by “use this color”
button beside the track.
Style select setting
Predicted genes’ display setting can be changed here. There are four kinds of display style. These styles are full, pack, small and dense.
ü
Full: In full
style, each gene’s position is displayed on one line. On left side the Genebank
accession number or Ensemble Gene ID is written.
ü
Pack: By showing
individual alignments together with the names (Genebank Acc or Ensemble Gene
ID) of genes on left side in multiple on one line, the pack style is more
compact than full style.
ü
Small: In small
style, the up and down spaces for one gene are reduced to the minimum. Unlike
the full style and the pack style, this does not display the names of the
genes.
ü
Dense: In this
style, only the positions of aligned exons are displayed on one line.
At present except the full style, if the numbers of the mapped genes are more than 200 in a range of more than 50kbp, the graphical view of the genes will be displayed.
FormSpecies disp setting
Here we can set which species among Medaka, Zebra fish, Fugu’s mapping results are displayed or not. On default, all species results will be displayed. Only the species, which are checked on the Checkbook, will be displayed.
FromSpecies color setting
The color distinguishing for Medaka, Zebra fish, Fugu’s can be done here.
MatchRatio gradation setting
Gene’s color can be changed with respect to its match ratio. Provided colors for the match ratios 0.7 and 1.0 respectively, the system automatically gradates the coloring of the alignments between the lower and upper match ratios.
MatchRatio ulbound setting
You can specify the range of the mapping ratio by inputting the lower and upper bounds into the corresponding boxes. Alignments of match ratios within the range are only displayed.
CoverRatio gradation setting
Gene’s color can also be changed with respect to its cover ratio. Provided colors for the cover ratios 0.4 and 1.0 respectively, the system automatically gradates the coloring of the alignments between the lower and upper cover ratios.
CoverRatio ulbound setting
You can specify the range of the cover ratio by inputting the lower and upper bounds into the corresponding boxes. Alignments of cover ratios within the range are only displayed.
Stage disp setting
Here, you can fix which development stage’s expression EST should be displayed.
Stage color setting
Color distinguishing can be done with respect to development stage.
In full, pack and small style, each gene can be clicked on. By clicking each gene, you can see the details of this gene.
This track displays the predicted genes by Genscan. The predicted genes are viewed on lines, where the exons parts are on rectangular shape. Arrow sign presents the strand of the genes. The above image shows the predicted genes in pack style.
Display settings
If the mark button on the left
side of Genscan Track is clicked, the display setting will pop up. Here are the
details of this setting.
Style selects setting
Predicted genes’ display setting can be changed here. There are four kinds of display style. These styles are full, pack, small and dense, and they are similar to those styles in Mapped Gene Track.
Linkage of each gene
In full, pack and small style each gene can be clicked on. After clicking each gene, the details of prediction by Genscan will be appeared.
Fugu Scaffold track
Description
This track shows Fugu/Medaka homologous scaffolds detected by ALPS alignment program. Fugu scaffolds are denoted by boxes connected by arrows. The boxes represent regions of high homology (match ratio > 60%) aligned by ALPS. The arrows represent low homology regions or gaps in Fugu scaffold. The direction of arrows indicates the orientation of Fugu/Medaka alignments. Clicking on a Fugu scaffold will open a new window to display its dotplot with Fugu scaffold sequence provided in the same page. The Fugu sequence (Fugu v.2.0) was downloaded from JGI.
Method
Fugu scaffold sequences are split into non-overlapping 300mer sequences and these 300mer sequences are mapped to Medaka scaffolds with ALPS. ALPS alignments with match ratio less than 60% are discarded and remaining alignments are chained by longest monotone subsequence algorithm. Chains consist of more than 10 alignments are displayed in the track. Note that inversions or microrearrangements are not shown. Only the longest monotone subsequence is displayed for each Fugu scaffold.
Linkage of each scaffold
In full, pack and small style each gene can be clicked on. After clicking each scaffold, the detailed alignments will be appeared as like dot plot. Dotplot shows homologous regions of Fugu/Medaka scaffolds as diagonal runs of dots. Each dot, plotted based on sequence similarity score, indicates that significantly many seed matches are found between corresponding regions. The sequence similarity score is defined in such a way that tandem repeats are not assigned high scores while unique sequences are assigned high scores.
Ramen Assembler / UT Genome Browser Team Members
Ramen Genome Assembler Development Team
ü
Development of “Ramen” genome assembler and
assembly of medaka genome:
Masahiro Kasahara and Shin Sasaki
ü
Development of “Ramen Viewer” for genome
assembly:
Yukinobu Nagayasu
UT Genome Browser Development Team
ü
Design and development of UT Genome Browser,
keyword search function, libraries for describing tracks:
Yukinobu Nagayasu and Koichiro Doi
ü
Online mapping function for query sequences:
Tomoyuki Yamada
ü
Comparative Genomics Track:
Yoichiro Nakatani and Wei Qu
ü
Gene Prediction:
Ahsan Budrul
ü
Mapped Gene Track:
Yasuhiro Kasai
ü
Database access accelerators:
Takehiro Furudate and Atsushi Mori
ü
Overall management:
Koichiro Doi and Shinichi Morishita
Acknowledgements
This work has been supported by Grant-in-Aid for Scientific Research on Priority Areas (Grant#12209003) to Shinichi Morishita.
Ramen Assembler Development Team members are indebted to Yuji Kohara and Tadasu Shin-i for their technical discussions on the whole genome shotgun assembly.
Members in the UT Genome Browser Development Team are grateful to Kiyoshi Naruse, Daisuke Kobayashi, and Takanori Narita for their valuable input to improve the functions of the browser in a variety of ways.