| 140 | Given the limited availability of expressed gene tags (EST) and |
| 141 | full-length enriched cDNA sequences, a considerable amount of 5’-end |
| 142 | serial analysis of gene expression (5’SAGE) data, the 19-20 5’-end |
| 143 | bases of mRNAs, have been collected, to detect transcription start |
| 144 | sites (TSSs) and subsequently protein-coding regions were predicted |
| 145 | from the TSSs by using GENSCAN. In all, 1,186,742 5’SAGE tags were |
| 146 | collected from a mixture of cDNA from 0- to 7-day-old day old medaka |
| 147 | embryos and adult body tissues. Of these, 841,235(70.9%) were aligned |
| 148 | to unique positions in the medaka draft genome, but most of them were |
| 149 | duplicates and were expressed from 344,266 transcription start sites. |
| 150 | Stated another way, multiple tags are often derived from one locus, |
| 151 | and the frequency approximates the expression level of the gene |
| 152 | encoded at that locus. From these TSSs, 20,141 genes were predicted, |
| 153 | and individual predicted genes were supported by the evidence of some |
| 154 | 5’SAGE tags, which was effective in reducing the false-positive |
| 155 | ratio of predicted genes. These evidence-based predicted medaka genes |
| 156 | were compared with human genes comprehensively, and about 57.7% of |
| 157 | the predicted medaka genes have human orthologues, and 21.6% of the |
| 158 | genes constitute medaka-human reciprocally best 1:1 orthologue pairs, |
| 159 | indicating that medaka could serve as a model system for humans. |
| 160 | |