|
Identifying transcription factor-encoding genes and conserved noncoding elements in human, mouse and fugu
Sequences for human transcription factors (TFs) were obtained from a published study [1] and redundancies were removed by a homology search against human RefSeq proteins. These proteins were then mapped to EnsEMBL genes (Version 37) [2]. Mouse orthologs were obtained from EnsEMBL BioMart and fugu orthologs were obtained from EnsEMBL BioMart combined with an INPARANOID search [3] to identify fish paralogs that have arisen due to fish-specific whole-genome duplication. Assembly versions used are human NCBI35 assembly, mouse NCBIM34 assembly and fugu FUGU4 assembly.
The gene loci of each orthologous set of human, mouse and fugu genes including the 5' and 3' flanking regions, were repeat-masked using RepeatMasker [4] and aligned using the global alignment program MLAGAN [5]. Conserved noncoding sequences were identified using the conservation visualization package VISTA [6]. Human-mouse conserved noncoding sequences were defined by the thresholds ≥70% identity over 100 bp, while human-fugu conserved noncoding sequences were defined by the thresholds ≥65% identity over 50 bp. Conserved noncoding elements (CNEs) were obtained after eliminating coding sequences, RNA genes and pseudogenes.
Binding sites in human-fugu CNEs for vertebrate TFs were predicted using TESS [7]. Overlapping binding sites bound by the same TFs were removed, and only string matches with log likelihood score ratio Lq > 0.98 and matrix matches with core similarity Sc > 90% and matrix similarity Sm > 80% were retained. In addition, only binding sites in human CNEs that have a corresponding prediction in fugu CNEs are displayed in this database.
|
| |
References
-
Messina, D.N., Glasscock, J., Gish, W., Lovett M. (2004)
An ORFeome-based analysis of human transcription factor genes and
the construction of a microarray to interrogate their expression.
Genome Res. 14, 2041-7.
-
Hubbard, T., Andrews, D., Caccamo, M., Cameron, G.,
Chen, Y., Clamp, M., Clarke, L., Coates, G., Cox, T., Cunningham, F., et al. (2005)
Ensembl 2005. Nucleic Acids Res. 33, D447–D453.
-
Remm, M., Storm, C.E., Sonnhammer, E.L. (2001)
Automatic clustering of orthologs and in-paralogs from
pairwise species comparisons. J Mol Biol. 314, 1041-52.
-
Smit, A. F. A. & Green, P. RepeatMasker version open-3.1.5. http://www.repeatmasker.org
-
Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F.,
Davydov, E., Green, E.D., Sidow, A., and Batzoglou, S;
NISC Comparative Sequencing Program. (2003)
LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale
Multiple Alignment of Genomic DNA.
Genome Res. 13, 721-731.
-
Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M., Dubchak, I. (2004)
VISTA: computational tools for comparative genomics.
Nucleic Acids Res. 32, W273-9.
-
Schug, J. (2003) Using TESS to Predict Transcription Factor Binding Sites in DNA Sequence. In Current Protocols in Bioinformatics. Edited by Baxevanis AD. J. Wiley and Sons.
|
|