Proteins are built by using the information contained in molecules of messenger RNA (mRNA). Cells have several ways of controlling the amounts of different proteins they make. For example, a so-called ‘microRNA’ molecule can bind to an mRNA molecule to cause it to be more rapidly degraded and less efficiently used, thereby reducing the amount of protein built from that mRNA. Indeed, microRNAs are thought to help control the amount of protein made from most human genes, and biologists are working to predict the amount of control imparted by each microRNA on each of its mRNA targets.
The human and mouse databases started with Gencode annotations (Harrow et al., 2012), for which 3′ UTRs were extended, when possible, using RefSeq annotations (Pruitt et al., 2012), recently identified long 3′-UTR isoforms (Miura et al., 2013), and 3P-seq clusters marking more distal cleavage and polyadenylation sites (Nam et al., 2014). Zebrafish reference 3′ UTRs were similarly derived in a recent 3P-seq study (Ulitsky et al., 2012).
3P-seq data were available for seven developmental stages or tissues of zebrafish, enabling isoform profiles to be generated and predictions to be tailored for each of these. For human and mouse, however, 3P-seq data were available for only a small fraction of tissues/cell types that might be most relevant for end users, and thus results from all 3P-seq datasets available for each species were combined to generate a meta 3′-UTR isoform profile for each representative ORF. Although this approach reduces accuracy of predictions involving differentially expressed tandem isoforms, it nonetheless outperforms the previous approach of not considering isoform abundance at all, presumably because isoform profiles for many genes are highly correlated in diverse cell types (Nam et al., 2014).