Human Gene Information and miRNA Annotations

This dataset describes Information about 28,353 human genes and their miRNA annotations together with their Transcript ID, Gene ID, Gene symbol, Gene description, Species ID, Number of 3P-seq tags + 5 and Representative transcripts.


Proteins are built by using the information contained in molecules of messenger RNA (mRNA). Cells have several ways of controlling the amounts of different proteins they make. For example, a so-called ‘microRNA’ molecule can bind to an mRNA molecule to cause it to be more rapidly degraded and less efficiently used, thereby reducing the amount of protein built from that mRNA. Indeed, microRNAs are thought to help control the amount of protein made from most human genes, and biologists are working to predict the amount of control imparted by each microRNA on each of its mRNA targets.

The human and mouse databases started with Gencode annotations (Harrow et al., 2012), for which 3′ UTRs were extended, when possible, using RefSeq annotations (Pruitt et al., 2012), recently identified long 3′-UTR isoforms (Miura et al., 2013), and 3P-seq clusters marking more distal cleavage and polyadenylation sites (Nam et al., 2014). Zebrafish reference 3′ UTRs were similarly derived in a recent 3P-seq study (Ulitsky et al., 2012).

3P-seq data were available for seven developmental stages or tissues of zebrafish, enabling isoform profiles to be generated and predictions to be tailored for each of these. For human and mouse, however, 3P-seq data were available for only a small fraction of tissues/cell types that might be most relevant for end users, and thus results from all 3P-seq datasets available for each species were combined to generate a meta 3′-UTR isoform profile for each representative ORF. Although this approach reduces accuracy of predictions involving differentially expressed tandem isoforms, it nonetheless outperforms the previous approach of not considering isoform abundance at all, presumably because isoform profiles for many genes are highly correlated in diverse cell types (Nam et al., 2014).

Release 7.1

John Snow Labs Standard License

Source Citation

Vikram Agarwal, George W Bell, Jin-Wu Nam, David P Bartel. Predicting effective microRNA target sites in mammalian mRNAs. Computational and Systems Biologygenomics and Evolutionary Biology; AUG 12 2015.


Transcript_IDTranscription ID is assigned in the first step of gene expression, in which a particular segment of DNA is copied into RNA (especially mRNA) by the enzyme RNA polymerase.stringrequired : 1
Gene_IDName or identification/ID of a human gene (from UTR input file).stringrequired : 1
Gene_SymbolSymbol of a human gene (from UTR input file).stringrequired : 1
Gene_DescriptionSpecific description of a human gene’s basic physical and functional unit of heredity.string-
Species_IDName or identification/ID of species (from UTR input file).integerrequired : 1 level : Nominal
ThreeP_Sequence_TagsThreeP (3P) or chromosome 3 expressed sequence tag or EST is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and are instrumental in gene discovery and in gene-sequence determinationintegerrequired : 1 level : Nominal
Is_Representative_TranscriptRepresentative transcript shows the representative miRNA, which is the miRNA in this family with the lowest total context score. Although only one miRNA is chosen as the representative miRNA, all the other miRNAs of the miRNA family are also predicted to target the same target gene at the same target site(s).booleanrequired : 1
