Others titles
- Gene and microRNA Family Annotations
- Gene Prediction Site and miRNA Family Annotations
- Gene and miRNA Target Prediction Annotations
Keywords
- Microrna
- miRNA
- microRNA
- microRNA Sequencing
- Prediction Site
- miRNA Profiling
- miRNA Target Prediction
- miRNA Cancer
Gene and miRNA Family Annotations
This dataset describes 9991 microRNA sequences and families with annotations for Seed+m8, Species ID, miRBase ID, Mature Sequence, Family Conservation, and miRBase Accession.
Get The Data
- ResearchNon-Commercial, Share-Alike, Attribution Free Forever
- CommercialCommercial Use, Remix & Adapt, White Label Log in to download
Description
The miRNAs conserved to fish have been grouped into 87 families, each with a unique seed region. On average, each of these families has >400 conserved targeting interactions, and together these interactions involve most mammalian mRNAs (Friedman et al., 2009). In addition, many nonconserved interactions also function to reduce mRNA levels and protein output (Farh et al., 2005; Krutzfeldt et al., 2005; Lim et al., 2005; Baek et al., 2008; Selbach et al., 2008). Accordingly, miRNAs have been implicated in a wide range of biological processes in worms, flies, and mammals (Kloosterman and Plasterk, 2006; Bushati and Cohen, 2007; Stefani and Slack, 2008). Critical for understanding miRNA biology is the accurate prediction of miRNA–target interactions. Although numerous advances have been made, accurate and specific target predictions remain a challenge.
All RNA molecules are made up of a sequence of bases, each commonly known by a single letter—‘A’, ‘U’, ‘C’ or ‘G’. These bases can each pair up with one specific other base—‘A’ pairs with ‘U’, and ‘C’ pairs with ‘G’. To direct the repression of an mRNA molecule, a region of the microRNA known as a ‘seed’ binds to a complementary sequence in the target mRNA. ‘Canonical sites’ are regions in the mRNA that contain the exact sequence of partner bases for the bases in the microRNA seed.
When partitioning miRNA families according to their conservation level, it commenced with a high-confidence set of human miRNAs supported by small-RNA sequencing (T Tuschl, personal communication) that shared nucleotides 2–8 with a mouse miRNA supported by small-RNA sequencing (Chiang et al., 2010). Then 100-way multiz alignments were extracted from each mature miRNA from the UCSC Genome Browser and the number of species for which nucleotides 2–8 of the miRNA that did not change were counted.
As an initial pass, those conserved among ≥40 species were classified as mammalian conserved, and those conserved among >60 species were classified as more broadly conserved among vertebrate species. Due to poorer quality alignments for more distantly related species, this procedure misclassified several more broadly conserved miRNAs as mammalian conserved. Therefore, mammalian conserved miRNAs that aligned with >90% homology to a mature miRNA from chicken, frog, or zebrafish, as annotated in miRBase release 21 (Kozomara and Griffiths-Jones, 2014), were re-classified as more broadly conserved.
In addition, miR-489 was included in the broadly conserved set of TargetScanHuman (but not TargetScanMouse) despite having a seed substitution in mouse. Some mammalian pri-miRNAs give rise to two or three abundant miRNA isoforms that have different seeds, either because both strands of the miRNA duplex load into Argonaute with near-equal efficiencies or because processing heterogeneity gives rise to alternative 5′ termini (Azuma-Mukai et al., 2008; Morin et al., 2008; Wu et al., 2009; Chiang et al., 2010).
To annotate these abundant alternative isoforms, all isoforms expressed were identified at ≥33% of the level of the most abundant isoform, as determined from high-throughput sequencing (allowing for 3′ heterogeneity within each isoform). These isoforms were carried forward as mammalian conserved isoforms if they also satisfied this property in the mouse small-RNA sequencing data (Chiang et al., 2010), and as broadly conserved isoforms if they satisfied this property in zebrafish small-RNA sequencing data available in miRBase release 21.
Adhering to the miRNA naming convention, if two isoforms mapped to the 5′ and 3′ arms of the hairpin they were named ‘–5p’ and ‘–3p’, respectively, and if two isoforms were processed from the same arm they were named ‘.1’ and ‘.2’ in decreasing order of their abundance, as detected in the human.
All mature miRNAs were downloaded from miRBase release 21 (Kozomara and Griffiths-Jones, 2014). Those that matched a conserved miRNA at nucleotides 2–8 were considered part of that miRNA family. All miRNAs and miRNA isoforms annotated in miRBase but not meeting the criteria for conservation in mammals or beyond were also grouped into families based on the identity of nucleotides 2–8 and were classified as poorly conserved miRNAs (which included many small RNAs misclassified as miRNAs).
About this Dataset
Data Info
Date Created | 2006-10 |
---|---|
Last Modified | 2021-09-01 |
Version | Release 8.0 |
Update Frequency |
Irregular |
Temporal Coverage |
2006-2021 |
Spatial Coverage |
N/A |
Source | John Snow Labs; TargetScanHuman Prediction of microRNA Targets; |
Source License URL | |
Source License Requirements |
N/A |
Source Citation |
N/A |
Keywords | Microrna, miRNA, microRNA, microRNA Sequencing, Prediction Site, miRNA Profiling, miRNA Target Prediction, miRNA Cancer |
Other Titles | Gene and microRNA Family Annotations, Gene Prediction Site and miRNA Family Annotations, Gene and miRNA Target Prediction Annotations |
Data Fields
Name | Description | Type | Constraints |
---|---|---|---|
MicroRNA_Family | A microRNA or miRNA family is comprised of miRNAs with the same seed+m8 sequence (positions 2-8 of the mature miRNA). Family of MicroRNAs short for miRNA are small non-coding molecules (containing about 22 nucleotides) found in plants, animals and some viruses, that function in RNA silencing and post-transcriptional regulation of gene expressions. | string | required : 1 |
Seedm8_Sequence | TargetScanS defines a seed as positions 2-7 of a mature miRNA. | string | required : 1 |
Species_ID | Name or Identification of species (from UTR input file). | integer | level : Nominalrequired : 1 |
MiRBase_ID | MiRBase is a biological database that acts as an archive of microRNA sequences and annotations. | string | required : 1 |
Mature_Sequence | The miRNA precursor coming from a genome processed by an enzymatic complex, and only a sequence of approximately 20 nucleotides is conserved, which is the mature miRNA. | string | required : 1 |
Family_Conservation | MiRNA families in TargetScan 8.0 with conservation cutoffs. | integer | level : Nominal |
MiRBase_Accession | MiRBase accession number is the only stable identifier for a MiRBase entry. miRNA names may change from those published as relationships between sequences change. This allows miRNAs to be tracked in the database, allowing names to evolve to remain consistent, whilst providing the user with full access to the data and history. | string | - |
Data Preview
MicroRNA Family | Seedm8 Sequence | Species ID | MiRBase ID | Mature Sequence | Family Conservation | MiRBase Accession |
let-7 | UAUACGA | 9615 | cfa-let-7d | CUAUACGACCUGCUGCCUUUCUUAG | -1 | MIMAT0034396 |
let-7/98 | GAGGUAG | 8364 | xtr-let-7a | UGAGGUAGUAGGUUGUAUAGUU | 2 | MIMAT0003667 |
let-7/98 | GAGGUAG | 8364 | xtr-let-7b | UGAGGUAGUAGUUUGUGUAGUU | 2 | MIMAT0003550 |
let-7/98 | GAGGUAG | 8364 | xtr-let-7c | UGAGGUAGUAGGUUGUAUGGUU | 2 | MIMAT0003644 |
let-7/98 | GAGGUAG | 8364 | xtr-let-7e | UGAGGUAGUAGGUUGUUUAGUU | 2 | MIMAT0003666 |
let-7/98 | GAGGUAG | 8364 | xtr-let-7f | UGAGGUAGUAGAUUGUAUAGUU | 2 | MIMAT0003645 |
let-7/98 | GAGGUAG | 8364 | xtr-let-7g | UGAGGUAGUUGUUUGUACAGU | 2 | MIMAT0003646 |
let-7/98 | GAGGUAG | 8364 | xtr-let-7i | UGAGGUAGUAGUUUGUGCUGU | 2 | MIMAT0003647 |
let-7/98 | GAGGUAG | 8364 | xtr-miR-98 | UGAGGUAGUAAGUUGUAUUGUU | 2 | MIMAT0003581 |
let-7/98 | GAGGUAG | 9598 | ptr-let-7a | UGAGGUAGUAGGUUGUAUAGUU | 2 | MIMAT0007936 |