About the database
The HIV mutation browser is a database of mutagenesis and mutation data on HIV collected from the scientific literature. The data has been identified and catalogued using computational text-mining methods. A researcher can use the database to find literature describing the phenotype of a mutation, and/or experimental data describing the effect of a mutation.
The resource is a collaboration between the Briggs Group at
European Molecular Biology Laboratory (EMBL) in Heidelberg and the Schneider Group at the
Luxembourg Centre for Systems Biomedicine (LCSB) in Luxembourg.
Source of mutation data
The content of the database is created by text-mining available HIV literature to find mutagenesis information. A list of articles is retrieved from PubMed using the search terms "HIV" and "Human Immunodeficiency Virus". All articles from this list that are available to us and that we are permitted to analyse computationally (see Publishers section) are downloaded and processed. We currently have permission to process approximately 40% of the literature, including the majority of basic-science publications on HIV.
The current version of the database (
version 1.0), identified approximately
275,000 papers of interest in pubmed
- Number of papers permitted to processed: 130,272
- Number of papers processed: 130,272
- Number of papers containing mutational information: 6,101
- Number of mutations: 119,201
- Number of distinct mutations: 8,258
The database is updated on a monthly basis to add the latest HIV literature.
Contributing Publishers
Unfortunately the analysis of scientific literature using computational text-mining is prohibited by the majority of publishers access licenses. Thankfully, the majority of the publishing companies and societies that we approached granted us permission to text-mine and index HIV mutation information contained in their literature. We thank the following publishers who permit us to analyse their content.
Table 1. List of publishers that have given permission to the HIV mutation browser to access, data-mine and display articles.
List of journals text mined for the database
If an article is missing from the database it is possible that have not yet asked for or obtained permission from the publisher to process it for inclusion in the data. If you would like us to add to a publisher or journal to the resource, feel free to contact us at
feedback@hivmut.org. We have been denied permission to analyse literature published by the American Chemical Society. Articles from their journals, the most relevant of which is "Biochemistry", are not indexed in the database. The current list of permitted journals is presented in table 2.
|
- BMC Biochem.
- BMC Bioinformatics
- BMC Biophys
- BMC Biotechnol.
- BMC Blood Disord
- BMC Cancer
- BMC Cell Biol.
- BMC Chem Biol
- BMC Clin Pathol
- BMC Clin Pharmacol
- BMC Complement Altern Med
- BMC Dermatol.
- BMC Evol. Biol.
- BMC Fam Pract
- BMC Gastroenterol
- BMC Genet.
- BMC Genomics
- BMC Health Serv Res
- BMC Immunol.
- BMC Infect. Dis.
- BMC Int Health Hum Rights
- BMC Med
- BMC Med Educ
- BMC Med Ethics
- BMC Med Genomics
- BMC Med Imaging
- BMC Med Inform Decis Mak
- BMC Med Res Methodol
|
- BMC Med. Genet.
- BMC Microbiol.
- BMC Mol. Biol.
- BMC Musculoskelet Disord
- BMC Nephrol
- BMC Neurol
- BMC Neurosci
- BMC Nurs
- BMC Oral Health
- BMC Palliat Care
- BMC Pediatr
- BMC Pharmacol.
- BMC Pregnancy Childbirth
- BMC Psychiatry
- BMC Public Health
- BMC Pulm Med
- BMC Res Notes
- BMC Struct. Biol.
- BMC Surg
- BMC Syst Biol
- BMC Urol
- BMC Womens Health
- EMBO J.
- EMBO Rep.
- J. Biol. Chem.
- J. Gen. Virol.
- J. Virol.
- Nat Protoc
|
- Nat Rev Drug Discov
- Nat. Biotechnol.
- Nat. Cell Biol.
- Nat. Chem. Biol.
- Nat. Genet.
- Nat. Immun.
- Nat. Immunol.
- Nat. Methods
- Nat. Neurosci.
- Nat. Rev. Cancer
- Nat. Rev. Genet.
- Nat. Rev. Microbiol.
- Nat. Rev. Mol. Cell Biol.
- Nat. Rev. Neurosci.
- Nat. Struct. Biol.
- Nat. Struct. Mol. Biol.
- PLoS Biol.
- PLoS Clin Trials
- PLoS Comput. Biol.
- PLoS Curr
- PLoS Genet.
- PLoS Med.
- PLoS Negl Trop Dis
- PLoS ONE
- PLoS Pathog.
- Proc. Natl. Acad. Sci. U.S.A.
|
Table 2. List of journals that have given permission to the HIV mutation browser to access, data-mine and display articles.
Database Statistics
Protein Statistics
The number of mutations per protein varies widely. Pol which contains the 3 enzymatic chains, a protease, an integrase and a reverse transcriptase is by far the best studied protein.
Gene |
Publications |
Distinct Mutations |
Distinct Positions |
gag | 1,203 | 1,531 | 467 |
pol | 4,390 | 4,396 | 1,240 |
env | 1,044 | 2,329 | 727 |
tat | 255 | 236 | 75 |
nef | 277 | 385 | 164 |
rev | 62 | 175 | 78 |
vif | 122 | 282 | 146 |
vpr | 153 | 145 | 59 |
vpu | 73 | 127 | 52 |
Table 3. Distribution of the mutation data across the proteins of the HIV proteome.
Source of mutation
Data from 2,639 different journals is curated in the database. The top 20 journals by the number of mutation annotated is listed below.
Journal |
Mutations |
Papers with mutations |
Papers |
Journal of virology | 6,639 | 1,666 | 11,628 |
Antimicrobial agents and chemotherapy | 2,415 | 325 | 1,915 |
The Journal of biological chemistry | 2,145 | 458 | 1,946 |
PloS one | 2,040 | 471 | 7,497 |
Virology | 1,633 | 277 | 2,495 |
Antiviral research | 1,541 | 121 | 940 |
Retrovirology | 1,393 | 145 | 651 |
Proceedings of the National Academy of Sciences of the United States of America | 1,185 | 237 | 3,671 |
Journal of molecular biology | 1,083 | 160 | 1,372 |
Journal of clinical microbiology | 964 | 83 | 1,941 |
PLoS pathogens | 961 | 166 | 1,009 |
Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology | 880 | 92 | 1,085 |
Viruses | 865 | 49 | 240 |
The Journal of antimicrobial chemotherapy | 758 | 75 | 143 |
Nucleic acids research | 595 | 99 | 1,508 |
Virus research | 415 | 52 | 716 |
The Journal of general virology | 410 | 65 | 602 |
Journal of virological methods | 386 | 51 | 823 |
AIDS research and human retroviruses | 373 | 19 | 94 |
AIDS (London, England) | 373 | 21 | 199 |
Table 4. Top 20 journals in the HIV mutation browser by number of mutations annotated.
Text-mining
A paper describing the text-mining algorithm for the HIV Mutation Browser resource is currently in preparation. We will update here upon acceptance of the article.
Source of ancillary data
The HIV Mutation Browser integrates information from several resources to increase the ease of interpretation of the available HIV mutation and mutagenesis data. These sources are:
- Homologue information
- HIV Subtype Reference Protein sequences were retrieved from the Los Alamos National Laboratory.
- Motif information
- Motifs were retrieved from the ELM database and from UniProt annotation.
- Protein structure information
- Protein structures were retrieved from the RCSB Protein Data Bank (PDB).
- Protein feature information
- Protein information was retrieved from UniProt.
- Disorder information
- Intrinsically disorder predictions for the proteins was calculated using the IUPred algorithm.
Further HIV resources
HIV Drug Resistance database
Viralzone HIV entry
NIH HIV sequence database
NIH HIV-Human protein interaction database
Useful HIV links
PDB guide to the structural biology of HIV (pdf poster 14mb)
Cell SnapShot: HIV-1 Proteins
NIH: Understanding the biology of HIV
Citation
Davey NE*, Satagopam VP*, Santiago-Mozos S*, Villacorta-Martin C, Bharat TA, Schneider R, Briggs JA.
The HIV Mutation Browser: A Resource for Human Immunodeficiency Virus Mutagenesis and Polymorphism Data.
PLoS Comput Biol. 2014 Dec 4;10(12):e1003951. doi: 10.1371/journal.pcbi.1003951. eCollection 2014. [
PubMed]
License
Data in this resource can accessed for non-commercial use according to the
HIV Mutation Browser UELA.