Cent advances in sequencing technologies enable the large-scale identification of genes that are affected by various genetic alterations in cancer. However, understanding tumor development requires insights into how these changes cause altered protein function and impaired network regulation in general and/or in specific cancer types. Results: In this work we present a novel method called iSiMPRe that identifies regions that are significantly enriched in somatic mutations and short in-frame insertions or deletions (indels). Applying this unbiased method to the complete human proteome, by using data enriched through various cancer genome projects, we identified around 500 protein regions which could be linked to one or more of 27 distinct cancer types. These regions covered the majority of known cancer genes, surprisingly even tumor suppressors. Additionally, iSiMPRe also identified novel genes and regions that have not yet been associated with cancer. Conclusions: While local somatic mutations correspond to only a subset of genetic variations that can lead to cancer, our systematic analyses revealed that they represent an accompanying feature of most cancer driver genes regardless of the primary mechanism by which they are perturbed during tumorigenesis. These results indicate that the accumulation of local somatic mutations can be used to pinpoint genes responsible for cancer formation and can also help to understand the effect of cancer mutations at the level of functional modules in a broad range of cancer driver genes. Reviewers: This article was reviewed by S dor Pongor, Michael Gromiha and Zolt G p i. Keywords: Cancer, Driver gene, Somatic mutation, Protein functional modules, Missense mutation, Insertion, DeletionBackground Cancer genome projects use next generation sequencing technologies to identify somatic mutations ?most often in exonic regions ?that discriminate tumor cells from normal cells with the aim to understand the basis of the most common genetic disease [1?]. The observed genetic alterations showed that the genetic landscape of cancer is complex, affecting a much larger number and* PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/27324125 Correspondence: [email protected]; [email protected] 1 Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, 2 Magyar Tud ok krt, Budapest H-1117, Hungary 3 MTA-ELTE Lend et Bioinformatics Research Group, Department of Biochemistry, E v Lor d University, 11/c P m y P er stny, Budapest H-1117, Hungary Full list of author information is available at the end of the articlevaried types of genes than previously expected [1, 6]. There is also heterogeneity at the level of the underlying genetic mechanisms that lead to the variations. With advanced technologies, cancer genome projects are able to produce a more complete catalog of the variations. These include single point mutations and short insertions or deletions that can have a localized effect on a single gene and larger structural aberrations such as copy number alterations and genomic rearrangements that Saroglitazar Magnesium web generally affect multiple genes. These data are cataloged in various databases, such as the COSMIC database, which now contains over millions of variations that are dominated by simple mutations [7, 8]. Most of the observed variations, however, correspond to randomly occurring passenger?2016 M z os et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://cr.