字幕表 動画を再生する 英語字幕をプリント Sunyaev Shamil: Well, it’s great to present for this group. John, thank you for the introduction. And a lot of this work is in close collaboration with John’s lab. So I was going to talk about how epigenetics actually may control genetics. So throughout this meeting there are multiple talks pointing to the importance of genetic variation in understanding of a genetic landscape. So there is a lot of justifiable interest in how genetics controls epigenetics. So briefly all the studies can be summarized as, pick a favorite epigenetic feature. QTL studies, right? So you take -- you have eQTLs, methylation QTLs, chromatin accessibility QTLs; all types of QTLs. And of course we believe that understanding the epigenetic variation on epigenetic features can allow us to go a long way in understanding the mechanism of the [unintelligible] association, the biology, and so forth. However, we’re interested in the inverse problem, is, what is the fact of epigenetic landscape on genetics? And one of these effect is how a genomic landscape controls mutation, right? Because the source of variation is mutation. So what we’ve been doing, we’ve been looking at data on mutations, both in germ line context -- so we are now sequencing data for multiple trios and quads; I wouldn’t be talking about this today -- and somatic mutations, coinco-somatic [spelled phonetically] mutations, where -- so these are differences between parents and children, where changes are happening in the DNA, and here differences are between usually blots and control cell type and cancer cell. So this is the idea, and the idea is to see, what are the facts of epigenetic variables on these changes in the DNA sequence? Why are we interested? So we’re interested for multiple reasons. One interest is in statistical and medical genetics fields because understand of mutation rate models would inform methods for gene mapping, and I’ll talk about that in a second. Another big interest of ours is evolutionary biology, and there are two reasons we care from an evolutionary biology perspective. One is that mutation rate is a key parameter in a lot of evolutionary models, right? If we want to infer selection, if we want to understand differences between populations, differences between species, date speciation advance, we have to have some understanding of mutation rate. The other interest is evolution of mutation rate itself, right? Because cell controls mutational events, and mutation rate is one phenotype which is under selection. So the question now is not only why mutation rate is what it is, but -- so not only what is mutation rate but why is mutation rate what it is? Also there is of course interest from biology perspective -- biology of DNA repair and biology of DNA replication. So for maybe couple minutes talk about statistical genetics piece of work. So there is a growing interest in gene mapping using de novo mutations. There are two areas specifically; it’s genetics of neuropsychiatric diseases and cancer genomics. And the idea here is to map genes involved in disease progression, or cancer driver genes, using recurrents. So this is not your classic genetic mapping. For example, LD-based association or linkage; this is mapping using mutations, and this is the only mapping strategy which is possible to learn in sectional systems. The idea is very simple: you find different patients carrying mutations in the same gene, collapse them by gene, and you can make an inference that this is a significantly mutated gene, right? There are more mutations than you expect. Now the big question is, what do you actually expect, right? Because in these studies you cannot run case control. You cannot really look at how many mutations in this gene happens in cases versus how many in controls because you would lack the statistical power to do so. So the idea is to do some sort of model. So for example, the simplest approach -- and this is -- was used in early papers on the subject -- you take some estimate of genomic mutation rate using independent samples, then you evaluate from an ability to observe recurrent events in a given gene, correct for multiple testing, right? So why this is not the correct strategy because if you have heterogeneity among samples, especially problem in cancer genomics, where you have one -- some samples basically filled with mutations; others have much lower mutation densities -- you will make flushes, inferences, and this mapping will generate a lot of false positives. So there is another strategy; another strategy is the following. So you take -- look at your real data and just permute data around. Look at permutation expanse, multiple permutations, and you can evaluate how frequently you see these do mutations independently in the same gene. And the problem here, of course, is mutation rate variation, because if mutation rate is heterogeneous along the genome, this may simply be a mutational hot spot which you don’t know about. So what we need -- we need careful model of local mutation rate. And the problem in cancer is that, because of accessibility to specific mutagens or specific genetic changes in repair systems, and I’ll be talking about that. You may have a situation where this mutation rate heterogeneity is patient-specific, not just cancer type-specific but specific to individual patients. Now, over five years ago, we -- again, collaboration with John’s lab -- made an observation that a bold density of human SNPs and human-chimpanzee diversions is increased in later replicating regions of the genome, compared to earlier replication regions of the genome. So we have certain epigenomic variables that control, potentially, mutation rate, so this is stratification of S-phase of cell cycle into four regions. And we’ve seen increase in both the versions and polymorphous. So this fueled our interest in the question. And it turned out that the same effect is observed in cancer genomics, so this in collaboration with [unintelligible]’s lab. We see that the risk affect the replication timing in pretty much every single cancer type we analyzed, so there is increase of mutation density later replication compared to earlier replication. And some genes that are located in later-replicating regions are sort of usual false positives of mutation mapping and in cancers. There is another variable, which is level of gene expression. Genes that are expressed at high levels have less mutations in cancer genomes. And the standard idea as the culprit is, the transcription coupled repair mechanism. And I’ll show you the pathway because I’ll be -- and I’ll show it then again because I’ll be talking about this pathway throughout the talk. So the idea is the following: in the resolution in DNA, one of the mechanisms is nucleic acid scission repair, which starts with the FDH, which is helicase on one’s DNA. There is precision step in bold direction, there is a resynthesis using the other strand as the template. Now, this mechanism -- this is a very accurate repair mechanism which can be recruited in two different ways. So one way is stalled RNA polymerase, so if the resolution and DNA transcription cannot proceed forward. And polymerase recruits nucleic acid scission repair systems downstream. The other mechanism is what we call global genome repair, is active search by the SPC complex for lesions in DNA. So first thing we decided to check is, “Okay, we think that this mechanism leads to reduction of mutation density in actively transcribed genes. What happens in active regulatory elements?” We decided to look within DNA’s one hypersensitive sites; I don’t have to introduce them for this audience. You’re all familiar with that. My naïve expectation was that mutation density may be elevated because these sites are not protected by nucleosomes; maybe they are more accessible to some sort of damage and so forth. So when we looked at multiple cell types -- this was published last year -- multiple myeloma, colon cancer, melanoma, lung cancer, CLL, and this scale depends on number of samples we had -- we see reduction in every single cell type, reduction of mutation density within regions of open chromatin. Now, what’s important -- the effect is very well localized. I’m not talking mega-basis or hundreds of KB; this is one kilobase resolution, right? And the reduction is compared to immediate flank, and I’m not going through many regression models, how to take into account effective location, effective nucleotide composition, mutational spectrum in this cancer type, and so forth. Okay, so what can be behind this effect? So we decided to look at one system specifically in melanoma, and there are several reasons. One is, there are multiple samples available; it’s high mutation-rate cancer; and, most importantly, we know the mutation source. We have a signature, and we believe this signature corresponds to UV damage of DNA. And we know that the major repair mechanism acting on this signature is nucleic acid scission repair, so we can make some biological hypotheses from looking at this system. Okay, so now it’s little more coincitative [spelled phonetically] presentation on the same data. These are intergenic regions; these are intronic regions; we have mutation density and we have chromatin accessibility in coincitative fashion. This is just number of mapped DNase1 cleavages. So what we think is this is the action of transcription coupled repair of the difference between intergenic and intronic regions. However, within each of those there is very strong dependency on chromatin accessibility. Okay, why is this happening? So there are many possibilities. One is that what we’re seeing is purifying selection in regulatory elements, so maybe mutations are happening but negative selection purges them, and we’re not seeing them. So I don’t have time to discuss this in detail, but as somebody who unsuccessfully spent now almost three years looking for signatures of purifying selection in cancers, I don’t believe in that, right? So in order to assume that this is the case, selection must be dramatically stronger than encoding regions of the genome; we never observe that. Another possibility is this -- is association with replication timing or other epigenetic feature, not necessarily specifically with chromatin accessibility. So we test it in two ways; you can run multiple variation regression models and see that this is not the case, and also the scale of the effect is very different, right? So there’s a very localized phenomenon. Okay, so another possibility is the accessibility to DNA repair. And here, what the hypothesis is, XPC in global genome repair is the large bulky complex, like DNase1, right? With footprint which is much larger than a distance between nucleosomes. So it has to work in -- with chromatinized DNA, and there is active mechanism to assist nucleic scission repair to work on chromatinized DNA. And if you look through experimental literature, the access of DNA repair to naked DNA is always much faster. So again, the idea is that global genome repair may work more efficiently in open DNA compared to chromatinized DNA and recruit the same nucleic acid scission repair machinery downstream. Now, even as bioinformaticists, we can test the hypothesis without running any experiments because cancer genomic data -- when you look at mutations, you have phenotype and genotype in the same dataset, right? So I have a phenotype, “What is the drop of mutation density in DNA’s hypersensitive regions?” and I have genotype of a tumor. And I have a hypothesis that nucleic acid scission repair is involved. So we can stratify all our melanoma samples into those where we do not see any change in nucleic acid scission repair -- which are marked green -- or samples where we do observe potentially deactivating mutation anywhere in nucleic acid scission repair pathway. And we see that there is statistically significant enrichment of samples with potentially deactivated nucleic acid scission repair among samples where the drop in mutation density is associated where chromatin accessibility is very small. We can further exploit the structure of the pathway because, if mutations deactivating nucleic acid scission repair happen downstream, and actual repair part of the pathway, then we should [unintelligible] both facts dependency of mutation density on transcription -- so correlation with expression level -- and correlation with chromatin. So as we see here, these three samples, for example, where mutations happen downstream, in these genes, in core repair part of the pathway, they have very small or no decrease in mutation density associated with either transcription or chromatin accessibility. Unfortunately, we had only one sample -- this is sample number four -- upstream specifically with mutations specifically in global genome repair. And this beats the hypothesis, but I probably wouldn’t really make very strong inference from a single sample. So, concluding this part of the talk, we think that mutation density -- what we think, we know -- we observed that mutation density is remarkably reduced in regulatory regions marked by DNase hypersensitive sides. And the fact is, like limited by global genome repair, as can be shown by association of this effect with presence of intact nucleic acid scission repair pathway in the same. Okay, so this is very focal. So what we learned so far -- we learned that mutation density in cancers is shifted towards later replicating regions, regions cancer don’t really -- doesn’t really need, because most of expressed genes and active elements are located in earlier replicating domains. We observed that mutation density in cancers is reduced in actively transcribed genes, in genes cancer needs, versus genes cancer doesn’t need. And we also learned that mutation density is reduced in actively [unintelligible] regulatory elements, right? So this is kind of the thing. So these are primarily observations especially on expression and DNase1 accessibility, with -- specifically within functional -- potentially functional elements. So what happens if we change resolution and we’ll look at the mega-base scale, and we use the data collected by the Epigenome Roadmap Consortium from multiple cell types and multiple epigenomic variables. So first, again, looking at -- looking at variation in DNase1 hypersensitivity, just density of picks per mega-base, versus number of mutations. Again, I use melanoma as an example and I use classic UBE-induced mutation density; there’s pretty good correlation. However, one interested feature we noted is the following. So I can look at three different skin cell types -- melanocytes, fiber blocks, and keratinocytes. And I see that there is decrease in mutation density associated with density of open chromatin regions in each of the three cell types. However, in melanocytes, this decrease is much more profound. Right? The correlation coefficient -- negative correlation coefficient is much greater. The general phenomenon, again, is that activating works are negatively related with mutation density, and repressive marks are positively related with mutation density; again, places where cancer doesn’t need functional genes to work have reduced density of mutations. And I’ll come back to that point. Now, back to specific cell types; so if I take, for example, mutations in liver cancers and information about non-methylation marks in liver and information about melanocytes -- and I would also look at melanoma mutations – what I observe is that, if I condition to the right cell type, the other cell type carries no information, right? So if I check liver cancer and melanoma and I check data on methylation in liver cells, hepatocytes, and melanocytes, if I would know about melanocytes, liver cells, and know information to mutation density in melanoma, if I would know about liver cells, melanocytes, don’t have any information to mutation density in liver cancer. Okay, so now these observations, they hint at the importance of features; they hint at multiple features; they hint at the importance of correct cell type. Now what are we going to do? We have highly dimensional dataset. Now for some reason, our projects involved in the study are like [unintelligible] progression, and I know there are many methods probably bioinformaticists in the room who like other methods, but I just follow projects in the study, so positive [unintelligible] selected random course regression for the analysis, much in learning method. So what you do, you throw everything into it and we show that we can actually predict the mutation density per mega-base with fairly remarkable accuracy not every cancer, but it’s -- over 80 percent of variants can be explained in whole bunch of cancer types. Now, because it’s random forest, you can look at the features that contribute to the exclusifier [spelled phonetically], and this is the pattern: so if we look at melanoma, I see some with the filial cells but most of the features come from melanocytes. If I move to liver -- and this is of course small chance of very large metrics like this, right? So I would look at what features significantly contribute to the predictor for liver cells and which features come from liver cells. Then I would look at colon cancer and there is the same match, multiple myeloma, and so forth. There is one cancer where it doesn’t work, and I think probably didn’t have the right cell type, is lung cancer. So lung cancer this trick didn’t work. Okay, now I can do the following trick: I can take all of my features and cluster them by gene. And I can look at, for which of the tissues collectively, what is the variants explained by the classifier if I take only the relevant cell type versus all of the relevant tissues and cell types? And again, for melanoma, I see that I can explain most of variation looking only at melanocytes. The effect is not as dramatic, but also I can select the right cell type in liver cancer, and so on. So, looking at this, what we decided to do, we decided to develop a simple classifier. So now we’re turning this on its head. So what I told you so far is this: there are regions of the genome where genes are expressed, where chromatin is active. These regions have less mutations than regions which are heterochromatic; later in replication; not associated with active chromatin and transcription. And I told you that, looking at epigenomic data, if you have the right cell type, you can actually predict a mutation profile over the mega-base. So now what we decided to do, we decided to turn it on its head because we can develop a predictor of cell type of origin of cancer from mutational data. So I look at the genome and I scan database of Epigenome Roadmap, and I’m trying to predict, what is the cell which is cell or origin of this cancer, right? Again, whenever I ran the true experiment taking tumors of a known primary, predicting and acting on them clinically, this wasn’t done. So what we did, we did very simple experiment. We took individual samples from our datasets and we developed a classifier again looking at significant features that explain variation of mutation-regular mega-base. And what we see for most of cancers, we predict with overall accuracy of 88 percent what is the right cell type. We did not predict lung cancer, as I mentioned; again, probably we don’t have the right epigenomic profile. There was almost an anecdote with esophageal cancer because the original cell type which the algorithm selected, we believed, is a false positive. But then, looking at the literature, we realized that these are exact cells that people believe give rise to esophageal cancer. So it lists -- with some reasonable accuracy, this trick works. Okay, so now there is an important question. The important question is, these are cells of origin, and we heard today about epigenomic modification due to cancer progression. This was my original thinking. This is this whole talk about failures of my original hypothesis, by the way. So my original thinking was the following: we observed that cancer avoids mutations in the regions it needs mutations. We know that this is determined by epigenomic profile. Now we can think about evolution of mutation rate, and this is what we’re doing on theoretical side of things, which I don’t have time to present. And you may think about the following idea, “Okay, so I -- cancer starts frequently at high mutation rate background, then mutations keep happening, and of course many of these mutations may potentially be deleterious for the tumor. There would be selection to suppress these mutations if you look at expression data. Both basic scission repair system and nucleic acid scission repair systems are overexpressed, like later melanoma compared to earlier melanoma. So I thought that this is active selection of mutation rate, right? To eliminate mutations where a tumor needs them.” So then we ask the following question. And we didn’t have plenty of data, but there are two cell types where we did have data. So we can take -- we can see how mutation densities predicted by epigenomic features of liver cells versus epigenomic features of liver cancer cells, right? And what we see is that we can predict much better using liver cells than liver cancer cells. In melanoma, there is even more interesting experiment because we take the same cell line, and we can see that all peaks in cell line don’t predict as well as all peak within melanocytes. But if we take specific cell line or specific to melanocytes, these are pretty much non-predictive, and melanocyte peaks that are not observed in cancer still predict mutation density. I found it very surprising. I think one possible explanation is a lot of mutations we observe in tumors actually arrives very early, before epigenomic changes associated with cancer. Okay, so I see John’s standing there, so I’m going to my conclusion slide. Basically, again, mutation density at one mega-base in cancer is very strongly associated with chromatin organization. This association is very highly specific with respect to cell of origin, and it looks like cancer genome has enough information about cell origins, so you can actually predict what is the cell of origin based on cancer genome. Thank you, my lab. So this is how seriously we think about our projects. Paz Polak, who recently left the lab, contributed to most of this. So he’s here, listed with the lab members, and of course thanks going to John Stamatoyannopoulos and Bob Thurman, and to Rosa Karlic and Amnon Kore, who were all collaborators. Thank you. [applause] Male Speaker: Fabulous. The thing that the tumors are actively going at silencing some of these mutations in order to transit from a normal state to a tumor state, if indeed the mutations are more likely to arise in the normal tissues than in active process? Sunyaev Shamil: So I’m a little bit in disarray with my thinking right now. So my original thinking was that, if you look at mathematical models of evolution of mutation rate, you find that, in a sectional systems, selectional mutation rate is much more efficient than in sectional systems. So in principle, cancer would have the ability to change mutation rate, especially if what we’re seeing is cell-type specific to silence mutation in regions where it needs. And I found this model intellectually pleasing; I don’t think this is what we’re observing. I think what we’re observing possibly is the very simple fact that most of cancerous clonal and most of these mutations possibly accumulated very early in, like, before cancer progression. But to tell you the truth, by now I don’t know. I don’t have any good model anymore. Male Speaker: Fantastic. So I was wondering -- you -- in the later part of the talk, you said the correlation with when you get the chromatin states from tissues versus cancers, the cancers that you show is cell-lines so is it -- is that -- would that be a factor that cell-lines are very selective and they probably have very selective chromatin states very different from what the original cancer would be. Sunyaev Shamil: Yeah, that’s -- Male Speaker: So the mutation rate would be better if you take directly cancer tissues than cancer cell-lines? Sunyaev Shamil: This may be the case. So in principle, if there is epigenetic control of mutation rate, I would be surprised that it would be different in cell-lines compared to cancers, but the observation is absolutely correct. So the main result on the paper were done on primary tumors, and the last couple slides were comparison with cell-line data. And we didn’t have matching datasets, so that’s of course a deficiency, but I do not see an obvious hypothesis why there would be a substantial difference, because cell-lines have been there for reasonably long time and if mutations are -- keep happening, and would be associated with epigenomics of cell-lines, we should observe it. Male Speaker: I have another question. It’s a very general question, so it’s been known in the field and very much propagated by followers for many years that the mutation rate is constant between cancer cells and normal mutation rate. So can you comment on that? What is it now, where does it stand? Sunyaev Shamil: It’s an interesting -- it’s a very interesting question. So I think there is disagreement within the field whether mutation rate is elevated during -- in cancer, or it’s not elevated. So people who believed that it is elevated, they point to A, a lot of mutator genes associated with cancer, both germ-line predisposition and these are earlier events in cancer. For example, we see a lot of samples in melanoma with changes in nucleic acid scission repair pathways. Theoretically, it fits very well because you would have changing mutator and would hitchhike with -- together with cancer drivers. Now there are people who don’t really believe that there is substantial difference, and especially if you look at mutation density. If a lot of these events happen early, people point to dependency on age of diagnosis and this type of observations. I don’t have a strong opinion either way; I find arguments of increased mutation rate very logical, and also I’m happy to live in the world where it’s grey in some cases, especially where you have mutator mutations. Mutation rate may be elevated, and in other cases maybe it’s the same. You just hit randomly driver gene. Male Speaker: Thanks. [applause] [end of transcript]
B2 中上級 米 遺伝学のエピジェネティック制御:エピゲノムが突然変異に与える影響 - Shamil Sunyaev (Epigenetic Control of Genetics: the Impact of Epigenome on Mutation - Shamil Sunyaev) 132 10 Chou Jasper に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語