EpiQuest - M

M stands for mutability. Current matrix employed by the progam allow to predict the potential for mutation of various molecule regions on the basis of neutral theory.

Introduction

Introduction

EpiQuest-M is our first software predicting the stability or enhanced mutagenicity of the domains in protein sequences.

The stability of the amino acid's specific position in a sequence is defined by many factors, such as immune response pressure, gene drift in restricted populations, and so on.

At the moment, EpiQuest-M is only offering one stability values matrix based on the frequencies of spontaneous mutations of the particular position of the protein sequence.

Algorithm

EpiQuest-M analyses the linear protein sequence looking for clusters of amino acids/domains that are more/less prone to changes in a process of protein evolution. The Matrix S1.3 is based on the relative probability of an amino acid being replaced by another (related or not) due to occasional point mutations (neutral theory of evolution, Kimura, 1991). It is also based on the probabilities of various amino acids being replaced by another, reported for mitochondrial proteins by Adachi & Hasegava (1992). Therefore, it does not reflect a variability in the specific region of proteins caused by, for example, a selective pressure of the immune system, but rather demonstrates overall potential for stability of variability.

The algorithm, however, takes into account the fact that in more complex regions the mutations of even often replaced amino acids are lower due to their structural significance, and, on the overall gives the good base for the analysis of changes in protein sequences for reasons other than occasional mutations (see below).

References:

Adachi, J., and Hasegawa, M. (1996). Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol. 42, 459–468.

Kimura, M. (1991). The neutral theory of molecular evolution: A review of recent evidence. The Japanese Journal of Genetics 66, 367–386.

Defining stable regions

One use of the program is to define the regions that are supposed to be stable and less influenced by occasional mutations in virus proteins. Below is an example of analysis for influenza virus Hemagglutinin (HA1). With a relatively wide analysis frame (giving you the overall view) you can see the regions that are less likely to be mutated in the sequence.

EpiQuest-M,virus mutation,protein mutation,aminoacid mutation,sequence based mutation prediction,site mutation,point mutation,mutation prediction,virus mutation prediction

We have compared the predicted stability of the 6-mers (overlapping by 1 amino acid), covering the entire length of the HA with actually observed variability of amino acids in the same 6-mers. The variability was calculated as a sum of the % of amino acid replacements as compared to the more widely occurred variant (for clarity, if in 6 mers the replacements were observed in position 1 in 10% or sequences, in position 2 - at 40%, 3- 1%, and in positions 4,5,6 - 10% in each, then the cumulative mutations in the 6-mer will be 81%).

The 6-mers were divided by quartiles according to their relative stability.

As can be seen, the predicted level of variability for quartiles 1, 2, and 3 differs from the variability predicted on the basis of spontaneous mutations, which suggests strong selective pressure making the new variants to appear with noticeable frequency.

At the same time, for Quartile 4 (the 6-mers with the highest predicted level of stability) the predicted and actual level of mutations correlates quite well. This approach allows the user to identify the sequences, in a virus protein, that are likely to be more stable in different variants or the virus.

Stable regions and evolution

Stability & Evolution

When studying the changes of the families of proteins (their subfamilies and the sequence variability among different species), it is useful to isolate the regions that (based on neutral theory) should have few occasional mutations. In the image below, we show stability profiles for Cadherin 1 from two remote species, human and frog. While variability in other regions may occur due to occasional mutations, changes in potentially stable regions must have functional value, that can be investigated.

You can increase or decrease the threshold of what can be considered as "stable" regions, depending on the variability of the sequences, and the particular question at hand.

Building the distance matrix for “stable regions” may give you quite a different picture of the sequence divergence in the process of evolution than comparing the overall sequences.

One may notice that both proteins have a quite identical general pattern of stable and variable regions, which gives a solid base to view functionally important and variable regions of the molecule (creating a reference map for this particular type of proteins.)

You are also advised to look at the complexity profile of the proteins when addressing such issues. The complexity maps may be build using EpiQuest-C.

EpiQuest Suite and site www.epiquest.co.uk belongs to Aptum Biologics Ltd.

EpiQuest® is a registered Trademark of Aptum Biologics Ltd.