Comparisons of amino acid compositions between proteins x and y.

The vector x contains 20 numbers corresponding to the number of times each amino acid occurs within protein x and similarly the vector y contains the number of times each amino acid occurs within protein y.

In the following equations xi is the ith component of vector x and yi is the ith component of vector y. xl and yl correspond to the lengths of proteins x and y respectively. xli and yli correspond to the frequencies of the ith aminoacids normalised by the lengths of proteins x and y respectively.

The similarity in amino acid composition between x and y was taken from the following equations.
 

One minus Euclidean distance:
 
 

(1)
 
 

One over standardised Euclidean distance:
 
 

(2)

where = standard deviation for each amino acid i.

One minus cosine theta distance (normalised Euclidean distance):
 
 

(3)
 
 

where and
 
 

One minus Manhattan metric distance:
 
 

(4)
 
 
 
 

One minus relative entropy
 
 

(5)
 
 

Pearson's Correlation coefficient:
 
 

(6)