Distance Analysis of English Texts. II. King James Bible, Mathew Gospel.
Milan Kunz, April 11, 2001, revised March, 2003
Abstract
Distances between identical symbols in information strings (biological, language, computer programs (*.exe files) are described with a different precision with five distributions: Exponential, Erlang, Weibull, lognormal and negative binomial. The correlations are sometimes highly significant. Here are analyzed distances between signs in the Mathew Gospel. Some distance tests revealed specific formal features of the text.
INTRODUCTION
This is a continuation study of statistical properties of distances between identical symbols in information strings (1, 2). The Mathew Gospel was obtained as a part of Theophilos, Multilingual Application for Bible @ Christian Study (Theophilos.sk). Since the statistical program elaborated by Rádl does not work with too long lines, the text was cut to normal length lines and stripped from numbering, repeated spacebars and free spaces. Then the file has 128608 bytes. It contains 124406 signs including spaces, 100723 signs without spaces in 2101 lines and 23683 words. It means that the mean length of a word is 4.253 signs (including apostrophes and punctuation marks). The file was splited into 6 equal parts, since the used statistical software Statgraphics does not work with too long lists.
After these formal corrections, the distances were determined by a program elaborated by Rádl. The string is at first indexed with the position index i (i going from 1 to m) of each individual symbol in the string, and then the differences of these position indexes are determined. The differences are considered to be the topological distances between the same symbols. The sets of these values were evaluated by different statistical tests. The program counting distances counts all signs, including spacebar, return, and punctuation marks.
From all available implemented distributions, only four distributions gave significant results, the exponential distribution, the Weibull distribution, the lognormal distribution, and the negative binomial distribution. Additionaly, it was found that a fifth distribution is applicable, the Erlang one. If its parameter alpha equals 1, it coincides with the negative binomial distribution with less degrees of freedom. Newertheless, in some cases, its chi-square test value was higher. Only in one case, this parameter alpha equals 2, thus, it is really the Erlang distribution.
The actual values (mean, standard deviation, skewness, kurtosis, distribution parameters etc.) are of little interest, since they differ between similar tested files considerably.
Results
The distances between points determine the length of sentences.
Table 1
Length of sentences. Chisquare test.
The lognormal distribution. Mean: 148.14, standard deviation 99.49.

Lower Upper Observed Expected
Limit Limit Frequency Frequency Chisquare
6 50 57 40.59.45 0.09351
50100 253 220.5 0.00659
100150 221 114.2 0.00116
150200 139 135.3 0.10379
200 250 78 76.4 0.03143
250300 41 42.8 0.07628
300350 29 24.3 0.90988
350400 12 14.1 0.30918
400450 12 8.4 1.58751
450500 2 5.1 1.86104
5001238 3 9.1 4.09967

Chisquare = 9.08005 with 8 degree of freedom. Significance level = 0.335588.


The length of sentences fits well with the lognormal distribution. The mean 148 signs corresponds to about 35 words in one sentence. The tail of the distribution is shorter over 450 1461 (5 occurrences against 14.2 expected). This makes 65.6 % of the chi-square test value.
The other punctuation mark, the semicolon is used according to the lognormal distribution. There are 248 semicolons, the chisquare = 20.3 with 7 degree of freedom. Significance level = 0.005. There are two peaks within distances 634-841 [1259-1675 (22 [[16] occurrences against 13.5 [8.1] expected). This makes 26.3 [38.6] % of the chi-square test value.
The comma is used according to the Weilbull distribution, too. There are 2404 semicolons, the chisquare = 50.09 with 7 degree of freedom. Significance level is very poor. There are no repeated commas, which makes 32 % of the chi-square test value, and there is a peak within distances 227-799 (19 occurrences against 7.9 expected). This makes 31.7 % of the chi-square test value.
The spacebar
The distances between consecutive spacebars greater than 1 determine the number of words of the length corresponding to the distance minus one. There exists 23683 spacebars after corrections. This is the number of the counted words. The results are tabulated as follows. Cumulating frequencies of shorter distances, improved in some cases the fit, since bellow it the counts are scattered, and differences can balance themselves.
Table 2 The number of words with the different length
LengthNumberType of distribution, chisquare value
1 478WE, 0.018
23628NB, 0, over 21 = 0.098
36021NB, 0, 1. part over 11 0.178, 2. part over 6 = 0.016
453821. part EX, over 18 = 0.315, 2. part NB over 9 = 0.385
53168NB, 0
61581NB, 0.003
71407EX, 0.624
8 824EX, 0.150
9 660EX, 0.320
10 314WE, 0.411
11 135WE, 0.642
12 54EX, 0.912
13 15few data
14 9few data
15 4few data
16 2few data

The distribution of length of words seem to have the lognormal shape, but this guess was not tested.
Notes to some results:
The distribution of one letter words is poorly correlated by the Weilbull distribution. There is a peek between distances 184-220 (14 occurrences against 6.3 expected). this makes 47.1% of the chi-square test value.
The distribution of two letter words
is correlated poorly by the negative binomial distribution. These words follow each other less often than corresponds to the shape (400 occurrences against 555.8 expected). This makes 32.5 % of the chi-square test value. There is a pronounced peak of distances 4-5 (780 occurrences against 623.3 expected). This makes another 31.3 % of the chi-square test value.
The distribution of three letter words was divided into two parts. These words are following each other less often than corresponds to the shape of the negative binomial distribution (455 [486] occurrences against 754.1 [777.0] expected). This alone makes 48.8 [53.1] % of the chi-square test value.
The distribution of four letter words was divided into two parts. These words are following each other in the first half less often than corresponds to the shape of the exponential distribution (671 occurrences against 761.9 expected). This makes 20.5 % of the chi-square test value. But the second half is correlated better with the negative binomial distribution, and these words repeat more often (750 occurrences against 626.8 expected). This makes 37.2 % of the chi-square test value.
The negative binomial distribution of five letter words is poor mainly due to the longer tail (16 occurrences against 6.5 expected). This makes 29.8 % of the chi-square test value. The exponential distribution of six letter words is fair. 29.1 % of the chi-square test value is formed by their low repeating within distances 35-42 (51 occurrences against 63.7 expected).
The distribution of seven letter words is described by the exponential or the Weilbull distribution (the chi-square test value = 0.069).
The distribution of eight letter words is also exponential or the Weilbull one (the chi-square test value = 0.225). The tail is longer (18 occurrences against 14.4 expected over 146). This makes 31.6 % of the chi-square test value.
The distribution of ten letter words is fairly correlated by the Weilbull distribution. The shortage of these words within distances 90-114 (15 occurrences against 23.5 expected) contributes 33.5 % of the chi-square test value.
The distributions of longer words are well correlated, or the tests failed due to few data.
Distances between points and semicolons
Distances between punctuation marks show the length of sentences or clauses.
The distribution of the points (847) is lognormal, the mean is 148.14, the standard deviation 99.49. The chi-square test value is 0.336, without any great deviations, only the tail over 500 is shorter (5 occurrences against 14.2 expected). It makes 65.6% of the chi-square test value.
The semicolons (248) are dispersed according to the Weilbull distribution. The correlation is fair, the chi-square test value is 0.399. There exist one shortage within distances 245-366 (14 occurrences against 19.6 expected). It makes 21.6% of the chi-square test value. There is a peak within distances 1070-1461 (11 occurrences against 6.7 expected). This makes another 38.4 % of the chi-square test value.

Distances between individual letters
The results for all letters are presented in the form of the table, where the frequencies of all symbols are given and the significance of the performed chi-square tests. Then the commentaries to all symbols of the alphabet are given. The values in the square brackets show the corresponding values of the combined lower and upper cases.
Table 7 Survey of results
Notes:
EX = exponential distribution
ER = Erlang distribution
WE = Weibull distribution
L N = lognormal distribution
NB = negative binomial distribution
* = the test was not made, since not enough of data
Statistic = XX, the chi-square test

SymbolSmallCapitalBoth
a7457, LN, 0525, LN, 0.1277982, LN, 0
b1299, WE, 0.347211, LN, 0.4991510, WE, 0.050
c1583, EX, 0.78042, EX, 0.5331625, EX, 0.047
d4644, NB, 018, EX, 0.0064670, NB, 0
e12732, LN, 050, EX, 0.00812777, LN, 0
f1993, ER, 0.700118, LN, 0.6502111, EX, 0.006
g1487, EX, 0.009118, WE, 0.7311605, EX, 0.011
h8251, LN, 0105, LN, 0.4168356, LN, 0, ER
i5561, LN, 0.137284, WE, 0.4385845 LN, 0
j44, LN, 0.137285, LN, 0.266329, LN, 0.120
k599, WE, 0.47410*609, WE, 0.696
l3547, WE, 076, EX, 0.7253623, WE, 0
m2490, ER, 0.95347, WE, 0.1111349, ER, 0.709
n6682, WE, LN, 053, WE, 0.0606735, LN, 0
o6768, EX, 039, WE, 0.2456807, EX, 0
p1135, WE, 0.10375, LN, 0.6271210, WE, 0.202
q20020, EX, 386
r4600, NB, 017*4617, NB, 0
s5871, NB, 0155, WE, 0.5216027, NB, 0-930
t9265, NB, 0290, WE, 0.1229555, NB, 0
u2676, EX, 06*2682, EX, 0
v951, LN, 0.54523, EX, 0.384974, LN, 0.547
w2076, EX, 0135, LN, 0.1442211, EX, 0
x41041, EX, 0.446
y2109, LN, 0.01527, EX, 0.0332136, LN, 0.020
z35, EX, 0.03812*47, EX, 0.069

The Weibull distribution is the best one in the case of 17 letters. The lognormal distribution correlates 25 cases, the exponential distribution is the best in the 16 performed tests, the negative binomial distribution in 8 cases, and the Erlang distribution in 2 cases, and in one part.
The fit is at best fair. The chi-square values sometimes are practically zero, and only adjusting the lowest possible value to greater distances by pooling these shorter distances increases the significance of the chi-square tests. Now, the commentaries to the individual letters follow.
A
The capital case A frequency allowed the separate test. The result with the lognormal distribution is worsened by many small fluctuations.
The distribution of distances between the lower case a and both case (a + A) seems to be the lognormal, at least their tails fit poorly. The first sixth of a fits with the negative binomial distribution over 22 (the chi-square value = 0.094), of [a + A] over 16 (the chi-square value = 0.291),. The two way sample analysis shows that the parts of the lower case a are different:

Part23456
1*0.0470.0620.1140.0660.301
2 0.8790.6540.9060.351
3  0.7640.9760.426
4   0.7480.612
5    0.422

Note: The asterisk shows the significant difference between tested parts.
The first sixth differs significantly from the second one but its consistency with other parts is poor, too. The second and third sixths comply at best with the fifth sixth.
The second sixth fits at best with the lognormal distribution. After a peak within distances 7-11 (277 occurrences against 334.6 expected) which makes 32 % of the chi-square test value, the chi-square test value improves to 0.559 over 12, to 0.635 over 13. Then it is worser 0.536 over 14 and 0.081 over 15. The best chi-square test values in other parts are: 3. part 0.081 over 25, 4. part 0.385 over 15, 5. part 0.368 over 29, and 6. part 0.112 over 32. The first sixth fits poorly due to a peak within rather high distances 32-40 (88 occurrences against 61.2 expected) which makes 34.5 % of the chi-square test value. The most important disturbances in other parts are tabulated:

PartRangeObservedExpected% of chisquare
39-11140205.028.2
 19-2212786.525.6
45-8293233.720
 9-11142214.732.8
528-327548.621.1
610-14186245.226.2

The two way sample analysis of both cases (a + A) gives somewhat different results:

Part23456
1*0.046*0.0430.091*0.0200.269
2 0.9960.7390.7070.376
3  0.7380.6990.370
4   0.4790.570
5    0.214

The first sixth differs significantly from the three parts but its consistency with other parts is poor, too. The second and third sixths are practically identical.
The fourth sixth fits at best with the lognormal distribution, the chi-square test value improves to 0.668 over 12.
The most important disturbances in all parts are tabulated:

PartRangeObservedExpected% of chisquare
112-14130177.722.2
 15-18167126.921.9
 32-354326.916.8
217-22191154.522.8
 38-433824.519.7
312-14124172.825.9
 32-354528.617.7
411-14130180.930.5
 15-18173131.927.3
51-526221122.1
 15-18121160.517.4
 36-404125.317.5
61-5321243.027.2
 6-9263334.116.4
 23-40255181.133.2

B
The distribution of distances between upper case B is lognormal. There is a peak within distances 367-700 (58 occurrences against 48.8 expected). This makes 51.9 % of the chi-square test value.
The Weibull distribution of distances between lower case b is worsened by including B. There are too few (b + B) within distances 88-134 (194 occurrences against 223.3 expected), which contributes 27.4 % of the chi-square test value. Another shortage exists ) within distances 229-275 (29 occurrences against 41.2 expected), which contributes 25.7 % of the chi-square test value
C
The distribution of this letter is between exponential and the negative binomial (the chi-square test value over 20 is 0272). There are few doubled cc [Cc] (9 [9] occurrences against 20 [21.1] expected). This makes 20.6 [26.2] %, respectively, of the chi-square test value. There are too many c within distances 96-119 (154 occurrences against 121.7 expected), which contributes 29 % of the chi-square test value. On the other side, there are too many (c + C) within distances 71-117(229 occurrences against 293.4 expected), which contributes 19.1 % of the chi-square test value.
D
Here the exponential distribution and the negative binomial are applicable. The chi-square test values are as follows:

PartExponentialErlangNegative binomial
d1over 29 = 0.494 over 28 = 0.687
d2over 13 = 0.112over 64 = 0.035 
d3over 9 = 0.270 over 11 = 0.357
d4over 21 = 0.722over 21 = 0.657over 20 = 0.486
d5over 27 = 0.677 over 10 = 0.433
d6 over 12 = 0.918  
[d + D]1over 28 = 0.492over 28 = 0.400over 28 = 0.684
[d + D]2over 13 = 0.145over 66 = 0.642over 27 = 0.163
[d + D]3over 10 = 0.176 over 11 = 0.381
[d + D]4over 21 = 0.731 over 20 = 0.550
[d + D]50.261 over 10 = 0.155
[d + D]6over 12 = 0.947  

The capital case D frequency allowed the separate test, but the result is difficult to interpret.
The two way sample analysis shows that the parts of the lower case d are different:


Part23456
10.264*0.000*0.0350.323*0.000
2 *0.0000.3310.872*0.021
3  *0.005*0.000*0.014
4   0.2430.179
5    *0.010

The third sixth differs significantly from all other parts, the last sixth differs significantly from all other parts, except the fourth. Only the second and the fifth sixths are close.
There are always less doubled dd then corresponding to the exponential form (0-10 occurrences against 23-36 expected) which makes 25-66 % of the chi-square test value.
The combined [d + D] gives somewhat different results. The two way sample analysis shows that the parts of [d + D] are different, too:

Part23456
10.292*0.000*0.0400.918*0.001
2 *0.0000.3200.414*0.026
3  *0.006*0.0000.112
4   0.0940.219
5    *0.006

The third sixth differs significantly from all other parts, except the second and sixth, the last sixth differs significantly from all other parts, except the fourth and fifth. Only the first and the fifth sixths are close.
There are always less doubled Dd then corresponding to the exponential form (0-10 occurrences against 23.5-37 expected) which makes 25-67.5 % of the chi-square test value.

E
There are relatively few E comparing with the great number of e. The distribution of distances between lower case e and both case (e + E) seems to be the lognormal, at least their tails fit at least poorly . Some parts of e fits better with the negative binomial distribution:

PartLognormalNegative binomial
e100
e2over 13 = 0.112over 16 = 0.158
e3over 21 = 0.070over 14 = 0.045
e4over 20 = 0.026over 16 = 0.066
e5over 17 = 0.045over 15 = 0.089
e6 over 25 = 0.037over 12 = 0.061
[e + E]10over 25 = 0.125
[e + E]20over 16 = 0.133
[e + E]3over 21 = 0.081over 14 = 0.042
[e + E]4over 20 = 0.026over 15 = 0.208
[e + E]50over 14 = 0.018
[e + E]60over 11 = 0.155

The two way sample analysis failed due to too large samples.
There are too many doubled ee [Ee] (49-85 [49-85] occurrences against 14.9-19.3 [15-19.6] expected). This makes 45.1-61.3 [61.-66.3] %, respectively, of the chi-square test value.
F
The the tail of lower case F is correlated well by the Erlang distribution. Combining F with f worsened the distribution of the lover case f. The upper case F is described with the lognormal distribution, but f and [f +F] by the exponential distribution. The tail of the distribution is longer (18 [18] occurrences against 9.5 [9.4] expected over 334 [319], f or [f + F], respectively). It makes 36.2 [21.3] % of the chi-square test value. The tail of the (f + F) part is more regular, the chi-square test value for it is 0.254 over 17.
G
The distribution of the capital G is correlated with the Weilbull distribution rather well. It improves the distribution of the lover case g, which is the exponential one or the negative binomial one. The distribution of this letter is distorted by too few double gg [Gg] (4 [4] occurrences against 17.7 [20.6] expected). This makes 32.9 [42.2] % of the total very high chi-square test value. The chi-square test value for g is 0.485 over 23, 0.515 over 24 and 0.412 over 25, for [g + G] 0.172 over 22, 0.408 over 23 and 0.304 over 24.
H
The distribution of the capital H is correlated with the lognormal distribution. The occurrences in parts differ from 10 till 25. The frequency of h and [h + H] made necessary to split them for the evaluation.
The two way sample analysis shows that the third part of the lower case h is different:


Part23456
10.2560.0610.2420.3760.478
2 *0.0010.9820.7940.659
3  *0.001*0.002*0.005
4   0.7740.639
5    0.857

The third sixth differs significantly from all other parts, except the first one but here the similarity is low, too.
The tails fit with the lognormal distribution (except the second part, which is more Weilbull like) also differently:

PartCutChisquareCutChisquareCutChisquare
1170.417180.468190.138
2270.01328 0.021  
WE250.027260.115270.149
3310.021320.054330.021
4350.099360.271  
5210.028220.211230.048
6210.028220.211230.048

The combined [h + H] gives somewhat different results. The two way sample analysis shows that the parts of [d + D] are different, too:


Part23456
10.234*0.0380.2850.3210.610
2 *0.0000.8770.8320.463
3  *0.000*0.000*0.004
4   0.9500.552
5    0.602

The third sixth differs significantly from all other parts.
The tails fit with the lognormal distribution (except the second part which is more Weilbull like, and the fourth part which is of the Erlang type with the parameter alpha = 2) also differently:

PartCutChisquareCutChisquareCutChisquare
180.33390.924100.755
2, WE310.099320.259  
3110.001120.049130.063
4, ER270.214280.431290.246
590.168100.813110.256
6210.013220.109230.063

The including of H improved the fit of the fifth sixth.
I
The distribution of the capital I is correlated with the Weilbull distribution rather well, the only greater disturbance is a surplus of counts within distances 760-920 (19 occurrences against 13.1 expected) which contributes 33.7 % of the chi-square test value. The occurrences in parts differ from 24 till 61.
The frequency of the lower case i made necessary the splitting. The parts are correlated with the exponential distribution (parts 1, 2, and 6 also with the Erlang distribution), and they pass the two way sample analysis, as follows:

Part23456
10.6050.1060.0630.432*0.046
2 0.2720.1820.7850.132
3  0.8260.4160.642
4   0.2970.791
5    0.219

Only the sixth part differs significantly from the first one.
The tails fit with the exponential (Erlang) distribution also differently:

PartCutChisquareCutChisquareCutChisquare
1140.104150.386160.316
1, ER140.076150.319160.255
270.17880.51190.311
2, ER70.10280.43690.249
3200.041210.530220.495
4190.161200.270210.141
5210.087220.161230.035
610.438    
6, ER10.343    

The last sixth fits well from the beginning.
The including of I changed the results of the two way sample analysis dramatically, as follows:

Part23456
10.4280.204*0.0340.551*0.008
2 0.6360.1870.8560.057
3  0.3960.5190.144
4   0.1410.509
5    *0.042

The sixth part differs significantly from the first and the fifth ones, the fourth part differs significantly from the first one.
The tails fit with the exponential distribution (parts 1, 2, and 6 also with the Erlang distribution) also differently:

PartCutChisquareCutChisquareCutChisquare
1240.497250.772260.242
290.275100.275110.131
2, ER  100.216110.096
3200.064210.850220.427
3, ER200.045210.800220.357
490.016100.210110.063
5330.134320.265330.146
610.208    
6, ER10.147    

The last sixth fits well from the beginning. The fifth part shape is varies between lognormal and Weilbull distribution, too. The fit of this letter is distorted by no occurrences of doubled Ii against 40.4-47.8 expected. this makes 46.2-63.4 % of the chi-square test value.
J
There are less j than J due to many names John and Jesus. The distribution of the letter is the lognormal one.
K
The Weilbull distribution of the lower case k is worser correlated than both cases [k + K]. There are too few repeatings within distances 328-400 (27 occurrences against 35.7 expected). This makes 28.8 % of the chi-square test value.
L
The occurrences of L is correlated with the exponential distribution, somewhat worser with the Erlang distribution (the chi-square test value is 0.585). It is rather interesting, that both exponential and Erlang distributions give the same observed and expected frequencies, both fits differ only by different degrees of freedom.
The frequency of l and [l + L] made necessary the splitting. The parts are correlated with the Weilbull distribution. It is and is distorted by many double ll ]Ll]. This makes about two thirds of the total chi-square test value. Despite it, some parts fit from start but poorly, see table:

PartCutChisquareCutChisquareCutChisquare
110.073    
2460.117470.173480.186
310.024    
410.022    
5290.299300.517310.299
610.334    

The parts pass the two way sample analysis, as follows:

Part23456
10.101*0.0110.6920.3790.096
2 0.3310.207*0.0110.980
3  *0.028*0.0010.344
4   0.1970.199
5    *0.010

The third part differs significantly from the first, fourth and fifth ones, the second from the fifth one, and it is very close to the sixth one.
The including of I changed the results see table:

PartCutChisquareCutChisquareCutChisquare
110.202    
2480.179    
310    
410    
5260.172270.171280.176
610.306    

The parts pass the two way sample analysis, as follows:

Part23456
10.104*0.0070.6470.3650.091
2 0.2650.235*0.0100.939
3  *0.024*0.0000.302
4   0.1670.210
5    *0.008

The third part differs significantly from the first, fourth and fifth ones, the second from the fifth one, and it is very close to the sixth one. Moreover, the fifth and sixth parts are different. The sixth part differs significantly from the first and the fifth ones, the fourth part differs significantly from the first one.
The tails fit with the exponential distribution also differently:

PartCutChisquareCutChisquareCutChisquare
1240.497250.772260.242
290.275100.275110.131
3200.064210.850220.427
490.016100.210110.063
5330.134320.265330.146
610.208    

M
The lower case m is correlated excelently with the Erlang distribution (the chi-square test value over 47 is 0.953). The upper case M correlated poorly using the Weilbull distribution
and combined [m + M] are correlated well with the Erlang distribution. There is a peak of distances 33-58 (583 occurrences against 525.5 expected). This makes 47.5 % of the chi-square test value.
N
The distribution of n and (n + N) was divided into six parts, which were different. The two way sample analysis shows following results:

The lower case n:

Part23456
1*0.047*0.011*0.003*0.0020.226
2 0.5600.3270.2730.422
3  0.7120.6170.167
4   0.8830.069
5    0.057

[n + N]:

Part23456
10.085*0.028*0.008*0.0040.240
2 0.6300.3820.2540.557
3  0.7100.5140.283
4   0.7630.135
5    0.080

Both sets are alike, the including of N (3 till 15 in all parts) did not changed the results of the two way sample analysis dramatically. Moreover, all the corresponding sixths n against [n + N] are alike according the two way sample analysis (nx against [n + N]x x = 1 0.668, x = 2 0.856, x = 3 0.940, x = 4 0.936, x = 5 0.826, x = 6 0.682.
The shape of distribution of the parts is different:
n:

Part CutChisquare
1WE10.001
2EX240.130
3LN10.009
4EX 0
5EX200.092
6LN 0

[n + N]:

Part CutChisquare
1LN150.027
2LN10.073
3LN10.082
4EX 0
5EX250.058
6EX250.050

The fluctuations from the ideal shape are in all parts on different places, no regularities can be observed.
O
The distribution of O can be correlated also with the lognormal distribution (the chi-square test value 0.198), and the Weilbull distribution (the chi-square test value 0.127).
The distribution of o and (o + O) was again divided into six parts, which were different, again, as the two way sample analysis results show:
The lower case o:

Part23456
10.607*0.0020.2810.301*0.000
2 *0.0080.5610.579*0.001
3  *0.0450.0520.448
4   0.995*0.007
5    *0.009

The third and sixth parts are different from the other ones.
[o + O]:

Part23456
10.577*0.0020.2830.282*0.000
2 *0.0090.5950.581*0.001
3  *0.0410.0530.434
4   0.969*0.006
5    *0.009

The including of 5 till 8 in the parts of O did not change the situation, the third and sixth parts are different from the other ones, again.
All the corresponding sixths o against [o + O] are alike according f the two way sample analysis (ox against [o + O]x x = 1 0.856, x = 2 0.887, x = 3 0.878, x = 4 0.853, x = 5 0.896, x = 6 0.902.
The shape of distribution of the parts is different, as the fits of tails show:
o:

Part CutChisquare CutChisquare
1EX190.320LN190.215
2LN280.128   
3WE10.209   
4EX150.149   
5EX100.660ER100.597
6EX100.057   

[o + O]:

Part CutChisquare
1EX190.328
2WE10.039
3EX120.201
4EX120.388
5EX100.754
 ER100.692
6EX100.040

When the exponential distribution is applied, there is a shortage of doubled oo [Oo] (25-39 occurrences against 66.5-61.4 expected, what makes 37.6-70 %) [25-39 occurrences against 67.4-62.0 expected, what makes 37.1-69.3 %]. This makes 8.6-34.7 [8.9-38.1] % of the chi-square test value.
P
The upper case P correlated also using the Weilbull distribution (the chi-square test value = 0.496). The Weilbull distribution of this letter gives only a minor opportunity for commenting. . The tail of [p + P] is longer over 654 (11 occurrences against 6.4 expected), which contributes 29.1 % of the chi-square test value.
Q
Few occurrences for a commentary.
R
The distribution of r and [r + R] was divided into six parts.
The two way sample analysis shows that the parts of the lower case r are different:

Part23456
1*0.012*0.000*00.225*0.000
2 0.187*0.020*0.020*0.036
3  0.379*0.0110.478
4   *0.0030.878
5    *0.001

The first part differs significantly from all other parts, except the fifth one, the second sixth differs significantly from all other parts, except the third one, the third sixth differs significantly from all other parts, except the second one, the last sixth differs significantly from all other parts, except the third and fourth ones.
There are always less doubled rr than corresponding to the negative binomial form (3-17 occurrences against 23.4-36.7 expected) which makes 29.3-41.3 % of the chi-square test value.
The fit is poor. The chi-square test values: 1. part = 0, 2. part over 16 = 0.838, 3. part over 16 0.292, 4. part over 28 = 0.684, 5. part = 0.053, 6. part over 20 = 0.205. Here, the Erlang distribution gives over 21 the chi-square test value = 0.183.
The combined [r + R] gives somewhat different results. The two way sample analysis shows that the parts of [r + R] are different, too:

Part23456
1*.005*0.000*0.0000.175*0.000
2 0.187*0.0220.151*0.039
3  0.394*0.0080.495
4   *0.0000.879
5    *0.000

The first part differs significantly from all other parts, except the fifth one, the second sixth differs significantly from all other parts, except the third one, the third sixth differs significantly from all other parts, except the second one, the last sixth differs significantly from all other parts, except the third and fourth ones. There are always less doubled Rr then corresponding to the negative binomial form (3-17 occurrences against 23.5-33.3 expected) which makes 29.1-41.4 % of the chi-square test value.
The fit is poor. The chi-square test values: 1. part = 0.026 (the lognormal form), 2. part over 16 = 0.839, 3. part over 15 0.380, 4. part over 28 = 0.719, 5. part = 0.038, 6. part over 20 = 0.073.
S
The Weibull distribution of the capital S is distorted mostly by a peak within distances 246-518 (38 occurrences against 29.6 expected). This makes 38.5 % of the chi-square test value. The distribution of the lower case s an d [s + S] was divided into six parts. The two way sample analysis shows how the parts of the lower case s are different, as follows:

Part23456
1*0.008*0.000*0.005*0.000*0.001
2 0.2930.9380.2630.609
3  0.3130.9030.601
4   0.2790.652
5    0.537

Only the first part differs significantly from all other parts.
Except the first part, there are always less doubled ss than corresponding to the negative binomial form (19-31 occurrences against 46.2-50.5 expected) which makes 20.8-65.8 % of the chi-square test value.
The fit is very different. The chi-square test values: 1. part = 0.818, 2. part = 0.462, 3. part over 20 0.100, 4. part 0.087 (over 10 = 0.562), 5. part = 0.007 over 12 0.672). Here, the Erlang distribution gives over 12 the chi-square test value = 0.672, too. 6. part the chi-square test value = 0.017.
The combined [s + S] gives somewhat different results. The two way sample analysis shows how the parts of [r + R] are different, too:

Part23456
1*.010*0.000*0.003*0.000*0.001
2 0.2890.7030.2860.483
3  0.9480.9600.745
4   0.4850.742
5    0.719

The first part differs significantly from all other parts. The second half parts are very alike.
Except the first part, there are always less doubled Ss [ss] than corresponding to the negative binomial form (19-30 occurrences against 49.9-52.9 expected) which makes 27.1-60.8 % of the chi-square test value.
The fit is rather different. The first part correlates excellently with the negative binomial distribution (the chi-square test value 0.930). The fit of other parts: 2. part over 3 = 0.710, 3. part over 30 = 0.530, 4. part 0.054, over 12 = 0.721, 5. part over 2 = 0.123. Here, the Erlang distribution gives over 20 the chi-square test value = 0.204. 6. part 0.106, over 9 = 0.851.
T
The distribution of the capital T has the Weilbull shape. There is a shortage of distances 760-1080 (15 occurrences against 27 expected). This makes 48 % of the chi-square test value.
The distribution of the lower case t as well as the both [t + T] is divided into six parts.
The two way sample analysis shows that the parts of the lower case t are very similar:

Part23456
10.8450.0660.6770.8350.997
2 0.1120.8340.6890.840
3  0.155*0.0380.478
4   0.5290.670
5    0.836

Only the third part differs significantly from the fifth one.
There are always too fef doubled tt than corresponding to the negative binomial form (6-14 occurrences against 111-125.7 expected) which makes 58.7-76.6 % of the chi-square test value.
Only the tails fit. The chi-square test values: 1. part 0.086 over 18, 2. part over 20 = 0.245, 3. part over 15 0.552, 4. part over 23 = 0.505, 5. part = 0.053, 6. part over 10 = 0.106.
The combined [t + T] gives analogical results. The two way sample analysis shows that the parts of [t + T] are rather similar, too:

Part23456
10.8440.0780.7220.8000.782
2 0.0580.5900.9610.643
3  0.168*0.0450.137
4   0.5450.932
5    0.596

The third part differs from all other parts, but the difference is significant only in the case of the fifth one.
There are always less doubled Tt [tt] then corresponding to the negative binomial form (6-19 occurrences against 1183-133 expected) which makes 59.1-74.5 of the chi-square test value.
The fit is poor. The chi-square test values: 1. part over 17 = 0.010, 2. part over 20 = 0.228, 3. part 0, 4. part over 15 = 0.316, 5. part over 10 = 0.022, 6. part over 16 = 0.184.

U
There are no doubled uu or Uu (0 [0] occurrence against 57.0 [57.2] expected). This makes 73.2 [57.8] % of the chi-square test value of the exponential distribution. When the lower limit is set to 30 at the u set, the chi-square test value is improved to 0.705. The Erlang distribution gives over 42 the chi-square test value = 0.110.
V
The lognormal distribution both v and [v + V] is good, only with minor fluctuations. Both sets have a shorter tail over distances 1000 [1100] (4 [1] occurrences against 8 [5.7] expected). This contributes 51.7 [32.9] % of the chi-square test value.
W
The exponential distribution of the upper case W gives an acceptable fit. There is a peak of the distances 64-427 (53 occurrences against 41.9 expected). This difference makes 30.6 % of the chi-square test value.
The exponential distribution of w [W] gives a fair fit over 14 [29] (the chi-square test value 0.464 [0.404]). There are no doubled ww [Ww] (50.6 [50] % of the chi-square test value). There is a peak of the distances 18-34 (435 [490] occurrences against 376.3 [415.1] expected). This makes 13.5 [17.4] % of the chi-square test value. Here, the Erlang distribution is applicable, too. It gives at w the chi-square test value = 0. 088, at [w+W] the chi-square test value = 0.336 over 29.
X
No comment is necessary.
Y
The lognormal distribution of the lower case y, as well as [y + Y) would give a good fit, except that the distribution of this letter is shorter than expected (6 [6] occurrences over 420 [420] against 21.3 [20.8] expected). This makes 41.6 [41.7] % of the chi-square test value. There is a long peak within distances 197-252 (60 [60] occurrences against 43.2 [42.9] expected). This makes 31.1 [34] % of the chi-square test value.
Z
No comment is necessary.
Discussion
The insufficient capacity of the used software for long lists forced splitting of too frequent signs. The splitting was made before determining distances. Surprisingly, the obtained parts are not always comparable, since there are in the split parts different number of signs. This leads to the different mean distances between them.
The following table shows the statistics of the 11 split letters. The upper triangle is for the lower cases and gives the number of the significant differences. The lower triangle is for the both cases and gives the difference of significant differences against the lower case.

Part123456
1 47424
2-1 2123
322 462
41-12 11
51-100 3
611-113 

The consistency of the parts of both cases is somewhat worser, there are 10 worser fits. The greatest difference exist between the first part and the third one.
Without a stylistic analysis, it is possible only to speculate, what is causing the observed differences.
Some distributions of distances between consonants are highly regular, especially their tails, if the low distances inside words are pooled. They are described with a different precision with five distributions: exponential, Weibull, Erlang, lognormal and negative binomial. Sometimes it is rather difficult to decide which distribution is the better one for fitting.
If the results are compared with published analysis (18) of Shakespeare's Sonnets, then there can be observed many differences.
Beginning with the frequency letters. When the frequency is adjusted to the size of both texts, than the use of some consonants is the same (ratio Sonnets/Gospel):
[b + B] = 1.000, [f + F] = 1.004, [t + T] = 1.008, [p + P] = 1.089. The scale goes to [v + V] = 1.326, [x + X] = 2.182 and at last to [q + Q] = 4.120, where the use in Sonnets is quite different. On the other side, Gospel is characterized by the higher frequency only of [n + N] = 0.904, [h + H] = 0.800, [d + D] = 0.786, [z + Z] = 0.427, and [j + J] = 0.238. In the last case the ratio of the higher case J is 0.011 due to the high frequency of proper names as Jesus and John.
The vowels are used practically with the total ratio 0.998, when [a + A] is used more in Gospel (0.818), and [e + E] = 0.965, whereas other vowels are more exploited in Sonnets, [i + I] = 1.083, [o + O] = 1.103, [u + U] = 1.131, and [y + Y] =1.267.
REFERENCES
STATGRAPHICS, Statistical Graphics Corporation.
Kunz, M. ; Rádl, Z. Distribution of Distances in Information Strings, J. Chem. Inform. Comput. Sci., 1998, 38, 374-378.