Distance Analysis of Latin Texts. Titi Livi Ab Urbe Condita Liber 1
Milan Kunz, April, 2003
Abstract
Distances between identical symbols and using of punctuation marks in "Titi Livi Ab Urbe Condita. Liber 1" are described with a different precision with five distributions: Exponential, Erlang, Weibull, lognormal and negative binomial. The correlations are sometimes highly significant.
Introduction
This is a continuation of study of statistical properties of distances between identical symbols in different languages. The same technique is used as before, see other papers in this section.
Results
Punctuation marks
The distances between point marks determine the length of sentences. There are 785 points.
Their distribution is of Erlang type, a = 2, b = 0.013208. Chisquare = 12.3399, significance level 0.5000 with 13 degree of freedom. There exists a shortage of short sentences till distance 34 (46 occurrences against 59.2 expected). This alone makes 23.9 % of the chi-square test value. Another shortage of points between distances 299-364 (26 occurrences against 38.2 expected). This makes another 31.8 % of the chi-square test value.
The other punctuation mark, the semicolon (303 occurrences), is modeled poorly (the chi-square test value 0.0353) by the exponential distribution. There exists a shortage of semicolons between distances 280-400 (27 occurrences against 39.1 expected). This makes 19.4 % of the chi-square test value. The surplus of semicolons between distances 880-1000 (16 occurrences against 8.4 expected) makes 35.4 % of the chi-square test value.
The distances between consecutive double points are dispersed according to the Weibull distribution (the chi-square test value 0.4458). The surplus of double points between distances 500-900 (22 occurrences against 15.9 expected) makes 49.2 % of the chi-square test value.
The distribution of distances between consecutive commas is exponential, the tail over 40 gives almost perfect fit with the significance level 0.9796. There are too few commas till distance 25 (280 occurrences against 333.7 expected). This makes 43.6 % of the chi-square test value. Then till distance 49, there are too many commas (308 occurrences against 242.4 expected). This makes 38.9 % of the chi-square test value.
The spacebar
The distances between consecutive spacebars greater than 1 determine the number of words of the length corresponding to this distance minus one. There exists 16985 spacebars. The results of tests are tabulated as follows. Cumulating frequencies of shorter distances, improved in some cases the fit, since bellow it the counts are scattered, and differences can balance themselves.
Table 1 The number of words of different length
Length | Number | Type of distribution, chisquare value |
2 | 45 | EX, 0.1713 |
3 | 1767 | NB, 0, 0.6226 over 10 |
4 | 1545 | NB, 0.5899 |
5 | 1841 | NB, 0.7491 |
6 | 2473 | NB, 0.4966 |
7 | 2517 | 1262, NB, 0.6683 |
1254, NB, 0.2420 | ||
8 | 2179 | NB, 0.5295 |
9 | 1655 | EX, 0, 0.5389 over 35 |
10 | 1161 | NB, 0.7843 |
11 | 857 | NB, 0.6745 |
12 | 500 | NB, 0.1432 |
13 | 240 | EX, 0.4140 |
14 | 105 | WE, 0.6906 |
15 | 62 | LN, 0.3996 |
16 | 20 | EX, 0.0602 |
17 | 0 | |
18 | 1 |
The most frequent words are distributed according to the negative binomial distribution. This is also most frequent, it correlates 9 cases. The Weibull distribution is applicable at the distance 14. The exponential distribution at four distances.
Notes to some outstanding results:
1 letter words repeat more often then expected within distances 253-370 (11 occurrences against 6.2 expected). This makes 75.7 % of the chi-square test value.
2 letter words. The negative binomial distribution shape is disturbed by the shortage of short distances till 3 (379 occurrences against 496.1 expected). This makes 45.8 % of the chi-square test value. The surplus of distances 4-6 (442 occurrences against 356.8 expected) makes 31.4 % of the chi-square test value.
3 letter words. The shortage of distances 11-13 (117 occurrences against 111.2 expected) makes 46.3 % of the chi-square test value.
4 letter words. No great deviation from the expected values.
5. letter words repeat immediately less often then expected (323 occurrences against 360.2 expected). This makes 26.7.7 % of the chi-square test value. The shortage of distances 29-30 (4 occurrences against 8.1 expected) makes 14.6 % of the chi-square test value.
6. letter words are most frequent. It was necessary to divide the set before testing in two halves. The halves are different according to the two way sample analysis.
The first half correlation is disturbed by the shortage of distances 18-19 (11 occurrences against 20.8 expected) makes 23.6 % of the chi-square test value. The surplus of distances 23-24 (15 occurrences against 9 expected) makes 19.4 % of the chi-square test value.
The second half correlation is disturbed by the peak of distances 7-8 (157 occurrences against 132 expected) makes 24.3 % of the chi-square test value. The shortage of distances 20-22 (9 occurrences against 17.9 expected) makes 22.6 % of the chi-square test value.
7. letter words repeat less often then expected within distances 28-30 (11 occurrences against 18 expected). This makes 18.1 % of the chi-square test value. Then they repeat more often then expected within distances 35-39 (15 occurrences against 10.1 expected). This makes 15.9 % of the chi-square test value.
8. letter words are correlated poorly due to their shortage within distances 11-14 (171 occurrences against 220.9 expected). This makes 17.7 % of the chi-square test value. The immediately following surplus of distances 15-17 (120 occurrences against 78.4 expected) makes 29.9 % of the chi-square test value.
9. letter words are correlated rather well. They repeat too often within distances 5-9 (286 occurrences against 260.9 expected). This makes 24.9 % of the chi-square test value. The shortage within distances 19-22 (65 occurrences against 80 expected) makes 29.1 % of the chi-square test value.
10. letter words are too many in distances over 89 (16 occurrences against 9 expected). This makes 41.9 % of the chi-square test value.
11. letter words are less often within distances 60-77.1 (60 occurrences against 77.1 expected). This makes 23.7 % of the chi-square test value. The shortage within distances 115-134 (2 occurrences against 7.4 expected) makes 24.9 % of the chi-square test value.
Longer words need not special comments.
Distances between individual letters
The results for all letters are presented in the form of the table, where the frequencies of all symbols are given and the significance of the performed chi-square tests. Then the commentaries to all symbols of the alphabet are given. The values in the square brackets show the corresponding values of the combined lower and upper cases.
Table 2 Survey of results
Notes:
EX = exponential distribution
WE = Weibull distribution
L N = lognormal distribution
NB = negative binomial distribution
Statistic = XX, the chi-square test value
Symbol | Small |
|
| Ratio C/B % |
a | 8378, LN, NB |
|
| 2.77 |
b | 1584, EX, 0.2605 |
|
| 1.31 |
c | 3665, NB |
|
| 3.30 |
d | 2835, EX |
|
| 1.32 |
e | 11367, NB, LN |
|
| 0.87 |
f | 1007, NB, 0.2520 |
|
| 5.44 |
g | 1207, NB, 0.2010 |
|
| 1.71 |
h | 398, WE, 0.5031 |
|
| 17.52 |
i | 11097, NB |
|
| 1.61 |
j | 0 | 0 | 0 | - |
k | 0 | 0 | 0 | - |
l | 2849, NB | 121, WE, 0.0016 | 2970, NB | 4.07 |
m | 5775, NB | 61, WE | 5836, NB | 1.04 |
n | 6019, NB | 79, WE, 0.8162 | 6098, NB | 1.29 |
o | 5082, NB, EX | 7 | 5089, NB, LN | 0.14 |
p | 2723, NB | 99, WE, 0.0990 | 2822, NB | 3.51 |
q | 1633, WE, 0.5848 | 56, EX, 0.4619 | 1689, EX, 0.9387 over 80 | 3.32 |
r | 6391, NB | 256, LN, 0.3145 | 6647, NB | 3.83 |
s | 6922, NB | 153, LN, 0.2388 | 7075, NB | 2.16 |
t | 7615, NB | 235, LN, 0.2695 | 7850, NB | 2.99 |
u | 8882, NB | 0 | 8882, NB | 0 |
v | 937, EX, 0.1191 | 63, EX, 0.1892 | 1000, EX, 0.9726 | 6.3 |
w | 0 | 0 | 0 | - |
x | 462, EX, 0.1239 | 0 | 462, EX, 0.1239 | 0 |
y | 14, no test | 0 | 14, no test | 0 |
z | 2 | 0 | 2 | 0 |
The last column gives ratios of capital letters to all occurrences. Since capital letters are used at the beginning of sentences and of proper names, it can be concluded, that no proper name and no sentence starts with U, in contrast with H, where it makes 17.52 % of all occurrences.
The results of statistical tests can be tabulated:
Lower case | Upper case | Combined | |
No test | 4 | 9 | 3 |
The negative binomial distribution | 14 | 0 | 13 |
The exponential distribution | 4 | 2 | 6 |
The Weibull distribution | 2 | 10 | 2 |
The lognormal distribution | 1 | 4 | 1 |
At the upper case, only 16 letters give results. The Weibull distribution is the most frequent together with the lognormal distribution. At the lower case and at combined cases, the negative binomial distribution is the most frequent, than the exponential distribution, the Weibull distribution, and the lognormal distribution correlates 1 case, only. The chi-square values sometimes are practically zero, and only adjusting the lowest possible value to greater distances by pooling these shorter distances increases the significance of the chi-square tests. Now, the commentaries to the individual letters follow.
A
The upper case A frequency allowed the separate test. The good fit with the Weibull distribution (the chi-square test value 0.434) is worsened by too few repeating within distances 318-634 (39 occurrences against 48.3 expected) which makes 46.9 % of the chi-square test value.
The distribution of distances between the lower case a can be modeled by the lognormal distribution. There are no doubled aa but then the lower case a repeats too often within short distances.
The set was divided into four parts. The two way sample analysis shows how the parts of the lower case differ:
Part | 2 | 3 | 4 |
1 | *0.048 | *0.010 | *0 |
2 | 0.554 | 0.068 | |
3 | 0.219 |
Note: The asterisk shows the significant difference between tested parts.
The first fourth differs significantly from all other parts. The difference increases.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Range | Observed | Expected | % of chisquare |
1 | 2-3 | 308 | 195.2 | 32.9 |
4-7 | 510 | 642.5 | 14.5 | |
26-28 | 71 | 39.9 | 12.2 | |
2 | 2-4 | 388 | 305.4 | 26.6 |
3 | 2-5 | 563 | 445.3 | 29.3 |
6-13 | 738 | 876.2 | 20.6 | |
4 | 41-43 | 34 | 17 | 19.2 |
over 85 | 0 | 13.5 | 15.1 |
The distances between both case (a + A) are fitted poorly by lognormal distribution and by the negative binomial distribution. There are no doubled Aa or aa but then the lower case a repeats too often within short distances.
The set was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 | 4 |
1 | 0.052 | *0.002 | *0 |
2 | 0.304 | *0.035 | |
3 | 0.282 |
The first fourth differs significantly from all other parts. The difference increases.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Distribution | Range | Observed | Expected | % of chisquare |
1 | LN | 2-3 | 332 | 214.2 | 35.5 |
4-5 | 268 | 350.1 | 10.6 | ||
26-28 | 65 | 338.5 | 10.0 | ||
2 | NB, 0, 0.8994 over 12 | 1 | 1 | 157.9 | 77.7 |
3 | NB, 0, 0.4487 over 12 | 1 | 1 | 153.6 | 75.3 |
2-13 | 1345 | 1478.2 | 13.5 | ||
4 | LN | 41-43 | 32 | 17.1 | 15.3 |
over 85 | 0 | 13.3 | 11.1 |
Adding A improved the fit.
B
The distribution of distances between consecutive occurrences of this letter is exponential. There are too few b within distances 179-210 (37 occurrences against 49.8 expected). This contributes 22.3 % of the chi-square test value. At (b + B), this deformation lies within distances 172-202 (29 occurrences against 52.7 expected). This contributes 46.9 % of the chi-square test value. Including B improved the fit.
C
The distribution of distances of the upper case of this letter is described by the Weibull distribution. There are too few C within short distances till 80 (26 occurrences against 20.9 expected). This contributes 33.5 % of the chi-square test value.
The set of c, as well as [c + C] was divided into two parts.
The distribution of distances of the first part of c is described by the negative binomial distribution (the chi-square test value is 0.5105 over 20, 0.8758 over 40) or by the exponential distribution (the chi-square test value is 0.6806 over 20). According of both distributions, there are too many c within distances 84-90 [86-94] (35 [44] occurrences against 35 [30.4] expected). This contributes 23.6 [13.3] % of the chi-square test value.
The distribution of distances of the second part of c is described by the negative binomial distribution (the chi-square test value is 0.4667 over 40) or by the exponential distribution (the chi-square test value is 0.5343 over 40). There are too many c within distances 13-35 (711 occurrences against 642.2 expected). This contributes 28.0 % of the chi-square test value. Too many c within distances 116-127 (24 occurrences against 15.4 expected) make 18.0 % of the chi-square test value.
Both parts are similar (test value 0.885).
The distribution of distances of the first part of [c + C] is described by the negative binomial distribution (the chi-square test value is 0.4058 over 20. [c + C] repeat less often than expected till distance 9 (405 occurrences against 477.5 expected). This contributes 28.4 % of the chi-square test value. Then there are too many [c + C] within distances 10-34 (884 occurrences against 783.4 expected). This contributes 30.8 % of the chi-square test value. There are too many [c + C] within distances 111-119 (23 occurrences against 13.6 expected). This contributes 15.5 % of the chi-square test value.
The distribution of distances of the second part of [c + C] is described by the exponential distribution (the chi-square test value is 0.3212 over 40). [c + C] repeat less often than expected till distance 9 (422 occurrences against 495.1 expected). This contributes 30.4 % of the chi-square test value. There are too many [c + C] within distances 86-94 (46 occurrences against 29.3 expected). This contributes 15.8 % of the chi-square test value.
Both parts are similar (test value 0.921).
D
The Weibull distribution of the upper case D needs no commentary.
Here the exponential distribution and the negative binomial are applicable in case of d as well as [d + D].
Both sets were divided into two parts.
The distributions of distances of both parts of d are described by the exponential distribution (the chi-square test value is 0.2141, and 0.3536, respectively).
In the first part, the greatest disturbance is due to too many d within distances 193-217 (12 occurrences against 6.6 expected). This contributes 22.3 % of the chi-square test value.
The distribution of distances of the second part of d has no single great deviation from the expected values.
Both parts are similar (test value 0.758).
The distributions of distances of both parts of [d + D] are described by the exponential distribution (the chi-square test value is 0.2139, and 0.1200 over 10, respectively).
In the first part, the greatest disturbance is due to too many [d + D] within distances 156-167 (15 occurrences against 9 expected). This contributes 21.2 % of the chi-square test value.
In the second part, the greatest disturbance is due to too many [d + D] within distances 61-72 (106 occurrences against 83.7 expected). This contributes 22.0 % of the chi-square test value.
Both parts are similar (test value 0.673).
E
The distribution of distances between upper case E is exponential. There are too many E within distances 2101-2900 (12 occurrences against 8.3 expected). This contributes 64.6 % of the chi-square test value.
The distribution of distances between the lower case e can be modeled by the negative binomial distribution. There are no doubled ee (or only 1) but then the lower case e repeats too often within short distances.
The set of e distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:
Part | 2 | 3 | 4 | 5 |
1 | 0.152 | *0.003 | 0.820 | 0.118 |
2 | 0.128 | 0.230 | 0.905 | |
3 | *.0.007 | 0.157 | ||
4 | 0.183 |
The first fifth differs significantly from third one. The third one from the fourth one.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Type | Chi-square | Range | Observed | Expected | % of chisquare |
1 | NB | 0.2340 over 20 | 1 | 1 | 225.1 | 75.0 |
2-3 | 477 | 385.8 | 7.2 | |||
2 | NB | 0.6630 over 12 | 1 | 1 | 216.7 | 74.6 |
2-3 | 500 | 373.4 | 14.9 | |||
3 | LN | 0 | 1 | 0 | 17.1 | 19.9 |
16-17 | 75 | 122.8 | 21.6 | |||
18-29 | 305 | 229.5 | 29.8 | |||
4 | NB | 0.4540 over 13 | 1 | 1 | 223.4 | 71.1 |
2-3 | 531 | 383.1 | 18.3 | |||
5 | LN | 0.1030 over 25 | 1 | 0 | 13.4 | 18.2 |
21-23 | 92 | 59.2 | 24.6 |
The 3. and 5. parts have too long tails to fit with the negative binomial distribution.
The set of [e + E] distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:
Part | 2 | 3 | 4 | 5 |
1 | 0.197 | *0.003 | 0.784 | 0.107 |
2 | 0.093 | 0.310 | 0.746 | |
3 | *.0.007 | 0.178 | ||
4 | 0.181 |
The first fifth differs significantly from third one. The third one from the fourth one.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Type | Chi-square | Range | Observed | Expected | % of chisquare |
1 | NB | 0.1494 over 14 | 1 | 2 | 231.4 | 75.3 |
2-3 | 489 | 396 | 7.2 | |||
2 | NB | 0.6970 over 12 | 1 | 2 | 223.8 | 74.9 |
2-3 | 513 | 385 | 14.5 | |||
3 | LN | 0.0913 over 23 | 1 | 0 | 17.7 | 18.6 |
16-17 | 77 | 124.2 | 18.9 | |||
18-26 | 271 | 197 | 30.2 | |||
4 | NB | 0.4830 over 12 | 1 | 1 | 229.5 | 71.7 |
2-3 | 544 | 393.1 | 18.3 | |||
5 | LN | 0.0.638 over 25 | 1 | 0 | 13.4 | 19.8 |
16-26 | 344 | 292.6 | 33.8 |
The 3. and 5. parts have too long tails to fit with the negative binomial distribution.
F
The distribution of F is correlated well with the lognormal distribution.
The distributions of the lover case f and of [f + F] are correlated with the negative binomial distribution. The doubled ff are too many (28 occurrences against 8.5 expected). This contributes 76.6 % of the chi-square test value. [f + F] is correlated n rather well, the chi-square test value is 0.8555.
G
The distribution of g and of [g + G], is correlated with the negative binomial distribution. g and [g + G] repeat less often than expected till distance 30 (279 [287] occurrences against 318.7 [329.1] expected). This contributes 27.6 [26.7] % of the chi-square test value. There are too few distances 248-278 (16 [15] occurrences against 26.3 [25.9] expected). This contributes 22.3 [22.9 % of the chi-square test value.
H
The distribution of this letter is correlated with the Weibull distribution. At lover case h, there are too few distances 262-376 and too many distances 724-838 (43 [17] occurrences against 54.7 [11.2] expected). This contributes 33.9 [40.9] % of the chi-square test value. Combining with the capital H improved the fit, it is worsened only due to too many distances 789-937 (12 occurrences against 8.6 expected) which contributes 31.6 % of the chi-square test value.
I
The distribution of the capital I is correlated with the Weibull distribution. The greatest disturbance is a surplus of counts within distances 235-452 (54 occurrences against 37.1 expected) which contributes 54.4 % of the chi-square test value. Then there are too few distances 670-886 (11 occurrences against 20 expected). This contributes 28.5 % of the chi-square test value.
The negative binomial distribution is applicable in case of i as well as [i + I].
The set of i distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:
Part | 2 | 3 | 4 | 5 |
1 | 0.609 | 0.071 | *0.001 | *0.001 |
2 | 0.197 | *0.007 | *0.007 | |
3 | 0.161 | 0.157 | ||
4 | 0.183997 |
The first fifth differs significantly from two last ones, similarly as the third one.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0.0504 over 21 | 1 | 32 | 197.2 | 70.6 |
16-17 | 132 | 92.6 | 8.5 | ||
2 | 0.8128 over 9 | 1 | 31 | 201.2 | 71.2 |
2-9 | 1260 | 1075.2 | 17.3 | ||
3 | 0.4758 over 10 | 1 | 31 | 208.7 | 66.9 |
6-9 | 578 | 443.4 | 19.6 | ||
4 | 0.2455 over 10 | 1 | 37 | 216.6 | 75.0 |
2-12 | 1554 | 1359.7 | 14.6 | ||
5 | 0.1453 over 21 | 1 | 45 | 214.1 | 65.9 |
2-18 | 1863 | 1640.3 | 18.2 |
The set of [i + I] distances was divided into four parts, only. The two way sample analysis shows how the parts of the upper case differ:
Part | 2 | 3 | 4 |
1 | 0.832 | 0.077 | *0.006 |
2 | 0.120 | *0.011 | |
3 | 0.318 |
The fit worsens consecutively, the last quarter differs significantly from the first one. The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0.0640 over 18 | 1 | 30 | 204.6 | 70.0 |
2-9 | 1237 | 1089.9 | 10.1 | ||
16-17 |
|
|
| ||
2 | 0.4578 over 8 | 1 |
|
|
|
6-9 | 1289 | 1099.7 | 17.1 | ||
3 | 0.4782 over 8 | 1 | 33 | 216.2 | 69.3 |
6-9 | 583 | 453 |
| ||
4 | 0.2310 over 10 | 1 | 37 | 221.8 | 74.5 |
2-12 | 1587 | 1385.1 | 15.0 |
Combining both cases improved the fit.
J and K
These letters are not used in the text.
L
The occurrences of capital L are correlated poorly by the Weibull distribution.
The frequencies of l and [l + L] are correlated with the negative binomial distribution.
The sets of l and [l + L] distances were divided into two parts. The two way sample analysis shows that the parts of both cases differ significantly.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
l1 | 0.8949 over 20 | 1 | 188 | 36.2 | 92.4 |
2-12 | 226 | 342.2 | 5.7 | ||
l2 | 0.2431 over 20 | 1 | 188 | 32.3 | 91.6 |
2-12 | 197 | 310.4 | 5.1 | ||
[l + L]1 | 0.8246 over 20 | 1 | 190 | 39 | 92.7 |
2-12 | 256 | 367 | 5.3 | ||
[l + L]2 | 0.1228 over 20 | 1 | 186 | 35.4 | 88.7 |
2-12 | 211 | 337.6 | 6.6 |
M
The occurrences of capital M are correlated poorly by the Weibull distribution.
The frequencies of m and [m + M] are correlated with the negative binomial distribution.
The sets of m and [m + M] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:
Part | 2 | 3 |
1 | *0.026 | *0.036 |
2 | 0.871 |
The first third differs significantly from two last ones.
The two way sample analysis shows how the parts of the upper case differ:
Part | 2 | 3 |
1 | *0.038 | *0.024 |
2 | 0.873 |
The fit worsens consecutively.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
m1 | 0.9773 over 13 | 1 | 15 | 98.0 | 71.9 |
5-13 | 581 | 506.7 | 11.2 | ||
m2 | 0.7837 over 10 | 1 | 21 | 91.2 | 65.8 |
2-14 | 963 | 857.8 | 20.2 | ||
m3 | 0.9909 over 12 | 1 | 24 | 91.9 | 66.6 |
6-14 | 608 | 536.7 | 13.2 | ||
[m + M]1 | 0.4490 over 9 | 1 | 15 | 100.1 | 70.7 |
6-13 | 591 | 515.1 | 11.1 | ||
[m + M]2 | 0.8589 over 10 | 1 | 21 | 93.7 | 65.4 |
8-14 | 478 | 402.2 | 16.5 | ||
[m + M]3 | 0.9861 over 12 | 1 | 24 | 93.2 | 66.2 |
6-14 | 615 | 543.4 | 12.7 |
N
The occurrences of capital N are correlated well by the Weibull distribution.
The frequencies of n and [n + N] are correlated with the negative binomial distribution.
The sets of n and [n + N] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:
Part | 2 | 3 |
1 | *0.046 | 0.271 |
2 | 0.384 |
The first third differs significantly from second one.
The two way sample analysis shows how the parts of the both cases differ:
Part | 2 | 3 |
1 | *0.038 | 0.163 |
2 | 0.519 |
The first third differs significantly from second one.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
n1 | 0 | 1 | 13 | 104.4 | 52.5 |
5-25 | 1185 | 1007.3 | 23.1 | ||
n2 | 0.6723 over 25 | 1 | 10 | 99.2 | 58.0 |
6-10 | 436 | 348.8 | 15.8 | ||
n3 | 0.6711 over 15 | 1 | 5 | 101.4 | 62.3 |
6-18 | 916 | 759.1 | 23.2 | ||
[n + N]1 | 0.4490 over 9 | 1 | 13 | 107.5 | 53.7 |
6-25 | 1198 | 1025.5 | 22.0 | ||
[n + N]2 | 0.3575 over 24 |
|
|
| 57.5 |
6-10 | 443 | 357.1 | 14.6 | ||
16-30 | 592 | 506.6 | 10.5 | ||
[n + N]3 | 0.7519 over 15 | 1 | 5 | 103.5 | 64.1 |
6-18 | 925 | 771.6 | 22.3 |
O
The distributions of o and (o + O) are correlated poorly with the negative binomial distribution. The tails, longer distances between consecutive occurrences, are more frequent than expected in some part. Then some other distribution perform better.
The sets of o and [o + O] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:
Part | 2 | 3 |
1 | *0.002 | 0.947 |
2 | *.001 |
The second third differs significantly from the first and third ones.
The two way sample analysis shows how the parts of the upper case differ:
Part | 2 | 3 |
1 | *0.002 | 0.948 |
2 | *0.002 |
The second third differs significantly from the first and third ones.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Type | Chi-square | Range | Observed | Expected | % of chisquare |
| NB | 0.9521 over 26 | 1 | 2 | 69.9 | 78.6 |
10-26 | 666 | 591.3 | 12.0 | |||
| EX | 0 | 1-2 | 58 | 149.4 | 47.4 |
8-19 | 601 | 487.8 | 22.3 | |||
| EX | 0.8249 over 20 | 1 | 0 | 68.4 | 63.9 |
6-21 | 631 | 531.1 | 19.5 | |||
| NB | 0.9567 over 26 | 1 | 2 | 70.2 | 78.4 |
10-26 | 668 | 592.4 | 12.4 | |||
| LN | 0 | 30-52 | 314 | 238 | 39.9 |
>162 | 6 | 26.6 | 25.0 | |||
| NB | 0.4789 over 14 | 1 | 1 | 70.2 | 73.5 |
8-14 | 378 | 322.8 | 10.2 |
P
The upper case P can be correlated poorly using the Weibull distribution, p and [p +P] are correlated with the negative binomial distribution. The sets of p and [p +P] distances were divided into two parts. The two way sample analysis shows that the parts of the lower case are poorly comparable, the test values 0.112 [0.059].
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
| 0.0978 | 1 | 49 | 32.3 | 40.9 |
0.7550 over 10 | 2-15 | 349 | 379.6 | 11.6 | |
| 0.1903 | 61-72 | 65 | 83.2 | 17.4 |
0.7308 over 9 | 73-107 | 173 | 146.4 | 22.8 | |
| 0.1291 | 1 | 49 | 34.9 | 28.4 |
| 0.1505 | 61-72 | 62 | 84.8 | 26.8 |
0.5285 over 9 | 84-107 | 108 | 87.3 | 22.9 |
Q
The consonant q is used only in connection with u as qu.
The upper case Q correlates using the exponential distribution. The fit is worsened by too few repeating within distances 1668 till 2500 (13 occurrences against 8.3 expected). It makes 71.1 % of the chi-square test value.
The distribution of q is correlated with the exponential distribution or with the Weibull distribution. The fit is almost equal:
over 30 | over 80 | |
The exponential distribution | 0.0972 | 0.9160 |
The Weibull distribution | 0.1860 | 0.8980 |
There are too few repeating within distances 193 till 209 (17 occurrences against 24.4 expected). It makes 13.8 % of the chi-square test value. Immediately, too many repeating follows within distances 210-226 (26 occurrences against 18.9 expected). This slight distortion makes 16.7 of the chi-square test value.
The distribution of [q + Q] is correlated with the exponential distribution. There are no Qq (0 occurrence against 23.8 expected). This contributes 53.1 % of the chi-square test value. Another 23.1 % makes the surplus of distances 53-87 (364 occurrences against 309.3 expected).
R
The upper case R is correlated with the lognormal distribution. The fit is worsened by too many repeating within distances 1233 till 1540 (10 occurrences against 6.5 expected). It makes 41.1 % of the chi-square test value.
The distribution of r is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 | 4 |
1 | 0.280 | 0.081 | *0.010 |
2 | 0.515 | 0.149 | |
3 | 0.432 |
The fit worsens consecutively, the last quarter differs significantly from the first one. The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0.2677 over 31 | 1-6 | 303 | 433.9 | 50.8 |
7-31 | 1014 | 850.5 | 33.5 | ||
2 | 0.5570 over 28 | 1-6 |
|
|
|
7-26 | 920 | 768.8 | 33.7 | ||
3 | 0.2855 over 30 | 1-6 | 327 | 456.5 | 50.3 |
7-26 | 932 | 770.3 |
| ||
4 | 0.2376 over 27 | 1-6 | 338 | 464.7 | 53.9 |
7-26 | 929 | 771.6 | 34.6 |
The distribution of [r + R] is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 | 4 |
1 | 0.450 | 0.233 | 0.082 |
2 | 0.671 | 0.335 | |
3 | 0.588 |
The combining of both cases decreased the differences of parts. The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0.3133 over 23 | 1-6 | 333 | 472.8 | 52.2 |
7-31 | 1064 |
|
| ||
2 | 0.1400 over 20 | 1-6 |
|
|
|
7-26 | 1071 | 906.6 | 33.7 | ||
3 | 0.5028 over 26 | 1-6 | 347 | 488.6 | 52.7 |
7-26 | 983 | 805.4 |
| ||
4 | 0 | 1-6 | 367 | 495.7 | 54.9 |
7-26 | 970 | 808.2 | 34.1 |
S
The upper case S is correlated with the lognormal distribution.
The distribution of the lower case s an d [s + S] is described poorly by the negative binomial distribution.
The set of s was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 | 4 |
1 | 0.555 | 0.995 | 0.388 |
2 | 0.565 | 0.143 | |
3 | 0.391 |
The first and the third parts are very similar. The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0.0061 | 1 | 133 | 100.2 | 29.4 |
0.9167 over 10 | 2-6 | 282 | 345.8 | 32.2 | |
2 | 0.0199 | 2-5 |
|
|
|
0.4318 over 9 | 6-9 | 313 | 269.4 | 20.1 | |
3 | 0 | 1 | 151 | 101.0 | 27.1 |
0.0791 over 24 | 2-5 | 297 | 348.5 |
| |
4 | 0 | 1 | 160 | 103.7 | 37.0 |
0.1611 over 10 | 2-5 | 276 | 356.3 | 21.9 |
The distribution of [r + R] is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 | 4 |
1 | 0.285 | 0.888 | 0.450 |
2 | 0.233 | 0.069 | |
3 | 0.547 |
The first and the third parts are similar. The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0.0079 | 1 | 134 | 106.9 | 19.3 |
0.9331 over 10 | 2-5 | 298 |
|
| |
6-14 | 616 | 560 | 15.7 | ||
2 | 0.0075 | 2-5 | 313 | 356.7 | 13.9 |
0.3233 over 10 | 6-9 | 322 |
|
| |
39-47 |
|
|
| ||
3 | 0.1916 over 23 | 1 |
|
|
|
2-5 | 309 | 371.5 | 13.4 | ||
6-13 | 587 | 516.9 | 13.4 | ||
4 | 0 | 1 | 148 | 103.1 | 31.0 |
2-5 | 283 | 353.1 | 22.1 |
T
The distribution of the capital T has the lognormal shape. The chi-square test value of the tail over 400 is 0.8053. There are too many distances 167-392 (60 occurrences against 45.8 expected). This contributes 49.9 % of the chi-square test value.
The distribution of the lower case t as well as the both [t + T] is the negative binomial one.
The set of t was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 | 4 |
1 | 0.930 | 0.072 | *0.006 |
2 | 0.087 | *0.008 | |
3 | 0.352 |
The first and the second parts are very similar. The similarity deteriorates consecutively.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0 | 1-4 | 267 | 458 | 59.5 |
0.2735 over 21 | 5-14 | 918 | 721 | 29.5 | |
2 | 0 | 1-4 |
|
|
|
0.1605 over 17 | 5-20 | 1159 | 963.9 | 25.5 | |
3 | 0 | 1-4 | 259 | 435.5 | 63.9 |
0.1849 over 28 | 5-20 | 1145 | 947.5 |
| |
4 | 0 | 1-5 | 332 | 515.3 | 57.6 |
6-26 | 1242 | 1016.8 | 28.6 |
The set of [t + T] was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 | 4 |
1 | 0.815 | 0.121 | 0.052 |
2 | 0.190 | 0.088 | |
3 | 0.685 |
The differences between parts increases step by step. The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0 | 1-4 | 293 | 483.5 | 61.2 |
0.2478 over 21 | 5-14 | 945 |
|
| |
2 | 0 | 1-4 | 313 | 479.6 | 63.0 |
0.1314 over 15 | 5-17 | 1072 |
|
| |
3 | 0 | 1-4 |
|
|
|
0.0735 over 16 | 6-20 | 1188 | 989 | 21.3 | |
4 | 0 | 1-5 | 356 | 548.1 | 57.6 |
0.1378 over 19 | 6-22 | 1182 | 940.1 | 30.2 |
U
There are no capital U, thus both sets, u and [u + U], are identical. The set of [u + U] was divided into five parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 |
| 5 |
1 | *0.036 | *0.028 | *0.238 | *0.002 |
2 | 0.923 |
| 0.269 | |
3 |
| 0.311 | ||
4 | 0.364 |
The first part differs significantly from other ones. The last fifth is very dissimilar. The similarity deteriorates consecutively.
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0 | 1-3 | 189 | 348.6 | 62.0 |
0.4691 over 10 | 4-9 | 628 | 504.4 | 20.6 | |
2 | 0 | 1-3 |
|
|
|
4-9 | 654 | 526.1 | 16.8 | ||
3 | 0 | 1-3 | 192 | 371.7 | 62.9 |
0.4208 over 11 | 4-10 | 768 | 592.4 |
| |
4 | 0 | 1 | 6 | 134.2 | 70.5 |
0.2392 over 10 | 6-26 | 606 | 787.8 | 18.1 | |
5 | 0 | 1 | 9 | 138.3 | 72.8 |
2-13 | 1121 | 1021 | 14.5 |
Since a considerable part of u is used in qu, it was possible to eliminate this connection and to analyze only remaining uses.
The set of these u was divided into four parts. The two way sample analysis shows how the parts differ:
Part | 2 | 3 |
|
1 | 0.056 | 0.397 | 0.169 |
2 | 0.285 |
| |
3 |
|
The first part does not differ significantly from other ones. The results are poorly comparable with the results with all u, since the number of parts is different
The most important disturbances from the shape of the distribution in all parts are tabulated:
Part | Chi-square | Range | Observed | Expected | % of chisquare |
1 | 0 | 1 | 8 | 104.8 | 72.2 |
0.3419 over 20 | 6-22 | 960 | 837.1 | 15.3 | |
2 | 0 | 1-3 |
|
|
|
0.85.66 over 11 | 4-12 | 773 | 648.8 | 20.0 | |
3 | 0 | 1 | 6 | 107.1 | 67.1 |
0.1437 over 18 | 6-19 | 899 | 749 |
| |
4 | 0 | 1 | 11 | 109.4 | 74.8 |
0.3193 over 15 | 7-18 | 745 | 642.9 | 16.0 |
V
The exponential distribution is applicable for all three sets.
The capital Vs occur too often till distance 527 (23 occurrences against 15.7 expected). This contributes 56.2 % of the chi-square test value.
The fit over 20 is almost perfect, the chi-square test value is 0.9801 over 20 at v set, 0.9726 at [v + V] set. None vv against 7.4 occurrences expected contribute 34.1 % of the chi-square test value. Too few v in the range 404-440 (2 occurrences against 9.7 expected) make 28.2 of the chi-square test value.
The upper case needs no commentary.
W
No occurrence.
X
The exponential distribution gives a good fit. Too few x till distance 22 (25 occurrences against 38.3 expected) contribute 32.9 % of the chi-square test value.
Y and Z
Too few occurrences.
Discussion
The insufficient capacity of the used software for long lists forced splitting of too frequent signs. The splitting showed that the use of words change within the studied book, different words are used at its end than at its beginning.
Some distributions of distances between consonants are highly regular, especially their tails, if the low distances inside words are pooled. They are described with a different precision with four distributions: negative binomial, exponential, Weibull, and lognormal. Sometimes it is rather difficult to decide which distribution is the better one for fitting.
Compared with the English and German text studied before, Latin is exploiting only 21 letters, j, k, and w are not used, y and z occur only in few words of foreign origin.
The most frequent fit was obtained with the negative binomial distribution. This distribution can be expected if no biases occur. At upper case, the Weibull distribution correlates long distances between scarce occurrences.