Sign In Sign-Up

Distance Analysis of Latin Texts. Titi Livi Ab Urbe Condita Liber 1
Milan Kunz, April, 2003
Abstract

Distances between identical symbols and using of punctuation marks in "Titi Livi Ab Urbe Condita. Liber 1" are described with a different precision with five distributions: Exponential, Erlang, Weibull, lognormal and negative binomial. The correlations are sometimes highly significant.

Introduction

This is a continuation of study of statistical properties of distances between identical symbols in different languages. The same technique is used as before, see other papers in this section.

Results
Punctuation marks

The distances between point marks determine the length of sentences. There are 785 points.

Their distribution is of Erlang type, a = 2, b = 0.013208. Chisquare = 12.3399, significance level 0.5000 with 13 degree of freedom. There exists a shortage of short sentences till distance 34 (46 occurrences against 59.2 expected). This alone makes 23.9 % of the chi-square test value. Another shortage of points between distances 299-364 (26 occurrences against 38.2 expected). This makes another 31.8 % of the chi-square test value.

The other punctuation mark, the semicolon (303 occurrences), is modeled poorly (the chi-square test value 0.0353) by the exponential distribution. There exists a shortage of semicolons between distances 280-400 (27 occurrences against 39.1 expected). This makes 19.4 % of the chi-square test value. The surplus of semicolons between distances 880-1000 (16 occurrences against 8.4 expected) makes 35.4 % of the chi-square test value.

The distances between consecutive double points are dispersed according to the Weibull distribution (the chi-square test value 0.4458). The surplus of double points between distances 500-900 (22 occurrences against 15.9 expected) makes 49.2 % of the chi-square test value.

The distribution of distances between consecutive commas is exponential, the tail over 40 gives almost perfect fit with the significance level 0.9796. There are too few commas till distance 25 (280 occurrences against 333.7 expected). This makes 43.6 % of the chi-square test value. Then till distance 49, there are too many commas (308 occurrences against 242.4 expected). This makes 38.9 % of the chi-square test value.

The spacebar

The distances between consecutive spacebars greater than 1 determine the number of words of the length corresponding to this distance minus one. There exists 16985 spacebars. The results of tests are tabulated as follows. Cumulating frequencies of shorter distances, improved in some cases the fit, since bellow it the counts are scattered, and differences can balance themselves.

Table 1 The number of words of different length

Length	Number	Type of distribution, chisquare value
2	45	EX, 0.1713
3	1767	NB, 0, 0.6226 over 10
4	1545	NB, 0.5899
5	1841	NB, 0.7491
6	2473	NB, 0.4966
7	2517	1262, NB, 0.6683
7	2517	1254, NB, 0.2420
8	2179	NB, 0.5295
9	1655	EX, 0, 0.5389 over 35
10	1161	NB, 0.7843
11	857	NB, 0.6745
12	500	NB, 0.1432
13	240	EX, 0.4140
14	105	WE, 0.6906
15	62	LN, 0.3996
16	20	EX, 0.0602
17	0
18	1

The most frequent words are distributed according to the negative binomial distribution. This is also most frequent, it correlates 9 cases. The Weibull distribution is applicable at the distance 14. The exponential distribution at four distances.

Notes to some outstanding results:

1 letter words repeat more often then expected within distances 253-370 (11 occurrences against 6.2 expected). This makes 75.7 % of the chi-square test value.

2 letter words. The negative binomial distribution shape is disturbed by the shortage of short distances till 3 (379 occurrences against 496.1 expected). This makes 45.8 % of the chi-square test value. The surplus of distances 4-6 (442 occurrences against 356.8 expected) makes 31.4 % of the chi-square test value.

3 letter words. The shortage of distances 11-13 (117 occurrences against 111.2 expected) makes 46.3 % of the chi-square test value.

4 letter words. No great deviation from the expected values.

5. letter words repeat immediately less often then expected (323 occurrences against 360.2 expected). This makes 26.7.7 % of the chi-square test value. The shortage of distances 29-30 (4 occurrences against 8.1 expected) makes 14.6 % of the chi-square test value.

6. letter words are most frequent. It was necessary to divide the set before testing in two halves. The halves are different according to the two way sample analysis.

The first half correlation is disturbed by the shortage of distances 18-19 (11 occurrences against 20.8 expected) makes 23.6 % of the chi-square test value. The surplus of distances 23-24 (15 occurrences against 9 expected) makes 19.4 % of the chi-square test value.

The second half correlation is disturbed by the peak of distances 7-8 (157 occurrences against 132 expected) makes 24.3 % of the chi-square test value. The shortage of distances 20-22 (9 occurrences against 17.9 expected) makes 22.6 % of the chi-square test value.

7. letter words repeat less often then expected within distances 28-30 (11 occurrences against 18 expected). This makes 18.1 % of the chi-square test value. Then they repeat more often then expected within distances 35-39 (15 occurrences against 10.1 expected). This makes 15.9 % of the chi-square test value.

8. letter words are correlated poorly due to their shortage within distances 11-14 (171 occurrences against 220.9 expected). This makes 17.7 % of the chi-square test value. The immediately following surplus of distances 15-17 (120 occurrences against 78.4 expected) makes 29.9 % of the chi-square test value.

9. letter words are correlated rather well. They repeat too often within distances 5-9 (286 occurrences against 260.9 expected). This makes 24.9 % of the chi-square test value. The shortage within distances 19-22 (65 occurrences against 80 expected) makes 29.1 % of the chi-square test value.

10. letter words are too many in distances over 89 (16 occurrences against 9 expected). This makes 41.9 % of the chi-square test value.

11. letter words are less often within distances 60-77.1 (60 occurrences against 77.1 expected). This makes 23.7 % of the chi-square test value. The shortage within distances 115-134 (2 occurrences against 7.4 expected) makes 24.9 % of the chi-square test value.

Longer words need not special comments.

Distances between individual letters

The results for all letters are presented in the form of the table, where the frequencies of all symbols are given and the significance of the performed chi-square tests. Then the commentaries to all symbols of the alphabet are given. The values in the square brackets show the corresponding values of the combined lower and upper cases.

Table 2 Survey of results

Notes:

EX = exponential distribution

WE = Weibull distribution

L N = lognormal distribution

NB = negative binomial distribution

Statistic = XX, the chi-square test value

Symbol	Small	Capital	Both cases	Ratio C/B %
a	8378, LN, NB	239, WE, 0.4341	8617, LN, NB	2.77
b	1584, EX, 0.2605	21, no test	1605, EX, 0.9796 over 40	1.31
c	3665, NB	125, WE, 0.5943	3790, NB, EX	3.30
d	2835, EX	38, WE, 0.2942	2873, EX, NB	1.32
e	11367, NB, LN	100, EX, 0.8694	11467, NB, LN	0.87
f	1007, NB, 0.2520	58, LN, 0.6581	1065, NB, 0.8555	5.44
g	1207, NB, 0.2010	21, no test	1228, NB, 0.1261	1.71
h	398, WE, 0.5031	85, WE, 0.6098	483, WE, 0.9393	17.52
i	11097, NB	182, WE, 0.0282	11279, NB	1.61
j	0	0	0	-
k	0	0	0	-
l	2849, NB	121, WE, 0.0016	2970, NB	4.07
m	5775, NB	61, WE	5836, NB	1.04
n	6019, NB	79, WE, 0.8162	6098, NB	1.29
o	5082, NB, EX	7	5089, NB, LN	0.14
p	2723, NB	99, WE, 0.0990	2822, NB	3.51
q	1633, WE, 0.5848	56, EX, 0.4619	1689, EX, 0.9387 over 80	3.32
r	6391, NB	256, LN, 0.3145	6647, NB	3.83
s	6922, NB	153, LN, 0.2388	7075, NB	2.16
t	7615, NB	235, LN, 0.2695	7850, NB	2.99
u	8882, NB	0	8882, NB	0
v	937, EX, 0.1191	63, EX, 0.1892	1000, EX, 0.9726	6.3
w	0	0	0	-
x	462, EX, 0.1239	0	462, EX, 0.1239	0
y	14, no test	0	14, no test	0
z	2	0	2	0

The last column gives ratios of capital letters to all occurrences. Since capital letters are used at the beginning of sentences and of proper names, it can be concluded, that no proper name and no sentence starts with U, in contrast with H, where it makes 17.52 % of all occurrences.

The results of statistical tests can be tabulated:

	Lower case	Upper case	Combined
No test	4	9	3
The negative binomial distribution	14	0	13
The exponential distribution	4	2	6
The Weibull distribution	2	10	2
The lognormal distribution	1	4	1

At the upper case, only 16 letters give results. The Weibull distribution is the most frequent together with the lognormal distribution. At the lower case and at combined cases, the negative binomial distribution is the most frequent, than the exponential distribution, the Weibull distribution, and the lognormal distribution correlates 1 case, only. The chi-square values sometimes are practically zero, and only adjusting the lowest possible value to greater distances by pooling these shorter distances increases the significance of the chi-square tests. Now, the commentaries to the individual letters follow.

The upper case A frequency allowed the separate test. The good fit with the Weibull distribution (the chi-square test value 0.434) is worsened by too few repeating within distances 318-634 (39 occurrences against 48.3 expected) which makes 46.9 % of the chi-square test value.

The distribution of distances between the lower case a can be modeled by the lognormal distribution. There are no doubled aa but then the lower case a repeats too often within short distances.

The set was divided into four parts. The two way sample analysis shows how the parts of the lower case differ:

Part	2	3	4
1	*0.048	*0.010	*0
2		0.554	0.068
3			0.219

Note: The asterisk shows the significant difference between tested parts.

The first fourth differs significantly from all other parts. The difference increases.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Range	Observed	Expected	% of chisquare
1	2-3	308	195.2	32.9
	4-7	510	642.5	14.5
	26-28	71	39.9	12.2
2	2-4	388	305.4	26.6
3	2-5	563	445.3	29.3
	6-13	738	876.2	20.6
4	41-43	34	17	19.2
	over 85	0	13.5	15.1

The distances between both case (a + A) are fitted poorly by lognormal distribution and by the negative binomial distribution. There are no doubled Aa or aa but then the lower case a repeats too often within short distances.

The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.052	*0.002	*0
2		0.304	*0.035
3			0.282

The first fourth differs significantly from all other parts. The difference increases.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Distribution	Range	Observed	Expected	% of chisquare
1	LN	2-3	332	214.2	35.5
		4-5	268	350.1	10.6
		26-28	65	338.5	10.0
2	NB, 0, 0.8994 over 12	1	1	157.9	77.7
3	NB, 0, 0.4487 over 12	1	1	153.6	75.3
		2-13	1345	1478.2	13.5
4	LN	41-43	32	17.1	15.3
		over 85	0	13.3	11.1

Adding A improved the fit.

The distribution of distances between consecutive occurrences of this letter is exponential. There are too few b within distances 179-210 (37 occurrences against 49.8 expected). This contributes 22.3 % of the chi-square test value. At (b + B), this deformation lies within distances 172-202 (29 occurrences against 52.7 expected). This contributes 46.9 % of the chi-square test value. Including B improved the fit.

The distribution of distances of the upper case of this letter is described by the Weibull distribution. There are too few C within short distances till 80 (26 occurrences against 20.9 expected). This contributes 33.5 % of the chi-square test value.

The set of c, as well as [c + C] was divided into two parts.

The distribution of distances of the first part of c is described by the negative binomial distribution (the chi-square test value is 0.5105 over 20, 0.8758 over 40) or by the exponential distribution (the chi-square test value is 0.6806 over 20). According of both distributions, there are too many c within distances 84-90 [86-94] (35 [44] occurrences against 35 [30.4] expected). This contributes 23.6 [13.3] % of the chi-square test value.

The distribution of distances of the second part of c is described by the negative binomial distribution (the chi-square test value is 0.4667 over 40) or by the exponential distribution (the chi-square test value is 0.5343 over 40). There are too many c within distances 13-35 (711 occurrences against 642.2 expected). This contributes 28.0 % of the chi-square test value. Too many c within distances 116-127 (24 occurrences against 15.4 expected) make 18.0 % of the chi-square test value.

Both parts are similar (test value 0.885).

The distribution of distances of the first part of [c + C] is described by the negative binomial distribution (the chi-square test value is 0.4058 over 20. [c + C] repeat less often than expected till distance 9 (405 occurrences against 477.5 expected). This contributes 28.4 % of the chi-square test value. Then there are too many [c + C] within distances 10-34 (884 occurrences against 783.4 expected). This contributes 30.8 % of the chi-square test value. There are too many [c + C] within distances 111-119 (23 occurrences against 13.6 expected). This contributes 15.5 % of the chi-square test value.

The distribution of distances of the second part of [c + C] is described by the exponential distribution (the chi-square test value is 0.3212 over 40). [c + C] repeat less often than expected till distance 9 (422 occurrences against 495.1 expected). This contributes 30.4 % of the chi-square test value. There are too many [c + C] within distances 86-94 (46 occurrences against 29.3 expected). This contributes 15.8 % of the chi-square test value.

Both parts are similar (test value 0.921).

The Weibull distribution of the upper case D needs no commentary.

Here the exponential distribution and the negative binomial are applicable in case of d as well as [d + D].

Both sets were divided into two parts.

The distributions of distances of both parts of d are described by the exponential distribution (the chi-square test value is 0.2141, and 0.3536, respectively).

In the first part, the greatest disturbance is due to too many d within distances 193-217 (12 occurrences against 6.6 expected). This contributes 22.3 % of the chi-square test value.

The distribution of distances of the second part of d has no single great deviation from the expected values.

Both parts are similar (test value 0.758).

The distributions of distances of both parts of [d + D] are described by the exponential distribution (the chi-square test value is 0.2139, and 0.1200 over 10, respectively).

In the first part, the greatest disturbance is due to too many [d + D] within distances 156-167 (15 occurrences against 9 expected). This contributes 21.2 % of the chi-square test value.

In the second part, the greatest disturbance is due to too many [d + D] within distances 61-72 (106 occurrences against 83.7 expected). This contributes 22.0 % of the chi-square test value.

Both parts are similar (test value 0.673).

The distribution of distances between upper case E is exponential. There are too many E within distances 2101-2900 (12 occurrences against 8.3 expected). This contributes 64.6 % of the chi-square test value.

The distribution of distances between the lower case e can be modeled by the negative binomial distribution. There are no doubled ee (or only 1) but then the lower case e repeats too often within short distances.

The set of e distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:

Part	2	3	4	5
1	0.152	*0.003	0.820	0.118
2		0.128	0.230	0.905
3			*.0.007	0.157
4				0.183

The first fifth differs significantly from third one. The third one from the fourth one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Type	Chi-square	Range	Observed	Expected	% of chisquare
1	NB	0.2340 over 20	1	1	225.1	75.0
			2-3	477	385.8	7.2
2	NB	0.6630 over 12	1	1	216.7	74.6
			2-3	500	373.4	14.9
3	LN	0	1	0	17.1	19.9
			16-17	75	122.8	21.6
			18-29	305	229.5	29.8
4	NB	0.4540 over 13	1	1	223.4	71.1
			2-3	531	383.1	18.3
5	LN	0.1030 over 25	1	0	13.4	18.2
			21-23	92	59.2	24.6

The 3. and 5. parts have too long tails to fit with the negative binomial distribution.

The set of [e + E] distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:

Part	2	3	4	5
1	0.197	*0.003	0.784	0.107
2		0.093	0.310	0.746
3			*.0.007	0.178
4				0.181

The first fifth differs significantly from third one. The third one from the fourth one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Type	Chi-square	Range	Observed	Expected	% of chisquare
1	NB	0.1494 over 14	1	2	231.4	75.3
			2-3	489	396	7.2
2	NB	0.6970 over 12	1	2	223.8	74.9
			2-3	513	385	14.5
3	LN	0.0913 over 23	1	0	17.7	18.6
			16-17	77	124.2	18.9
			18-26	271	197	30.2
4	NB	0.4830 over 12	1	1	229.5	71.7
			2-3	544	393.1	18.3
5	LN	0.0.638 over 25	1	0	13.4	19.8
			16-26	344	292.6	33.8

The 3. and 5. parts have too long tails to fit with the negative binomial distribution.

The distribution of F is correlated well with the lognormal distribution.

The distributions of the lover case f and of [f + F] are correlated with the negative binomial distribution. The doubled ff are too many (28 occurrences against 8.5 expected). This contributes 76.6 % of the chi-square test value. [f + F] is correlated n rather well, the chi-square test value is 0.8555.

The distribution of g and of [g + G], is correlated with the negative binomial distribution. g and [g + G] repeat less often than expected till distance 30 (279 [287] occurrences against 318.7 [329.1] expected). This contributes 27.6 [26.7] % of the chi-square test value. There are too few distances 248-278 (16 [15] occurrences against 26.3 [25.9] expected). This contributes 22.3 [22.9 % of the chi-square test value.

The distribution of this letter is correlated with the Weibull distribution. At lover case h, there are too few distances 262-376 and too many distances 724-838 (43 [17] occurrences against 54.7 [11.2] expected). This contributes 33.9 [40.9] % of the chi-square test value. Combining with the capital H improved the fit, it is worsened only due to too many distances 789-937 (12 occurrences against 8.6 expected) which contributes 31.6 % of the chi-square test value.

The distribution of the capital I is correlated with the Weibull distribution. The greatest disturbance is a surplus of counts within distances 235-452 (54 occurrences against 37.1 expected) which contributes 54.4 % of the chi-square test value. Then there are too few distances 670-886 (11 occurrences against 20 expected). This contributes 28.5 % of the chi-square test value.

The negative binomial distribution is applicable in case of i as well as [i + I].

The set of i distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:

Part	2	3	4	5
1	0.609	0.071	*0.001	*0.001
2		0.197	*0.007	*0.007
3			0.161	0.157
4				0.183997

The first fifth differs significantly from two last ones, similarly as the third one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0.0504 over 21	1	32	197.2	70.6
		16-17	132	92.6	8.5
2	0.8128 over 9	1	31	201.2	71.2
		2-9	1260	1075.2	17.3
3	0.4758 over 10	1	31	208.7	66.9
		6-9	578	443.4	19.6
4	0.2455 over 10	1	37	216.6	75.0
		2-12	1554	1359.7	14.6
5	0.1453 over 21	1	45	214.1	65.9
		2-18	1863	1640.3	18.2

The set of [i + I] distances was divided into four parts, only. The two way sample analysis shows how the parts of the upper case differ:

Part	2	3	4
1	0.832	0.077	*0.006
2		0.120	*0.011
3			0.318

The fit worsens consecutively, the last quarter differs significantly from the first one. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0.0640 over 18	1	30	204.6	70.0
		2-9	1237	1089.9	10.1
		16-17	136	92.9	9.5
2	0.4578 over 8	1	30	206.9	73.7
		6-9	1289	1099.7	17.1
3	0.4782 over 8	1	33	216.2	69.3
		6-9	583	453	17.4
4	0.2310 over 10	1	37	221.8	74.5
		2-12	1587	1385.1	15.0

Combining both cases improved the fit.

J and K

These letters are not used in the text.

The occurrences of capital L are correlated poorly by the Weibull distribution.

The frequencies of l and [l + L] are correlated with the negative binomial distribution.

The sets of l and [l + L] distances were divided into two parts. The two way sample analysis shows that the parts of both cases differ significantly.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
l1	0.8949 over 20	1	188	36.2	92.4
		2-12	226	342.2	5.7
l2	0.2431 over 20	1	188	32.3	91.6
		2-12	197	310.4	5.1
[l + L]1	0.8246 over 20	1	190	39	92.7
		2-12	256	367	5.3
[l + L]2	0.1228 over 20	1	186	35.4	88.7
		2-12	211	337.6	6.6

The occurrences of capital M are correlated poorly by the Weibull distribution.

The frequencies of m and [m + M] are correlated with the negative binomial distribution.

The sets of m and [m + M] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:

Part	2	3
1	*0.026	*0.036
2		0.871

The first third differs significantly from two last ones.

The two way sample analysis shows how the parts of the upper case differ:

Part	2	3
1	*0.038	*0.024
2		0.873

The fit worsens consecutively.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
m1	0.9773 over 13	1	15	98.0	71.9
		5-13	581	506.7	11.2
m2	0.7837 over 10	1	21	91.2	65.8
		2-14	963	857.8	20.2
m3	0.9909 over 12	1	24	91.9	66.6
		6-14	608	536.7	13.2
[m + M]1	0.4490 over 9	1	15	100.1	70.7
		6-13	591	515.1	11.1
[m + M]2	0.8589 over 10	1	21	93.7	65.4
		8-14	478	402.2	16.5
[m + M]3	0.9861 over 12	1	24	93.2	66.2
		6-14	615	543.4	12.7

The occurrences of capital N are correlated well by the Weibull distribution.

The frequencies of n and [n + N] are correlated with the negative binomial distribution.

The sets of n and [n + N] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:

Part	2	3
1	*0.046	0.271
2		0.384

The first third differs significantly from second one.

The two way sample analysis shows how the parts of the both cases differ:

Part	2	3
1	*0.038	0.163
2		0.519

The first third differs significantly from second one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
n1	0	1	13	104.4	52.5
		5-25	1185	1007.3	23.1
n2	0.6723 over 25	1	10	99.2	58.0
		6-10	436	348.8	15.8
n3	0.6711 over 15	1	5	101.4	62.3
		6-18	916	759.1	23.2
[n + N]1	0.4490 over 9	1	13	107.5	53.7
		6-25	1198	1025.5	22.0
[n + N]2	0.3575 over 24	1	11	102.1	57.5
		6-10	443	357.1	14.6
		16-30	592	506.6	10.5
[n + N]3	0.7519 over 15	1	5	103.5	64.1
		6-18	925	771.6	22.3

The distributions of o and (o + O) are correlated poorly with the negative binomial distribution. The tails, longer distances between consecutive occurrences, are more frequent than expected in some part. Then some other distribution perform better.

The sets of o and [o + O] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:

Part	2	3
1	*0.002	0.947
2		*.001

The second third differs significantly from the first and third ones.

The two way sample analysis shows how the parts of the upper case differ:

Part	2	3
1	*0.002	0.948
2		*0.002

The second third differs significantly from the first and third ones.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Type	Chi-square	Range	Observed	Expected	% of chisquare
o1	NB	0.9521 over 26	1	2	69.9	78.6
			10-26	666	591.3	12.0
o2	EX	0	1-2	58	149.4	47.4
			8-19	601	487.8	22.3
o3	EX	0.8249 over 20	1	0	68.4	63.9
			6-21	631	531.1	19.5
[o + O]1	NB	0.9567 over 26	1	2	70.2	78.4
			10-26	668	592.4	12.4
[o + O]2	LN	0	30-52	314	238	39.9
			>162	6	26.6	25.0
[o + O]3	NB	0.4789 over 14	1	1	70.2	73.5
			8-14	378	322.8	10.2

The upper case P can be correlated poorly using the Weibull distribution, p and [p +P] are correlated with the negative binomial distribution. The sets of p and [p +P] distances were divided into two parts. The two way sample analysis shows that the parts of the lower case are poorly comparable, the test values 0.112 [0.059].

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
p1	0.0978	1	49	32.3	40.9
	0.7550 over 10	2-15	349	379.6	11.6
p2	0.1903	61-72	65	83.2	17.4
	0.7308 over 9	73-107	173	146.4	22.8
[p + P]1	0.1291	1	49	34.9	28.4
[p + P]2	0.1505	61-72	62	84.8	26.8
	0.5285 over 9	84-107	108	87.3	22.9

The consonant q is used only in connection with u as qu.

The upper case Q correlates using the exponential distribution. The fit is worsened by too few repeating within distances 1668 till 2500 (13 occurrences against 8.3 expected). It makes 71.1 % of the chi-square test value.

The distribution of q is correlated with the exponential distribution or with the Weibull distribution. The fit is almost equal:

	over 30	over 80
The exponential distribution	0.0972	0.9160
The Weibull distribution	0.1860	0.8980

There are too few repeating within distances 193 till 209 (17 occurrences against 24.4 expected). It makes 13.8 % of the chi-square test value. Immediately, too many repeating follows within distances 210-226 (26 occurrences against 18.9 expected). This slight distortion makes 16.7 of the chi-square test value.

The distribution of [q + Q] is correlated with the exponential distribution. There are no Qq (0 occurrence against 23.8 expected). This contributes 53.1 % of the chi-square test value. Another 23.1 % makes the surplus of distances 53-87 (364 occurrences against 309.3 expected).

The upper case R is correlated with the lognormal distribution. The fit is worsened by too many repeating within distances 1233 till 1540 (10 occurrences against 6.5 expected). It makes 41.1 % of the chi-square test value.

The distribution of r is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.280	0.081	*0.010
2		0.515	0.149
3			0.432

The fit worsens consecutively, the last quarter differs significantly from the first one. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0.2677 over 31	1-6	303	433.9	50.8
		7-31	1014	850.5	33.5
2	0.5570 over 28	1-6	327	449.8	51.2
		7-26	920	768.8	33.7
3	0.2855 over 30	1-6	327	456.5	50.3
		7-26	932	770.3	35.2
4	0.2376 over 27	1-6	338	464.7	53.9
		7-26	929	771.6	34.6

The distribution of [r + R] is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.450	0.233	0.082
2		0.671	0.335
3			0.588

The combining of both cases decreased the differences of parts. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0.3133 over 23	1-6	333	472.8	52.2
		7-31	1064	902.2	32.6
2	0.1400 over 20	1-6	354	485.3	51.0
		7-26	1071	906.6	33.7
3	0.5028 over 26	1-6	347	488.6	52.7
		7-26	983	805.4	28.9
4	0	1-6	367	495.7	54.9
		7-26	970	808.2	34.1

The upper case S is correlated with the lognormal distribution.

The distribution of the lower case s an d [s + S] is described poorly by the negative binomial distribution.

The set of s was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.555	0.995	0.388
2		0.565	0.143
3			0.391

The first and the third parts are very similar. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0.0061	1	133	100.2	29.4
	0.9167 over 10	2-6	282	345.8	32.2
2	0.0199	2-5	296	340.6	16.7
	0.4318 over 9	6-9	313	269.4	20.1
3	0	1	151	101.0	27.1
	0.0791 over 24	2-5	297	348.5	9.3
4	0	1	160	103.7	37.0
	0.1611 over 10	2-5	276	356.3	21.9

The distribution of [r + R] is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.285	0.888	0.450
2		0.233	0.069
3			0.547

The first and the third parts are similar. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0.0079	1	134	106.9	19.3
	0.9331 over 10	2-5	298	367.4	36.8
		6-14	616	560	15.7
2	0.0075	2-5	313	356.7	13.9
	0.3233 over 10	6-9	322	281.4	15.2
		39-47	54	78.8	18.9
3	0.1916 over 23	1	156	108.1	26.9
		2-5	309	371.5	13.4
		6-13	587	516.9	13.4
4	0	1	148	103.1	31.0
		2-5	283	353.1	22.1

The distribution of the capital T has the lognormal shape. The chi-square test value of the tail over 400 is 0.8053. There are too many distances 167-392 (60 occurrences against 45.8 expected). This contributes 49.9 % of the chi-square test value.

The distribution of the lower case t as well as the both [t + T] is the negative binomial one.

The set of t was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.930	0.072	*0.006
2		0.087	*0.008
3			0.352

The first and the second parts are very similar. The similarity deteriorates consecutively.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0	1-4	267	458	59.5
	0.2735 over 21	5-14	918	721	29.5
2	0	1-4	289	455.3	62.2
	0.1605 over 17	5-20	1159	963.9	25.5
3	0	1-4	259	435.5	63.9
	0.1849 over 28	5-20	1145	947.5	22.6
4	0	1-5	332	515.3	57.6
		6-26	1242	1016.8	28.6

The set of [t + T] was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.815	0.121	0.052
2		0.190	0.088
3			0.685

The differences between parts increases step by step. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0	1-4	293	483.5	61.2
	0.2478 over 21	5-14	945	752.7	28.0
2	0	1-4	313	479.6	63.0
	0.1314 over 15	5-17	1072	889.1	24.1
3	0	1-4	287	463.1	64.6
	0.0735 over 16	6-20	1188	989	21.3
4	0	1-5	356	548.1	57.6
	0.1378 over 19	6-22	1182	940.1	30.2

There are no capital U, thus both sets, u and [u + U], are identical. The set of [u + U] was divided into five parts. The two way sample analysis shows how the parts differ:

Part	2	3	4	5
1	*0.036	*0.028	*0.238	*0.002
2		0.923	0.851	0.269
3			0.927	0.311
4				0.364

The first part differs significantly from other ones. The last fifth is very dissimilar. The similarity deteriorates consecutively.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0	1-3	189	348.6	62.0
	0.4691 over 10	4-9	628	504.4	20.6
2	0	1-3	213	371.2	62.9
		4-9	654	526.1	16.8
3	0	1-3	192	371.7	62.9
	0.4208 over 11	4-10	768	592.4	26.5
4	0	1	6	134.2	70.5
	0.2392 over 10	6-26	606	787.8	18.1
5	0	1	9	138.3	72.8
		2-13	1121	1021	14.5

Since a considerable part of u is used in qu, it was possible to eliminate this connection and to analyze only remaining uses.

The set of these u was divided into four parts. The two way sample analysis shows how the parts differ:

Part	2	3	4
1	0.056	0.397	0.169
2		0.285	0.623
3			0.582

The first part does not differ significantly from other ones. The results are poorly comparable with the results with all u, since the number of parts is different

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part	Chi-square	Range	Observed	Expected	% of chisquare
1	0	1	8	104.8	72.2
	0.3419 over 20	6-22	960	837.1	15.3
2	0	1-3	213	371.2	62.9
	0.85.66 over 11	4-12	773	648.8	20.0
3	0	1	6	107.1	67.1
	0.1437 over 18	6-19	899	749	21.7
4	0	1	11	109.4	74.8
	0.3193 over 15	7-18	745	642.9	16.0

The exponential distribution is applicable for all three sets.

The capital Vs occur too often till distance 527 (23 occurrences against 15.7 expected). This contributes 56.2 % of the chi-square test value.

The fit over 20 is almost perfect, the chi-square test value is 0.9801 over 20 at v set, 0.9726 at [v + V] set. None vv against 7.4 occurrences expected contribute 34.1 % of the chi-square test value. Too few v in the range 404-440 (2 occurrences against 9.7 expected) make 28.2 of the chi-square test value.

The upper case needs no commentary.

No occurrence.

The exponential distribution gives a good fit. Too few x till distance 22 (25 occurrences against 38.3 expected) contribute 32.9 % of the chi-square test value.

Y and Z

Too few occurrences.

Discussion

The insufficient capacity of the used software for long lists forced splitting of too frequent signs. The splitting showed that the use of words change within the studied book, different words are used at its end than at its beginning.

Some distributions of distances between consonants are highly regular, especially their tails, if the low distances inside words are pooled. They are described with a different precision with four distributions: negative binomial, exponential, Weibull, and lognormal. Sometimes it is rather difficult to decide which distribution is the better one for fitting.

Compared with the English and German text studied before, Latin is exploiting only 21 letters, j, k, and w are not used, y and z occur only in few words of foreign origin.

The most frequent fit was obtained with the negative binomial distribution. This distribution can be expected if no biases occur. At upper case, the Weibull distribution correlates long distances between scarce occurrences.