Distance Analysis of Latin Texts. Titi Livi Ab Urbe Condita Liber 1

Milan Kunz, April, 2003

Abstract

Distances between identical symbols and using of punctuation marks in "Titi Livi Ab Urbe Condita. Liber 1" are described with a different precision with five distributions: Exponential, Erlang, Weibull, lognormal and negative binomial. The correlations are sometimes highly significant.

Introduction

This is a continuation of study of statistical properties of distances between identical symbols in different languages. The same technique is used as before, see other papers in this section.

Results

Punctuation marks

The distances between point marks determine the length of sentences. There are 785 points.

Their distribution is of Erlang type, a = 2, b = 0.013208. Chisquare = 12.3399, significance level 0.5000 with 13 degree of freedom. There exists a shortage of short sentences till distance 34 (46 occurrences against 59.2 expected). This alone makes 23.9 % of the chi-square test value. Another shortage of points between distances 299-364 (26 occurrences against 38.2 expected). This makes another 31.8 % of the chi-square test value.

The other punctuation mark, the semicolon (303 occurrences), is modeled poorly (the chi-square test value 0.0353) by the exponential distribution. There exists a shortage of semicolons between distances 280-400 (27 occurrences against 39.1 expected). This makes 19.4 % of the chi-square test value. The surplus of semicolons between distances 880-1000 (16 occurrences against 8.4 expected) makes 35.4 % of the chi-square test value.

The distances between consecutive double points are dispersed according to the Weibull distribution (the chi-square test value 0.4458). The surplus of double points between distances 500-900 (22 occurrences against 15.9 expected) makes 49.2 % of the chi-square test value.

The distribution of distances between consecutive commas is exponential, the tail over 40 gives almost perfect fit with the significance level 0.9796. There are too few commas till distance 25 (280 occurrences against 333.7 expected). This makes 43.6 % of the chi-square test value. Then till distance 49, there are too many commas (308 occurrences against 242.4 expected). This makes 38.9 % of the chi-square test value.

The spacebar

The distances between consecutive spacebars greater than 1 determine the number of words of the length corresponding to this distance minus one. There exists 16985 spacebars. The results of tests are tabulated as follows. Cumulating frequencies of shorter distances, improved in some cases the fit, since bellow it the counts are scattered, and differences can balance themselves.

Table 1 The number of words of different length

Length

Number

Type of distribution, chisquare value

2

45

EX, 0.1713

3

1767

NB, 0, 0.6226 over 10

4

1545

NB, 0.5899

5

1841

NB, 0.7491

6

2473

NB, 0.4966

7

2517

1262, NB, 0.6683

1254, NB, 0.2420

8

2179

NB, 0.5295

9

1655

EX, 0, 0.5389 over 35

10

1161

NB, 0.7843

11

857

NB, 0.6745

12

500

NB, 0.1432

13

240

EX, 0.4140

14

105

WE, 0.6906

15

62

LN, 0.3996

16

20

EX, 0.0602

17

0

 

18

1

 

 

The most frequent words are distributed according to the negative binomial distribution. This is also most frequent, it correlates 9 cases. The Weibull distribution is applicable at the distance 14. The exponential distribution at four distances.

Notes to some outstanding results:

1 letter words repeat more often then expected within distances 253-370 (11 occurrences against 6.2 expected). This makes 75.7 % of the chi-square test value.

2 letter words. The negative binomial distribution shape is disturbed by the shortage of short distances till 3 (379 occurrences against 496.1 expected). This makes 45.8 % of the chi-square test value. The surplus of distances 4-6 (442 occurrences against 356.8 expected) makes 31.4 % of the chi-square test value.

3 letter words. The shortage of distances 11-13 (117 occurrences against 111.2 expected) makes 46.3 % of the chi-square test value.

4 letter words. No great deviation from the expected values.

5. letter words repeat immediately less often then expected (323 occurrences against 360.2 expected). This makes 26.7.7 % of the chi-square test value. The shortage of distances 29-30 (4 occurrences against 8.1 expected) makes 14.6 % of the chi-square test value.

6. letter words are most frequent. It was necessary to divide the set before testing in two halves. The halves are different according to the two way sample analysis.

The first half correlation is disturbed by the shortage of distances 18-19 (11 occurrences against 20.8 expected) makes 23.6 % of the chi-square test value. The surplus of distances 23-24 (15 occurrences against 9 expected) makes 19.4 % of the chi-square test value.

The second half correlation is disturbed by the peak of distances 7-8 (157 occurrences against 132 expected) makes 24.3 % of the chi-square test value. The shortage of distances 20-22 (9 occurrences against 17.9 expected) makes 22.6 % of the chi-square test value.

7. letter words repeat less often then expected within distances 28-30 (11 occurrences against 18 expected). This makes 18.1 % of the chi-square test value. Then they repeat more often then expected within distances 35-39 (15 occurrences against 10.1 expected). This makes 15.9 % of the chi-square test value.

8. letter words are correlated poorly due to their shortage within distances 11-14 (171 occurrences against 220.9 expected). This makes 17.7 % of the chi-square test value. The immediately following surplus of distances 15-17 (120 occurrences against 78.4 expected) makes 29.9 % of the chi-square test value.

9. letter words are correlated rather well. They repeat too often within distances 5-9 (286 occurrences against 260.9 expected). This makes 24.9 % of the chi-square test value. The shortage within distances 19-22 (65 occurrences against 80 expected) makes 29.1 % of the chi-square test value.

10. letter words are too many in distances over 89 (16 occurrences against 9 expected). This makes 41.9 % of the chi-square test value.

11. letter words are less often within distances 60-77.1 (60 occurrences against 77.1 expected). This makes 23.7 % of the chi-square test value. The shortage within distances 115-134 (2 occurrences against 7.4 expected) makes 24.9 % of the chi-square test value.

Longer words need not special comments.

Distances between individual letters

The results for all letters are presented in the form of the table, where the frequencies of all symbols are given and the significance of the performed chi-square tests. Then the commentaries to all symbols of the alphabet are given. The values in the square brackets show the corresponding values of the combined lower and upper cases.

Table 2 Survey of results

Notes:

EX = exponential distribution

WE = Weibull distribution

L N = lognormal distribution

NB = negative binomial distribution

Statistic = XX, the chi-square test value

Symbol

Small

Capital
Both cases

Ratio C/B %

a

8378, LN, NB

239, WE, 0.4341
8617, LN, NB

2.77

b

1584, EX, 0.2605

21, no test
1605, EX, 0.9796 over 40

1.31

c

3665, NB

125, WE, 0.5943
3790, NB, EX

3.30

d

2835, EX

38, WE, 0.2942
2873, EX, NB

1.32

e

11367, NB, LN

100, EX, 0.8694
11467, NB, LN

0.87

f

1007, NB, 0.2520

58, LN, 0.6581
1065, NB, 0.8555

5.44

g

1207, NB, 0.2010

21, no test
1228, NB, 0.1261

1.71

h

398, WE, 0.5031

85, WE, 0.6098
483, WE, 0.9393

17.52

i

11097, NB

182, WE, 0.0282
11279, NB

1.61

j

0

0

0

-

k

0

0

0

-

l

2849, NB

121, WE, 0.0016

2970, NB

4.07

m

5775, NB

61, WE

5836, NB

1.04

n

6019, NB

79, WE, 0.8162

6098, NB

1.29

o

5082, NB, EX

7

5089, NB, LN

0.14

p

2723, NB

99, WE, 0.0990

2822, NB

3.51

q

1633, WE, 0.5848

56, EX, 0.4619

1689, EX, 0.9387 over 80

3.32

r

6391, NB

256, LN, 0.3145

6647, NB

3.83

s

6922, NB

153, LN, 0.2388

7075, NB

2.16

t

7615, NB

235, LN, 0.2695

7850, NB

2.99

u

8882, NB

0

8882, NB

0

v

937, EX, 0.1191

63, EX, 0.1892

1000, EX, 0.9726

6.3

w

0

0

0

-

x

462, EX, 0.1239

0

462, EX, 0.1239

0

y

14, no test

0

14, no test

0

z

2

0

2

0

The last column gives ratios of capital letters to all occurrences. Since capital letters are used at the beginning of sentences and of proper names, it can be concluded, that no proper name and no sentence starts with U, in contrast with H, where it makes 17.52 % of all occurrences.

The results of statistical tests can be tabulated:

 

Lower case

Upper case

Combined

No test

4

9

3

The negative binomial distribution

14

0

13

The exponential distribution

4

2

6

The Weibull distribution

2

10

2

The lognormal distribution

1

4

1

At the upper case, only 16 letters give results. The Weibull distribution is the most frequent together with the lognormal distribution. At the lower case and at combined cases, the negative binomial distribution is the most frequent, than the exponential distribution, the Weibull distribution, and the lognormal distribution correlates 1 case, only. The chi-square values sometimes are practically zero, and only adjusting the lowest possible value to greater distances by pooling these shorter distances increases the significance of the chi-square tests. Now, the commentaries to the individual letters follow.

A

The upper case A frequency allowed the separate test. The good fit with the Weibull distribution (the chi-square test value 0.434) is worsened by too few repeating within distances 318-634 (39 occurrences against 48.3 expected) which makes 46.9 % of the chi-square test value.

The distribution of distances between the lower case a can be modeled by the lognormal distribution. There are no doubled aa but then the lower case a repeats too often within short distances.

The set was divided into four parts. The two way sample analysis shows how the parts of the lower case differ:

Part

2

3

4

1

*0.048

*0.010

*0

2

 

0.554

0.068

3

  

0.219

Note: The asterisk shows the significant difference between tested parts.

The first fourth differs significantly from all other parts. The difference increases.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Range

Observed

Expected

% of chisquare

1

2-3

308

195.2

32.9

 

4-7

510

642.5

14.5

 

26-28

71

39.9

12.2

2

2-4

388

305.4

26.6

3

2-5

563

445.3

29.3

 

6-13

738

876.2

20.6

4

41-43

34

17

19.2

 

over 85

0

13.5

15.1

The distances between both case (a + A) are fitted poorly by lognormal distribution and by the negative binomial distribution. There are no doubled Aa or aa but then the lower case a repeats too often within short distances.

The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.052

*0.002

*0

2

 

0.304

*0.035

3

  

0.282

The first fourth differs significantly from all other parts. The difference increases.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Distribution

Range

Observed

Expected

% of chisquare

1

LN

2-3

332

214.2

35.5

  

4-5

268

350.1

10.6

  

26-28

65

338.5

10.0

2

NB, 0, 0.8994 over 12

1

1

157.9

77.7

3

NB, 0, 0.4487 over 12

1

1

153.6

75.3

  

2-13

1345

1478.2

13.5

4

LN

41-43

32

17.1

15.3

  

over 85

0

13.3

11.1

Adding A improved the fit.

B

The distribution of distances between consecutive occurrences of this letter is exponential. There are too few b within distances 179-210 (37 occurrences against 49.8 expected). This contributes 22.3 % of the chi-square test value. At (b + B), this deformation lies within distances 172-202 (29 occurrences against 52.7 expected). This contributes 46.9 % of the chi-square test value. Including B improved the fit.

C

The distribution of distances of the upper case of this letter is described by the Weibull distribution. There are too few C within short distances till 80 (26 occurrences against 20.9 expected). This contributes 33.5 % of the chi-square test value.

The set of c, as well as [c + C] was divided into two parts.

The distribution of distances of the first part of c is described by the negative binomial distribution (the chi-square test value is 0.5105 over 20, 0.8758 over 40) or by the exponential distribution (the chi-square test value is 0.6806 over 20). According of both distributions, there are too many c within distances 84-90 [86-94] (35 [44] occurrences against 35 [30.4] expected). This contributes 23.6 [13.3] % of the chi-square test value.

The distribution of distances of the second part of c is described by the negative binomial distribution (the chi-square test value is 0.4667 over 40) or by the exponential distribution (the chi-square test value is 0.5343 over 40). There are too many c within distances 13-35 (711 occurrences against 642.2 expected). This contributes 28.0 % of the chi-square test value. Too many c within distances 116-127 (24 occurrences against 15.4 expected) make 18.0 % of the chi-square test value.

Both parts are similar (test value 0.885).

The distribution of distances of the first part of [c + C] is described by the negative binomial distribution (the chi-square test value is 0.4058 over 20. [c + C] repeat less often than expected till distance 9 (405 occurrences against 477.5 expected). This contributes 28.4 % of the chi-square test value. Then there are too many [c + C] within distances 10-34 (884 occurrences against 783.4 expected). This contributes 30.8 % of the chi-square test value. There are too many [c + C] within distances 111-119 (23 occurrences against 13.6 expected). This contributes 15.5 % of the chi-square test value.

The distribution of distances of the second part of [c + C] is described by the exponential distribution (the chi-square test value is 0.3212 over 40). [c + C] repeat less often than expected till distance 9 (422 occurrences against 495.1 expected). This contributes 30.4 % of the chi-square test value. There are too many [c + C] within distances 86-94 (46 occurrences against 29.3 expected). This contributes 15.8 % of the chi-square test value.

Both parts are similar (test value 0.921).

D

The Weibull distribution of the upper case D needs no commentary.

Here the exponential distribution and the negative binomial are applicable in case of d as well as [d + D].

Both sets were divided into two parts.

The distributions of distances of both parts of d are described by the exponential distribution (the chi-square test value is 0.2141, and 0.3536, respectively).

In the first part, the greatest disturbance is due to too many d within distances 193-217 (12 occurrences against 6.6 expected). This contributes 22.3 % of the chi-square test value.

The distribution of distances of the second part of d has no single great deviation from the expected values.

Both parts are similar (test value 0.758).

The distributions of distances of both parts of [d + D] are described by the exponential distribution (the chi-square test value is 0.2139, and 0.1200 over 10, respectively).

In the first part, the greatest disturbance is due to too many [d + D] within distances 156-167 (15 occurrences against 9 expected). This contributes 21.2 % of the chi-square test value.

In the second part, the greatest disturbance is due to too many [d + D] within distances 61-72 (106 occurrences against 83.7 expected). This contributes 22.0 % of the chi-square test value.

Both parts are similar (test value 0.673).

E

The distribution of distances between upper case E is exponential. There are too many E within distances 2101-2900 (12 occurrences against 8.3 expected). This contributes 64.6 % of the chi-square test value.

The distribution of distances between the lower case e can be modeled by the negative binomial distribution. There are no doubled ee (or only 1) but then the lower case e repeats too often within short distances.

The set of e distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:

Part

2

3

4

5

1

0.152

*0.003

0.820

0.118

2

 

0.128

0.230

0.905

3

  

*.0.007

0.157

4

   

0.183

The first fifth differs significantly from third one. The third one from the fourth one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Type

Chi-square

Range

Observed

Expected

% of chisquare

1

NB

0.2340 over 20

1

1

225.1

75.0

   

2-3

477

385.8

7.2

2

NB

0.6630 over 12

1

1

216.7

74.6

   

2-3

500

373.4

14.9

3

LN

0

1

0

17.1

19.9

   

16-17

75

122.8

21.6

   

18-29

305

229.5

29.8

4

NB

0.4540 over 13

1

1

223.4

71.1

   

2-3

531

383.1

18.3

5

LN

0.1030 over 25

1

0

13.4

18.2

   

21-23

92

59.2

24.6

The 3. and 5. parts have too long tails to fit with the negative binomial distribution.

The set of [e + E] distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:

Part

2

3

4

5

1

0.197

*0.003

0.784

0.107

2

 

0.093

0.310

0.746

3

  

*.0.007

0.178

4

   

0.181

The first fifth differs significantly from third one. The third one from the fourth one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Type

Chi-square

Range

Observed

Expected

% of chisquare

1

NB

0.1494 over 14

1

2

231.4

75.3

   

2-3

489

396

7.2

2

NB

0.6970 over 12

1

2

223.8

74.9

   

2-3

513

385

14.5

3

LN

0.0913 over 23

1

0

17.7

18.6

   

16-17

77

124.2

18.9

   

18-26

271

197

30.2

4

NB

0.4830 over 12

1

1

229.5

71.7

   

2-3

544

393.1

18.3

5

LN

0.0.638 over 25

1

0

13.4

19.8

   

16-26

344

292.6

33.8

The 3. and 5. parts have too long tails to fit with the negative binomial distribution.

F

The distribution of F is correlated well with the lognormal distribution.

The distributions of the lover case f and of [f + F] are correlated with the negative binomial distribution. The doubled ff are too many (28 occurrences against 8.5 expected). This contributes 76.6 % of the chi-square test value. [f + F] is correlated n rather well, the chi-square test value is 0.8555.

G

The distribution of g and of [g + G], is correlated with the negative binomial distribution. g and [g + G] repeat less often than expected till distance 30 (279 [287] occurrences against 318.7 [329.1] expected). This contributes 27.6 [26.7] % of the chi-square test value. There are too few distances 248-278 (16 [15] occurrences against 26.3 [25.9] expected). This contributes 22.3 [22.9 % of the chi-square test value.

H

The distribution of this letter is correlated with the Weibull distribution. At lover case h, there are too few distances 262-376 and too many distances 724-838 (43 [17] occurrences against 54.7 [11.2] expected). This contributes 33.9 [40.9] % of the chi-square test value. Combining with the capital H improved the fit, it is worsened only due to too many distances 789-937 (12 occurrences against 8.6 expected) which contributes 31.6 % of the chi-square test value.

I

The distribution of the capital I is correlated with the Weibull distribution. The greatest disturbance is a surplus of counts within distances 235-452 (54 occurrences against 37.1 expected) which contributes 54.4 % of the chi-square test value. Then there are too few distances 670-886 (11 occurrences against 20 expected). This contributes 28.5 % of the chi-square test value.

The negative binomial distribution is applicable in case of i as well as [i + I].

The set of i distances was divided into five parts. The two way sample analysis shows how the parts of the lower case differ:

Part

2

3

4

5

1

0.609

0.071

*0.001

*0.001

2

 

0.197

*0.007

*0.007

3

  

0.161

0.157

4

   

0.183997

The first fifth differs significantly from two last ones, similarly as the third one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0.0504 over 21

1

32

197.2

70.6

  

16-17

132

92.6

8.5

2

0.8128 over 9

1

31

201.2

71.2

  

2-9

1260

1075.2

17.3

3

0.4758 over 10

1

31

208.7

66.9

  

6-9

578

443.4

19.6

4

0.2455 over 10

1

37

216.6

75.0

  

2-12

1554

1359.7

14.6

5

0.1453 over 21

1

45

214.1

65.9

  

2-18

1863

1640.3

18.2

The set of [i + I] distances was divided into four parts, only. The two way sample analysis shows how the parts of the upper case differ:

Part

2

3

4

1

0.832

0.077

*0.006

2

 

0.120

*0.011

3

  

0.318

The fit worsens consecutively, the last quarter differs significantly from the first one. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0.0640 over 18

1

30

204.6

70.0

  

2-9

1237

1089.9

10.1

  

16-17

136
92.9
9.5

2

0.4578 over 8

1

30
206.9
73.7
  

6-9

1289

1099.7

17.1

3

0.4782 over 8

1

33

216.2

69.3

  

6-9

583

453

17.4

4

0.2310 over 10

1

37

221.8

74.5

  

2-12

1587

1385.1

15.0

Combining both cases improved the fit.

J and K

These letters are not used in the text.

L

The occurrences of capital L are correlated poorly by the Weibull distribution.

The frequencies of l and [l + L] are correlated with the negative binomial distribution.

The sets of l and [l + L] distances were divided into two parts. The two way sample analysis shows that the parts of both cases differ significantly.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

l1

0.8949 over 20

1

188

36.2

92.4

  

2-12

226

342.2

5.7

l2

0.2431 over 20

1

188

32.3

91.6

  

2-12

197

310.4

5.1

[l + L]1

0.8246 over 20

1

190

39

92.7

  

2-12

256

367

5.3

[l + L]2

0.1228 over 20

1

186

35.4

88.7

  

2-12

211

337.6

6.6

M

The occurrences of capital M are correlated poorly by the Weibull distribution.

The frequencies of m and [m + M] are correlated with the negative binomial distribution.

The sets of m and [m + M] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:

Part

2

3

1

*0.026

*0.036

2

 

0.871

The first third differs significantly from two last ones.

The two way sample analysis shows how the parts of the upper case differ:

Part

2

3

1

*0.038

*0.024

2

 

0.873

The fit worsens consecutively.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

m1

0.9773 over 13

1

15

98.0

71.9

  

5-13

581

506.7

11.2

m2

0.7837 over 10

1

21

91.2

65.8

  

2-14

963

857.8

20.2

m3

0.9909 over 12

1

24

91.9

66.6

  

6-14

608

536.7

13.2

[m + M]1

0.4490 over 9

1

15

100.1

70.7

  

6-13

591

515.1

11.1

[m + M]2

0.8589 over 10

1

21

93.7

65.4

  

8-14

478

402.2

16.5

[m + M]3

0.9861 over 12

1

24

93.2

66.2

  

6-14

615

543.4

12.7

N

The occurrences of capital N are correlated well by the Weibull distribution.

The frequencies of n and [n + N] are correlated with the negative binomial distribution.

The sets of n and [n + N] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:

Part

2

3

1

*0.046

0.271

2

 

0.384

The first third differs significantly from second one.

The two way sample analysis shows how the parts of the both cases differ:

Part

2

3

1

*0.038

0.163

2

 

0.519

The first third differs significantly from second one.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

n1

0

1

13

104.4

52.5

  

5-25

1185

1007.3

23.1

n2

0.6723 over 25

1

10

99.2

58.0

  

6-10

436

348.8

15.8

n3

0.6711 over 15

1

5

101.4

62.3

  

6-18

916

759.1

23.2

[n + N]1

0.4490 over 9

1

13

107.5

53.7

  

6-25

1198

1025.5

22.0

[n + N]2

0.3575 over 24

1
11
102.1

57.5

  

6-10

443

357.1

14.6

  

16-30

592

506.6

10.5

[n + N]3

0.7519 over 15

1

5

103.5

64.1

  

6-18

925

771.6

22.3

O

The distributions of o and (o + O) are correlated poorly with the negative binomial distribution. The tails, longer distances between consecutive occurrences, are more frequent than expected in some part. Then some other distribution perform better.

The sets of o and [o + O] distances were divided into three parts. The two way sample analysis shows how the parts of the lower case differ:

Part

2

3

1

*0.002

0.947

2

 

*.001

The second third differs significantly from the first and third ones.

The two way sample analysis shows how the parts of the upper case differ:

Part

2

3

1

*0.002

0.948

2

 

*0.002

The second third differs significantly from the first and third ones.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Type

Chi-square

Range

Observed

Expected

% of chisquare

o1

NB

0.9521 over 26

1

2

69.9

78.6

   

10-26

666

591.3

12.0

o2

EX

0

1-2

58

149.4

47.4

   

8-19

601

487.8

22.3

o3

EX

0.8249 over 20

1

0

68.4

63.9

   

6-21

631

531.1

19.5

[o + O]1

NB

0.9567 over 26

1

2

70.2

78.4

   

10-26

668

592.4

12.4

[o + O]2

LN

0

30-52

314

238

39.9

   

>162

6

26.6

25.0

[o + O]3

NB

0.4789 over 14

1

1

70.2

73.5

   

8-14

378

322.8

10.2

P

The upper case P can be correlated poorly using the Weibull distribution, p and [p +P] are correlated with the negative binomial distribution. The sets of p and [p +P] distances were divided into two parts. The two way sample analysis shows that the parts of the lower case are poorly comparable, the test values 0.112 [0.059].

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

p1

0.0978

1

49

32.3

40.9

 

0.7550 over 10

2-15

349

379.6

11.6

p2

0.1903

61-72

65

83.2

17.4

 

0.7308 over 9

73-107

173

146.4

22.8

[p + P]1

0.1291

1

49

34.9

28.4

[p + P]2

0.1505

61-72

62

84.8

26.8

 

0.5285 over 9

84-107

108

87.3

22.9

Q

The consonant q is used only in connection with u as qu.

The upper case Q correlates using the exponential distribution. The fit is worsened by too few repeating within distances 1668 till 2500 (13 occurrences against 8.3 expected). It makes 71.1 % of the chi-square test value.

The distribution of q is correlated with the exponential distribution or with the Weibull distribution. The fit is almost equal:

over 30

over 80

The exponential distribution

0.0972

0.9160

The Weibull distribution

0.1860

0.8980

There are too few repeating within distances 193 till 209 (17 occurrences against 24.4 expected). It makes 13.8 % of the chi-square test value. Immediately, too many repeating follows within distances 210-226 (26 occurrences against 18.9 expected). This slight distortion makes 16.7 of the chi-square test value.

The distribution of [q + Q] is correlated with the exponential distribution. There are no Qq (0 occurrence against 23.8 expected). This contributes 53.1 % of the chi-square test value. Another 23.1 % makes the surplus of distances 53-87 (364 occurrences against 309.3 expected).

R

The upper case R is correlated with the lognormal distribution. The fit is worsened by too many repeating within distances 1233 till 1540 (10 occurrences against 6.5 expected). It makes 41.1 % of the chi-square test value.

The distribution of r is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.280

0.081

*0.010

2

 

0.515

0.149

3

  

0.432

The fit worsens consecutively, the last quarter differs significantly from the first one. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0.2677 over 31

1-6

303

433.9

50.8

  

7-31

1014

850.5

33.5

2

0.5570 over 28

1-6

327
449.8
51.2
  

7-26

920

768.8

33.7

3

0.2855 over 30

1-6

327

456.5

50.3

  

7-26

932

770.3

35.2

4

0.2376 over 27

1-6

338

464.7

53.9

  

7-26

929

771.6

34.6

The distribution of [r + R] is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.450

0.233

0.082

2

 

0.671

0.335

3

  

0.588

The combining of both cases decreased the differences of parts. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0.3133 over 23

1-6

333

472.8

52.2

  

7-31

1064

902.2
32.6

2

0.1400 over 20

1-6

354
485.3
51.0
  

7-26

1071

906.6

33.7

3

0.5028 over 26

1-6

347

488.6

52.7

  

7-26

983

805.4

28.9

4

0

1-6

367

495.7

54.9

  

7-26

970

808.2

34.1

S

The upper case S is correlated with the lognormal distribution.

The distribution of the lower case s an d [s + S] is described poorly by the negative binomial distribution.

The set of s was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.555

0.995

0.388

2

 

0.565

0.143

3

  

0.391

The first and the third parts are very similar. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0.0061

1

133

100.2

29.4

 

0.9167 over 10

2-6

282

345.8

32.2

2

0.0199

2-5

296
340.6
16.7
 

0.4318 over 9

6-9

313

269.4

20.1

3

0

1

151

101.0

27.1

 

0.0791 over 24

2-5

297

348.5

9.3

4

0

1

160

103.7

37.0

 

0.1611 over 10

2-5

276

356.3

21.9

The distribution of [r + R] is correlated with the negative binomial distribution. The set was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.285

0.888

0.450

2

 

0.233

0.069

3

  

0.547

The first and the third parts are similar. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0.0079

1

134

106.9

19.3

 

0.9331 over 10

2-5

298

367.4
36.8
  

6-14

616

560

15.7

2

0.0075

2-5

313

356.7

13.9

 

0.3233 over 10

6-9

322

281.4
15.2
  

39-47

54
78.8
18.9

3

0.1916 over 23

1

156
108.1
26.9
  

2-5

309

371.5

13.4

  

6-13

587

516.9

13.4

4

0

1

148

103.1

31.0

  

2-5

283

353.1

22.1

T

The distribution of the capital T has the lognormal shape. The chi-square test value of the tail over 400 is 0.8053. There are too many distances 167-392 (60 occurrences against 45.8 expected). This contributes 49.9 % of the chi-square test value.

The distribution of the lower case t as well as the both [t + T] is the negative binomial one.

The set of t was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.930

0.072

*0.006

2

 

0.087

*0.008

3

  

0.352

The first and the second parts are very similar. The similarity deteriorates consecutively.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0

1-4

267

458

59.5

 

0.2735 over 21

5-14

918

721

29.5

2

0

1-4

289
455.3
62.2
 

0.1605 over 17

5-20

1159

963.9

25.5

3

0

1-4

259

435.5

63.9

 

0.1849 over 28

5-20

1145

947.5

22.6

4

0

1-5

332

515.3

57.6

  

6-26

1242

1016.8

28.6

The set of [t + T] was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.815

0.121

0.052

2

 

0.190

0.088

3

  

0.685

The differences between parts increases step by step. The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0

1-4

293

483.5

61.2

 

0.2478 over 21

5-14

945

752.7
28.0

2

0

1-4

313

479.6

63.0

 

0.1314 over 15

5-17

1072

889.1
24.1

3

0

1-4

287
463.1
64.6
 

0.0735 over 16

6-20

1188

989

21.3

4

0

1-5

356

548.1

57.6

 

0.1378 over 19

6-22

1182

940.1

30.2

U

There are no capital U, thus both sets, u and [u + U], are identical. The set of [u + U] was divided into five parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

5

1

*0.036

*0.028

*0.238

*0.002

2

 

0.923

0.851

0.269

3

  
0.927

0.311

4

   

0.364

The first part differs significantly from other ones. The last fifth is very dissimilar. The similarity deteriorates consecutively.

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0

1-3

189

348.6

62.0

 

0.4691 over 10

4-9

628

504.4

20.6

2

0

1-3

213
371.2
62.9
  

4-9

654

526.1

16.8

3

0

1-3

192

371.7

62.9

 

0.4208 over 11

4-10

768

592.4

26.5

4

0

1

6

134.2

70.5

 

0.2392 over 10

6-26

606

787.8

18.1

5

0

1

9

138.3

72.8

  

2-13

1121

1021

14.5

Since a considerable part of u is used in qu, it was possible to eliminate this connection and to analyze only remaining uses.

The set of these u was divided into four parts. The two way sample analysis shows how the parts differ:

Part

2

3

4

1

0.056

0.397

0.169

2

 

0.285

0.623

3

  
0.582

The first part does not differ significantly from other ones. The results are poorly comparable with the results with all u, since the number of parts is different

The most important disturbances from the shape of the distribution in all parts are tabulated:

Part

Chi-square

Range

Observed

Expected

% of chisquare

1

0

1

8

104.8

72.2

 

0.3419 over 20

6-22

960

837.1

15.3

2

0

1-3

213
371.2
62.9
 

0.85.66 over 11

4-12

773

648.8

20.0

3

0

1

6

107.1

67.1

 

0.1437 over 18

6-19

899

749

21.7

4

0

1

11

109.4

74.8

 

0.3193 over 15

7-18

745

642.9

16.0

V

The exponential distribution is applicable for all three sets.

The capital Vs occur too often till distance 527 (23 occurrences against 15.7 expected). This contributes 56.2 % of the chi-square test value.

The fit over 20 is almost perfect, the chi-square test value is 0.9801 over 20 at v set, 0.9726 at [v + V] set. None vv against 7.4 occurrences expected contribute 34.1 % of the chi-square test value. Too few v in the range 404-440 (2 occurrences against 9.7 expected) make 28.2 of the chi-square test value.

The upper case needs no commentary.

W

No occurrence.

X

The exponential distribution gives a good fit. Too few x till distance 22 (25 occurrences against 38.3 expected) contribute 32.9 % of the chi-square test value.

Y and Z

Too few occurrences.

Discussion

The insufficient capacity of the used software for long lists forced splitting of too frequent signs. The splitting showed that the use of words change within the studied book, different words are used at its end than at its beginning.

Some distributions of distances between consonants are highly regular, especially their tails, if the low distances inside words are pooled. They are described with a different precision with four distributions: negative binomial, exponential, Weibull, and lognormal. Sometimes it is rather difficult to decide which distribution is the better one for fitting.

Compared with the English and German text studied before, Latin is exploiting only 21 letters, j, k, and w are not used, y and z occur only in few words of foreign origin.

The most frequent fit was obtained with the negative binomial distribution. This distribution can be expected if no biases occur. At upper case, the Weibull distribution correlates long distances between scarce occurrences.