Distances between numerals in the number e

Milan Kunz

Abstract

The distribution of distances between identical numerals in the number e was studied in the different transcription of this number. The distributions are mostly negative binomial but some numerals show rather significant disturbances.

Introduction

The number e =2,7118281828..., the base of the natural logarithms is obtained as the result of the infinite sum of inverse factorials. The calculating capacities of PC made possible to calculate this number to many places.

The publication of Ventluka [1] who published the program for calculation of the number e, and calculated it to 130000 places gave me an opportunity to study distances between identical numerals similarly as the distances between identical letters were studied in the Czech and English languages and the distances between identical coding triplets in DNA [2].

We are accustomed to the decadic number system. To be sure that the results are not an artifact of this custom, my friend Rádl made an algorithm for transforming the numbers into all bases from 2 till 16.

The results

The preliminary results using the STATGRAPHIC program gave the best results for the negative binomial distribution, thus this distribution was used only. The program is not capable to calculate with enough numbers, thus only 10000 first digits has been evaluated and in the binary notation even less, since the STATGRAPHIC program was not able to work with the 11529 bit file. The size of files decreased to 7379 bits for 16 base. Moreover, the STATGRAPHIC program is not reproducible, calculations in different times gave somewhat different results. Thus the results below must be considered as relative, only.

The evaluation of the fit was done by the chisquare test, which gave were high significance.

The examples for the best and the worst result in the decadic notation follow:

Decadic number e, distances between the numeral 6. Chisquare test

Lower

Upper

Observed

Expected

 

Limit

Limit

Frequency

Frequency

Chisquare

at or below

2.516

235

220.0

1.0207

2.516

5.240

253

248.8

.0720

5.240

7.964

98

124.3

5.5791

7.964

10.689

132

140.6

.5246

10.689

13.413

101

99.8

.0138

13.413

16.137

86

70.9

3.2247

16.137

18.861

41

35.4

.8765

18.861

21.585

39

40.1

.0279

21.585

24.310

31

28.4

.2299

24.310

27.034

14

20.2

1.9010

27.034

29.758

6

10.1

1.6607

29.758

32.482

7

11.4

1.7067

32.482

35.206

13

8.1

2.9575

35.206

40.655

10

8.6

.2173

Chisquare = 20.4604 with 14 d.f. Sig. level = 0.116281

Decadic number e, distances between the numeral 7. Chisquare Test

Lower

Upper

Observed

Expected

 

Limit

Limit

Frequency

Frequency

Chisquare

at or below

1.000

106

101.5

.1998214

1.000

4.097

247

247.1

.0000688

4.097

7.194

175

179.7

.1220749

7.194

10.290

128

130.6

.0535168

10.290

13.387

93

95.0

.0416361

13.387

16.484

70

69.1

.0126758

16.484

19.581

49

50.2

.0294113

19.581

22.677

46

36.5

2.4664300

22.677

25.774

28

26.5

.0796353

25.774

28.871

20

19.3

.0253087

28.871

31.968

13

14.0

.0761013

31.968

35.065

8

12.9

1.8878681

35.065

38.161

6

6.7

.0674840

38.161

44.355

8

8.4

.0169562

 

Chisquare = 5.11795 with 13 d.f. Sig. level = 0.972504

There were 26 distances 6 or 7 between the consecutive numerals 6, than expected, what worsened the chi-square in the first case, and 10 distances between the consecutive numerals 20-22, than expected, what worsened the chi-square in the second case, but nevertheless, the fit was almost perfect.

In the following table, the results of calculations for number bases 2 till 12 are tabulated:

Table 1

Chisquare tests

Chisquare is given without zero on 3 decadic places without rounding, to spare place.

Numerals

Base

0

1

2

3

4

5

6

7

8

9

10

11

2

113

047

 

 

 

 

 

 

 

 

 

 

3

874

572

212

 

 

 

 

 

 

 

 

 

4

440

102

244

693

 

 

 

 

 

 

 

 

5

799

517

273

835

837

 

 

 

 

 

 

 

6

992

763

337

818

354

682

 

 

 

 

 

 

7

587

590

468

445

763

440

282

 

 

 

 

 

8

125

378

837

457

126

138

766

772

 

 

 

 

9

298

755

846

465

043

236

468

760

958

 

 

 

10

736

650

818

831

417

156

116

972

793

895

 

 

11

568

959

892

263

438

667

601

078

337

554

660

 

12

256

318

682

781

657

236

110

346

781

960

318

828

The worst fit was obtained with the binary transcription of the number e, the best one with the base 6 and 5. At the base 6 the numeral gave an almost perfect fit with the expected values.

The numeral sequence of the number e can be compared with results of consecutive throwing of the regular triangular bipyramide, whereas the binary transcription of the number e were obtained by consecutive throwing of a coin only exceptionally.

The decadic transcription was good, too. Here appeared rather poor fit of the numeral 4. The chisquare value 0,043 means that the obtained values are practically accidental. The explanation of observed fluctuations is rather difficult.

The results can be compared with the number pi, which first 1000 I got on internet from the page of a man who tried to memorize them.

The negative binomial distribution gave following results

Digit

Chisquare

0

0.7293

1

0.4265

2

0.3709

3

0.2319

4

0.3439

5

0.9466

6

0.0705

7

0.5321

8

0.6977

9

0.8144

The binary transcription was worse, again; Chisquare test for 0 = 0.4170, for 1 = 0.2580.

The secondary distances

The distances between distances between consecutive symbols, it is the second difference which could be interesting. Unfortunately, the program I am using can not work with words, only with symbols. I have thus a possibility to study only such strings, where the secondary distances are rather short, as in binary strings, where only few distances 10-14 occurred, actually to few to study their distribution. But shorter distances gave rather interesting results.

An example of a very good fit:

Chisquare test of the distribution of the distance 3 between consecutive 0

Lower

Upper

Observed

Expected

 

Limit

Limit

Frequency

Frequency

Chisquare

at or below

1.500

1110

1093.5

.2499

1.500

2.500

528

554.4

1.2563

2.500

3.500

287

281.1

.1248

3.500

4.500

141

142.5

.0159

4.500

5.500

80

72.3

.8310

5.500

6.500

36

36.6

.0109

6.500

7.500

17

18.6

.1331

7.500

8.500

10

9.4

.0362

above

8.500

9

9.7

.0483

Chisquare = 2.70644 with 7 d.f. Sig. level = 0.910767

An example of a poor fit:

Test of the distribution of the distance 2 between consecutive 0

Lower

Upper

Observed

Expected

 

Limit

Limit

Frequency

Frequency

Chisquare

at or below

1.500

347

320.7

2.1487

1.500

2.500

215

235.1

1.7162

2.500

3.500

154

172.3

1.9442

3.500

4.500

125

126.3

.0131

4.500

5.500

117

92.6

6.4536

5.500

6.500

69

67.8

.0198

6.500

7.500

42

49.7

1.1993

7.500

8.500

23

36.4

4.9588

8.500

9.500

26

26.7

.0189

9.500

10.500

23

19.6

.5986

10.500

11.500

15

14.3

.0296

11.500

12.500

14

10.5

1.1539

12.500

13.500

10

7.7

.6816

13.500

14.500

7

5.6

.3229

Chisquare = 21.9729 with 14 d.f. Sig. level = 0.0791739

The results are tabulated below:

Binary transcription of the number e Chisquare test

Secondary distances between consecutive

0

1

1

0.9107

0.3012

2

0.0791

0.0611

3

0.0946

0.5909

4

0.1600

0.7317

5

0.5216

0.7317

6

0.0189

0.2757

The character of the distribution changed at the distance 6 between consecutive 0 and 1: a better fit was obtained with the lognormal distribution, which gave chisquare test results 0.2465 and 0.8297, respectively. Longer distances were to few to give meaningful results. Since there are necessarily rather high numbers, the distribution type must be of another type than the negative binomial distribution. It should be noted, that the distribution of the distance 4 between consecutive 0 is either negative binomial (chisquare 0.1600) or exponential (chisquare 0.1708), in both cases there appears a peak of distances 29-31 (observed frequency 15 against expected 8.1).

Conclusion

Generally, the primary distance results are more stochastical than the distribution distances in information strings. But at the secondary distances, there appears phenomena observed in [2]. It were interesting to study longer sequences and maybe the ternary distances.

Literature

1. J. Ventluka, CHIP, CD-ROM, 1999.

2. M. Kunz, Z. Rádl: 'Distribution of Distances in Information Strings,' J. Chem. Inform. Comput. Sci., 38, 374-378.