Distances between numerals in the number e
Milan Kunz
Abstract
The distribution of distances between identical numerals in the number e was studied in the different transcription of this number. The distributions are mostly negative binomial but some numerals show rather significant disturbances.
Introduction
The number e =2,7118281828..., the base of the natural logarithms is obtained as the result of the infinite sum of inverse factorials. The calculating capacities of PC made possible to calculate this number to many places.
The publication of Ventluka [1] who published the program for calculation of the number e, and calculated it to 130000 places gave me an opportunity to study distances between identical numerals similarly as the distances between identical letters were studied in the Czech and English languages and the distances between identical coding triplets in DNA [2].
We are accustomed to the decadic number system. To be sure that the results are not an artifact of this custom, my friend Rádl made an algorithm for transforming the numbers into all bases from 2 till 16.
The results
The preliminary results using the STATGRAPHIC program gave the best results for the negative binomial distribution, thus this distribution was used only. The program is not capable to calculate with enough numbers, thus only 10000 first digits has been evaluated and in the binary notation even less, since the STATGRAPHIC program was not able to work with the 11529 bit file. The size of files decreased to 7379 bits for 16 base. Moreover, the STATGRAPHIC program is not reproducible, calculations in different times gave somewhat different results. Thus the results below must be considered as relative, only.
The evaluation of the fit was done by the chisquare test, which gave were high significance.
The examples for the best and the worst result in the decadic notation follow:
Decadic number e, distances between the numeral 6. Chisquare test
Lower |
Upper |
Observed |
Expected |
|
Limit |
Limit |
Frequency |
Frequency |
Chisquare |
at or below |
2.516 |
235 |
220.0 |
1.0207 |
2.516 |
5.240 |
253 |
248.8 |
.0720 |
5.240 |
7.964 |
98 |
124.3 |
5.5791 |
7.964 |
10.689 |
132 |
140.6 |
.5246 |
10.689 |
13.413 |
101 |
99.8 |
.0138 |
13.413 |
16.137 |
86 |
70.9 |
3.2247 |
16.137 |
18.861 |
41 |
35.4 |
.8765 |
18.861 |
21.585 |
39 |
40.1 |
.0279 |
21.585 |
24.310 |
31 |
28.4 |
.2299 |
24.310 |
27.034 |
14 |
20.2 |
1.9010 |
27.034 |
29.758 |
6 |
10.1 |
1.6607 |
29.758 |
32.482 |
7 |
11.4 |
1.7067 |
32.482 |
35.206 |
13 |
8.1 |
2.9575 |
35.206 |
40.655 |
10 |
8.6 |
.2173 |
Chisquare = 20.4604 with 14 d.f. Sig. level = 0.116281
Decadic number e, distances between the numeral 7. Chisquare Test
Lower |
Upper |
Observed |
Expected |
|
Limit |
Limit |
Frequency |
Frequency |
Chisquare |
at or below |
1.000 |
106 |
101.5 |
.1998214 |
1.000 |
4.097 |
247 |
247.1 |
.0000688 |
4.097 |
7.194 |
175 |
179.7 |
.1220749 |
7.194 |
10.290 |
128 |
130.6 |
.0535168 |
10.290 |
13.387 |
93 |
95.0 |
.0416361 |
13.387 |
16.484 |
70 |
69.1 |
.0126758 |
16.484 |
19.581 |
49 |
50.2 |
.0294113 |
19.581 |
22.677 |
46 |
36.5 |
2.4664300 |
22.677 |
25.774 |
28 |
26.5 |
.0796353 |
25.774 |
28.871 |
20 |
19.3 |
.0253087 |
28.871 |
31.968 |
13 |
14.0 |
.0761013 |
31.968 |
35.065 |
8 |
12.9 |
1.8878681 |
35.065 |
38.161 |
6 |
6.7 |
.0674840 |
38.161 |
44.355 |
8 |
8.4 |
.0169562 |
Chisquare = 5.11795 with 13 d.f. Sig. level = 0.972504
There were 26 distances 6 or 7 between the consecutive numerals 6, than expected, what worsened the chi-square in the first case, and 10 distances between the consecutive numerals 20-22, than expected, what worsened the chi-square in the second case, but nevertheless, the fit was almost perfect.
In the following table, the results of calculations for number bases 2 till 12 are tabulated:
Table 1
Chisquare
testsChisquare is given without zero on 3 decadic places without rounding, to spare place.
Numerals
Base |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
2 |
113 |
047 |
|
|
|
|
|
|
|
|
|
|
3 |
874 |
572 |
212 |
|
|
|
|
|
|
|
|
|
4 |
440 |
102 |
244 |
693 |
|
|
|
|
|
|
|
|
5 |
799 |
517 |
273 |
835 |
837 |
|
|
|
|
|
|
|
6 |
992 |
763 |
337 |
818 |
354 |
682 |
|
|
|
|
|
|
7 |
587 |
590 |
468 |
445 |
763 |
440 |
282 |
|
|
|
|
|
8 |
125 |
378 |
837 |
457 |
126 |
138 |
766 |
772 |
|
|
|
|
9 |
298 |
755 |
846 |
465 |
043 |
236 |
468 |
760 |
958 |
|
|
|
10 |
736 |
650 |
818 |
831 |
417 |
156 |
116 |
972 |
793 |
895 |
|
|
11 |
568 |
959 |
892 |
263 |
438 |
667 |
601 |
078 |
337 |
554 |
660 |
|
12 |
256 |
318 |
682 |
781 |
657 |
236 |
110 |
346 |
781 |
960 |
318 |
828 |
The worst fit was obtained with the binary transcription of the number e, the best one with the base 6 and 5. At the base 6 the numeral gave an almost perfect fit with the expected values.
The numeral sequence of the number e can be compared with results of consecutive throwing of the regular triangular bipyramide, whereas the binary transcription of the number e were obtained by consecutive throwing of a coin only exceptionally.
The decadic transcription was good, too. Here appeared rather poor fit of the numeral 4. The chisquare value 0,043 means that the obtained values are practically accidental. The explanation of observed fluctuations is rather difficult.
The results can be compared with the number pi, which first 1000 I got on internet from the page of a man who tried to memorize them.
The negative binomial distribution gave following results
Digit |
Chisquare |
0 |
0.7293 |
1 |
0.4265 |
2 |
0.3709 |
3 |
0.2319 |
4 |
0.3439 |
5 |
0.9466 |
6 |
0.0705 |
7 |
0.5321 |
8 |
0.6977 |
9 |
0.8144 |
The binary transcription was worse, again; Chisquare test for 0 = 0.4170, for 1 = 0.2580.
The distances between distances between consecutive symbols, it is the second difference which could be interesting. Unfortunately, the program I am using can not work with words, only with symbols. I have thus a possibility to study only such strings, where the secondary distances are rather short, as in binary strings, where only few distances 10-14 occurred, actually to few to study their distribution. But shorter distances gave rather interesting results.
An example of a very good fit:
Chisquare test of the distribution of the distance 3 between consecutive 0
Lower |
Upper |
Observed |
Expected |
|
Limit |
Limit |
Frequency |
Frequency |
Chisquare |
at or below |
1.500 |
1110 |
1093.5 |
.2499 |
1.500 |
2.500 |
528 |
554.4 |
1.2563 |
2.500 |
3.500 |
287 |
281.1 |
.1248 |
3.500 |
4.500 |
141 |
142.5 |
.0159 |
4.500 |
5.500 |
80 |
72.3 |
.8310 |
5.500 |
6.500 |
36 |
36.6 |
.0109 |
6.500 |
7.500 |
17 |
18.6 |
.1331 |
7.500 |
8.500 |
10 |
9.4 |
.0362 |
above |
8.500 |
9 |
9.7 |
.0483 |
Chisquare = 2.70644 with 7 d.f. Sig. level = 0.910767
An example of a poor fit:
Test of the distribution of the distance 2 between consecutive 0
Lower |
Upper |
Observed |
Expected |
|
Limit |
Limit |
Frequency |
Frequency |
Chisquare |
at or below |
1.500 |
347 |
320.7 |
2.1487 |
1.500 |
2.500 |
215 |
235.1 |
1.7162 |
2.500 |
3.500 |
154 |
172.3 |
1.9442 |
3.500 |
4.500 |
125 |
126.3 |
.0131 |
4.500 |
5.500 |
117 |
92.6 |
6.4536 |
5.500 |
6.500 |
69 |
67.8 |
.0198 |
6.500 |
7.500 |
42 |
49.7 |
1.1993 |
7.500 |
8.500 |
23 |
36.4 |
4.9588 |
8.500 |
9.500 |
26 |
26.7 |
.0189 |
9.500 |
10.500 |
23 |
19.6 |
.5986 |
10.500 |
11.500 |
15 |
14.3 |
.0296 |
11.500 |
12.500 |
14 |
10.5 |
1.1539 |
12.500 |
13.500 |
10 |
7.7 |
.6816 |
13.500 |
14.500 |
7 |
5.6 |
.3229 |
Chisquare = 21.9729 with 14 d.f. Sig. level = 0.0791739
The results are tabulated below:
Binary transcription of the number e Chisquare test
Secondary distances between consecutive |
0 |
1 |
1 |
0.9107 |
0.3012 |
2 |
0.0791 |
0.0611 |
3 |
0.0946 |
0.5909 |
4 |
0.1600 |
0.7317 |
5 |
0.5216 |
0.7317 |
6 |
0.0189 |
0.2757 |
The character of the distribution changed at the distance 6 between consecutive 0 and 1: a better fit was obtained with the lognormal distribution, which gave chisquare test results 0.2465 and 0.8297, respectively. Longer distances were to few to give meaningful results. Since there are necessarily rather high numbers, the distribution type must be of another type than the negative binomial distribution. It should be noted, that the distribution of the distance 4 between consecutive 0 is either negative binomial (chisquare 0.1600) or exponential (chisquare 0.1708), in both cases there appears a peak of distances 29-31 (observed frequency 15 against expected 8.1).
Conclusion
Generally, the primary distance results are more stochastical than the distribution distances in information strings. But at the secondary distances, there appears phenomena observed in [2]. It were interesting to study longer sequences and maybe the ternary distances.
Literature
1. J. Ventluka, CHIP, CD-ROM, 1999.
2. M. Kunz, Z. Rádl: 'Distribution of Distances in Information Strings,' J. Chem. Inform. Comput. Sci., 38, 374-378.