Some speculations about emerging of RNA and DNA sequences

Milan Kunz

20. april 2002

Introduction

Triplets of four ribonucleic acids in their polymer RNA coding aminoacids in proteins emerged in the twentieth century as a new enigma replacing solved ones of physics and chemistry.

We are reading long sequences of an unknown language and we try to understand them as hieroglyphs of ancient Egypt or cuneiform writings of Babylon.

There are tried many approaches, frequencies of individual aminoacids, occurences of pairs, triplets and generally n-tuplets of ribonucleic acids or aminoacids.

I tried to investigate an inverse approach, to study distances between consecutive items in the information strings distrib.tex. In DNA, there appear four types of distributions of distances between consecutive triplets: lognormal, negative binomial, exponential, and Weilbull, giving better or worser fit of observed values, with deviations forming peaks and valleys against expected values.

Similar situation was observed in texts written in English and Czech languages. The fit was usually better for longer distances, where effects of some often used longer words vanished.

DNA can be thus compared with a language, and it is justified to speak about a language of Nature. Otherwise, since our brain is a product of DNA, it is not surprising that its product is made by a similar template.

When we speak or write, we are not thinking about repeating vowels, except in rhymes, and only try do not repeat words often. Thus distances appear quite spontaneously. But a question remains: How are the observed distances formed?

Distances between numerals in irrational numbers

The irrational number e is a result of an algorithm, it is an infinite sum of 1/n!. Essentially, it is infinite, its length depends only on possibilities of its computing.

The study of the distributions of distances between consecutive digits has shown e.htm that the distribution is always negative binomial, as if it were obtained as a result of consecutive trials with coins, regular tetrahedrons or cubes. The observed deviations have form of unexpected high or low appearences of some distances, the basic shape of the negative binomial distribution is always maitened.

Thus, the observed distributions of distances are not produced by a stochastical or algorithmical process. There appears an explanation.

The observed distances between consecutive symbols form a string of numbers, again. It is thus possible to study the distribution of distances between distances, the secondary distances. The preliminary results (only short distances gave inough data for statistical tests) have shown the negative binomial distribution for distances 1-5, but at the distance 6 appeared another shapes, lognormal and exponential. This gives a clue to the explanation of observed DNA distance distributions: they are not the primary ones.

As an example, we take a string of distances in a binary string inducing the secondary distance strings, and form the secondary distances:

Primary

4

4

3

1

2

2

2

1

1

6

1

1

1

3

1

1

2

distances of 1

-

-

-

4

-

-

-

4

1

-

2

1

1

-

2

1

-

2

-

-

-

-

5

1

1

-

-

-

-

-

-

-

-

-

9

3

-

-

3

-

-

-

-

-

-

-

-

-

-

10

-

-

-

Since in the primary negative binomial distribution longer distances were scarcer, greater numbers appeared, and the distribution of these values got another shape. This leads to this hypothesis:

The life appeared by forming RNA-DNA on a template which reactive sites had the negative binomial distribution of distances. On this catalyst was formed a primitive RNA-DNA with somewhat another form of distance distribution. By development these forms become to be the main ones.

This hypothesis can be tested by two ways:

More primitive forms of life should have the distributions of distances in RNA-DNA more negative binomial than the higher forms.

The possible templates for forming primitive RNA-DNA, as for example silicates, should have the negative binomial distribution of distances between reactive sites.