One from my many attempts to publish the controversial subject revived by the scanner.
SOME PROBLEMS OF ENTROPIC MEASURES
Milan Kunz
Chemopetrol, Research Institute of Macromolecular Chemistry
Tkalcovská 2, 656 49 Brno, Czechoslovakia
Boltzmann Hn and Shannon Hm entropies are interpreted as symmetries of spherical orbits in vector spaces. The sum s of Hn and Hm, used for counting cyclic permutations and trees, do not distinguish finer structures. It is shown that there exists yet another class of entropies of negative distributions which can be employed to measure the permutation entropy.
INTRODUCTION
There exist thousands of publications about the entropy function
H = -
and only recently a new scientific revolution was announced which should lead to the abdication of Boltzmann‘s entropy Hn (1).
It was generally accepted that Boltzmann had never given an exact proof of his H theorem (2), and so, when Shannon 40 years ago published his theory of communication (3) which introduced function Hm as the third postulate, the mathematical elegance of his work led to the hope that the new information theory can replace the controversial hypothesis (4).
There are two forms of the H theorem. One is that (1) is a forever growing function. The weaker form is that (1) is a statistical form of thermodynamical entropy S. This was proofed by practice but Boltzmann‘s explanation of the relation of H and S was not understood, because Boltzmann himself did not take it seriously.
In 1877 (5), he pronounced the quantum hypothesis.
In an isolated system of the ideal gas, there are m quanta of energy partitioned between n molecules. Supposing m>n, there are necessarily multiplicities of energy levels mk of molecules and thus the system is described by the following sums
n =
Sj 1j = Sk nk (2)the index j goes from 1 till n, the index k from 1 till m. Since m can be greater then n, many nk must be zero.
m =
The generally accepted relation between Boltzmann Hn and Shannon Hm entropies or otherwise, between thermodynamical and informational entropy is the negentropy principle (6). According to it, entropy and information are complementary. Information should decrease entropy. This principle is based on nonphysical arguments connected with the incomplete analysis of the Maxwell‘s demon, which decreases entropy by sorting molecules. It was overlooked that the process can be reversed and then demon increases the entropy by the identical method as it was decreased. Between demon‘s information and entropy of the system does not exist any one to one relation postulated by the negentropy hypothesis (7).
Shannon defined entropy Hm with the logarithm to base 2. This definition need not be connected with Boltzmann‘s Hn at all.
For indexing of m objects by the regular code, mlog2m binary digits are needed at least. If the set of m objects is divided by some information into n subsets, then only
S nkmk log2mk binary digits are sufficient. The difference gives the result which limit is (1) and thus Hm is the direct measure of information (8).But Hm has also another meaning analogical to Hn.
Identity (3) counts undistinguishable objects. If these objects are ordered into a string as symbols into a text, they are indexed by their natural order and thus they are distinguishable.
It is a different problem and indeed both Hn and Hm are calculated differently. Shannon has given a simple example how his function Hm should be calculated. 4 symbols with probabilities pa = 1/2, pb = 1/4, pc = pd = 1/8 determine the share of symbols in an infinite string
Hm = -0.5 log2 0.5 - 0.25 log2 0.25 -2 x 0.125 log2 0.125 = 1.75.
If we want to calculate Hn, then we must compare n symbols with n particles and their repeating with their energy. We have
P(0.5)= 1/4, P(0.25)= 1/4 , P(0.125)= 1/2.
Hn = -2 x 0.25 log2 0.25 -0.5 log2 0.5 = 1.5
If there are two possibilities how to calculate entropy according to equation (1), the question arose which of them is right or if both can be applied, how they are related.
TWO ENTROPIES Hn AND Hm
Recently it has been shown that both entropies are additive. The proof corroborating this statement is elementary.
Two operations over a word (a string of symbols) NT, the substitution e.g. abaced
® rirfuk, and the permutation e.g. abaced ® daceab are joined, if we write the word in the transposed form as a column and symbols j as the unit vectorsej (01, ...., 1j, ... 0n).
The word N has a form of a naive matrix, whose elements are {0, 1}. It is a one-sided stochastical matrix giving NJ = J, where J is unit vector-column.
In each row of naive matrix N is only one unit symbol. There are two possibilities how to interpret these matrices. The unit vectors ej are translations in the n dimensional lattice and the naive matrix N is a path going from the beginning of the nonnegative lattice to the final point determined by the row vector JTN, where T is the symbol of transposing. All naive matrices with m rows and n columns N (m, n) form the vertices of a plane simplex. This projection stresses the constant sum m, it expresses the law of the conservation of energy.
The system is represented by a n-dimensional vector NTN as a point in the phase space of energy. Because energy of the isolated system is constant, all representation points lie on a plane orthogonal to the unit diagonal vector In . Vectors NTN are hypotenuses of right triangles with (m/n) In as the leg. The n dimensional plane is a (n - 1) dimensional body and we use for it the term the plane simplex. The n dimensional space is the sum of plane simplexes, the n dimensional plane complex.
All representation points available for the system with m quanta of energy and n ideal gas particles are counted by the identity above (4). The left hand side of (4) counts all possible partitions of m undistinguishable objects into n boxes.
The summation on the right side of (4) is made over all partitions of number m into at most n parts. The partitions form on the plane simplex spherical orbits, because the length of vector M does not depend on permutations of its elements. The volume of partition orbits is determined by polynomial coefficients.
Molecules of the ideal gas exchange their energy by collisions. lf the energy is not transposed by a collision of two particles, the system goes on another orbit. At large systems with many simultaneous collisions, we can suppose that fluctuations produced by individual collisions eliminate and that the system remains on one orbit or on a small band of orbits. It is possible to say that it rotates on its n dimensional orbit. Boltzmann connected entropy S with the logarithmic measure of the volume of partition orbits which for n going to infinity is approximated by (1).
The alternative interpretation of naive matrices N (m, n) is the cube simplex in m dimensional space. The cube has edges < o, n - 1> and j-th index ej determines the distance (j - 1) from the beginning on axe i. This maps each string of a plane simplex as a point with natural coordinates (zero including) in a cube. The cube is obtained from the plane complex by truncating all points with values mk greater than (n - 1).
The permutations are formalized by unit permutation matrices Pm acting on a matrix M from the left and substitutions are formalized by unit permutation matrices Pn acting on a matrix M from the right. Jointly we have PmMPn or using the symmetric group Sn substitutions and permutations are equivalent operations forming the group SmM(m,n)Sn containing all possible partition orbits of M. If m = n, then one of these orbits is the symmetrical group Sn itself, with partition 1n.
The identity counting all naive matrices N (m, n) with m rows and n columns or corresponding points of the m dimensional cube is
nm =
The sum is made over all partitions of the number m into n dimensional vectors which can have zero elements. Terms n!/
Õ 0k!n(0) must be calculated.If matrices N are projected into the space of columns using the unit projection operator J or into the space of rows, using the transposed unit operator JT‚ a part of the original symmetry is lost.
Both symmetries may be formally separated by defining the corresponding quadratical forms PnNTNPn, and PmNNTPm
The diagonal matrices NTN (or projections JTN) are counted by identity (4). It counts all partition orbits in the cube simplex.
Other quadratical forms NNT are permuted block-diagonal matrices with blocks JmjJmjT. They are counted by Bell polynomials
S
n = S m!/Õ nk!mk!n(k) = S S(m, n) (5)where (n-n0) = 1 n0 = const, and S(m,n) are Stirling numbers of the second kind (10). There are m! possible symmetrical permutations of a quadratical matrix NNT. The permutations inside blacks JmjJmjT are ineffective, which gives mk!nk terms of the denominator, also permutations as blocks as the same size are ineffective, which gives nk! terms. The partial sums of Bell polynomials with constant number as the nonzero elements are known to give Stirling numbers of the second kind. Using the definition of Stirling numbers of the second kind and the falling factorial in the form n!/no!, we obtain from (5) the identity (7), without terms m0!n0 = 1 .This offers its additional proof.
The polynomial coefficient for m permutations is known as the Brillouin measure. Its logarithm gives Hm similarly as Hn was obtained from the polynomial coefficient for n permutations.
There is an open question how to interpret Hm in physics. The interpretation of Hn is clear. If the representation point of an isolated system can go through all points as the plane simplex, its average state is given by the left hand side as identity (4), it is described by the Bose-Einstein distribution. If the system is large and it is not spliced into independent ensembles, we can observe it on the largest orbit or its neighbor orbits for sufficient long time. Then its entropy is given by a term of the right hand side as (4).
If the temperature of the system is T= 0, there is only one element of symmetry, the identity and Hm = 0. At higher temperatures, the system moves in the phase space to higher planes with more and longer orbits. There are not taken into account changes of the volume V, and thus Hn cannot be replaced by any other function, as Guy tried (1). We can suppose that n molecules are moving on a lattice with v cells and thus there are (v!/n!(v-n)!) possible additional arrangements of the system at the constant volume V. If we want to consider the effect of the volume on entropy of the ideal gas, we must add another term to Hn, Hm could be applied only at the nonideal gas which molecules are not pointlike.
There is possible to distinguish, at least partly, energy increments of different parts of molecules as it will be shown later. Usually (1) is applied and no distinction is made between Hn and Hm.
There is an exception. The principle of increasing mixing character (11-14) is based explicitly on Hm. According to it, the time development of a Gibbs (microcanonical) ensemble of isolated systems proceeds in such a way that the mixing character increases monotonically. There are some partition orbits which are not comparable by their mixing character (e.g. 3,3,0 and 4,1,1). It is not possible to go from one orbit on the other by one collision of two particles, there are necessary at least two collisions and the principle of the increasing mixing order excludes accessibility of both orbits, it deforms the ideal space by its restrictions.
But the vector space as defined till now is not yet complete. It does not contain e.g. sums and differences of naive matrices N. They form incidence matrices G of graphs. For graphs many special functions (16, 17) are defined. They are known are as information indices. Both functions Hn and Hm were found distinct from them on a special class of graphs, star forests.
OVERESTIMATING SUMS OF ENTROPIES
If we use sum of entropies (Hn + Hm) for counting permutation matrices Pn, which are a special class of naive matrices N, we obtain the correct result
n!/n1!.n!/1!n = n!
because n1 = n.
But in this case we can distinguish also the permutations according to their cycle structure and we can count them using the cycle index
S n!/Õ nk!mk!n(k) = n!
where the sum is made over all partitions of n into cycles of mk length,
S mk = n.Similarly, formula (4) can be used for counting indexed trees. The Pruefer algorithm (17) connects all strings of (n-2) symbols from a set of n symbols with the indexed trees with n vertices. For these strings we use (5) as
n(n-2) =
where mk are valences of the vertices.
If we apply this formula e.g. for octanes, we must remember that carbons and hydrogens can not be permuted, even after a correction we get all isomers on one orbit.
A somewhat better result is obtained, if we consider only the carbon atoms. In such a case 7 partitions are derived:
42 16, 4 3 2 15, 4 23 14, 33 15, 32 22 14, 3 24 13, 26 12.
Also in this case, however, some isomers are counted together, e.g. 2-methyl heptanes, 3-methyl heptanes, 4-methyl heptanes and 3-ethyl hexanes on the partition orbit 3 24 13.
These results can be explained by the fact that using (6) for counting trees, we do not distinguish the structure of the off-diagonal elements of the adjacency matrix and their symmetry, which is the symmetry of a quadratical form GTG.
The known methods for counting trees (17) are better because they determine the product of symmetries of trees more exactly. Hn and Hm are not universal measures and the range of their possible applications is limited.
ENTROPlES OF Negative DlSTRlBUTlONS
The intuitive concept of mixing is connected with shuffling of cards. The term of mixing entropy was used for ordering of partitions somewhat unfortunately. Two partitions are compared according to the shape of their distribution functions which are in practice changed by an operation of grinding.
We have an intuitive notion, what it means that a pack of cards is good mixed. We can describe this shuffling by permutations. But such mixing has no effect on both entropies Hn and Hm until the partition of the cards is not changed and thus we need something else as a measure of mixing.
There are several methods how to measure the extent of mixing. In a binomial string we can measure distances between symbols of one kind and use them as the measure of mixing. These distances are known as the negative binomial distribution, which is correlated with the polynomial distribution (18).
lt has been shown that in a string N there exists a set of negative binomial distributions, forming the negative polynomial distribution (19). E.g. for a word bacaabda.. we have 4 different binomial distributions
a 0a0aa00a 2 2 1 3
b b0000b00 1 5
c 00c00000 3
d 000000d0 7
This set can be used to measure mixing of symbols in a word. Or, if we write the word in the matrix form, we can take it as nm dimensional vector (elements {0, 1}, and determine its negative binomial distribution. The problem is, that we can write its string using the column indexing (a11.. am1, a12... ...amn) or the row indexing (a11 ..a1n, a21... amn) or any other suitable indexing. The obtained measures are not identical and it is difficult to choose the right one. It is possible that the best measure could be obtained, if results at different indexing method were averaged.
The averaging is nothing usual. The calculation of the function Hn is made for the system in one orbit. But we have mentioned that we can use the left hand side of identity (4), if system goes trough all possible states. This is an averaging too. We can not exclude an averaging a priori for computational difficulties.
The concept of the nm dimensional vector solves the problem of transmissions between partitions orbits, which are impossible if only symmetries SmM Sn are allowed. The full symmetry of the nm vector is given by the symmetric group Smn. The symmetries obtained by permutations of rows and columns of a matrix SmM Sn are then its subgroups.
For example, the group S9 for vector elements 06 13 reduces in (9!/3!6!) orderings which are arranged into 6 subgroups S3M S3
| 1 | 0 | 0 |
1 | 0 | 0 | |
1 | 0 | 0 |
| 1 | 1 | 0 | 1 | 0 | 0 |
1 | 0 | 0 | 1 | 0 | 0 | |
0 | 0 | 0 | 0 | 1 | 0 |
1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Vector values in a string of the negative distribution are differences of indexes. For matrices with {0,1} elements there are not any zero differences. Such strings are thus in their turn the differences (Moebius inversions) of strings which may have zeroes as their elements. The strings of negative distributions with zeroes can be obtained from matrices, the elements of which are integers. If we count the distances between the individual quanta inside the matrix elements mij as zeroes, e.g. for a vector (0, 1, 0, 3, 1 ..) we get its negative as (2, 2, 0, 0, 1, ..).
The entropies of negative distributions can be calculated in the same manner as for originals. These entropies correspond to Shannon entropies of conditional probabilities (3) of symbol sequences. They can give a better measure of local properties of strings.
E. g. for two extreme cases of binary strings, we have Markov matrices of probability transitions
00000 . ... 11111
1 | 0 |
0 | 1 |
01010101 ....
0 | 1 |
1 | 0 |
determining their properties. But for the matrix
0.5 | 0.5 |
0.5 | 0.5 |
there are many corresponding strings, e.g.
000 ..111... 101010
00110011....
and we need an additional hypothesis, for example ergodic, or conditional probabilities of higher orders to describe exactly the string.
DISCUSSION
The existence of entropies of negative distributions shows how complicated even the most elementary structures such as words are. Moreover, there remains a problem: how are strings N imbedded into geometrical space? For example, which restrictions hold as to the positions of symbols in a string. If a string N is a text, than the symbols must form words and only specific combinations of symbols are allowed, depending on a language used. Moreover, only such combinations of words are meaningful, which form messages. In physics the strings must represent physical particles.
The unit vectors ej in naive matrices N with unrestricted permutations represent quanta which may be loose as molecules in a gas mixture. Their chemical identity does not depend on their position, it is given by indices j. If particles must be compact, as e. g. in the case of atoms in a molecule, where all imbedding have to remain connected, then the distances in a string are determined by a method used for the scanning of the lattice. If a system is imbedded in a box, then the greatest allowed distance between symbols depends on the box sides, it is the surface of one side of the box. The number of allowed words is restricted by imbedding dimensionless topological objects into geometrical space (20). Till now, the entropy of such combinatorial systems was solved as a sum of topological and metrical entropies (21).
The naive matrices N were introduced with a naive hope that the unit vectors ej are the basic elements of space and their strings, visualized as vector additions, are the simplest structures. Apart from this, matrices of their negative distributions have more elements and behind them negative distributions of the second order are, etc. The negative distributions show that naive matrices N are the differences of matrices with natural numbers (including zero).
This situation is well comparable with the existence of quarks in elementary particle physics which are behind elementary particles in which classical atoms disintegrated. The existence of several entropies corresponds to the fact that entropy Hn expresses only the partition of thermal energy of the ideal gas with pointlike molecules.
The temperature reflects only the deviation of motion of molecules against local means and there remains an open problem, how different contributions are combined into the unified measure outside the realm of the classical thermodynamics and if such measure exists at all. There were made many attempt to find it.
At the end it should be mentioned that the treatment shown above has its philosophical aspects.
At first, it is the distinguishability of elementary particles. Particles are distinguished by their indexes, but their permutations can be ineffective, they need not have any effects on macroscopic properties of systems.
In (4), the molecules are distinguished by j indexes and thus particles of Maxwell -Boltzmann statistics are distinguishable (there are undistinguished energy quanta) (22, 23).
The distinguishability is lost by mathematical operations as JTN, NTN, and N NT are. But this is given by properties of these operations and not by some special properties of objects forming words N. If we know only JTN or NTN, and there is no unique inverse J-T or N-T, we cannot conclude that N does not exist, too.
REFERENCES
1. A.G. Guy, Appl. Phys. Commun. 7, 217 (1987).
2. M. Kac, In: J. Mehra Ed. The Physicist’s Conception of Nature, D. Riedel, Dordtecht 1973, p. 560.
3. C. Shannon, Bell System Techn. J. 27, 379, 623 (1948).
4. E.T. Jaynes, Phys. Rev. 106, 620 (1954).
5. L. Boltzmann, Wien. Ber. 76, 373 (1877).
6. L. Brillouin, Science and Information Theory, Academic Press, New York, 1956, p. 152.
7. M. Kunz, MATCH, in press.
8. M. Kunz, Coll. Czech. Chem. Comm. 51, 1856 (1986).
9. M. Kunz, Information Processing Management 20, 519 (1984).
10. E.A. Bender, J.R. Goldman, The American Math. Monthly 82, 789 (1975).
11. E. Ruch, Theor. Chim. Acta (Berl.) 38, 167 (1975).
12. E. Ruch, A. Mead, Theor. Chim. Acta 41, 95 (1976).
13. E. Ruch, R. Schranner, T.H. Seligman, J. Chem. Phys. 96, 386 (1976).
14. E. Ruch, B. Lesche, J. Chem. Phys. 69, 393 (1978).
15. A. Sabljiè, N. Trinajstiè, Acta Pharm, Jug. 31, 189 (1981).
16. A.T. Balaban, J. Motoc, O. Bonchev, O. Mekenyan, Topics Curr. Chem. 114, 21 (1983).
17. F. Harary, E.M. Palmer , Graphical Enumeration, New York, Academic Press, 1973.
18. A. Renyi, Wahrscheinlichkeitsrechnung, Berlin, Deutscher Verlag der Wissenschaften, 1962.
19. M. Kunz, Scientometrics 11, 159 (1987).
20. N.J. Turro, Ang. Chem. 98, 872 (1986).
21. M. Gordon, W.B. Temple, J. Chem. Soc. A 1970, 729.
22. A. Bach, Phys Lett. A 125, 447 (1987).
23. A. Bach, International Conference on Statistics in Science Luino (Italy). May 26-28, 1988.