Boltzmann and Shannon Entropies are Additive Measures of Symmetry

Milan Kunz, Jurkovièova 13, 63800 Brno, The Czech Republic, email: kunzmilan@seznam.cz.

The Boltzmann and Shannon entropy functions H are logarithms of two different polynomial coefficients determining cyclic permutations of two indices. They measure symmetry of unit vector strings. They are additive, if both indices are explicit. There exist other similar functions, for example, for the distribution of distances between objects of the same kind (entropy of mixing).

INTRODUCTION

The statement from an abstract in CA1: "Boltzmann entropy is an information entropy", is typical for the state of art. It is generally believed, that the Shannon entropy function Hm is more sophisticated and therefore better defined than Boltzmann entropy function Hn. Both functions measure related but nevertheless different properties. They can even be additive.

One can speculate, who was Jack with a Lantern, who changed the great enigma connected with entropy into a greater error. Its consequences are spread from mathematics, over physics, cosmology2, biology, social sciences to philosophy.

One word to change, such a small correction could be enough, and J. Von Neumann could not say to Shannon3: "You should call it entropy, for two reasons. In the first place, your uncertainty function has been used in statistical mechanics under that name so, it already has a name. In the second place, and more important, no one knows what entropy really is, so in a debate you will always have the advantage."

The word, which could change this uncertainty was the word symmetry. It should have been used by Boltzmann4, when he explained his idea, what the entropy of the ideal gas is. He had another problem, too. His proof needed a quantum hypothesis, 30 years ahead of Planck. He had no faith in his thoughts and abandoned them for conventional mathematics. Instead of the term "symmetry", he used the word "probability" which says nothing. Since this probability is determined by the symmetry, he was correct, but the basic idea of his proof of the H theorem was not recognized and remained obscure (Kac5 "a demonstration").

SYMMETRY

First, the notion of symmetry and its measure must be clarified. If we take regular pentagons, hexagons and generally n-agons (and of course regular triangles and squares), they will have an ever growing number of symmetry elements (planes), or symmetry elements of higher order (rotation axes). The circle has the highest symmetry in two dimensions, the sphere in three dimensions.

The lowest possible symmetry has only one symmetry element, the identity.

The logarithm of one is zero. The physical entropy of any objects at the absolute zero temperature is zero, since they can exist in only one state. If the temperature is growing, the number of possible states is growing, there appear new elements of symmetry or their orders are increasing. To name the increasing possible orderings as disorder leads to a confusion.

ORBITS IN MULTIDIMENSIONAL SPACES

S. Weinberg 6 mentioned in his lecture about the importance of mathematics for physics the case of the Ramanudjan-Hardy equation, determining the number of partitions of the number m into n parts7. Hardy thought that this equation never would have physical applications. Weinberg found an application for this part of the number theory in the theory of elementary particles. Weinberg had forgotten completely, that even before Hardy, the partitions had a more practical use, they were the base of the Boltzmann‘s proof of his H theorem.

Boltzmann4 gave an example partitioning 7 quanta’s of energy within 7 particles. All possible cases can be written in the table. The numbers represent quanta of energy mk, the upper indices ^i count the number nk of particles with the same energy mk. The theory of partitions does not count zero parts, they are considered as nonexistent. Therefore, the columns of the table are arranged according to the number of nonzero parts and zeroes and their frequencies are omitted. This is in fact the difference of the whole plane simplex, it is important if zeroes are counted with or not. The rows are ordered according to the size of the largest part, which is dominant and determines the length of the vector row:

Table 1

The diagram of the normal seven dimensional plane

Dimension

1

2

3

4

5

6

7

Size 7

7

      

6

 

61

     

5

 

52

511

    

4

 

43

421

4111

   

3

  

331,

322

3211

31^4

  

2

   

2221

221^3

21^5

 

1

      

1^7

This is the diagram of the normal (m=n) seven dimensional plane orthogonal to the diagonal unit vector I7, its crossection (details see in24). The columns represent consecutively: vertices, points on lines, two dimensional bodies (surfaces in three dimensions), and so on. Each partition of the number seven represents one orbit (Boltzmann used the term "complexion"), and so the diagrams count them. Another arrangement of the diagram was introduced by Ruch8, who stressed nearest neighbor relations only.

All points lying on an orbit have the same distance from the origin of the coordinate system. They are obtained by cyclic permutations of the partition vector from its ordered canonical form. Therefore the orbits are spherical.

The physical interpretation is, for example, a closed isolated system with constant energy. At the beginning, all kinetical energy is concentrated in one particle. After a big bang {state (7, 0, 0, 0, 0, 0, 0)} this moving particle collides in its flight with six other stationary particles with zero energy and its energy dissipates to them. Similarly, if all particles have equal energies, some particles receive as the result of collisions more energy while other ones lose it. The system goes spontaneously to the orbit with the largest symmetry defined by the group of cyclic permutations (3, 2, 1, 1, 0, 0, 0). It there remains, if not disturbed from outside. Boltzmann allowed fluctuations to neighbor orbits with the lower symmetry.

The number of points on each orbit is determined by the polynomial coefficient for n permutations, and the function Hn is just the logarithm of this number

Hn = ln(n!/P nk!) =  S(nk/n)(ln nk/n) (1)

There is no doubt that the factorials of large numbers are approximated quite satisfactorily using the Stirling formula. The problem is that the system must be very large; it must contain a huge number of particles for obtaining a sufficient number of simultaneous collisions to keep it on one orbit9. This is fulfilled, if parts are counted in mols and energy by the Boltzmann constant k quanta.

The polynomial coefficient for n permutations has its maximum if all terms in the divisor are as small as possible. It can be realized if the ratio m/n is specific, only. The minimum is reached when one term exists in the divisor. Then all parts are equal and nk = n, entropy is zero (temperature does not measure all energy of particles, for example the energy connected with the rotation of a laboratory with the Earth, but only its thermal part).

Another question is if the function Hn corresponds really to the entropy of real gases. We will see later that additional terms must be added, but the function Hn itself was defined by Boltzmann without any uncertainty as the measure of the symmetry of a natural vector. Unfortunately, he used the term "probability" instead of "symmetry", which was not in vogue then.

The polynomial coefficients for n permutations can be placed into the diagram on places of corresponding orbits. The column sums and the total sum are well known combinatorial identities for the distribution of m undistinguishable objects into n distinguishable cells24.

INFORMATION ENTROPY

Shannon10 built his theory of communication using axioms. They need not be explained. Moreover, he used the binary logarithm and thus the Hm function has a specific interpretation11.

To index a set of m objects by a regular code (symbols 0 and 1), we need at least mlog2m digits, for example 000, 001, 010, 011, 100, 101, 110, 111 for 8 items. If these objects are classified into n groups with the index j, say aaaabbcd (4 groups), we need only  Smjlog2mj digits, in our example 10 digits: a00, a01, a10, a11, b0, b1, c, d. The difference (24- 10) divided by the number objects, m = 8, is the measure of information relative to one object, we have about the set. The fractions mj/m, obtained after manipulations with the formula, are similarly as fractions nk/n in Hn interpreted as probabilities pj.

Since the regular code can be visualized as the binary decision tree, each 0 representing an edge going left, and each 1 representing an edge going right, each indexed object forming a leave of the tree, the information entropy can be explained even to children as a play.

The given interpretation is only an approximation, the needed number of digits is greater than the binary logarithm, if mk are not powers of 2. A greater fault is that it does not exactly match with the notion we have about information. We do not index symbols in texts.

Moreover, this elementary explanation is bound by the choice of the base of logarithm. If we change the base and define the function Hm in natural logarithms, to be comparable with the function Hn is, we must search for another interpretation.

We have shown that Boltzmann connected function Hn with cyclic permutations of the partition vector. The cyclic permutations demand some ordering of permuted objects, recognizing individual particles.

The symbols in a message are indexed by their natural order. A message is not a sum of symbols used, but their product. Since "cat" is another word than "act", the ordering of symbols is important, and the multiplication is noncommutative. The symbols repeat in texts, their counts are known as frequencies. This suggests an analogy with photons, which transfer information too. Their energy is based in their frequencies.

With messages, there are two possible kinds of permutations, either permutations of the order of symbols in a string, e. g. when aaaabbcd permutes to babacada, or substitutions of symbols, when one symbol is replaced by another, e. g. when from aaaabbcd is obtained ddddccab.

Both kinds of transformations can be done as one formal mathematical operation, if we write a string of symbols as a naive matrix12,13A, where the columns are indexed by the alphabetical index and empty cells represent zeroes

Index

a

b

c

d

1

 

1

  

2

1

   

3

 

1

  

4

1

   

5

  

1

 

6

1

   

7

   

1

8

1

   

Sums

4

2

1

1

Both symmetry operations are performed here by multiplying the matrix N by the unit permutation matrices Pmfrom the left and by the unit permutation matrices Pn from the right. The left hand symmetry can be eliminated either by finding the quadratic form PTNTNP or transforming the matrix N into the row JTN, where JT is the unit vector-row. The quadratic form PTNTNP is the diagonal matrix, the symmetry operations must act from both sides. PTNTNP and or the JTN are the row vectors Boltzmann considered. Despite their simpler forms, they are derived from the basic matrix by a mathematical operation which has no inverse, and thus they are not basic stones of space.

The number of vector strings leading to each point on the partition orbit is determined by the polynomial coefficient for m permutations

m!/P mj! = m!/ Pmk!^nk (2)

The function Hm could be again just the logarithm of this number obtained using the Stirling formula, since natural and binary logarithms differ only by a factor. Here the precision is less good than at the function Hn, but acceptable. The function Hm measures the number of messages which can be formed from a given set of symbols. But at each string, the Boltzmann function Hn is defined too, therefore the total number of strings of length m going to a n dimensional plane, counted over all partition orbits, is

 S (n!/P nk!)(m!/P mk!^nk) = nm (3)

This improved form of the Newton polynomial formula can be found in textbooks14. Partial sums of polynomials are well known combinatorial identities for the distribution of m distinguishable objects into n distinguishable cells. This is somewhat confusing, since all typed letters are alike, mutual permutations of symbols of one kind are ineffective, as the coefficient for m permutations shows. Each item got the second index by its position in the matrix (text). The third index (or some specific features) is necessary for distinguishing them further.

If we start from formula (3), the logarithm of the product is a sum, therefore both measures H are additive. In our example, the vector of frequencies mk is (4, 2, 1, 1) and n0 = 0, n1 = 2, n2 = 1 n3 = 0, n4 = 1, etc., Hn = log(4!/2!1!^2) and Hm = log(8!/4!2!1!^2).

Hm has the maximum, if all mk are equal. Then Hn is minimal. If all mk = 1, the polynomial coefficient for m permutations reduces to m! and is maximal. It can be expressed by the cycle index characterizing the group of cyclic permutations Sn or Sm, respectively.

The existence of two measures of symmetry explains the observed redundancy of natural languages. It is true, that if all symbols are used equally, it is possible to formulate the greatest number of messages from the given set. But many of them are alike, and it were difficult to distinguish them.

It is better to explain this difficulty speaking about words instead of symbols. In a message without redundancy all words would have equal frequencies, there were no key words, and we can not recognize, what is being spoken about.

SYMMETRY OF GRAPHS

With ideal gases, all energy is concentrated in a point. With real gases, the energy is dispersed, but only within a molecule, its quanta can not be spread throughout the whole matrix as symbols are in a text. The ideal gas can be formally considered as a quadratic form PTNTNP, where PT is the transposed permutation matrix for n permutations and NTN is the diagonal vector. Its molecules are represented as points on vector axes. A real molecule forms a blot in the matrix representing the system.

Molecules are isomorphic with graphs, mapping their structures. A graph is described by an incidence matrix. This matrix is either a difference of two naive matrices (oriented graph) or their sum (unoriented graph). The symmetry of graphs is determined similarly as at naive matrices, by multiplying the incidence matrix by the unit permutation matrices from the left and from the right. This leads to the wreath products of the cyclic groups15 and to rather complicated formulas transforming the group of cyclic permutations Sn into graph groups. In specific cases, both functions Hn and Hm together with additive terms can be exploited for enumeration11.

For chemical compounds, the relation of the entropy with symmetry was stated by many authors many times before [e. g. 16-18]. They calculated its contribution to the experimental entropy. It seems that their contributions remain unrecognized.

The problem is complicated by the existence of the entropy of mixing19,20. Consider now, that the string aaaabbcd represents 8 molecules of 4 different kinds (another embodiment of this kind of entropy is sorting hot and cool molecules inside the system), arranged linearly in a tube. The entropy of such a mixture depends on the mixing of molecules inside the system. If the original arrangement permutes to babacada, its entropy must change, at least if we change the size of the system to mols. Both Hn and Hm do not measure this effect15,16, since they remain constant over orbits. We need another measure evaluating distances between symbols of one kind in the string.

The distances between letters of one kind in this paper, when they were loaded as in the printer‘s case, were one. This corresponds to the canonical state of a string of symbols. Now, these distances can be described as the negative binomial, lognormal or Weilbull distributions in texts, as well as in another information strings as molecules of RNA are21. Hn and Hm values for distances can be calculated separately for all symbols, as before.

DISCUSSION

It is possible to replace factorials in the polynomial coefficients by gamma functions (this is necessary for particles with spin), to use Euler functions, and to replace sums by integrals. But such modification would not improve understanding, only the definitions would be more abstract and complicated.

I dislike using sophisticated terms, usually employed discussing entropy. My arguments are based on the notion of the natural Euclidean (Hilbert) space and its properties. The space is not an ordered set of points, but the strings N of unit vectors ej leading to points. The coordinates of points are obtained from these strings either as quadratic forms NTN, or as unit projections JTN. Due to these operations, it is proven that noncommutative algebra is basic, and commutative algebra is derived.

I used for the matrices N the adjective "naive". Some suitable adjectives, as primitive, elementary, were occupied before. The other reason was, that I considered my proof of the relations of H functions as naive and humble.

Both Hn and Hm functions are employed as they were proposed by their authors. Boltzmann‘s one exactly, and Shannon‘s one in its spirit and form.

The entropy of the ideal gas is comparable with Boltzmann Hn function, but it is not possible to say, how many terms are incident at real systems. Yet, it is plausible that entropy is a measure of symmetry, meaning the measure of the number of symmetry elements and their orders in the systems.

Physical entropy is a logarithmic measure of the amount of energy needed to increase temperature. On this molecular level, the temperature is the integrating factor.

Outside physics, we can calculate functions H on many levels. But we do not know if some integrating factors exist, nor how to define them.

If we interpret the spontaneous growth of entropy as the spontaneous growth of symmetry in the Universe, then we do not need the term negentropy for living organisms22,23. They have a greater number of symmetry elements of higher order, a greater complexity than non-living things, only. The increase of symmetry is a spontaneous process. Elementary particles form atoms, atoms molecules, molecules structures as crystals or living cells. Living cells assemble into organisms, organisms into societies. In each step, new symmetry elements appear to the old ones. These views were expressed by many authors before.

The apparent disorder is only unrecognized symmetry24. Accepting, Hn and Hm functions are approximated logarithmic measures of symmetry of strings of unit vectors ej, their purpose and relation is clearly defined. The disadvantage of my proposal is, that then the functions are too simple for a debate.

REFERENCES

(1) Chen E. B., Boltzmann Entropy, Relative Entropy and Related Quantities in Thermodynamic Space, J. Chem. Phys., 1995, 102, 7169-79; CA 122: 299958.

(2) Hawking S., A Brief History of Time, Bantam Books, Toronto, many editions.

(3) Tribus M., McIrvine E. C., Energy and Information, Scientific American, 1971, 225, 3, 179.

(4) Boltzmann L., Über die Beziehung zwischen dem zweiten Hauptsatze der mechanishen Wärmetheorie und die Wahrscheinlichkeitsrechnung, Wiener Berichte 1877, 76, 373. [I am not sure about this identification, since I had the paper as a separate without a nearer identification].

(5) Kac M. in Mehra J., Ed. The Physicist's Conception of Nature, Reidel, Dordrecht, 1973, p.560.

(6) Weinberg S., Mathematics, the Unifying Thread in Science, Notices AMS, 1986, 33 716.

(7) Andrews G. E., The Theory of Partitions, Addison-Wesley Publ. Comp., Reading, MA, 1976.

(8) Ruch E., The Diagram Lattice as Structural Principle, Theor. Chim. Acta (Berl.) 1975, 38, 167-183.

(9) Kunz M.: How to Distinguish Distinguishability: Physical and Combinatorial Definitions, Physics Letters 1989, A 135, 421-424.

(10). Shannon C. E, The Mathematical Theory of Communication, Bell System Technical Journal, 1948, 27, 379, 623.

(11) Kunz M.: Entropies and Information Indices of Star Forests, Coll. Czech. Chem. Commun., 1986, 51, 1856-1863.

(12) Kunz M., Information Processing in Linear Vector Space, Information Processing and Management, 1984, 20, 519-524.

(13) Kunz M., About Metrics of Bibliometrics, J. Chem. Inform. Comput. Sci., 1993, 33, 193-196.

(14) Feller W., An Introduction to Probability Theory and its Applications, J.Willey, New York, 1970, Chapter 10.4.

(15) Harary F., Palmer E. M., Graphical Enumeration, Academic Press, New York, 1973.

(16) Gordon M., Temple W. B., Chemical Combinatorics. Part I. Chemical Kinetics, Graph Theory and Combinatorial Entropy, J. Chem. Soc (A), Inorg. Phys. Theor., 1970, 729.

(17) Dannenfelser R. M., Surendran N., Yalkowsky S. H., Molecular Symmetry and Related Properties, SAR, QSAR Environ. Res., 1993, 1, 273.

(18) Lin, S.K. Correlation of Entropy with Similarity and Symmetry, J. Chem. Inform. Comput. Sci., 1996, , 36, 367-376.

(19) Kunz M., Time Spectra of Patent Information, Scientometrics, 1987, 11, 163.

(20) Ruch E., Lesche B., Information Extent and Information Distance, J. Chem. Phys. 1978, 69, 393-401.

(21) Kunz M., Rádl Z.: 'Distribution of Distances in Information Strings,' J. Chem. Inform. Comput. Sci., 1998, 38, 374-378.

(22) Schroedinger E., What Is Life?, Cambridge University Press, Cambridge, 1944.

(23) Kunz M., A Note about the Negentropy Principle, MATCH, 1988, 23, 3.

(24) Tonnelat J., Conformity of Evolution towards Complexity from Thermodynamic Conclusions, Arch. Int. Physiol. Biochim. 1986, 94, C11.

(25) Kunz M., Matrix Combinatorics and Algebra, www:// mujweb.cz/veda/kunzmilan.