January 14, 1991. This date was registered, when I submitted this paper to Journal Chemical Education. It was rejected. Now I scanned the rejected manuscript and corrected it, only.

ABC of Entropy

MILAN KUNZ

Research Institute of Macromolecular Chemistry, 656 49 Brno, Czechoslovakia

There exist thousands of publications about the entropy and despite of it Peres (1) and Weinberg (2) could not agree about its definition in the recent polemics about the application of the second law of thermodynamics in quantum mechanics.

In this journal same papers appeared recently (3-6) devoted to the problem.

Actually, the problem must be divided into two parts, the problem of the mathematical formula

H = -S pklog pk                                           /1/

And the problem of physical entropy S. We try to clear the problem of the function H on elementary level. The only difficulty to be overcome is the notion of the n-dimensional space, but it can be explained with the alphabet.

Our explanation follows exactly Boltzmann (7) and Shannon (8), where the infinite model was abandoned and virtual symbols (not occurring in an actual text) allowed.

A. On the plain of the constant energy

The Bolzmann’s example was simply an isolated system of 7 particles with 7 quanta of energy. We should start with the 3 dimensional phase space, where the coordinates of all particles are registered at one moment, take two following moments and find their difference. We had 6.7 = 42 coordinates. Instead we use 7 dimensional Euclidean space in which all vectors are orthogonal and draw all possible combinations having the constant sum S mj = 7. Only points which coordinates are natural numbers (zero including) appear as the possible solution of the distribution problem. They lie on the plane orthogonal with the positive cone and form the simplex. With this simplex are differences of our 7- dimensional phase space isomorphic.

The problem was simplified but not for us, because we are 3-dimensional animals and our abilities to understand higher dimensional spaces are limited. We must analyze such simplex at first. For it we divide the simplex into orbits

7000000

 

 

 

 

 

 

 

610000

 

 

 

 

 

 

520000

5110000

 

 

 

 

 

430000

4210000

4111000

 

 

 

 

 

3310000

3211000

3111100

 

 

3220000

 

 

 

2221000

2211100

2111110

 

 

 

 

 

 

 

1111111

Fig. 1. The orbit partition. The surface simplex (7,7) in 7-dimensional vector space. The distance between the neighbor orbits is due to the exchange of 2 vectors only. The orbits are ordered into columns according to the number of zero vectors, into the rows according to the size of the longest vector. Such orbit diagrams can be formed by the full induction for all m and n (it is the problem of the partition of the number m into n parts). If m < n or n> m, the diagram is truncated, some orbits can not be realized.

If we have the ideal gas system (9) with its collisions, such a collision can be symmetrical, e.g. a5+ b2® a2+ b5. Two vectors change their position and the system remains on the same orbit. But a collision can be asymmetrical, e. g. a5+ b2® a3+ b4. The system changed its orbit since the resulting partition is different.

As the result of a series of asymmetrical collisions, the system can return on its initial orbit. If a system has infinite many particles, collisions occur simultaneously, and the system remains on an orbit or on a narrow band of orbits. Because the neighbor particles meet at the collisions, we may suppose that the corresponding vectors are indexed as neighbor, too, and we can imagine that the ideal gas system is rotating in its equilibrium orbit. The state of the system is determined by the symmetry of this orbit.

If we multiply n dimensional vector row with natural coordinates (zero including) by the unit permutation matrices P representing the group of cyclic permutations Sa, we get

n!/Õ nk!                                             /2/

Different points on an orbit provided

S (k ³ 0) nk = n, S (k ³ 0) nkmk = m    /3/

Boltzmann proposed the logarithm of this index (n-polynomial coefficient) as the entropy measure H.

B There was a word at the beginning

The application of the H function in the information theory is an example of sloppiness.

Shannon needed the name for the function Hm = S pklog2 pk.

And J. von Neumann suggested him entropy:”In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place and more important, no one knows what entropy really is, so in a debate you will always have the advantage” (10). Shannon used the binary logarithm and an alphabet with few letters for which the Stirling formula was not applicable.

But a more important objection was that the probabilities in both applications were different. The ideal pas system can be described by the linear vector. e.g. 3,2,1,1, 0 ... or a3, b2, c1, d1, e0 .. The elements of the vector are exponents corresponding to the energy which the particles have, we need not know the positions of particles. But such a vector gives only the statistics; it does not contain the information which is inside words. For example, bacadaab and abdacaba are different words with the same statistics. In the case of the ideal gas, it is the actual position of all particles which are lost in the difference.

This information is exactly the difference between a path in the space represented by a string of vectors and a position in the space given by the vector sum.

At first we give a straightforward interpretation of the information entropy with the binary logarithm (11). In a message we have m symbols (Shannon worked with m = 7). For their indexing with the regular binary code using digits {0, 1} we need at least mlog2m digits. If we have same information about symbols, it means that they form n subsets of an alphabet, then for their indexing it is sufficient S milog2mi digits. The difference gives the information entropy H2 (Fig. 2), if divided by the number of symbols m.

Fig. 2. Graph interpretation of the information entropy. The entropy is given by the difference of the number of edges in both coding graphs divided by the number vertices. If no information about 8 objects is available, we need at least 24 decisions (left-0, right-1) digits. If the objects are indexed as a, a, a, a, b, c, then it is necessary only 8 such indexing steps. The difference divided by the number of the objects, is the measure of information, which we have about the indexing.

But for information entropy based on natural logarithms, we have another interpretation.

Words are vectors of information and we interpret them as vector strings (12). For this purpose we identify the symbol i with the unit vector ej and write a word in the form of a vector column. The word has the form of the naive matrix. Which has in each row one unit. Now we have two symmetry operations acting on a matrix Pm NPn.

If we multiply a matrix by the unit permutation matrix Pn from the right, we permute columns of the matrix (it corresponds to substitutions of symbols). If we multiply a matrix by the unit permutation matrix Pm from the left, we permute the rows or symbols. In both cases the position vector remains on the orbit.

Both symmetry operations are separated, if we form the quadratic forms

PnNTNPn and PmNNTPm.

Finding the quadratic form PnNTNPa , we get from a text its statistics, we know only how many symbols of each kind were in the text, but we do not know their order, finding PmNNTPm , we know the pattern, but we do not know which symbols were used. The formal mathematical operation corresponds to the abstraction in thinking.

The statistics of N is known in combinatorics as the distribution of m distinguishable things into n distinguishable boxes and is determined by the sum of the products of two polynomial coefficients (9).

S (n!/P nk!) (m!/P mk!nk) = nm /4/

The sum is made over all orbits fulfilling /3/.

The statistics of PnNTNPn is known as the distribution of m undistinguishable things into n distinguishable boxes and it is given by the sum of the n polynomial coefficients of all orbits

S (n!/P nk!) = (m+n-1)!/m!(n-1)! /5/

Now we see the difference of both entropies. The n-polynomial coefficient was used by Boltzmann; its logarithmic measure gives Hn function. Similarly the m polynomial coefficient gives Hm function which corresponds to the Shannon entropy measure with the binary logarithm. In every case Hm is different from Hn and can not replace Hn. Thus all attempts to replace Boltzmann Hn function with entropies derived from the theory of information (13, 14) were based on the incomplete analysis.

The formalism of the symmetry of the quadratic forms shows al so that the dispute about supposed undistinguishability of microparticles was based on misunderstandings. According to classical combinatorics, boxes (vectors) in NTN are distinguishable and we can map each case onto the plane simplex and count them. But according to the quantum physics, if a system can be permuted, its particles should be undistinguishable. Actually they are only equivalent.

It could be argued that there is difference between the polynomial coefficients and the equation /1/, but simple calculations show how fast both results converge for small n and m.

C The mixing entropy

Boltzmann and Shannon both used the ergodic hypothesis; they supposed that their elements are good mixed. But it is impossible using both measures H and H to measure the mixing entropy which exists in physics.

If we have two binary strings large enough

1 1 1 1 1 1 0 0 0 0 0 0 /a/

1 0 1 0 0 1 0 0 1 0 1 1 /b/

to form a book with two volumes, then we can distinguish macroscopically /a/ and /b/ similarly as we distinguish between two states of a system consisting from two compartments, one full of molecules and the other empty in one state and mixed in another. But both strings lie in m dimensional vector space on the same orbit which crosses the unit cube or they lead to the same point in the 2-dimensional space (12), it is, all strings have the identical entropy Hn and Hm.

We can interpret them in two ways. Let us suppose that m is the volume of the system, the number of cells which can be occupied by a molecule (then the value is 1) or can be empty (then 0). The indexing of the cells can be microscopically observable, it is /a/ is a comprimed gas system, /b/ is the expanded system. Or 0 and 1 are just two kinds of molecules and we follow their mixing. The entropy H is just the logarithmic measure of the symmetry index given by the polynomial coefficient. This index is the same for all parts on the orbit and it does not express the entropy of mixing which should be connected with the position on the orbit.

This can be shown on a simple example. Let us suppose that we have two isolated systems of the ideal gas A, B, with the number of molecules and entropies HA and HB and further that the system A is hotter than B and moreover, that both distributions are ”iced”, there is none exchange of energy between particles. The entropy of the system formed from two isolated systems is HA+B = HA+ BB.

If we join both systems removing the separating wall, the number of the particles in the system will be n = nA+ nB and its entropy

HAB = HA HB + ln n!/nA!(n-nA)!

This binomial coefficient is just the number of all permutations of particles nA with particles nB without regard on their energy, it corresponds to the changes due to the diffusion, there are no collisions of particles. The term ln n!/nA!(n-nA)! (seems to be the mixing entropy, we looked for, but it is not, because it only characterizes the new orbit but not the position of the system on the orbit.

When we postulated the existence of two subsystems A and B, we introduced into the n- dimensional vector indexing. Because removing of the separating wall has no effect on the physical space we can not change the indexing of the topological space and indexes in both original parts must remain macroscopically distinguishable.

Boltzmann discussed effects of collisions on the distribution of energy only and did not analyze the effect of the position of particles on the equilibrium. He introduced the ergodic hypothesis and neglected the possibility that the entropy can depend not only on the partition, but on the ordering of vectors-particles on the partition orbit. If the points of the partition orbits are not macroscopically equivalent, then the Boltzmann formula does not include any term for the mixing entropy.

Even the Shannon information entropy H measures possibilities of m only. If the frequencies of the symbols are constant, then the Hm value remains constant for all permutations of symbols. If there are two physical processes affecting the entropy, the exchange of energy between particles by their collisions and the change of their positions due to the diffusion, we need also two different mathematical measures isomorphic to both operations. Essentially there are two possibilities. We can split the partition orbits into suborbits, as e.g. the partition 1n corresponding to the group of cyclic permutations S can be split into n subgroups according to their cyclic structure.

But it is doubtful if it gives the right result. It gives too low estimates. By splitting the system into two subsystems A and B, we introduced the binomial distribution. The orderings in its string are measured by the negative binomial distribution which counts the lengths of sequences of one kind. This distribution is the polynomial distribution, which probability is measured by the polynomial coefficient. It means that the mixing entropy is H again, only the measured values are not values mk but distances between vectors of one kind.

Using cyclic indexing (n+1) = 1 we get for /a/ and /b/

111117 /a/

233211 /b/

We obtained again partitions and we can use both measures Hn and Hm once more.

For usual texts the problem is somewhat more complicated, we have many symbols and instead of the binomial distribution we deal with the polynomial distribution. But the problem can be simplified, the polynomial distribution is a set of the binomial distributions. For example the string

abcdabaa

can be written as

a 0 0 0 a 0 a a

0 b 0 0 0 b 0 0

0 0 c 0 0 0 0 0

We can now calculate four negative binomial distributions. Or alternatively, we can use directly the naive matrix N which is the binary vector in mn-dimensional space and apply to it the negative binomial distribution.

Using this method for physical word or world, the mixing entropy measures distances between molecules of the same kind; or having the same energy, how they are distributed inside their matrix.

D. Symmetry of molecules and information indices.

Till now we worked only with symbols or with ideal molecules without any structures. But the reality is more complicated. If we have n objects and m relations between them, we need to its description the incidence matrix of a graph in which rows at least two unit symbols must appear.

The entropic measures /1/ were introduced on graphs by Rashewsky (15), Mowshowitz (16) and others. They are known as topological information indices (17). They measure different graph invariants. Using the naive matrices in (m+n) dimensional dual space, it was possible to show that all them are different from the entropies as defined by Boltzmann and Shannon.

The matrix

0

N

NT

0

is the adjacency matrix A of a special kind of graphs -the star forests. At them the difference of Hn and Hm and other topological information indices was obvious (11).Topological information indices measure symmetries of graphs which are given by the number of possible different configurations of a graph with labeled and unlabeled vertices.

This symmetry was connected with physical entropy of molecules by Gordon and Temple (18). We got the fourth set of entropic measures H connected with entropy and we still do not know what the entropy S really is.

D. The maximal entropy

Boltzmann found that the maximum of H was achieved if the n energy distribution corresponded to the Maxwell distribution. Then it was exponential and the distribution of velocities was normal. The proof was based on the use of Lagrange coefficients.

But Shannon found the maximal Hm straightforwardly. H is maximal if all Pk are equal. The difference between the maximal Hm and the observed information energy was called the redundancy. All observed information distributions have rather high redundancy, they are not optimal for the communication.

This fact is explained easily. If all symbol s have the same frequency in the text, there exist the maximal number of possible strings, obtainable by mere permutations. But the given orbit contracted to 1 point and its H = D. If the frequencies of symbols are different, then different combinations of symbols give more different messages, Hm is maybe lower, but the sum Hn + Hm is higher. The natural languages do not maximalize Hm but the sum of entropies and this explains the observed redundancy easily.

If we think in terms of words (molecules of information formed from its atoms-symbols), we distinguish easily the theme it is spoken about if not all words occur uniformly. If we use Shannon’s direct approach for Hm, we find that H should be maximal, if each nk = 1. It means that the maximum of the function H is not achieved at the Maxwell distribution but at the linear n distribution. Boltzmann’s proof was based on a mathematical maneuver, but why the nature accepts it?

The condition nk = 1 means that each particle should be in its own energy state which is possible only if the total number of quanta is greater that the sum of the linear line

m > n(n+1)/2.

The mean energy m of 1 particle should be then greater then the half of the number of particles in the system

mj³ (n+1)/2

Comparing the Boltzmann constant k in the form k= 2,O8.1O1O h/grad, where h is the Planck constant, with the Avogadro number N=6,023.1023, we see that the linear distribution could be achieved in the normal range of temperatures only for very small systems with the number of particles less than 10-13 mols or for molar size systems at temperatures higher than 1013 K. Supposing that no other entropy were effective.

E Perspectives

The elementary analysis of mathematical entropy measures H showed that they can be explained consistently on an elementary level. They are connected with the notion of symmetry. Its elements seem to be trivial, but complications are towering fastly in wreath products. It is a long way from a string of symbols to superstrings which appear in contemporary physics. The physical phase space is not as plain as the ideal Euclidean space is and it will be long before we decipher all letters of the alphabet of a strange language the Nature is written in.

Literature Cited

1. Peres A., Phys. Rev. Lett. 1985, 63, 1114.

2. Weinberg S., Phys. Rev. Lett. 1989, 63, 1115.

3. Bowen L.H., J. Chem. Educ. 1988, 65, 50.

4. Pojman, J., J. Chem. Educ. 1990, 67, 200.

5. Clugston M.J., J. Chem. Educ. 1990, 67, 203.

6. Swanson R. M., J. Chem. Educ. 1990, 67, 206.

7. Boltzmann L., Wien. Ber. 1877, /6, 373.

8. Shannon C., Bell System Techn. J. 1948, 27, 623.

9. Kunz M., Physics Letters A 1989, 135, 421.

10. Shaw O., Davis C. H. J. Am. Soc. lnform. Sc.1983, 34, 67.

11. Kunz M., Coll. Czech. Chem. Commun. 1986, 51, 1856.

12. Kunz M., lnformation Processing and Management, 1984, 20, 519.

13. Jaynes E.T., Phys. Rev. 1954, 106, 620.

14. Brillouin L., Science and lnformation Theory, Academic Press, New York, 1956.

15. Rashewsky N., Bull. Math. Biophys. 1955, 17, 229.

16. Mowshowitz A., Bull. Math. Biophys. 1968, 30, 175, 225, 387, 533.

17. Balaban A.T., Motoc I., Bonchev O., Mekenyan O., Topics Curr. Chem. 1983, 114, 21.

18. Gordon M., Temple W.B., J. Chem. Soc. 1970, 729.