Lila M. Gatlin, Information Theory and the Living System, 1972

My old notebooks have many references to this book, now out-of-print, but my memory of it has all but disappeared. I obtained a copy through the Philadelphia Free Library.

The key chapter is the second chapter, Entropy Is the Measure. She makes a quick definitions ("Information: the capacity to store and transmit meaning or knowledge") ands then asks how do we calculate this capacity? Turns out, she says, that Entropy is the measure of this capacity. Entropy measures the degree of randomness of a system. In a table, she listed "The Elements of the Entropy Concept:"

Higher EntropyLower Entropy
Random
Disorganized
Disordered
Mixed

Equiprobable events
Independent events

Configurational variety
Freedom of choice
Uncertainty
Higaher error probability

Potential information
Nonrandom
Organized
Ordered
Separated

D1 (divergence from equiprobability)
D2 (divergence from independence)

Restricted arrangements
Constraint
Reliability
Fidelity

Stored information

Generally, this boils down to a "fundamental, quantitative principle": The maximum entropy state is characterized by equiprobable, independent events.

Gatlin then does a tour de force of scientific writing. In a sequence of 15 formulae, she goes from a simple intuitive definition of Entropy to an advanced mathematical explanation -- and, amazingly, with some work, I am able to understand each. Here is a shortened version of the logical chain of formulae:
FormulaExplanation
S = KW "where S is Entropy, W is the thermodynamic probability, and K is an arbitrary constant. This definition is not yet complete or even correct." In other words, the Entropy (capacity to store information) is greater in a 100-page book than a 1-page book.
S(a) + S(b) = S(ab) "It is intuitively reasonable that entropy should [also] have the additive property, that is, the entropy of system A plus the entropy of system B should be equal to the entropy of the composite system AB.
S(a) = K log W(a)
S(b) = K log W(b)
S(ab) = K log W(ab)
"It is not immediately obvious how this [making Entropy additive] can be done.... Boltzmann solved this problem in the early eighteen hundreds. When numbers are expressed in the same base are multiplicative, their exponents are additive."
p = 1/W or W= 1/p(a)
S = K log 1/p(a)
S = - K log p(a)
Here comes the contribution of Claude Shannon's study of communication channels. Shannon introduces probabilities (p) instead of thermodynamic probability (W). The last step took me a long time to figure out: K log 1/p =. -K log p because of algebra and the property that the log of 1 is 0.
H = -K sum-over-all-i-of p(i) log p(i) H is new. It is the expected value of a numerical-valued random phenomenom, "which is the sum over all possible outcomes of the probability of each individual outcome multiplied by the numerical value of that individual outcome." Assuming the logarithm is to the base of 2, the unit is in bits. "It expresses the entropy of a system in terms of probabilities and may used even when all the microstates or elementary arrangements of the system are not equiprobable.

Gatlin explains: "This is a remarkable accomplishment; it takes the concept of entropy out of the restricted thermodynamic settings in which it arose historically and lifts it to the higher domain of general probability theory."

I'm actually not sure I read much beyond that second chapter, but it influenced me a lot 20 years ago.