’H (p (x, y)) ’ p (x, y) log p (x) ’

= p (x, y) log p (y)

xy xy

’H (p (x, y)) + H (p (x)) + H (p (y)) ,

=

where the second equality of (1.3) was used. The result follows directly from equation (1.4).

5. It is easily derived from subadditivity (property 4) and the de¬nition of conditional entropy (1.8).

6. First note that by the de¬nition of the joint entropy (1.7) and some algebra H (X, Y, Z) + H (Y ) ’

H (X, Y )’H (Y, Z) = x,y,z p (x, y, z) log p(x,y)p(y,z) . Then using the fact that log x ¤ x’1 for all positive

p(x,y,z)p(y) ln 2

x and equality achieved if and only if x = 1, the following can be concluded

p (x, y) p (y, z) p (x, y, z) p (x, y) p (y, z)

¤ ’1

p (x, y, z) log

p (x, y, z) p (y) ln 2 p (x, y, z) p (y)

xyz xyz

1 p (y) p (y) 1

’

= = 0,

ln 2 p (y) ln 2

y

p(x,y)p(y,z)

= 1 ” p (z|y) = p (z|x, y) ” X ’ Y ’ Z is a Markov chain, q.e.d.

with equality if and only if p(x,y,z)p(y)

7. From strong subadditivity (property 6) it straightforward that H (X, Y, Z) ’ H (Y, Z) ¤ H (X, Y ) ’ H (Y )

and from the de¬nition of conditional entropy (1.26c) the result is obvious.

8. First the result is proven for n = 2 using the de¬nition of conditional entropy (1.26c)

H (X1 , X2 |Y ) = H (X1 , X2 , Y ) ’ H (Y ) = H (X1 , X2, Y ) ’ H (X1 , Y ) + H (X1 , Y ) ’ H (Y )

H (X2 |Y, X1) + H (X1 |Y ) .

=

Now induction is going to be used to prove it for every n. Assume that the result holds for n, then

using the one for n = 2, H (X1 , . . . , Xn+1|Y ) = H (X2 , . . . , Xn+1|Y, X1 ) + H (X1 |Y ) , and applying the

inductive hypothesis to the ¬rst term on the right hand side gives

n+1 n+1

H (X1 , . . . , Xn+1|Y ) = H (Xi , . . . , Xn+1|Y, Xi’1) + H (X1 |Y ) = H (Xi , . . . , Xn+1|Y, Xi’1) ,

i=2 i=1

q.e.d.

4

9. The concavity of Shannon entropy, will be deduced by the concavity of von Neumann™s entropy in 1.38.

However here is going to be demonstrated that binary entropy (Hbin (p) H (p, 1 ’ p)) is strictly concave,

Hbin (px1 + (1 ’ p) x2) ≥ pHbin (x1) + (1 ’ p) Hbin (x2) ,

where 0 ¤ p, x1, x2 ¤ 1 and equality holds for the trivial cases x1 = x2 , or p = 0, or p = 1. This is easily

proved by using the fact that the logarithmic function is increasing and ’p (1 ’ x) ≥ ’ (1 ’ px) , hence

Hbin (px1 + (1 ’ p) x2 ) ’ (px1 + (1 ’ p) x2) log (px1 + (1 ’ p) x2)

’ [1 ’ (px1 + (1 ’ p) x2)] log [1 ’ (px1 + (1 ’ p) x2 )]

≥ ’px1 log x1 ’ (1 ’ p) x2 log x2 ’ p (1 ’ x1) log (1 ’ px1)

’ (1 ’ p) (1 ’ x2) log [1 ’ (1 ’ p) x2]

pHbin (x1) + (1 ’ p) Hbin (x2 ) .

=

The strictness of concave property is seen by noting that only in the trivial cases inequalities such as

log px1 ¤ log (px1 + (1 ’ p) x2) could be equalities. Finally concerning the binary entropy it is obvious that

d2

because dp Hbin (p) = ’ log p’1+log (1 ’ p)+1 = 0 ” p = 1 , and for p = 0, 1, dp2 Hbin (p) = ’ 1 ’ 1’p < 0,

d 1

2 p

1

the maximum is reached at p = 2 .

Some additional notes on the properties of Shannon entropy

Concluding the properties of Shannon entropy, it should be noted that the mutual information is not always

subadditive or superadditive. One counterexample for the ¬rst is the case where X and Y are independent

identically distributed random variables taking the values 0 or 1 with half probability. Let Z = X • Y, where

• the modulo 2 addition, then H (X, Y : Z) = 1 and further calculating H (X : Z) + H (Y : Z) = 0, that is

H (X, Y : Z) H (X : Z) + H (Y : Z) .

The counterexample concerning the second case is the case of a random variable X 1 taking values 0 or 1

with half probabilities and X2 Y1 Y2 X1 . Then H (X1 : Y1 ) + H (X2 : Y2 ) = 2 and in addition to this

H (X1 , X2 : Y1, Y2 ) = 1, which means that

H (X1 : Y1 ) + H (X2 : Y2) H (X1 , X2 : Y1 , Y2) .

1.2 Quantum physics and information theory

Quantum theory is another very important area of physics, which is used to describe the elementary particles that

make up our world. The laws and the intuition of quantum theory are totally di¬erent from the classical case.

To be more speci¬c quantum theory is considered as counter intuitive, or quoting Richard Feynman, ”nobody

really understands quantum mechanics”. However quantum physics o¬ers new phenomena and properties which

can change peoples view for information. These properties are going to be investigated in this section.

1.2.1 Basic features of quantum mechanics relevant to information theory

Mathematically quantum entities are represented by Hilbert space vectors |ψ , usually normalized ψ|ψ = 1.

Quantum systems evolve unitarily, that is, if a system is initially in a state |ψ 1 , it becomes later another state

|ψ 2 after a unitary operation

U |ψ 1 = |ψ 2 . (1.10)

Unitary operations are reversible, since U U † = 1 and previous states can be reconstructed by |ψ 1 = U † |ψ2 .

What is important about such operations is that because ψ 2 |ψ2 = ψ1 | U † U |ψ1 = ψ1 | I |ψ 1 = 1, normal-

ization, which soon will be interpreted as probability, is conserved. The measurement of the properties of these

objects is described by a collection of operators {Mm } . The quantum object will found to be in the m-th state

with probability

†

p (m) = ψ| Mm Mm |ψ , (1.11)

5

and after this measurement it is going to be in de¬nite state, possibly di¬erent from the starting one

Mm

|ψ . (1.12)

†

|ψ

ψ| Mm Mm

Of course m should be viewed as the measurement outcome, hence information extracted from a physical system.

This way information content can be assigned to each state of a quantum object. As a practical example, in

each energy state of an atom one can map four numbers or in each polarization state of a photon one can

map two numbers say 0 and 1. The last case is similar to bits of classical information theory, but because

a photon is a quantum entity they are named qubits (quantum bits). It should be stressed here that the

†

measurement operators should satisfy the completeness relation m Mm Mm = I which results m p (m) =

† †

m ψ| Mm Mm |ψ = 1 as instructed by probability theory. However this implies that ψ| M m Mm |ψ = 1 and

looking at equation (1.12) one understands that measurement is an irreversible operation.

What is very interesting about quantum entities is that they can either be in a de¬nite state or in a

superposition of states! Mathematically this is written

|ψ = cs |s .

s

Using the language of quantum theory |ψ is in state |s with probability c— cs , and because of normalization

s

the total probability of measuring |ψ is

c— cs = 1.

ψ|ψ = (1.13)

s

s

States |s are usually orthonormal to each other, hence

ψ|s = cs. (1.14)

Although being simultaneously in many states sounds weird, quantum information can be very powerful in

computing. Suppose some quantum computer takes as input quantum objects, which are in a superposition

of multiple states, then the output is going to be quantum objects which of course are going to be in multiple

states too. This way one can have many calculations done only by one computational step! However careful

extraction of results is needed [13, 20, 21, 22, 23], because quantum measurement has as outcome only one

answer from the superposition of multiple states, as equation (1.12) instructs, and further information is lost.

Then one can have incredible results, like for example calculate discrete logarithms and factorize numbers in

√

polynomial time [20, 21], or search an unsorted database of N objects with only N iterations [22, 23]!

In contrast to classical information which under perfect conditions can be known up to a desired accuracy,

quantum information is sometimes ambiguous. This is because one cannot distinguish non-orthogonal states

reliably. Assuming for a while that such a distinction is possible for two non-orthogonal states |ψ 1 and |ψ 2

and a collection of measurement operators {Mm } . Then according to this assumption some of the measuring

operators give reliable information whether the measured quantity is |ψ 1 or |ψ2 , and collecting them together

the following two distinguishing POVM elements can be de¬ned

†

Ei Mm Mm , i = 1, 2.

Mm measuring i

The assumption that these states can be reliably distinguished, is expressed mathematically

ψi | Ei |ψi = 1, i = 1, 2. (1.15)

Since i=1,2 Ei = I it follows that i=1,2 ψ 1| Ei |ψ1 = 1. Because E1 operator reliable measures the ¬rst

state, then ψ1 | E1 |ψ1 = 1, hence the other term must be

ψ1 | E2 |ψ1 = 0. (1.16)

Suppose |ψ 2 is decomposed in |ψ 1 and and orthogonal state to |ψ 1 , say |ψ ⊥ ; then |ψ 2 = ± |ψ1 + β |ψ ⊥ .

2 2

Of course |±| + |β| = 1 and |β| < 1 because |ψ 1 and |ψ 2 are not orthogonal. Using the last decomposition

2 2

and (1.16) ψ 2 | E2 |ψ2 = |β| ψ⊥ | E2 |ψ⊥ ¤ |β| < 1 which is in contradiction with (1.15).

6

In what concerns the results of the last paragraph it should be additionally mentioned that information gain

implies disturbance. Let |ψ and |φ be non-orthogonal states, without loss of generality assume that a unitary

process is used to obtain information with the aid of an ancilla system |u . Assuming that such a process does

not disturb the system, then in both cases, one obtains

|ψ — |u ’ |ψ — |v ,

|φ — |u ’ |φ — |v .

Then one would like |v and |v to be di¬erent, in order to acquire information about the states. However since

the inner products are preserved under unitary transformations, v|v ψ|φ = u|u ψ|φ ’ v|v = u|u = 1,

and hence are identical. Thus distinguishing between |ψ and |φ must inevitably disturb at least one of these

states.

However at least theoretically there is always a way of distinguishing orthonormal states. Suppose |i

|i i| plus the operator

are orthonormal, then it is straightforward to de¬ne the set of operators Mi

I ’ i=0 |i i|, which satisfy the completeness relation. Now if the state |i is prepared then p (i) =

M0

i| Mi† Mi |i = 1, thus they are reliably distinguished.

Another very surprising result is the prohibition of copying arbitrary quantum states. This is known as

no-cloning theorem [24, 25] and it can be very easily proven. Suppose it is possible to have a quantum photo-

copying machine, which will have as input a quantum white paper |w and a state to be copied. The quantum

photocopying machine should be realized by a unitary transformation U and if somebody tries to photocopy

two states |ψ and |φ , it should very naturally work as follows

U {|w — |ψ } = |ψ — |ψ ,

U {|w — |φ } = |φ — |φ .

2

Now taking the inner product of these relations ψ|φ = ψ|φ , thus ψ|φ = 0 or ψ|φ = 1, hence |ψ = |φ

or |ψ and |φ are orthogonal. This means that cloning is allowed only for orthogonal states! Thus at least

somebody can construct a device, by quantum circuits, to copy orthogonal states. For example if |ψ and |φ

are orthogonal then there exists a unitary transformation U such that U |ψ = |0 and U |φ = |1 . Then by

applying the FANOUT quantum gate, which maps the input to |00 if the input qubit was |0 and to |11 if the

it was |1 , and further applying U † — U † to the tensor product of outcoming qubits, then either the state |ψ

is will be copied and ¬nally get |ψψ , or |φ and get |φφ .

The basic information relevant features of quantum mechanics, analyzed in this subsection, are summarized

in Figure 1.2.

Basic features of quantum mechanics

1. Reversibility of quantum operations

2. Irreversibility of measurement

3. Probabilistic outcomes

4. Superposition of states

5. Distinguishability of orthogonal states by operators

6. Non-distinguishability of non-orthogonal states

7. Information gain implies disturbance

8. Non-cloning of non-orthogonal states

Figure 1.2: A summary of basic information relative features, of quantum mechanics.

1.2.2 The language of quantum information theory: quantum thermodynamics

As in the case of classical information theory thermodynamics should be studied for abstracting quantum

information. As it was already stated in the last paragraphs, quantum mechanics have some sort of built-in

probabilities. However this is not enough. The probabilistic features of quantum states are di¬erent from that

of classical thermodynamics. The di¬erence is demonstrated by taking two totally independent facts A and B.

7

classical

Then the probability of both of them occurring would be PAB = P (A) + P (B) . In contrast, in quantum

mechanics the wave functions should be added, and by calculating the inner product probabilities are obtained.

More analytically

quantum

( ψ A | + ψ B |) (|ψA + |ψB ) = ( ψA |ψA + ψ B |ψB ) + 2Re ψ — |ψB

PAB = (1.17)

A

in general

P (A) + P (B) + 2Re ψ — |ψB classical

= = PAB .

A

Moreover if somebody decides to encode some information with quantum objects then he is going to be interested

with probabilities of occurrence of the alphabet, exactly the same way it was done in section 1.1, and he would

never like to mess up with the probabilities already found in quantum entities. For this reason a quantum version

of thermodynamics is needed. In the sequel the name thermodynamical probabilities is used to distinguish the

statistical mixture of several quantum states, from the quantum probabilities occurring by observation of a

quantum state.

The basic tool of quantum thermodynamics is the density matrix and its simplest case is when the quantum

state occurs with thermodynamical probability p = 1. Then by (1.14) t|ψ ψ|s = c — cs and therefore

t

|ψ ψ| ,

ρ

is the natural matrix generalization of a quantum vector state. This was the de¬nition of a density matrix of a

pure state, in contradiction to mixture of states where several states occur with probabilities 0 < p < 1. Each

element of this matrix is

t|ψ ψ|s = c— cs.

ρts t

and by equation (1.13) where the normalization of vector was de¬ned, density matrix is correspondingly nor-

malized by

c— cs =

trρ = ρss = s|ψ ψ|s = 1. (1.18)

s

s s s

Moreover it is straightforward that unitary evolution (1.10) is described by

U |ψ ψ| U † = U ρU † , (1.19)

the probability for measuring the m-th state (1.11) is given by

† † † †

p (m) = ψ| Mm Mm |ψ = ψ| Mm |s s| Mm |ψ = ψ| Mm |s s| Mm |ψ = tr Mm ρMm , (1.20)

s s

and the state after measurement (1.12) is obtained by

†

Mm Mm 1 †

|ψ ψ| = Mm ρMm . (1.21)

†

† † tr Mm ρMm

|ψ |ψ

ψ| Mm Mm ψ| Mm Mm

Suppose now there is a collection of quantum states |ψ i occurring with thermodynamical probabilities pi ,

then the mixed density operator is simply de¬ned

pi |ψ i ψi | .

ρ (1.22)

i

This construction is precisely what was expected in the beginning of this subsection, because the thermody-

namical probabilities are real numbers and can be added, contrasting to the situation of quantum probabilities

for a state vector (1.17). The generalization of normalization (1.18), unitary evolution (1.19), probability of

measurement (1.20) and measurement (1.21) for the mixed density matrix are

pitr (|ψ i ψ i|) = 1,

trρ =

i

pi |ψi ψ i | = U ρU † ,

i

†

p (m) = tr Mm ρMm ,

1 †

Mm ρMm .

†

tr Mm ρMm

8

The above results are pretty obvious for the ¬rst, the second and the fourth, but some algebra of probabilities

is needed for the third (refer to [1, p.99] for details). The unitary evolution of the system can be generalized by

quantum operations

†

E (ρ) = Ei ρEi ,

i

†

where i Ei Ei = I. Quantum operations are sometimes referred in the literature [2] as superoperators.

Quantum thermodynamics, just described can be viewed as a transition from classical to quantum thermo-

dynamics. Suppose an ensemble of n particles in an equilibrium is given. For this ensemble assume that the

i-th particle is located at xi , has velocity vi and energy Ei (xi , vi) . Classical thermodynamics says that for the

i-th particle, there is a probability

1

e’βEi ,

pi = (1.23)

’βEj

je

1

to be in this state, where β = kB T , with kB Boltzmann™s constant and T the temperature. Now if the particles

where described by quantum mechanics, then if the i-th would have an eigenenergy Ei, given by the solution

of the Schr¨dinger equation H |ψ i = Ei |ψ i , where H is the Hamiltonian of the system. Now with the help of

o

equation (1.23) the density matrix as was de¬ned in (1.22) can be written as

1 1

|ψ i e’βEi ψi | = e’βH ,

|ψi pi ψi | =

ρ= ’βEj tre’βH

je

i i

which is a generalization of classical thermodynamical probability (1.23). This transition is illustrated in Figure

1.3.

Statistical behavior of classical particles:

Statistical behavior of quantum particles:

having velocity vi , at xi , and energy Ei (vi , xi) ,

at state |ψ i , given by H |ψ i = Ei |ψ i

e’βEi

with probability pi = e’βEi

’βEj

je with probability pi = tre’βH

x2

p2 , • p2, |ψ 2

“ v2

v

n

pn, ←’ •

pn, |ψ n

xn

Quantum

v3

p3, |ψ 3

•

p3 , Mechanics

x3

’’ ’ ’’

’’’’

v1

p1 , • p1 , |ψ1

x1

Figure 1.3: Transition from classical to quantum thermodynamics.

1.2.3 Von Neumann entropy and relevant measures of quantum information the-

ory

Willing to describe quantum information, one could use quantum version of entropy, and in order to justify its

mathematical de¬nition, recall how Shannon entropy was given in equation (1.2), and assume that a density

matrix ρ is diagonalized by ρ = x »x |x x| . Then naturally quantum entropy of ρ is de¬ned

’

S (ρ) »x log »x . (1.24)

x

Translating this into the mathematical formalism developed in the last subsection, the von Neumann entropy

is de¬ned

’tr (ρ log ρ) .

S (ρ) (1.25)