. 1
( : 7)



. . >>

Quantum Information Theory
and
Applications to Quantum Cryptography




By

Nikolaos P. Papadakos
B.Sc. in Mathematics, University of Athens, 2001




Supervisor
Professor Abbas Edalat




Individual Study Option,
on the First Term of the Academic year 2001-2002,
for the M.Sc. in Advanced Computing




Department of Computing
Imperial College of Science, Technology and Medicine
University of London, United Kingdom
Contents

List of Figures iii

Acknowledgements iv

About this Individual Study Option v

1 Physics and information theory 1
1.1 Classical physics and information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Shannon entropy and relevant measures of classical information theory . . . . . . . . . . . 2
1.1.2 Basic properties of Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Quantum physics and information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Basic features of quantum mechanics relevant to information theory . . . . . . . . . . . . 5
1.2.2 The language of quantum information theory: quantum thermodynamics . . . . . . . . . 7
1.2.3 Von Neumann entropy and relevant measures of quantum information theory . . . . . . . 9
1.2.4 A great discrepancy between classical and quantum information theory: quantum entan-
glement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 Basic properties of von Neumann entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.6 Measurement in quantum physics and information theory . . . . . . . . . . . . . . . . . . 14

2 Some important results of information theory 16
2.1 Accessible information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 16
2.1.1 Accessible classical information: Fano™s inequality . . . . . . . . . ......... . . . . 16
2.1.2 Accessible quantum information: quantum Fano inequality and the Holevo bound . . . . 17
2.2 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 18
2.2.1 Classical data processing . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 18
2.2.2 Quantum data processing . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 19

3 Real-life applications of information theory 20
3.1 Data compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.1 Shannon™s noiseless channel coding theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.2 Schumacher™s quantum noiseless channel coding theorem . . . . . . . . . . . . . . . . . . . 21
3.2 Information over noisy channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Classical information over noisy classical channels . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Classical information over noisy quantum channels . . . . . . . . . . . . . . . . . . . . . . 25
3.2.4 Quantum information over noisy quantum channels . . . . . . . . . . . . . . . . . . . . . . 26

4 Practical quantum cryptography 27
4.1 Theoretical principles of quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 The BB84 quantum key distribution protocol . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.2 Information reconciliation and privacy ampli¬cation . . . . . . . . . . . . . . . . . . . . . 29
4.1.3 Privacy and security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.4 A provable secure version of the BB84 protocol . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 A commercial quantum cryptographic device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Experimental quantum key distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Commercial quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


i
Summary 33

Appendices 35

A Omitted proofs 35
A.1 Special diagonal transformation of normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A.2 Projective measurements are equivalent to an addition of unitary operations . . . . . . . . . . . . 36

B Distance measures of information 37
B.1 Classical information distance: Hamming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B.2 Quantum information distance: Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Bibliography 38




ii
List of Figures

1.1 Relationships between di¬erent entropies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 A summary of basic information relative features, of quantum mechanics. . . . . . . . . . . . . . 7
1.3 Transition from classical to quantum thermodynamics. . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Quantum data compression. The compression operation Cn compresses a quantum source ρ stored
in n log d qubits into nS (ρ) qubits. The source is accurately recovered via the decompression
operation Dn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 The noisy channel coding problem for classical messages. It is required that every one of the 2 nR
possible messages should be sent uncorrupted through the channel with high probability. . . . . . 25

4.1 Alice and Bob communicating with the fear of Eve stealing information. . . . . . . . . . . . . . . 27
4.2 The setup of BB84 quantum key distribution protocol. The quantum line is used for the distri-
bution of the key, while the classical line for discussing security topics and the transmission of
encrypted messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Typical system for quantum cryptography using polarization coding (LD: laser diode, BS: beam-
splitter, F: neutral density ¬lter, PBS: polarizing beam splitter, »/2: half waveplate, APD:
avalanche photodiode). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
S.1 Summary of classical and quantum information theory. . . . . . . . . . . . . . . . . . . . . . . . . 33
S.2 Summary of quantum cryptography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34




iii
Acknowledgements

I am very pleased to thank my supervisor Professor Abbas Edalat, who with his important comments not only
did he help me improve the presentation of this Individual Study Option, but by suggesting me to solve all
64 exercises of the last two chapters of [1, chap.11,12] he contributed dramatically to my understanding and
augmented my self-con¬dence on the ¬eld. I am indebted to him, for his e¬ort of transforming the present
material into something more than just an Individual Study Option.




iv
About this Individual Study Option

In this Individual Study Option the concepts of classical and quantum information theory are presented, with
some real-life applications, such as data compression, information transmission over noisy channels and quantum
cryptography.

Abstract
In each chapter the following topics are covered:

1. The relation of information theory with physics is studied and by using the notion of Shannon™s entropy,
measures of information are introduced. Then their basic properties are proven. Moreover important
information relevant features of quantum physics, like the disability of distinguishing or cloning states
in general are studied. In order to get a quantum measure of information, an introduction to quantum
thermodynamics is given with a special focus on the explanation of the utility of density matrix. After this
von Neumann entropy and its related measures are de¬ned. In this context a major discrepancy between
classical and quantum information theory is presented: quantum entanglement. The basic properties of
von Neumann entropy are proven and some information theoretic interpretation of quantum measurement
is given.
2. The amount of accessible information is obtained by the Fano™s inequality in the classical case, and by its
quantum analogue and Holevo™s bound in the quantum case. In addition to this classical and quantum
data processing is discussed.
3. Furthermore for real-life applications data compression is studied via Shannon™s classical and Schumacher™s
quantum noiseless channel coding theorems. Another application is transmission of classical information
over noisy channels. For this a summary of classical and quantum error correction is given, and then
Shannon™s classical noisy channel and Holevo-Schumacher-Westmoreland quantum noisy channel coding
theorems are studied. The present state of transmission of quantum information over quantum channels
is summarized.
4. A practical application of the aforementioned is presented: quantum cryptography. In this context the
BB84, a quantum key distribution protocol, is demonstrated and its security is discussed. The current
experimental status of quantum key distribution is summarized and the possibility of a commercial device
realizing quantum cryptography is presented.

Special viewpoints emphasized
The above material is in close connection with the last two chapters of [1, chap.11,12], and error correction is a
summary of the results given in chapter 10 of [1]. From this book Figures 1.1, 3.1, 3.2 and S.1, were extracted.
However there was much in¬‚uence on information theory by [2], and some results given therein like for example
equation (3.1) were extended to (3.2). Of equal impact was [3] concerning error correction, and [3, 4] concerning
quantum cryptography. Note that Figure was 4.2 extracted from [4]. Of course some notions which were vital
to this Individual Study Option were taken from almost all the chapters of [1].
However there where some topics not well explained or sometimes not enough emphasized. The most
important tool of quantum information theory, the density matrix, is misleadingly given in [1, p.98]: ”the
density matrix [...] is mathematically equivalent to the state vector approach, but it provides a much more
convenient language for thinking about commonly encountered scenarios in quantum mechanics”. The density
matrix is not equivalent to the state vector approach and is much more than just a convenient language. The
former is a tool for quantum thermodynamics, a ¬eld where is impossible to use the latter. Rephrasing it,


v
quantum thermodynamics is the only possible language. Willing to augment understanding of this meaningful
tool, subsection 1.2.2 is devoted to it, and is close to the viewpoint of [5, p.295-307], which is a classical textbook
for quantum mechanics.
In this Individual Study Option some topics are presented in a di¬erent perspective. There is an attempt to
emphasize information theoretic discrepancies between classical and quantum information theory, the greatest of
which is perhaps quantum entanglement. A very special demonstration of this di¬erence is given in subsection
1.2.4. It should be noted that some particular information theoretic meaning of measurement in physics is
presented in subsection 1.2.6.
Concerning the di¬erent views presented in this text, nothing could be more bene¬cial than the 64 exercises
of the last two chapters of [1, chap.11,12]. As an example a delicate presentation of measures of information is
given in page 1, due to exercise 11.2 [1, p.501], or subsection 1.2.4 on quantum entanglement was inspired by
exercise 11.14 [1, p.514]. In some occasions special properties where proved in order to solve the exercises. Such
a property is the preservation of quantum entropy by unitary operations (property 2 of von Neumann entropy,
in page 11), which was needed to solve for example exercise 11.19,20 [1, p.517,518]. For these last two exercises
some special lemmas were proven in appendices A.1 and A.2. It is important to notice that from the above
mentioned exercises of [1, chap.11,12] only half of them are presented here.

Notation
Finally about the mathematical notation involved, the symbol should be interpreted as ”de¬ned by”, and the
symbol ≡ as ”to be identi¬ed with”.




vi
Chapter 1

Physics and information theory

Since its invention, computer science [6] was considered as a branch of mathematics, in contrast to information
theory [7, 8] which was viewed by its physical realization; quoting Rolf Landauer [9] ”information is physical”.
The last decades changed the landscape and both computers and information are mostly approached by their
physical implementations [10, 11, 12, 13]. This view is not only more natural, but in the case of quantum
laws it gives very exciting results and sometimes an intriguing view of what information is or can be. Such an
understanding could never be inspired just from mathematics. Moreover there is a possibility that the inverse
relation exists between physics and information [14, 15] or quoting Steane [3] one could ¬nd a new methods of
studying physics by ”the ways that nature allows, and prevents, information to be expressed and manipulated,
rather than [ the ways it allows] particles to move”. Such a program is still in its infancy, however one relevant
application is presented in subsection 1.2.6. In this chapter the well established information theory based on
classical physics is presented in section 1.1, and the corresponding up to date known results for quantum physics
are going to be analyzed in section 1.2.


1.1 Classical physics and information theory
In classical physics all entities have certain properties which can be known up to a desired accuracy. This fact
gives a simple pattern to store and transmit information, by assigning information content to each of the property
a physical object can have. For example storage can be realized by writing on a paper, where information lays
upon each letter, or on a magnetic disk, where information, in binary digits (bits), is represented each of the
two spin states a magnetic dipole can have. In what concerns transmission, speech is one example, where each
sound corresponds to an information, or a second example is an electronic signal on a wire, where each state
of the electricity is related to some piece of information. Unfortunately in every day life such simple patterns
are non-economical and unreliable. This is because communication is realized by physical entities which are
imperfect, and hence they can be in¬‚uenced by environmental noise, resulting information distortion.
Concerning the economical transmission of information, one can see that the naive pattern of information
assignment presented in the last paragraph is not always an optimal choice. This is because a message, in
English language for example, contains symbols with di¬erent occurrence frequencies. Looking for example
this text one can note immediately that the occurrence probability of letter a, p a , is much greater than that
of exclamation p! . According to the naive assignment, English language symbols are encoded to codewords of
identical length l, and the average space needed to store a is lpa and of the exclamation lp! , and since pa > p! ,
a lot of space is wasted for the letter a. In order to present how a encoding scheme can be economical consider
a four letter alphabet A, B, C, D, with occurrence probabilities pA = 3 , pB = 1 , pC = pD = 16 , the subsequent
1
4 8
assignment of bits: A ’ 1, B ’ 01, C ’ 010, and D ’ 011. A message of n symbols, using this encoding, has
on average n (pA + 2pB + 3pC + 3pD ) = n 3 + 2 1 + 3 16 + 3 16 = n 11 bits instead of 2n which would needed
1 1
4 8 8
if somebody just mapped to each letter a two bit codeword.
The topics discussed in the last two paragraphs give rise to the most important information theoretic
question: which are the minimal resources needed to reliably communicate? An answer to this question can be
given by abstractly quantifying information in relevance to the physical resources needed to carry it. Motivated
by the previously demonstrated four letter alphabet example, probabilities are going to be used for such an
abstraction. One now de¬nes a function H, quantifying a piece of information I, exhibiting the following
reasonable properties:


1
1. H (I) is a function only of the probability of occurrence p of information I, thus H (I) ≡ H (p) .
2. H is a smooth function.
3. The resources needed for two independent informations with individual probabilities p, q > 0 are the sum
of the resources needed for one alone, or in mathematical language H (pq) = H (p) + H (q) .

The second and third property imply that H is a logarithmic function, and by setting q = 1 in the third it
is immediate to see that H (1) = 0. Hence H (p) = k log p, where k and a are constants to be determined (refer
to comments after equations (1.2) and (1.5)). This means that the average of resources needed when one of the
mutually exclusive set of information with probabilities p1, . . . , pn occurs is

H (p1, . . . , pn) = k pi loga pi . (1.1)
i

It should be noted that probability is not the only way of quantifying information [14, 15, 16, 17, 18, 19].

1.1.1 Shannon entropy and relevant measures of classical information theory
The function Q found in (1.1) is known in physics as entropy, and measures the order of a speci¬c statistical
system. Of course one interesting physical system is an n-bit computer memory, and if all the possible cases of
data entry are described by a random variable X with probabilities p1, . . . , p2n , then the computer™s memory
should have an entropy given by

H (X) ≡ H (p1, . . . , p2n ) ’ px log px . (1.2)
x


Here a modi¬ed version of (1.1) is used, with log ≡ log2 and k 1, an assignment to be veri¬ed after equation
(1.5). Equation (1.2) is known in information theory as the Shannon™s entropy [7]. There are two complementary
ways of understanding Shannon™s entropy. It can be considered as a measure of uncertainty before learning the
value of a physical information or the information gain after learning it.
The Shannon entropy gives rise to other important measures of information. One such is the relative entropy,
which is de¬ned by
px
H (px||qx) = ’H (X) ’
px log px log qx, (1.3)
qx
x x

and is a measure of distance between two probability distributions. This is because it can be proven that

H (px ||qx) > 0, (1.4)
H (px ||qx) 0 ” ∀x : px = qx .
=

Of course it is not a metric because as one can check H (px ||qx) = H (qx||px) is not always true. The relative
entropy is often useful, not in itself, but because it helps ¬nding important results. One such is derived using
the last equality in (1.3) and (1.4), then in a memory with n-bits

log 2n = n,
H (X) < (1.5)
1
log 2n = n ” ∀i : pi =
H(X) = ,
2n
which justi¬es the selection of k = 1 in (1.2), because in the optimal case and in absence of noise, the maximum
physical resources needed to transmit or to store an n-bit word should not exceed n. One can also see from
(1.2) that

H (X) > 0, (1.6)
0 ” system X is in a de¬nite state (p = 1).
H (X) =

Other important results are deduced relative entropy, and concern useful entropic quantities such as the
joint entropy, the entropy of X conditional on knowing Y, and the common or mutual information of X and



2
Y. Those entropies are correspondingly de¬ned by the subsequent intuitive relations


H (X, Y ) pxy log pxy , (1.7)
xy

H (X, Y ) ’ H (Y ) ,
H (X|Y ) (1.8)
H (X) + H (Y ) ’ H (X, Y ) = H (X) ’ H (X|Y ) ,
H (X : Y ) (1.9)

and can be represented in the ™entropy Venn diagram™ as shown in Figure 1.1.

"¢
 #"¢




¨¤
 ! ¤¢
 ©¨¦¤£¢ 
§¥ ¡




Figure 1.1: Relationships between di¬erent entropies.


1.1.2 Basic properties of Shannon entropy
It is worth mentioning here the basic properties of Shannon entropy:

1. H (X, Y ) = H (Y, X) , H (X : Y ) = H (Y : X) .
2. H (Y |X) ≥ 0 and thus by the second equality of (1.9) H (X : Y ) ¤ H (Y ) , with equality if and only if Y
is a function of X, Y = f (X) .
3. H (X) ¤ H (X, Y ) , with equality if and only if Y is a function of X.
4. Subadditivity: H (X, Y ) ¤ H (X) + H (Y ) with equality if and only if X and Y are independent
random variables.
5. H (Y |X) ¤ H (Y ) and thus by the second equality of (1.9) H (X : Y ) ≥ 0, with equality in each if and
only if X and Y are independent variables.
6. Strong subadditivity: H (X, Y, Z) + H (X) ¤ H (X, Y ) + H (Y, Z) , with equality if and only if Z ’
Y ’ X forms a Markov chain.
7. Conditioning reduces entropy: H (X|Y, Z) ¤ H (X|Y ) .
8. Chaining rule for conditional entropies: Let X1 , . . . , Xn and Y be any set of random variables, then
n
H (X1 , . . . , Xn |Y ) = i=1 H (Xi |Y, X1, . . . , Xi’1) .
9. Concavity of the entropy: Suppose there are probabilities qi ≥ 0, pi > 0 and then H ( ≥
i pi qj )
i pi H (qj ) , with equality if and only if qj s are identical.

The various relationships between entropies may mostly be derived from the ™entropy Venn diagram™ shown
in Figure 1.1. Such ¬gures are not completely reliable as a guide to properties of entropy, but they provide a
useful mnemonic for remembering the various de¬nitions and properties of entropy. The proofs of the above
mentioned properties follow.
Proof


3
1. Obvious from the de¬nition of joint entropy (1.7) and mutual entropy (1.9).
2. Based on the de¬nition of conditional entropy (1.8)

H (Y |X) H (X, Y ) ’ H (Y ) = ’ p (x, y) log p (x, y) + p (y) log p (y)
xy x
p (x, y)
’ p (x, y) log p (y) = ’
= p (x, y) log p (x, y) + p (x, y) log
p (y)
xy xy xy
’H (p (x, y) ||p (y)) ,
=

and using (1.4), where equality holds if and only if p (X, Y ) = p (Y ) , that is Y = f (X) .

3. It is immediately proven by the second one and using the de¬nition of conditional entropy (1.8).
4. Following the subsequent steps of calculation

H (p (x, y) ||p (x) p (y)) ’H (p (x, y)) ’ p (x, y) log p (x) p (y)

. 1
( : 7)



. . >>