# e-book Finite Fields for Computer Scientists and Engineers Contents:

In computer science applications, the operations are simplified for finite fields of characteristic 2, also called GF 2 n Galois fields , making these fields especially popular choices for applications. Multiplication in a finite field is multiplication modulo an irreducible reducing polynomial used to define the finite field. Rijndael uses the characteristic 2 finite field with elements, which can also be called the Galois field GF 2 8.

It employs the following reducing polynomial for multiplication:. Multiplication in this particular finite field can also be done using a modified version of the " peasant's algorithm ". Each polynomial is represented using the same binary notation as above. Eight bits is sufficient because only degrees 0 to 7 are possible in the terms of each reduced polynomial. This algorithm uses three variables in the computer programming sense , each holding an eight-bit representation.

This is obviously true when the algorithm starts. When the algorithm terminates, a or b will be zero so p will contain the product. This algorithm generalizes easily to multiplication over other fields of characteristic 2, changing the lengths of a , b , and p and the value 0x1b appropriately. See also Itoh—Tsujii inversion algorithm. The multiplicative inverse for an element a of a finite field can be calculated a number of different ways:.

When developing algorithms for Galois field computation on small Galois fields, a common performance optimization approach is to find a generator g and use the identity:. This exploits the property that every finite field contains generators. A necessary but not sufficient condition for a polynomial to be a generator is to be irreducible.

Here, the order of the generator, g , is the number of non-zero elements of the field. Advanced Search Find a Library. Refine Your Search Year. Your list has reached the maximum number of items. Please create a new list with a new name; move some items to a new or existing list; or delete some items. Finite fields for computer scientists and engineers. Finite Fields for Computer Scientists and Engineers.

All rights reserved. Remember me on this computer. I'm going to decide right here what I think it was, what it most likely was. And this is the best way to do it. Just take the sign. This here is a real valued weight, often called the reliability, which is helpful to retain. That's the main point of what I'm going to talk to you about. If this is 0 -- that means what we received was right on this boundary between the positive and negative side -- then the reliability is 0. In likelihood terms, that means it's equally likely that what was sent was a plus 1 or a minus 1.

So in some sense, we get no information when the reliability is 0. Fails to discriminate between plus 1 and minus 1. The larger this is, the further we get out here. Or symmetrically, the further we get out here, the more certain we are that in that particular symbol, neither a plus or a minus was sent.

OK And it actually tuRns out, if you work out the Gaussian numbers, that beta k is the log likelihood ratio up to scale of the more likely versus the log of the likelihood, or the more likely versus the less likely symbol. So it has a minimum of 0. The bigger it is, the more reliable. So it's natural to call it the reliability. An awful lot of traditional algebraic decoding neglects this channel down here. It just says, OK, let's take these hard decisions and try to decode them. If you've previously had a coding class, then that's probably what they did.

Or they assumed a context in which this was never available in the first place. And my only point here, my main point, is that that's a very bad thing to do. How can we evaluate that? Suppose we just take this top channel and don't look at the reliability. Then basically we have the channel model becomes a binary symmetric channel.

We have bits in, bits out, and we have probably one minus p that if we send a 0 we get a 0. And since it's symmetric, the same probability -- if we send a 1, we get a 1 -- and a probability p of making an error. So we get the traditional memoryless binary symmetric channel model. We can compute the capacity of this model and compare it to the capacity of the additive white Gaussian channel -- this model.

And depending on the signal to noise ratio, from a capacity calculation, we find that there is of the order of 2 or 3 dB loss. Well, as we go on in this course, we're going to find that 1 dB is a very worthwhile coding game. So to throw away 2 or 3 dB just by throwing away this information, is a bad thing to do.

I can make this point in a different way by simply looking at what's the optimum decision rule. For say, let me take very simple code, just the repetition code. That's just 0,0,1,1. This is the code where you either send plus, plus, plus alpha, plus alpha, or minus alpha, minus alpha.

And what's the decision region? It's obviously this 45 degree line. And what is this squared distance to any decision region? It's 2 alpha squared, right? So basically, the probability of making an error is the probability that the noise variable has a magnitude in this dimension, and in this direction, greater than the square root of 2 alpha squared. You've been through all those calculation several times now. So this is -- if I keep reliability info -- in other words, if I keep the full received signal -- if I discard reliability, then basically what can I get out?

There are only four possible things I can see in two transmissions through this channel. They're all binary two-tuples. And what does that mean? That means if my actual r is in this quadrant, then I'm going to make a hard decision of 0 and 0 both times. So I'm going to say I'm in this quadrant. And likewise, I'm basically going to decide which of these four quadrants I'm in, and that's all the information the decoder is going to have.

So here I am at this point here. Now the decoder simply knows, via these two bits, which quadrant you're in. Now, it has to decide which of these two code words were sent. What's its maximum likelihood decision rule given just this information? And in the other case, any one. That's correct. So clearly, if you land in this quadrant, you decide 0, 0. Now, if you land down here, you decide 1, 1 in this quadrant.

And what am I going to do here or here? Actually, here the evidence is totally balanced. I have nothing that tells me whether to go one way or another. I could make an arbitrary decision in the hope of minimizing my error probability. Flip a coin. But whichever decision I make, what I'm going to find is that there's a probability of error that -- there's a certain noise, namely the noise that takes me from here to this decision boundary that is going to cause an error.

Or if I decide the other way, it would be the one that goes from here to here. So I'm going to be stuck. There are going to be certain noise variables of length, now, merely alpha, or square distance alpha squared, that are going to cause me to make a decision error, ultimately. Regardless of how I set up this final block here. Whatever rule I give to this decoder, it's only going to take, worst case, a noise of squared magnitude alpha squared to cause an error. I shouldn't have thrown this away. That's the elementary point I'm trying to make here.

I mean, we can put it that way. And this is what shows up in our union-bound estimate in all of our -- we do this kind of minimum distance decoding. But if I throw away this important information, then my effective minimum squared distance is only alpha squared. That is the bottom line here. In dB terms, how much of a cost is that? So I've cost myself a factor of 2 in noise margin. So 3 dB loss. Because of lack of time, I won't go through the argument in the notes, which shows that, in fact, exactly the same thing occurs for any -- whenever the minimum Hamming distance is even here in this code that I started from, you lose precisely 3 dB.

## Finite Fields for Computer Scientists and Engineers | Robert J. McEliece | Springer

It's not quite as clean an argument when the minimum Hamming distance is odd. Then you lose up to 3 dB, and it goes to 3 dB as the distance increases. But there's a very, pretty elementary geometric argument that hard decisions, again, cost you 3 dB loss, which is consistent with what the capacity calculation gives you. Well, obviously the reason we did this is we were trying to simplify things. What would be the first step to unsimplify things and get some of this loss back?

Let me suggest that what this amounts to is a two-level quantization of the received real number rk. What's the next number higher than 2? How about a 3 level quantization here? So what we're talking about is a highly quantized magnitude where instead of just making a decision boundary here, which is effectively what we did for 2 level quantization, let's make some null zone here between some threshold plus t and some threshold minus t. And we'll say, in this region, the received symbol is unreliable. That's called an erasure. So we make a quantized magnitude where rk -- doesn't fit so neatly here.

Let me just say this quantized magnitude is going to give me out something which is either going to be minus 1, a 0, or plus 1. It's going to be three levels, this is minus 1, 0, plus 1. Or I don't want the minus 1 there.

• Finite fields for computer scientists and engineers..
• Finite Fields and Their Applications?
• International Workshop on the Arithmetic of Finite Fields.
• Sentence Connection: Illustrated Chiefly from Livy.
• On My Own: Korean Businesses and Race Relations in America.

The reliability is either a fixed reliability, or it's 0. So now I've got a channel model that looks like this. If you work it out, it's called the binary erasure channel with errors. All transitions are possible. I can send a 0,1. I can receive, let's call it, a 0, a 1, or a question mark for -- this is called an erasure. And this is 1 minus p minus q, and this is p and this is q symmetrically, for some p and q, which you can evaluate. And let's choose this threshold t to optimize things and get the maximum capacity.

Now if you do the capacity calculation, you'll find there's only 1 to 1. Again, depending on the signal to noise ratio. So in effect, you've already bought back, just with this very simple method, half of the loss that you inflicted on yourself by making hard decisions.

So making erasures is a good first step towards correcting this problem. And let's think about it again for this simple code. Suppose I establish these thresholds at plus t and minus t. Plus t and minus t. So now when I consider the problem that the decoder has, I have nine possible regions, all right, each with three possible decisions in two dimensions.

And so what's my decision rule going to be now? For this 0, 0 decision, I'm going to include, of course, this region. But now if I land in this region, that means one of the received symbols was erased, but the other one gave me a definite indication. So the weight of evidence still goes up here.

And similarly, like so. So my decision region for 0 -- definitely for 0, 0 is this region. And my decision region definitely for 1, 1 is the symmetric region. And, of course, I still have three regions where I really can't say anything. This is two erasures. This is 0, 1 -- two conflicting pieces of evidence. This is 1, 0 -- two conflicting pieces of evidence. So I still have to flip a coin out in here. But how I improve things, that's measured by what's the minimum distance to -- what's the minimum size of error it takes to make a decision error.

And that's going to be either this length or this length. You see that? Just pretty clear, intuitively? So the game here is we first of all choose t so that these two lengths are the same. That's done by squeezing t. You see these go in opposite directions as t is raised or lowered. So we find the t such that these two are the same. Having equalized t, we find some value for the effective minimum squared distance, which is between alpha squared and 2 alpha squared.

As I remember, this somehow gains about 1. The moral is about the same. You can get about half of your loss back by using erasures. And again, this holds for any code which has an even minimum Hamming distance, if you can accept this method of analysis. So a first step, even if you want to stay in the binary world, is to allow yourself to use erasures as well. That will claw back half of the loss that you might have incurred by using hard decisions. And, of course, even better would be to use a more highly quantized reliability and somehow have a decoding algorithm that can use soft decisions.

Soft decisions are decisions that have reliability metrics attached to them. And in the early days of coding, people evaluated capacity. They evaluated this sort of thing. And it was pretty generally agreed that eight level quantization was going to be practically good enough. Nowadays you typically go up to 64 levels. You might go up to much higher numbers if you have little other things to worry about like timing recovery and so forth. But if you're purely in a synchronized, symbol by symbol transmission, then three or four bits of reliability information are going to be enough.

So from an engineering point of view, that's a good way to go. Now, of course, you're going to need a decoding algorithm that can use soft decisions. So as we go along, I'm going to talk about error correcting decoding algorithms that are not much, because correcting errors only is a very bad thing to do here. Errors and erasure. Correcting decoding algorithms, which are fairly easy in algebraic code context. And then finally soft decision decoding algorithms, which is the kind we really want on this channel.

We really want to use the soft decisions. We can't afford these huge losses. OK, so there's more set on this in the last part of chapter six, but that's all the time I want to spend on it in class. Does anyone have any questions? But think about it from an information theoretic point of view. Can you actually get more information by introducing a one-to-one transformation, whether it's linear or nonlinear, whatever? That's going to imply decisions that are not symbol by symbol decisions. That's going to imply that you need to save r1 and r 2 in order even to just decide where you're in this and then go through some kind of quantization on the plane. And once you've got r1 and r2 and you're trying to draw crazy boundaries, I suggest it's going to be simpler to just compute the Euclidean distance to each of these points, which results in this decision region. I don't see the rationale for it yet, but there's been a million innovations in this business, and you might have a good idea. Any other comments? I really appreciate comments that go anywhere.

I like to address them now when they're ripe. So let's go on to chapter seven and eight. Chapters seven and eight are closely related. This is really my bow to algebraic coding theory. I expect that if any of you have had a close course in coding before, that it was primarily on algebraic coding theory. How many of you have had a course in coding theory before? One, two, three. Not so many. Was it on algebraic coding theory? Well, you're a ringer then. What was the one on this stuff?

I don't know that book. But it talks about sophisticated soft decision type coding, and so forth. What was the course you took? Well, let me just make a broad brush comment, which may or may not be supported by your previous exposure to coding theory. Which is that for many, many years, when people said coding theory, they meant algebraic coding theory, coding theory of block codes that are constructed by algebraic techniques like Reed-Muller codes, like Reed-Solomon codes, BCH codes, cyclic codes. And if you said, I want to leaRn some coding theory, asked your graduate student to go buy a textbook for you, it's probably going to be a textbook on algebraic coding theory.

Meanwhile, there was a bunch of us who were off actually trying to construct codes for real channels. And conceRning with really how complex this is to implement, and what kind of performance can we really get. And we hardly ever used algebraic codes. Initially, we used convolutional codes with various kinds of decoding algorithms, threshold decoding, sequential decoding, the Viterbi algorithm. And these get more and more elaborate, and then finally the big step, well, the big step was to capacity approaching codes.

Which now people are starting to call modeRn coding theory. Long, random-like codes that have an iterative decoding algorithm. And that's what we'll get to towards the end of this course. So from an engineers point of view, to some extent, of all this work on n k d and algebraic decoding algorithms and so forth was a massive distraction.

### Collections

It really missed the point. Sometimes because it assumed hard decisions. Or after assuming hard decisions, it assumed bounded distance decoding algorithms. But it was all -- from Shannon's perspective, it was too deterministic, too constructed, too structured. You want more random elements in your coding scheme. However, two things: one, this theory is a beautiful theory, both the mathematical theory of finite fields and the coding theory, particularly of Reed-Solomon codes, which are uniquely the greatest accomplishment of algebraic coding theory, and which have proved to be very useful.

And B, Reed-Solomon codes are something that as engineers, ought to be part of your tool kit, and you will find very useful a variety of situations. So the objective of this part of the course is, within some proportion within the overall scheme of the course, to give you exposure to this lovely theory.

You can leaRn all about it just by reading a book or two. Probably less than one book. And to give you some exposure to Reed-Solomon codes, which have been used in practice. For more than 20 years, the deep space standard was a concatenated code.

## Finite field arithmetic

That means a sequence of two codes, of which the inner code was a convolutional code decoded by the Viterbi algorithm, which we'll talk about after this. And then the outer code, the code that cleans up errors made by the inner code, was a Reed-Solomon code of length over the finite field with elements. And that's the dynamite combination that dominated the power-limited coding world for at least 20 years.

So you ought to know, as a minimum, about Reed-Solomon codes. So that's my objective in going through these next two chapters. However, because these have not basically been on the winning path to get to the Shannon limit, I'm trying to limit it to three weeks max. And therefore, that means that the presentation is going to be a lot faster than what you're used to.

I'll probably do as much in these three weeks as most people do in a term in an entire algebraic coding theory course. That's an exaggeration -- people do a lot more -- but I try to at least hit all the main points that you need to know. So I apologize in advance if it seems that now we're on a very fast moving train. But you ought to know this stuff.