Most recent change of Random

Edit made on November 11, 2012 by ColinWright at 17:38:30

Deleted text in red / Inserted text in green

WM
HEADERS_END
[[[>30
THIS IS WEAK !!! - OVER TO OTHER CONTRIBUTORS: Other topics to cover on the same subject:
* MONTE CARLO METHODS, Monte Carlo methods,
* PSEUDORANDOM NUMBER GENERATORS, pseudorandom numbers,
** and their generators

* ERNIE,
* TESTS FOR RANDOMNESS
tests for randomness
]]]
Randomness is an important and useful concept in Mathematics
which is, unfortunately, very difficult to define.

If a sequence of numbers is chosen from a set, the sequence
is said to be random if each number, at every choice, is equally
likely to be selected.

In this case, the process determines whether the numbers are random.

However random number generators in computers and calculators
fail this definition absolutely, as the process chosen to generate
the sequence of numbers is deterministic. If you knew the method
that has been chosen by the designers of the device, you can predict
the numbers exactly.
---------------------------------------------

Randomness.

Preamble.

I am very pleased to read the above lines from the previous writer on this very difficult subject – it is the very first publication in print that I have seen anywhere that corroborates my own belief also that random per se essentially means “equal probability” but it is a contextual thing also that has two hats to it that one must wear at different times. You should wear the ‘haphazard’ hat if you are counting the splurges of radiation from a piece of Welsh granite say, or playing Ludo with your children on the kitchen table and your ‘scientific random’ hat when equal probability is required in serious algorithmic scientific implementations.

I feel vindicated having read these lines because I have been fighting single-handed on a lone front with some highly partisan colleagues for several years now who have railed against my views on randomness for some time.

I disagree with the writer however that the fact of being deterministic is sufficient to preclude a set from being random – in fact ‘random’ as I define it, is highly deterministic but in a particular context. What he describes as randomly generated numbers are more *pseudo-random sets that have to be accepted as having some repeats which of course makes them not equally probable and therefore something less than random.

Digressing.

I purchased a book from Amazon on randomness thinking “this is it – I am going to know all about randomness when I have read this “ - but what a let down – the author who was on a very comfortable research bursary from a noted university in America spent almost 100% of her time saying the same thing over and over again – she gave hundreds of examples of instances of randomness that never said anything new – especially on how to acquire randomness in a set that is required to have that property.

She was pushing against an open door i.e. wrongly reinforcing her private belief that haphazard alone is randomness - she was preaching this so much that I was inwardly saying “okay I give in – I’ll come quietly”- she obviously did not know her randomness and was quite honestly labouring a wrong understanding that she thought was covering all aspects of randomness – I’m sure she meant well but that was not what I wanted. I wanted to know how I could acquire randomness in general and in particular, how to acquire randomness in cryptography.

I was open also to all applications of how to acquire and use the property of randomness anywhere in science no matter what the application might be.


--------------------------------------------

Randomness Applied to Cryptography- An Enigma for the Understanding

Let me zoom in as quickly as possible into randomness in cryptography. Imagine if you will that you are looking over the shoulder of a cryptanalyst at work. He has illegally intercepted a batch of ciphertext and is going through it thoroughly trying to decipher the long string of ciphertext item by item that is before him. He knows the encryption and decryption algorithms very, very well but does not know the unique keyset that will enable him to decipher each item. Let us say that he guesses the type and the range of the keys as a basis for a ‘brute force’ computer program. Brute forcing means exhaustively testing a set of specimen data that will flag a result based on probability.

He has arrived now at point where he must select a key from ‘n’ possibilities of key in order to complete the illegal decryption of the item of ciphertext in hand. His problem is that each element of the true set of possibilities has equal probability of being the right one and there is total uncertainty in his mind about which is the right one even if he could identify it as belonging to the set. He can only guess which might be a right one. It is totally impossible for him to set his brute force program to flag a result with any degree of certainty.

Applying this brute force to the full set of ciphertext that he is dealing with in turn, the same uncertainty exists in each case of any item being guessed at so that finally the best he can do is to say that the true plaintext message is any one of ‘n’ possible and equally likely plaintext messages – he might as well just write an essay of what the messagetext is - total failure in other words. He failed because he could not flag a result with any degree of certainty. He experienced total uncertainty in the attempt to crack the secrecy of the ciphertext that he intercepted, in other words.

What has made it impossible for him to succeed is the equal probability of the elements of the true key set that prevented him from setting a value to the flag of the brute force program - that is randomness – in cryptography.

I like to call this “scientific” randomness to distinguish it quite distinctly from any thing else. A truly random key set translates as total uncertainty to cryptanalysts. The whole object in acquiring a random key set is in making sure that it has equal probability between the elements. Each element must have equal probability of being the next one to be called by whatever retrieval system uses the key set. The retrieval system most likely to be the one in question is undoubtedly the computer program of an encryption/decryption scheme that the entities of a secure communications system are using.

The chicken or the egg?

Which comes first i.e. which one dictates the other is my demonstration question here because there is a lot of wrong thinking in some quarters that random data sets can be procured and stored independently in batch mode for general use later in a multitude of situations –(“I’ll keep that on handy file for whenever I need it” style). This is completely wrong. The data of a keyset must be validated by the algorithm that is going to use it and a keyset can only be compiled by the designer of the encryption /decryption algorithm after he has perfected the latter. It is not possible to create a random key set independently and apart from the algorithm that is later going to use it – this a widely held misunderstanding.

It is the encryption algorithm design that comes first always. Once that design is clear to the designer, only then does it become apparent what data can be used in it. The design person is then able to collect enough suitable data to form the keyset. The designer makes sure that in addition to setting the bounds and type of the data that there are no repeats of elements in the key set so that each element has equal probability and the set of keys will then be random by the definition. The person can then key in the required keyset with deliberate unhurried ease - there is no mechanical generator required - no intellectaual blindfolding to ensure unbiased thinking required - just plain keyboarding and nothing more.

This latter definition requires some contingency explanation as follows - an entire random module of n elements (say) may be multiplied to form a larger key set of (n x m <= n is multiplied by m) elements and still be random because the separate elements are still equal in number and therefore have the same probability as before.
The operative word at all times in any description of randomness is equal probability.

Keysets do not have to be generated sporadically to be random, on the contrary they must be keyed in studiously, bounds may be set by calculation and other conditions stated about the ‘type’ (as in object-oriented programming) of the data. Clearly then, it is the encryption algorithm that validates the key set – not the other way round. It is quite wrong to try to create a key set beforehand in isolation of the algorithm that is going to use it later.

There is no mystery about randomness. It is not necessary to demonstrate impartiality, intellectual blind folding, absence of bias in the selection process, spontaneity or anything else for randomness - just equal probability is all that is needed and nothing else.

Note: If a random keyset has ‘n’ elements then it has n! (n factorial) permutations of arrangement of the elements, each and every one of the permutations is a distinct random set in itself. This means that a keyset belonging to a computer-driven encryption program may be scrambled in n! (n factorial) ways to give the cryptographer/user an enormous number of fresh keysets, each one of which is a random keyset on its own. I make heavy use of this fact in my own work.

Comment.

Cryptography-orientated people should see ”equal probability” as the scientific meaning of randomness that has nothing to do with haphazard randomness. I call it ‘scientific’ randomness especially to labour the difference – it is a binary property of a keyset, you either have it altogether or not at all. There are no degrees of randomness like ‘almost’ random or ‘nearly’ random or ‘less’ random, least of all the detestable term ‘pseudo’ random.

Colloquial Randomness.

This is the general understanding of randomness that must be kept well clear in our minds of what is usable in cryptography because this randomness is definitely not. This is the chaotic randomness that is associated with haphazard events like lightning strikes, earthquakes etc. and has variable probability that keeps popping up uninvited in our daily lives – it can range from the benign kitchen-table randomness of a board-game to devastating earthquakes and tsunamis, sometimes it’s a nuisance but then other times we buy into it deliberately so as to have a romantic chance of some event materialising like winning the national lottery for instance. This latter is a pleasant dalliance with risk. It has romance about it as opposed to the scientific randomness of cryptography described. Long may it live because without it life could be very dull but it has no place in cryptography.

Haphazardly collected data is not workable in mathematics in any way that is different to studiously collected data – there is no mystical property about it that is imparted to and adheres intrinsically to the data as a result of being haphazard, right through all mathematical functions but people seem to think there is. Haphazardly collected data is useless in cryptography unless the ‘equal probability’ caveat is met and that is not very likely to be the case if it is haphazardly collected.

Reminder.

So, if you are sitting in front of your computer and you are designing a cipher algorithm then put on your “scientific” randomness hat to remind you of the difference.

If on the other hand you are rolling dice in some board game on the kitchen table with your four-year old then it’s the “colloquial” randomness hat (maybe).

The difference is very easy to see but if you are reading the vast array of misinformed statements on internet newsgroups one needs to be especially single-minded about what is applicable to what and where it may be used. Randomness is the most misunderstood topic in cryptography today. The application to cryptography and game theory is a special one in my view. Haphazard and random are quite rightly taken to be synonymous and are equivalent states in our everyday conversations but this is not so in cryptography or in mathematics.

Recapping,

A Random Keyset.

A key-set is random if and only if, there is equal probability between each element of the set being the next one to be called in any retrieval system. A retrieval system in this context is seen here to be the encryption / decryption calling program of a cryptographic cipher at run time of the cipher program.

PS - data collected by a haphazard collection process will be random if and only if, the accidental repeats are sifted out as the process goes on.

Example.

Suppose I stop 1 hundred people on a pavement and ask them to immediately verbalise the first integer that comes into their heads - clearly,the collection process is spontaneous, unrehearsed, unbiased and totally haphazard. If there are any repeats of the integers in the collected set however the set will not be random. Haphazard and random are taken to be synonymous terms in everyday colloquial English but not in mathematics or cryptography.

They are quite different things.

-----------------------------------------------

Furthermore.

Copy of a post to a news group on cryptography.

It Has to be Contextual Randomness.

It is not right to try and create a set of random data on its own for general all-round later use by collecting haphazard data today say, on the premise that because it is haphazard and therefore unbiased it will always be random no matter how it is used later. That haphazard data just won’t work in mathematics because it has no regular structure that enables any reliable inductive use but even if it did it still may not be suitable for other reasons of type say, that are dictated by the algorithm of a particular cipher.

Random data either as single keys or elements of large key-sets has to be in context with the cipher algorithm that uses it in addition to the abiding condition that the elements of a set must all have equal probability of being the next one to be called in some unbiased retrieval system that is part of a cipher algorithm. That is a basic starting requirement but it may not be the only one.

To achieve a random set of data that is usable in a particular algorithm it is best to let the newly designed *algorithm itself* be the generator per se of the set of random keys. The form and bounds of a suitable set of random keys then become obvious from the theory, they can be laid down and collected.

Clearly, one cannot create random keys in isolation of the algorithm that is going to use them by creating something that purports to be an all-purpose all-round general batch of random data that you can keep handy on file and pull out in future for dusting down in any old use that requires it, the form and range of the data may not fit the need of the cipher algorithm in many cases.

Randomness translates to the ‘uncertainty’ that it creates to an adversary who has intercepted the cipher text by electronic theft and is busy trying to decrypt it now, item by item.

Usually it requires a modicum of scrambling to be given to an array of newly collected random keys as a basic need just to pre-empt guessing by an adversary at the set having consecutive order of say a subset of positive integers (something that it often has), that are otherwise suitable for use by the cipher by being contextually random.

PS_2.

I have good reason to believe that a leading London research specialised group in security of information also use this definition of randomness i.e. equal probability.

It is my personal judgement that this 'equal-probability' is always contingent on non-repetition of the elements of data in the set in question.