Prime Generator Algorithm

The primgen program simply generates prime numbers until you get bored and stop it (likely), the maximum integer size is reached (unlikely), or the program runs out of memory (much more likely than the last one). It is based on the sieve method, and is quite fast.

To present the algorithm used by pgen, we will start with the simple-minded prime generation method. It is based on a double loop like this: this:

for n := 2 to infinity do for d := 2 to n / 2 do if d divides n then proceed to the next n print n

Now, we will try to improve the efficiency. First, we observe that the inner loop needs only to check the d values which are themselves prime. Since it has already identified these values, we need only remember them. So now we have:

primes := { } for n := 2 to infinity do for each d in primes do if d divides n then proceed to the next n print n primes := primes + {n}

This version is faster because the inner loop performs fewer iterations. It is possible to make it still faster, however, by using a better data structure for primes For this, first off, we observe that any given prime number eliminates a series of its multiples in increasing order. For instance, as n increases, the prime number 2 will eliminate 4, 6, 8, 10, …, in that order. The prime number 7 eliminates 14, 21, 28, and so forth. This means that, for any prime, we can predict which n value will be eliminated next. This leads to the notion of an eliminator object. It is a pair of numbers, a prime and one of its multiples. The multiple is the next n value which it will eliminate. It also has a method advance which moves it to its next victim. That simply means adding the prime to the multiple to create the next multiple. We keep a collection of these. For each n value, we check to see if a member of our eliminator collection eliminates n. If so, we know that that n is not prime, and we advance the eliminator. The algorithm becomes:

eliminators := { } for n := 2 to infinity do if eliminators contains some element e = (p, n) then advance e to (p, n + p) if there are any other e′ = (p′, n), advance them, too. proceed to the next n print n eliminators := eliminators + {(n, 2n)}

Here's what that looks like:

n eliminators Action
2 Print n
3 (2,4) Print n
4 (2,4), (3,6) Eliminate n
5 (2,6), (3,6) Print n
6 (2,6), (3,6), (5,10) Eliminate n
7 (2,8), (3,9), (5,10) Print n
8 (2,8), (3,9), (5,10), (7,14) Eliminate n
9 (2,10), (3,9), (5,10), (7,14) Eliminate n
10 (2,10), (3,12), (5,10), (7,14) Eliminate n
11 (2,12), (3,12), (5,15), (7,14) Print n
12 (2,12), (3,12), (5,15), (7,14), (11, 22) Eliminate n
. . .

The supposed speed improvement in this version comes from replacing the loop through primes with an if test on the contents of eliminators. But this isn't a real improvement if the test requires us to just loop through eliminators instead. To avoid that, we need a data structure which can quickly answer the question “does eliminators contains some eliminator e = (p, n)”. First observe that, at each step, if there is an eliminator for n in eliminators, it is always one with the smallest multiple. Therefore we organize eliminators as a heap. A heap is a data structure from which it is efficient to find and remove the smallest (or largest) item.

I use the vector class from the C++ standard libraries to hold eliminators. The standard libraries also provide the methods to order the contents of a vector as a heap.


Source: This algorithm is hardly original. The Sieve of Eratosthenes is a classical approach. This particular implementation was suggested by a data structures book I have by Augenstein and Tenenbaum, which is nearly as ancient. (The application language is PL/I, which might give you clue, if you've heard of PL/I.) My implementation is rather different from theirs (better, I think), but omits some optimizations which they suggest.