Array Location Arithmetic

Arrays are typically laid out continuously in memory. The basic parameters are:

α	The starting address of the array.
e	The size of an array element.
lb,ub	The lower and upper array bounds.
a[i]	The subscript expression being evaluated.

So, for a one-dimensional array,

a[i]

a[lb..ub]

i-lb slots

The address of address of the location a[i] is given by:

addr(a[i])=α+e(i−lb)=(α−e⋅lb)+e⋅i

This can be generalized for a two-dimensional array. Note that the first cell of each row is located in memory just after the last cell of the row above it.

a[i,j]

a[lb(1)..ub(1),lb(2)..ub(2)]

i−lb(1)
rows

j−lb(2) slots

ub(1)−lb(1)+1 rows

ub(2)−lb(2)+1 columns

So the address of a[i] will be given by starting from α and first skipping the rows above it, then the cells to its left.

addr(a[i,j])=α+e(ub(2)−lb(2)+1)(i−lb(1))+e(j−lb(2))
=(α−e⋅ub(2)⋅lb(1)+e⋅lb(2)⋅lb(1)−e⋅lb(1)−e⋅lb(2))+i(e⋅ub(2)−e⋅lb(2)+e)+e⋅j

Languages which create multi-dimensional arrays as shown here typically require that the bound be constants. That means that the address formulas can be reduced at compile to time to a linear computation of the subscripts (the sum of a constant and a constant multiple of each subscript).

In languages like C and Java, where the lower bound is zero, and the upper bound is the size less 1, we can substitute 0 for each lb and s−1 for each ub and get:

addr(a[i])=α+ei

addr(a[i,j])=α+s(2)⋅e⋅i+e⋅j

For instance, suppose an array declared: joe: array[1..10] of integer;, and the compiler places it in memory at location 25000, and integers are four bytes. It would yield the formula addr(joe[i])=25000+4(i−1)=29996+4i. If the language allowed slicing, the slice joe[3:6] would have a similar formula, except that α is moved to position 3 of the larger array, giving addr(joe[i])=(25000+4⋅(3−1))+4(i−3)=29996+4i, which is boringly similar.

Now, suppose we had mike: array[1..10,-1..5] of double;, the compiler places it in memory at location 50000, and doubles are eight bytes. The resulting mess is

addr(joe[i,j])=50000+8(5−−1+1)(i−1)+8(j−−1)=49952+56i+8j

If the language allowed the programmer to send the second row as a parameter, something like mike[2,*], it could be accessed as a normal one-dimensional array, with α equal to the location of mike[2,-1]. If it allowed sending a column, say mike[*,3], that could also be accessed with the one-dimensional formula, again α being the location of mike[-1,3], and e being the size of a row.