Array Location Arithmetic

Arrays are typically laid out continuously in memory. The basic parameters are:

αThe starting address of the array.
eThe size of an array element.
lb,ubThe lower and upper array bounds.
a[i]The subscript expression being evaluated.

So, for a one-dimensional array,

α
a[i]
a[lb..ub]
e
i-lb slots

The address of address of the location a[i] is given by:

addr(a[i])=α+e(ilb)=(αelb)+ei

This can be generalized for a two-dimensional array. Note that the first cell of each row is located in memory just after the last cell of the row above it.

α
a[i,j]
e
a[lb(1)..ub(1),lb(2)..ub(2)]
i−lb(1)
rows
j−lb(2) slots
ub(1)lb(1)+1 rows
ub(2)lb(2)+1 columns

So the address of a[i] will be given by starting from α and first skipping the rows above it, then the cells to its left.

addr(a[i,j])=α+e(ub(2)lb(2)+1)(ilb(1))+e(jlb(2))
  =(αeub(2)lb(1)+elb(2)lb(1)elb(1)elb(2))+i(eub(2)elb(2)+e)+ej

Languages which create multi-dimensional arrays as shown here typically require that the bound be constants. That means that the address formulas can be reduced at compile to time to a linear computation of the subscripts (the sum of a constant and a constant multiple of each subscript).

In languages like C and Java, where the lower bound is zero, and the upper bound is the size less 1, we can substitute 0 for each lb and s1 for each ub and get:

addr(a[i])=α+ei
addr(a[i,j])=α+s(2)ei+ej

For instance, suppose an array declared: joe: array[1..10] of integer;, and the compiler places it in memory at location 25000, and integers are four bytes. It would yield the formula addr(joe[i])=25000+4(i1)=29996+4i. If the language allowed slicing, the slice joe[3:6] would have a similar formula, except that α is moved to position 3 of the larger array, giving addr(joe[i])=(25000+4(31))+4(i3)=29996+4i, which is boringly similar.

Now, suppose we had mike: array[1..10,-1..5] of double;, the compiler places it in memory at location 50000, and doubles are eight bytes. The resulting mess is

addr(joe[i,j])=50000+8(51+1)(i1)+8(j1)=49952+56i+8j

If the language allowed the programmer to send the second row as a parameter, something like mike[2,*], it could be accessed as a normal one-dimensional array, with α equal to the location of mike[2,-1]. If it allowed sending a column, say mike[*,3], that could also be accessed with the one-dimensional formula, again α being the location of mike[-1,3], and e being the size of a row.