High Level Languages:

Indexed addressing and arrays

1 Reference values and Indirection

Section 6.3 in the textbook has the title "Indexed Addressing and Arrays", but it is actually about a broader topic: implementing indirection. At the assembly language level, indirection occurs when you retrieve from one memory location the address of another location whose value is the thing you really want to get or set. At a higher language level, indirection is a fundamental programming technique. With indirect reference, the same statement in a subroutine can be made to apply its effects to different data items at different moments.

In Java (which is our example high-level language) the situation is simply stated. There are two kinds of type: value types and object types. The value types are boolean, byte, char, short, int, long, float, and double; all other types including arrays and String are object types. Data of object types are called objects and data of value types are called values.

The other fact about Java (which unfortunately we have to try to ignore) is that objects, including arrays and strings, are always allocated "dynamically" (using the operator new), and never preallocated by the compilation process. In the course (and on the Pep/6) we have no interest in runtime facilities or storage managers &endash; those are for later courses. Since we want to discuss implementation of strings and arrays at the assembler level, my examples will always show arrays and strings used in such a way as to be allocatable statically or on the stack as "locals".

2 Arrays in assembly language

An array is a collection of variables, all of the same type, which you access by specifying a subscript (also called an index) which identifies one of the variables in the collection. The subscript type is a subrange of the integers, or some other set of scalar values which can be implemented as integers. There is one element in the array for every valid value in the subscript range. In many languages, Java included, the array also stores its length (the number of elements) and this can then be determined at run time (i.e. by indirect reference).

An array is implemented at the assembly language level by a block of memory in which the variables of the array are stored, contiguously, one right after the other. When the length value stored as well, it's generally stored just before the beginning of the block, so that it can be easily found.

short [] anArray = new short [N];

The above allocates an array with elements numbered from 0 to N-1, of type short. The array variable anArray is made to refer to it. If at the moment of executing this statement the value of N is 5, then the following allocation will take place in the memory. (The addresses start at 0100 only for example.) The contents of the array elements will not be initialized until explicitly done so.

Array Element  Address  Contents 
length  0100 
anArray [0]  0102 
anArray [1]  0104 
anArray [2]  0106 
anArray [3]  0108 
anArray [4]  010A 

The variable anArray itself may be anywhere else in memory: it is not the array, it's an indirect reference to the array. The value of the variable is set to h#0102. (In fact, that's the value returned by new, which is then assigned to the variable in the usual way by this statement.) The subscript value for an element in the array determines how far from the beginning of this block of memory you need to look to find the storage where that element's data is to be found.

In general, to find the location of the array element with index i in the block of memory implementing the array, you need to multiply the index i by the size of each array element in bytes, and add that product to the address of the base of the array to get the address of the element:

address of A[i] = address of base of A in memory + 
                  (i * size of each element)

3 Indexed addressing mode

The PEP/6's indexed addressing mode retrieves the value of the operand based on the values specified in base register B and index register X:
operand = Mem[B + X ]
 
The indexed addressing mode does not require a presence of an operand specifier and therefore all instructions with this mode are unary!!!
indexed addressing mode (,x)
Exampleof an instruction using indexed addressing mode:
LOADA ,x
This will load the accumulator from the memory location indicated by the sum of registers B and X.

 

4 Implementing Static Arrays in PEP/6 Assembler

On the PEP/6, an array allocation like:
Type [] anArray = new Type [numberOfElements];
can be implemented statically with the assembly language
anArray:  .addrss anArrayA
          .word d#numberOfElements
anArrayA: .block d#size
where size = numberOfElements * size of Type. Note that this can only handle Java allocations for which numberOfElements is known statically (at compile time). The .ADDRSS pseudo-operation is pretty much the same as .WORD: it allocates two bytes (one word) of storage in the program and places a 16-bit initial value there. The difference is that for .WORD, the initial value is literal (e.g. d#34) and for .ADDRSS, the initial value is symbolic (e.g. anArrayA).

Another way to initialize anArray would be at run time: this would be appropriate if for example anArray was meant to refer (indirectly!) to more than one array (at different times, of course). Then you could write

               LOADA    anArrayA,i
               STOREA   anArray,d
at the beginning of the main program or wherever else you wanted to assign the (reference) value of anArray.

For example, the statement

short [] x = new short [6];
could be statically implemented like this. (The initial value stored at x was discovered by preassembling, but it's better (more symbolic) to do it at run time as above.)
0100       x:      .addrss xAddr
0102               .word d#6
0104       xAddr:  .block d#12
0120 .....
because there are six elements requiring two bytes each.

The reason that the value in the index register is multiplied by 2 when calculating the address of the operand for some instructions but not for others is that this facilitates use of indexed addressing mode for accessing elements of arrays where the elements are word-sized. If the elements in an array are byte-sized, you will use the four byte-oriented instructions to access them (and you will not multiply the value in X), while if the elements are word-sized, you will be using word-oriented instructions to access the elements.

5 Accessing Arrays

The PEP/6's indexed addressing mode is designed to make it easy to address elements in arrays. The general idea is that you will load the address of the beginning of the array into the B register, and compute in the X register the displacement from that starting point. The value of this displacement depends on which element of the array you wish to access and on the size of each element. Recall that array indices start at 0. For arrays with one byte long elements (byte), to access the i-th element (that is, the one with index i-1) compute the index value i-1 in register X. For arrays with two byte long elements (short, and also arrays of addresses!) to access the i-th element, compute value of (i-1)*2 in the register X. The multiply is always done by ASLX.

A reference to a 3rd element of the array totals:

totals [2]
is thus rendered by the PEP/6 assembly language:
loadb totals,d          ; load base of array into base register
loadx d#4,i             ; 4 = 2(for 3rd element)* 2(for integer size)
loada ,x
(Note: In Java, as we are seeing, the 'array identifier' variable is an indirect referenceto the array. That's why we allocate it separately and initialize it to the address of the actual array. Furthermore, when the base register is loaded for the array, we use direct addressing, because we want the base register to contain the address of the array, and that's the value of the array identifier.

(But in some languages, array identifiers are sometimes constants, that is, direct references to arrays. A statically allocated array of that kind might be assembled with immediate loading of the base register, like this:

loadb totalsAdd,i       ; load base of array into base register
loadx d#4,i             ; 4 = 2(for 3rd element)* 2(for integer size)
loada ,x
where here the address of the array is 'known' as an immediate constant. Few languages present this situation for all arrays all the time (for example, any array passed by reference must be accessed indirectly) but in C++ (Warford's example language) global arrays are allocated statically in this way. (Warford's previous version of out textbook used to use Pascal as the high-level language, and there again, global static array identifiers are constant (direct) references.) Hence, in examples on pages 230 (Program 6.13) and 236 (Program 6.15) the arrays are implemented without indirection.)

Here is a simple Java program segment that calculates a total sum of all the values stored in an array. This program segment assumes that the array is initialized with values. (example_18_java)

short [] arr = new short [12];
...
static short sum;
static short ind;
...
sum = 0;
 
for (ind = 0; ind < arr. length; ind ++)
    sum = sum + arr [ind];
 
Io. decimalOutput (sum);
The above segment could be translated into Pep/6 as follows (bits missing)
          BR main                ; branch to start of program
length:   .EQUATE d#-2           ; offset of length from array address
...
arr:     .WORD   h#0000          ; array identifier variable
...
         .WORD   d#12            ; length of some array in memory
arr1Add: .BLOCK  d#48            ; the actual array
...
ind:     .WORD   d#0             ; index variable
sum:     .WORD   d#0             ; sum variable
... 
main:    ...
         LOADA   arr1Add,i
         STOREA  arr,d           ; arr now refers to array 1.
         ...
         LOADB   arr,d           ; start accessing arr
         LOADA   d#0,i           ; for (ind = 0;
         STOREA  ind,d
toploop: LOADX   length,i        ; prepare to access arr.length  
         COMPA   ,x              ; compare ind to arr.length
         BRGE    endloop         ; branch out if ind > arr. length
         LOADX   ind,d           ; prepare to access arr[ind]
         ASLX                    ; multiply index by 2 to get offset
         LOADA   ,x              ; load arr [ind]
         ADDA    sum,d           ; add to sum
         STOREA  sum,d           ; store back to sum
         LOADA   ind,d           ; ind = ind + 1
         ADDA    d#1,i
         STOREA  ind,d
         BR      toploop         ; loop again
endloop: DECO    sum,d
         ...

6 Other uses of Indirection at the Assembler level

Having seen how to implement arrays, we can go back to the idea of indirection as a general technique. Indirection occurs when you can change at run time the effective address of an operand, that is, the actual address of the operand can be different at different times. (Effective address is the address that you get after all the operand calculations take place just before instruction executes).

One way to implement indirection (the so-called "impure" way) is to store the operand specifier of an instruction which is then later executed. Warford gives and example of this (p 226) in order to change which character a CHARO instruction outputs. The self-modifying programs technique is not used in modern computer systems because (a) it prevents a body of code from being used by more than one process or user, and (b) it requires the code memory to be modifiable during user execution, which is not secure; normally, stored instructions are temporarily marked 'read only' before their execution.

Indirection is instead implemented using an indirect addressing mode. With indirect addressing, an instruction's operand specifier specifies a register or a location in memory which contains the address of the operand that the instruction will actually read or change. Using such an addressing mode, you can change the location of an instruction's operands at runtime by changing the address that is stored in the register or location being "dereferenced".

PEP/6 does not have indirect addressing as such, but the indexed addressing mode provides indirection. If you put 0 in the index register, this means that the operand is the value in memory at the address stored in the base register.

Parameter passing by reference and case statements can be implemented on the PEP/6 using variations on this technique. In addition, the implementation of indirect variables, which are required for all Java object types (including arrays and strings), uses indirection.

6.1 Assigning reference variables

The previous example showed an array being allocated and later operated on (indirectly!) In the following example, reference reassignment is used to show that array variables are indirect references. (example_19_java)
short n;
short [] arr1, arr2;
arr1 = new short [15];
...
arr2 = arr1;
arr1 [n] = 12;
arr2 [n] = arr2 [n] + 1;
Io. decimalOutput (arr1 [n]);      // outputs 13
The assembler code can reveal why the comment is true:  (example_19_assembl)
...
n:        .WORD d#0          ; index variable
arr1:     .WORD h#0000       ; array reference variable
arr2:     .WORD h#0000       ; another array reference variable
...
          .WORD d#15         ; length of the array
arrAddr:  .BLOCK d#30        ; storage for 15 short integers
...
          LOADA  arrAddr,i   ; arr1 = new short [15];
          STOREA arr1,d
...
          LOADA  arr1,d      ; arr2 = arr1; 
          STOREA add2,d      ; this is not copying the array, only the reference!
          LOADB  arr1,d      ; start accessing arr1
          LOADX  n,d         ; get index
          ASLX               ; shift because element size is 2
          LOADA  d#12,i
          STOREA ,x          ; arr1 [n] = 12;
;
          LOADB  arr2,d      ; start accessing arr2 (same index)
          ADDA   d#1,i
          STOREA ,x          ; arr2 [n] = arr2 [n] + 1;
;
          LOADB  arr1,d      ; start accessing arr1 (same index again)
          DECO   ,x          ; output arr [n].
The assignment arr2 = arr1 is a reference assignment and it has the effect that the addresses in arr1 and arr2 are identical. Therefore, the operations of assigning to 12 and adding one refer (indirectly) to the same array, and hence to the same element [n]. The final value is 13.

6.2 Subroutine Parameters: Pass by Reference

When a subroutine is called, the caller supplies an actual parameter and the called routine declares a corresponding formal parameter. The connection between an actual parameter and the corresponding formal, during a subroutine call, is parameter passing. In programming language theory we distinguish pass by reference from pass by value when passing parameters. Pass by value means that the formal parameter is assigned the value which is the actual parameter. Pass by reference means that during execution of the subroutine the formal parameter refers to the actual parameter.

In Java, in keeping with the object/value distinction, an actual parameter which is a value is passed by value and an actual parameter which is an object is passed by reference. In the following situation:\

static void doSomething (Type param)
      {....}
...
Type  arg;
...
doSomething (arg);
If Type is a value type, then the actual parameter (the value of arg) is assigned to param at the beginning of the subroutine. Further operations on param have no effect on arg at all. On the other hand, if Type is an object type, then the actual parameter (the object arg refers to) is passed by reference, and at the beginning of the subroutine param is made to refer to that same object. Method calls on param can affect that object, and any effects are visible through arg after the subroutine returns.

Here is an example of call by reference, in which an array object is passed to a method which modifies it. The situation is very similar to the previous example, except that the array object is passed to another method.(example_20.java)

static void addToAll (short [] vals, short m)
{
    m = m + 2;
    for (short n = 0; n < vals. length; ++ n)
        vals [n] = (short) (vals [n] + m);
}
 
static short [] arr = new short [5];
static short extra;
 
static void tester ()
{
    for (short n = 0; n < arr. length; ++ n)
        arr [n] = (short) (n + 1);
    extra = 8;
    addToAll (arr, extra);
    Io. decimalOut (arr [4]);
    Io. charOut ('\n');
    Io. decimalOut (extra);
}
In the above program, the call to addToAll(arr,extra) causes 10 to be added to each element of the array, which was passed by reference. Therefore the first output is 15. But the assignment to m in the subroutine has no effect on extra, which was passed by value. Therefore the second output is 8.

To implement pass by reference, the caller passes the address of the actual parameter on the stack, and then the callee puts that address into the base register and uses indexed addressing to get at the contents of the actual parameter. Note that for Java, the array variable contains the address of the actual parameter (because of indirect reference for object types) and so the procedure call convention at the assembler level is exactly the same for call by reference as for call by value. This is true only because of the way Java works; in other languages value variables can be passed by reference, and the called routine can change their value. In such languages, the address of the parameter must be calculated and placed on the stack.

Here is a translation of the above program into Pep/6 assembler. Note that the subroutine treats its arguments like variables which happen to be on the stack.
(example_20_assembl)

          LOADA   short5,i  ; arr = new short [5]
          STOREA  arr,d
          JSR     tester    ; invoke tester
          STOP              ; and then just stop
;
addToAll: ADDSP   alloc1,i  ; static void addToAll (
vals:     .EQUATE d#6       ; short [] vals,
m:        .EQUATE d#4       ; short m) {
n:        .EQUATE d#0       ; offset of short n
alloc1:   .EQUATE d#-2      ; add to SP to allocate 1 variable
deall1:   .EQUATE d#2       ; add to SP to deallocate 1 variable
          LOADA   m,s       ; m = m + 2;
          ADDA    d#2,i
          STOREA  m,s
          LOADB   vals,s    ; start accessing vals
          LOADA   d#0,i
          STOREA  n,s       ; for (n=0;
loop1:    LOADX   d#-2,i    ;    n < vals. length;
          COMPA   ,x
          BRGE    end1
          LOADX   n,s       ; vals [n] = vals [n] + m;
          ASLX
          LOADA   ,x
          ADDA    m,s
          STOREA  ,x
          LOADA   n,s       ;            ++ n)
          ADDA    d#1,i
          STOREA  n,s
          BR      loop1
end1:     ADDSP   deall1,i
          RTS               ; }
;
arr:      .WORD   h#0000    ; static short [] arr
;
          .WORD   d#5       ; new short [5]
short5:   .BLOCK  d#10
;
extra:    .WORD   d#0       ; static short extra;
;
tester:   ADDSP   alloc1,i  ; static void tester () {
          LOADB   arr,d     ; start accessing arr (indirectly!)
          LOADA   d#0,i     ; for (n=0;
          STOREA  n,s
loop2:    LOADX   d#-2,i    ;    n < vals. length;
          COMPA   ,x
          BRGE    end2
          LOADX   n,s       ; arr [n] = n + 1;
          ASLX
          LOADA   n,s
          ADDA    d#1,i
          STOREA  ,x
          LOADA   n,s       ;            ++ n)
          ADDA    d#1,i
          STOREA  n,s
          BR      loop2
end2:     LOADA   d#8,i     ; extra = 8;
          STOREA  extra,d
; call addToAll:
          LOADA   arr,d     ; push address of array (value of arr)
          STOREA  d#-2,s
          LOADA   extra,d   ; push value of extra
          STOREA  d#-4,s
          ADDSP   d#-4,i    ; adjust stack
          JSR     addToAll
          ADDSP   d#4,i     ; pop stack
          LOADB   arr,d     ; Io. decimalOut (arr [4]);
          LOADX   d#4,i
          ASLX
          DECO    ,x
          CHARO   d#13,i    ; Io. charOut ('\n');
          DECO    extra,d   ; Io. decimalOut (extra);
          ADDSP   deall1,i
          RTS               ; }
          .END