In Java (which is our example high-level language) the situation is simply stated. There are two kinds of type: value types and object types. The value types are boolean, byte, char, short, int, long, float, and double; all other types including arrays and String are object types. Data of object types are called objects and data of value types are called values.
An array is implemented at the assembly language level by a block of memory in which the variables of the array are stored, contiguously, one right after the other. When the length value stored as well, it's generally stored just before the beginning of the block, so that it can be easily found.
short [] anArray = new short [N];
The above allocates an array with elements numbered from 0 to N-1, of type short. The array variable anArray is made to refer to it. If at the moment of executing this statement the value of N is 5, then the following allocation will take place in the memory. (The addresses start at 0100 only for example.) The contents of the array elements will not be initialized until explicitly done so.
Array Element | Address | Contents |
length | 0100 | 5 |
anArray [0] | 0102 | ? |
anArray [1] | 0104 | ? |
anArray [2] | 0106 | ? |
anArray [3] | 0108 | ? |
anArray [4] | 010A | ? |
The variable anArray itself may be anywhere else in memory: it is not the array, it's an indirect reference to the array. The value of the variable is set to h#0102. (In fact, that's the value returned by new, which is then assigned to the variable in the usual way by this statement.) The subscript value for an element in the array determines how far from the beginning of this block of memory you need to look to find the storage where that element's data is to be found.
In general, to find the location of the array element with index i in the block of memory implementing the array, you need to multiply the index i by the size of each array element in bytes, and add that product to the address of the base of the array to get the address of the element:
address of A[i] = address of base of A in memory + (i * size of each element)
The indexed addressing mode does not require a presence of an operand specifier and therefore all instructions with this mode are unary!!!operand = Mem[B + X ]
This will load the accumulator from the memory location indicated by the sum of registers B and X.LOADA ,x
Type [] anArray = new Type [numberOfElements];can be implemented statically with the assembly language
where size = numberOfElements * size of Type. Note that this can only handle Java allocations for which numberOfElements is known statically (at compile time). The .ADDRSS pseudo-operation is pretty much the same as .WORD: it allocates two bytes (one word) of storage in the program and places a 16-bit initial value there. The difference is that for .WORD, the initial value is literal (e.g. d#34) and for .ADDRSS, the initial value is symbolic (e.g. anArrayA).anArray: .addrss anArrayA .word d#numberOfElements anArrayA: .block d#size
Another way to initialize anArray would be at run time: this would be appropriate if for example anArray was meant to refer (indirectly!) to more than one array (at different times, of course). Then you could write
at the beginning of the main program or wherever else you wanted to assign the (reference) value of anArray.LOADA anArrayA,i STOREA anArray,d
For example, the statement
could be statically implemented like this. (The initial value stored at x was discovered by preassembling, but it's better (more symbolic) to do it at run time as above.)short [] x = new short [6];
because there are six elements requiring two bytes each.0100 x: .addrss xAddr 0102 .word d#6 0104 xAddr: .block d#12 0120 .....
The reason that the value in the index register is multiplied by 2 when calculating the address of the operand for some instructions but not for others is that this facilitates use of indexed addressing mode for accessing elements of arrays where the elements are word-sized. If the elements in an array are byte-sized, you will use the four byte-oriented instructions to access them (and you will not multiply the value in X), while if the elements are word-sized, you will be using word-oriented instructions to access the elements.
A reference to a 3rd element of the array totals:
is thus rendered by the PEP/6 assembly language:totals [2]
(Note: In Java, as we are seeing, the 'array identifier' variable is an indirect referenceto the array. That's why we allocate it separately and initialize it to the address of the actual array. Furthermore, when the base register is loaded for the array, we use direct addressing, because we want the base register to contain the address of the array, and that's the value of the array identifier.loadb totals,d ; load base of array into base register loadx d#4,i ; 4 = 2(for 3rd element)* 2(for integer size) loada ,x
(But in some languages, array identifiers are sometimes constants, that is, direct references to arrays. A statically allocated array of that kind might be assembled with immediate loading of the base register, like this:
where here the address of the array is 'known' as an immediate constant. Few languages present this situation for all arrays all the time (for example, any array passed by reference must be accessed indirectly) but in C++ (Warford's example language) global arrays are allocated statically in this way. (Warford's previous version of out textbook used to use Pascal as the high-level language, and there again, global static array identifiers are constant (direct) references.) Hence, in examples on pages 230 (Program 6.13) and 236 (Program 6.15) the arrays are implemented without indirection.)loadb totalsAdd,i ; load base of array into base register loadx d#4,i ; 4 = 2(for 3rd element)* 2(for integer size) loada ,x
Here is a simple Java program segment that calculates a total sum of all the values stored in an array. This program segment assumes that the array is initialized with values. (example_18_java)
The above segment could be translated into Pep/6 as follows (bits missing)short [] arr = new short [12]; ... static short sum; static short ind; ... sum = 0; for (ind = 0; ind < arr. length; ind ++) sum = sum + arr [ind]; Io. decimalOutput (sum);
BR main ; branch to start of program length: .EQUATE d#-2 ; offset of length from array address ... arr: .WORD h#0000 ; array identifier variable ... .WORD d#12 ; length of some array in memory arr1Add: .BLOCK d#48 ; the actual array ... ind: .WORD d#0 ; index variable sum: .WORD d#0 ; sum variable ... main: ... LOADA arr1Add,i STOREA arr,d ; arr now refers to array 1. ... LOADB arr,d ; start accessing arr LOADA d#0,i ; for (ind = 0; STOREA ind,d toploop: LOADX length,i ; prepare to access arr.length COMPA ,x ; compare ind to arr.length BRGE endloop ; branch out if ind > arr. length LOADX ind,d ; prepare to access arr[ind] ASLX ; multiply index by 2 to get offset LOADA ,x ; load arr [ind] ADDA sum,d ; add to sum STOREA sum,d ; store back to sum LOADA ind,d ; ind = ind + 1 ADDA d#1,i STOREA ind,d BR toploop ; loop again endloop: DECO sum,d ...
One way to implement indirection (the so-called "impure" way) is to store the operand specifier of an instruction which is then later executed. Warford gives and example of this (p 226) in order to change which character a CHARO instruction outputs. The self-modifying programs technique is not used in modern computer systems because (a) it prevents a body of code from being used by more than one process or user, and (b) it requires the code memory to be modifiable during user execution, which is not secure; normally, stored instructions are temporarily marked 'read only' before their execution.
Indirection is instead implemented using an indirect addressing mode. With indirect addressing, an instruction's operand specifier specifies a register or a location in memory which contains the address of the operand that the instruction will actually read or change. Using such an addressing mode, you can change the location of an instruction's operands at runtime by changing the address that is stored in the register or location being "dereferenced".
PEP/6 does not have indirect addressing as such, but the indexed addressing mode provides indirection. If you put 0 in the index register, this means that the operand is the value in memory at the address stored in the base register.
Parameter passing by reference and case statements can be implemented on the PEP/6 using variations on this technique. In addition, the implementation of indirect variables, which are required for all Java object types (including arrays and strings), uses indirection.
The assembler code can reveal why the comment is true: (example_19_assembl)short n; short [] arr1, arr2; arr1 = new short [15]; ... arr2 = arr1; arr1 [n] = 12; arr2 [n] = arr2 [n] + 1; Io. decimalOutput (arr1 [n]); // outputs 13
The assignment arr2 = arr1 is a reference assignment and it has the effect that the addresses in arr1 and arr2 are identical. Therefore, the operations of assigning to 12 and adding one refer (indirectly) to the same array, and hence to the same element [n]. The final value is 13.... n: .WORD d#0 ; index variable arr1: .WORD h#0000 ; array reference variable arr2: .WORD h#0000 ; another array reference variable ... .WORD d#15 ; length of the array arrAddr: .BLOCK d#30 ; storage for 15 short integers ... LOADA arrAddr,i ; arr1 = new short [15]; STOREA arr1,d ... LOADA arr1,d ; arr2 = arr1; STOREA add2,d ; this is not copying the array, only the reference! LOADB arr1,d ; start accessing arr1 LOADX n,d ; get index ASLX ; shift because element size is 2 LOADA d#12,i STOREA ,x ; arr1 [n] = 12; ; LOADB arr2,d ; start accessing arr2 (same index) ADDA d#1,i STOREA ,x ; arr2 [n] = arr2 [n] + 1; ; LOADB arr1,d ; start accessing arr1 (same index again) DECO ,x ; output arr [n].
In Java, in keeping with the object/value distinction, an actual parameter which is a value is passed by value and an actual parameter which is an object is passed by reference. In the following situation:\
If Type is a value type, then the actual parameter (the value of arg) is assigned to param at the beginning of the subroutine. Further operations on param have no effect on arg at all. On the other hand, if Type is an object type, then the actual parameter (the object arg refers to) is passed by reference, and at the beginning of the subroutine param is made to refer to that same object. Method calls on param can affect that object, and any effects are visible through arg after the subroutine returns.static void doSomething (Type param) {....} ... Type arg; ... doSomething (arg);
Here is an example of call by reference, in which an array object is passed to a method which modifies it. The situation is very similar to the previous example, except that the array object is passed to another method.(example_20.java)
In the above program, the call to addToAll(arr,extra) causes 10 to be added to each element of the array, which was passed by reference. Therefore the first output is 15. But the assignment to m in the subroutine has no effect on extra, which was passed by value. Therefore the second output is 8.static void addToAll (short [] vals, short m) { m = m + 2; for (short n = 0; n < vals. length; ++ n) vals [n] = (short) (vals [n] + m); } static short [] arr = new short [5]; static short extra; static void tester () { for (short n = 0; n < arr. length; ++ n) arr [n] = (short) (n + 1); extra = 8; addToAll (arr, extra); Io. decimalOut (arr [4]); Io. charOut ('\n'); Io. decimalOut (extra); }
To implement pass by reference, the caller passes the address of the actual parameter on the stack, and then the callee puts that address into the base register and uses indexed addressing to get at the contents of the actual parameter. Note that for Java, the array variable contains the address of the actual parameter (because of indirect reference for object types) and so the procedure call convention at the assembler level is exactly the same for call by reference as for call by value. This is true only because of the way Java works; in other languages value variables can be passed by reference, and the called routine can change their value. In such languages, the address of the parameter must be calculated and placed on the stack.
Here is a translation of the above program into Pep/6 assembler. Note
that the subroutine treats its arguments like variables which happen to
be on the stack.
(example_20_assembl)
LOADA short5,i ; arr = new short [5] STOREA arr,d JSR tester ; invoke tester STOP ; and then just stop ; addToAll: ADDSP alloc1,i ; static void addToAll ( vals: .EQUATE d#6 ; short [] vals, m: .EQUATE d#4 ; short m) { n: .EQUATE d#0 ; offset of short n alloc1: .EQUATE d#-2 ; add to SP to allocate 1 variable deall1: .EQUATE d#2 ; add to SP to deallocate 1 variable LOADA m,s ; m = m + 2; ADDA d#2,i STOREA m,s LOADB vals,s ; start accessing vals LOADA d#0,i STOREA n,s ; for (n=0; loop1: LOADX d#-2,i ; n < vals. length; COMPA ,x BRGE end1 LOADX n,s ; vals [n] = vals [n] + m; ASLX LOADA ,x ADDA m,s STOREA ,x LOADA n,s ; ++ n) ADDA d#1,i STOREA n,s BR loop1 end1: ADDSP deall1,i RTS ; } ; arr: .WORD h#0000 ; static short [] arr ; .WORD d#5 ; new short [5] short5: .BLOCK d#10 ; extra: .WORD d#0 ; static short extra; ; tester: ADDSP alloc1,i ; static void tester () { LOADB arr,d ; start accessing arr (indirectly!) LOADA d#0,i ; for (n=0; STOREA n,s loop2: LOADX d#-2,i ; n < vals. length; COMPA ,x BRGE end2 LOADX n,s ; arr [n] = n + 1; ASLX LOADA n,s ADDA d#1,i STOREA ,x LOADA n,s ; ++ n) ADDA d#1,i STOREA n,s BR loop2 end2: LOADA d#8,i ; extra = 8; STOREA extra,d ; call addToAll: LOADA arr,d ; push address of array (value of arr) STOREA d#-2,s LOADA extra,d ; push value of extra STOREA d#-4,s ADDSP d#-4,i ; adjust stack JSR addToAll ADDSP d#4,i ; pop stack LOADB arr,d ; Io. decimalOut (arr [4]); LOADX d#4,i ASLX DECO ,x CHARO d#13,i ; Io. charOut ('\n'); DECO extra,d ; Io. decimalOut (extra); ADDSP deall1,i RTS ; } .END