Learn about the ATMCADD MI instruction and the PowerPC technology behind it.
As one of the shared storage synchronization MI instructions, the Atomic Add (ATMCADD) instruction and its siblings, the Atomic And (ATMCAND) and Atomic Or (ATMCOR) MI instructions, were introduced to IBM i at V5R3 to support atomic update operations on 4-byte or 8-byte storage shared between two or more threads. The concept of atomicity of storage operations is discussed in the Atomicity section in the Machine Interface Architecture Introduction.
This article introduces the details of the working mechanism of the ATMCADD instruction and its siblings.
User Code Generated for ATMCADD
The ATMCADD MI instruction operates on two signed binary values of the same length. ATMCADD atomically increments the signed binary value addressed by operand 1 (which is a space pointer) by the signed binary value specified by operand 2, and returns the original value addressed by operand 1. ATMCADD is available to user programs only as system built-in functions (SYSBIFs); in other words, it can be used only by ILE programs. Two SYSBIFs are provided for the ATMCADD MI instruction, _ATMCADD4 and _ATMCADD8, which operate on 4-byte signed binary values and 8-byte signed binary values, respectively. The ILE RPG prototypes of these SYSBIFs can be found in mih-stgsync.rpgleinc as follows:
/** * @BIF _ATMCADD4 (Atomic Add (ATMCADD)) * @return Original value of sum_addend */ d atmcadd4 pr 10i 0 extproc('_ATMCADD4') d sum_addend 10i 0 d augend 10i 0 value
/** * @BIF _ATMCADD8 (Atomic Add (ATMCADD)) * @return Original value of sum_addend */ d atmcadd8 pr 20i 0 extproc('_ATMCADD8') d sum_addend 20i 0 d augend 20i 0 value |
The ILE C prototypes of the _ATMCADD4 and _ATMCADD8 SYSBIFs are the following:
# pragma linkage(_ATMCADD4, builtin) long _ATMCADD4(long*, long);
# pragma linkage(_ATMCADD8, builtin) long long _ATMCADD8(long long*, long long); |
Now let's look at the user code generated for the ATMCADD instruction in a compiled program. Let's take the _ATMCADD8 SYSBIF, for example. Compile an ILE C program containing the following tiny ILE C procedure iadd():
# pragma linkage(_ATMCADD8, builtin) long long _ATMCADD8(long long*, long long);
long long iadd(long long *num, long long diff) { _ATMCADD8(num, diff); /* Statement 1 */ return *num; } |
Find the RISC instructions generated for the first statement of procedure iadd() in the System Service Tools (SST) dump of the program containing iadd(). At V5R4, the user code generated for statement 1 of iadd(), which invokes _ATMCADD8, might look like the following:
RISC INSTRUCTIONS (iadd) |
||||
Location |
Object text |
Source statement |
Statement numbers |
Description |
000028 |
E9860030 |
LD 12,0X30(6) |
1 |
At the time when statement 1 is executed, General Purpose Register (GPR) 6 (aka r6) addresses the parameter area of procedure iadd(). The format of the procedure parameter area is described in the documentation of the _NPMPARMLISTADDR SYSBIF in the |
|
E1060026 |
LQ 8,0X20(6),6 |
|
The Load Quadword (LQ) instruction tries to load the 16-byte space pointer num from offset hex 20 from the start of the procedure parameter area into r8 and r9. |
000030 |
|
SELRI 3,9,0,41 |
|
If the previous LQ instruction loads the space pointer successfully, the SELRI instruction copies r9 (the 8-byte address portion space pointer num) to r3, which acts as the operand 1 of _ATMCADD8. Otherwise, the immediate value hex 0000000000000000 is placed in r3. |
000034 |
61840000 |
ORI 4,12,0 |
|
Copy r12 (value of BIN(8) diff) to r4, which acts as the operand 2 of _ATMCADD8. |
000038 |
4B801DC3 |
BLA 0X3801DC0 |
|
Invoke the implementing Licensed Internal Code (LIC) routine of the _ATMCADD8 SYSBIF. [2] |
|
|
ORI 10,3,0 |
|
Copy the return value of _ATMCADD8 (stored in r3) to r10. |
Notes
[1] Note that _NPMPARMLISTADDR, instead of what appears in the documentation of the SYSBIF, is the correct name of the NPM Procedure Parameter List Address (NPM_PARMLIST_ADDR) SYSBIF. This is important if you want your programs that invoke the SYSBIF to get compiled.
[2] A BLA form Branch (b) instruction branches the execution to the absolute address specified by target address operand and places the next instruction address (NIA) into the link register so that the called routine can return to the calling code via a Branch Conditional to Link Register (bclr) instruction. The BLA 0X3801DC0 instruction branches the execution to the address (FFFFFFFFFF 801DC0) of the implementing LIC routine of _ATMCADD8 (in LIC module #cfgrbla), which stores the return value in r3 on return.
The PowerPC Load and Reserve Instructions and Store Conditional Instructions
Before we turn to the code of the implementing LIC routines of the ATMCADD MI instruction, let's review the load and reserve instructions and the store conditionally instructions briefly. Section
The concept behind the use of the lwarx, ldarx, stwcx., and stdcx. instructions is that a processor may load a semaphore from memory, compute a result based on the value of the semaphore and conditionally store it back to the same location. If the store was successful, the sequence of the instructions from the read of the semaphore to the store that updated semaphore appear to have been executed atomically (that is, no other processor or mechanism modified the semaphore location between the read and the update), thus providing the equivalent of a real atomic operation.
The lwarx instruction must be paired with a stwcx. instruction, and ldarx instruction with an stdcx. instruction, with the same effective address (EA) specified by both instructions of the pair. The only exception is that an unpaired stwcx. or stdcx. instruction on any (scratch) EA can be used to clear any reservation held by the processor. The conditional store is performed based upon the existence of a reservation established by the preceding lwarx or ldarx instruction. Note that at most one reservation exits simultaneously on any processor. If the reservation exists when the store is executed, the store is performed and the EQ bit of the CR field 0 (CR0) is set. If the reservation does not exist when the store is executed, the target memory location is not modified and the EQ bit of CR0 is cleared.
The reservation held by the processor is cleared if any of the following events occurs:
- The processor holding the reservation executes another load and reserve instruction; this clears the first reservation and establishes a new one.
- The processor holding the reservation executes a store conditional instruction to any address.
- Another processor executes any store instruction to the address associated with the reservation.
- Any mechanism, other than the processor holding the reservation, stores to the address associated with the reservation.
Therefore, a sequence of the instructions from a load and reserve instruction to a paired successful store conditional instruction is equivalent to a real atomic storage operation (at the address associated with the reservation established by the load and reserve instruction).
Also note that the lwarx/stwcx. and ldarx/stdcx. instructions require the EA to be aligned to 4-byte and 8-byte boundaries, respectively. Like the majority of Load/Store Indexed PowerPC instructions (e.g., stdx rS,rA,rB), EA is the sum of the content of rA (or zero if rA=0) and the content of rB, aka (rA|0)+(rB), for all these four instructions. After reading the next section, you will see that this is the reason that the ATMCADD MI instruction requires its first operand to be aligned based on the length it operands.
Analyze the Implementing LIC Routines of ATMCADD
At V5R4, the implementing LIC routines of _ATMCADD4 and _ATMCADD8 are at addresses FFFFFFFFFF 801DA0 and FFFFFFFFFF 801DC0, respectively. The LIC module containing these routines is called #cfgrbla. At V5R4, the disassembled PowerPC instructions of the implementing LIC routine of _ATMCADD8 are like so:
Implementing LIC Routine of _ATMCADD8 (FFFFFFFFFF 801DC0) |
|||
Location |
Object Text |
Source Statement |
Description |
1DC0 |
7CA |
ldarx 5,0,3 |
The effective address (EA) is the content of r3, i.e., the address of the first operand of _ATMCADD8 passed by use code. The ldarx instruction loads the operand 1 of _ATMCADD8 into r5 and establishes a reservation for use by a Store Doubleword Conditionally (stdcx.) instruction. The address of the operand 1 of _ATMCADD8 is associated with the reservation. |
1DC4 |
|
add 0,4,5 |
The sum of the value of operand 1 (in r5) and the operand 2 (passed by use code via r4) of _ATMDADD8 is placed into r0. |
1DC8 |
|
stdcx. 0,0,3 |
Store the sum to the address of the operand 1 of _ATMCADD8 if the reservation established by the previous ldarx instruction at address of the operand 1 of _ATMCADD8 is not cleared due to reasons such as another processor executes a store instruction to the same address. Whether the conditional store is performed or not, any reservation held by the processor will be cleared. The EQ bit of CR0 (CR[2]) is set to reflect whether the store is performed: CR[2] is set if the store is performed; CR[2] is cleared if the store is not performed. |
1DCC |
|
bc 6,2,-0xc |
If the previous conditional store isn't performed, the Branch Conditional (bc) instruction branches the execution to the ldarx 5,0,3 instruction to retry the sequence of instructions from ldarx 5,0,3 to stdcx. 0,0,3. |
1DD0 |
38650000 |
addi 3,5,0 |
Upon a successful store, the original value of the operand 1 of _ATMCADD8 is copied to r3 as the return value of _ATMCADD8. |
1DD4 |
4E800020 |
bclr 20,0,0 |
The BO field with the value 20 means branch always. The Branch Conditional to Link Register (bclr) instruction (bclr 20,0,0) branches the execution to the user code which invokes the implementing LIC routine of _ATMCADD8 via a bla instruction. |
Storage Synchronization Related Consideration When Using the ATMCADD MI Instruction
As documented by the MI documentation in the
Within a thread, to enforce ordering of update operations made by ATMCADD, ATMCAND, and ATMCOR to multiple shared storage, you can separate two update operations by a Synchronize Shared Storage Accesses (SYNCSTG) MI instruction to guarantee that the first update operation to be completed not later than the second update operation to be started. For example, consider the following scenario:
- An array of counters that are expected to be accessed by any thread or MI process within the system are stored in a user space (*USRSPC) object called CTRARA.
- An ILE RPG program called ATMC03 accepts an array of counter indices (ctrinx), increments each counter specified by crtinx atomically, and returns an array of the incremented counter values.
In this scenario, the ATMCADD MI instruction can be used to make sure each counter is incremented atomically, and the SYNCSTG MI instruction can be used to enforce the ordering of the update operations to multiple counters. The following is the example source code (atmc03.rpgle) of ILE RPG program ATMC03:
/** * @file atmc03.rpgle * * Atomically increment each of one or more counters shared by * multiple threads. * @pre Create a hex 1934 space named CTRRA: * CALL PGM(QUSCRTUS) + * PARM('CTRARA *CURLIB' 'USRARA' X' * X'00' *USE 'Counter Space') */
h dftactgrp(*no)
d atmc03 pr extpgm('ATMC03') d ctrinx 5u 0 dim(16) options(*varsize) d numinx 5u 0 d incctr 20i 0 dim(16) options(*varsize)
/** * @BIF _SYNCSTG (Synchronize Shared Storage Accesses (SYNCSTG)) */ d syncstg pr extproc('_SYNCSTG') d action 10u 0 value /** * @BIF _ATMCADD8 (Atomic Add (ATMCADD)) */ d atmcadd8 pr 20i 0 extproc('_ATMCADD8') d sum_addend 20i 0 d augend 20i 0 value * d rslvsp_tmpl ds qualified d obj_type d obj_name * required authorization d auth /** * @BIF _RSLVSP2 (Resolve System Pointer (RSLVSP)) */ d rslvsp2 pr extproc('_RSLVSP2') d obj * procptr d opt /** * @BIF _SETSPPFP (Set Space Pointer from Pointer (SETSPPFP)) */ d setsppfp pr * extproc('_SETSPPFP') d src_ptr * value procptr
* System pointer to hex 1934 space object CTRARA d spc@ s * procptr d counter s 20i 0 based(spp@) d dim(16) d one s 20i 0 inz(1) d n s 5u 0
d atmc03 pi d ctrinx 5u 0 dim(16) options(*varsize) d numinx 5u 0 d incctr 20i 0 dim(16) options(*varsize)
/free // Resolve a SYSPTR *USRSPC *LIBL/CTRARA rslvsp_tmpl.obj_type = x'1934'; rslvsp_tmpl.obj_name = 'CTRARA'; rslvsp2(spc@ : rslvsp_tmpl); spp@ = setsppfp(spc@);
for n = 1 to numinx; // Increase @var counter by 1 atomically atmcadd8(counter(ctrinx(n)) : one);
// Enforce ordering of shared storage operations syncstg(0);
// Display the increased value of @var counter dsply ctrinx(n) '' counter(ctrinx(n)); endfor;
*inlr = *on; /end-free |
LATEST COMMENTS
MC Press Online