Anyone have any ideas? I’m surprised that the compiler emits a “useless” instruction at -o3.
At the end of the day the black box is literally doing:
asm("" : : "r"(&mut dummy) : "memory" : "volatile");
so 'mov rcx, rsp' is simply storing the address of dummy (at this point stored at top of stack) into rcx (which has been chosen to match the 'r' constraint) to pass it to the inline asm. The address doesn't change and rcx is not clobbered, so it is loop invariant can be hoisted out of the loop.But the mov cannot be removed totally, because the compiler doesn't look inside the asm, and as far as the compiler is knows, the rcx might be used by the asm body. As far as the compiler is concerned, the address of dummy has escaped, so stores into it cannot be optimized either, hence 'mov dword ptr [rsp], 3 ' appears in the final generated code.
If the compiler knew that the address was not used, then it would also be able to remove the store to dummy, as it would be able to prove that it is never read from.