Instruction-Level Instrumentation¶
Introduction¶
This section describes general considerations that are necessary when adding instrumentation instructions within existing code. We aim to cover both 64-bit x86_64 and aarch64.
Instruction Insertion¶
Egalito allows instructions to be added at any point within existing basic
blocks. Instructions are usually added with the ChunkMutator
class. Note
that when inserting instructions before an existing instruction, an important
consideration is whether any incoming jumps to that existing instruction should
run the instrumentation or not. For example, instrumentation intended to be
executed once upon function entry (like stack XOR) should not be targeted by
jumps like a loop back to the first instruction; for this case, use
insertBefore
. Conversely, any instrumentation tied to individual
instructions should use insertBeforeJumpTo
so that incoming jumps will not
skip the instrumentation.
Be warned that insertBeforeJumpTo
works by swapping the
InstructionSemantics inside individual Instructions. This is so that any
incoming jump references will continue to refer to the first instruction in the
block. This means that any Instruction *
pointer to the insert point will
actually point at the newly inserted instruction (now the first in the block)
after a call to insertBeforeJumpTo
. It may be better to use insertion
functions that take an array of instructions if multiple insertions are needed,
to avoid this situation.
Egalito also provides other functions in ChunkMutator
like insertAfter
,
append
, etc. Basic blocks are not split automatically based on user
modifications. For this, see the SplitBasicBlocks
pass. Also, if enough
instructions are inserted and short 1-byte jump instructions are used on
x86_64, the jumps me no longer be able to reach the target blocks. Run the
PromoteJumpsPass
to deal with this.
Simple Call Instrumentation¶
The architecture calling convention specifies what happens to each register at
function call boundaries. Callee-saved registers must be preserved across the
function call, and the calling function may depend on this. But caller-saved
registers are allowed to be overwritten by the calling function. If a
caller-saved register is not used as a parameter, it is essentially safe to
clobber the value of the register and use it for instrumentation code. On
x86_64, we frequently use %r10
and %r11
for this purpose. For example,
we could place the following code before a call or at the beginning of a
function to implement stack XOR, and not worry about saving and restoring
%r11
:
mov %fs:0x28, %r11
xor (%rsp), %r11
Unfortunately, saving and restoring registers on x86_64 is complicated by leaf functions. If a function does not call any other functions, the compiler may opt to not create a stack frame and instead use positive offsets from the stack pointer, accessing the 128 bytes beyond the top of the stack that are guaranteed by the architecture to always be available. This is called the red zone, and Egalito has an analysis to determine if a given function uses it.
When a function is using the red zone, an inserted push instruction will overwrite existing data. One alternative is to use a thread-local storage location to spill an existing register. Another alternative is to use data-flow analysis to see if any registers are unused. On aarch64, we implemented a register reservation pass which can reserve a register for used by instrumentation code throughout a function. If necessary, the pass will expand the stack frame and spill other uses of registers. This code could be ported to x86_64 but TLS accesses work well on that platform.
Besides the standard x86 callee-saved registers, the XMM registers like
%xmm0
are also supposed to be preserved across call boundaries. This is not
a problem if the instrumentation code does not use XMM registers, but XMM
registers cannot be stored on a stack unless it is 16-byte aligned. A segfault
will occur otherwise. Some functions in libc such as memcpy
implementations
do put XMM registers on the stack. This boils down to needing to preserve the
16-byte stack alignment that the compiler would have ensured at call
boundaries: push an even number of 8-byte quadwords.
See the InstrumentCallsPass
for an example of how to handle some of these
issues, and InstrumentInstructionPass
for more sophisticated version.
Instrumentation at Jumps¶
x86_64 uses jump instructions in many situations:
- to target basic blocks within a function (these are often 1-byte jumps which
may need to be promoted to 4-byte jumps with
PromoteJumpsPass
); - for tail recursion, i.e. calling another function and having it inherit the current stack frame;
- for jump tables, using an indirect jump; and
- even for indirect tail recursion, though this appears only in hand-coded assembly e.g. libc low-level lock functions.
Egalito classifies each jump as internal (within a function) or external (tail recursion), and additionally identifies all jump table invocations. This is important because many types of instrumentation do not wish to operate on all kinds of jumps.
For internal jumps and jump table invocations, the architecture calling convention does not help identify any usable registers. Thus, it is usually necessary to spill an existing registers a the stack (unless analysis finds an unused register). This is subject to the same red zone caveats mentioned above.
Indirect Calls/Jumps¶
For indirect calls and jumps, it may be desirable to perform some checks on the final target address. This is relatively easy on RISC architectures, but on x86_64, it is not so uncommon to see control flow instructions with complex memory operands such as:
jmpq *(%rax,%rbx,8) ; typical PIC jump table
callq *0x40(%rax) ; typical C++ vtable
We provide passes to transform such values to move the target into %r11
,
along instrumentation to examine it, and then calling %r11
.
Instrumentation at Arbitrary Points¶
When adding instrumentation at arbitrary points in basic blocks, all the same
caveats apply. No registers can be assumed to be available; the red zone may
get in the way of spilling existing registers. The stack may not be 16-byte
aligned, particularly in the middle of a leaf function. Furthermore, the code
may rely on the rflags
register to be preserved for conditionals. We
provide an InstrumentInstructionPass
that allows a call instruction to be
inserted at any arbitrary point, taking care of most of these details.
See also pass/endbrenforce.cpp
, pass/syscallsandbox.cpp
,
pass/retpoline.cpp
, pass/instrumentinstr.cpp
for some inspiration.