Security Research

Spectre, Meltdown, and the Persistent Timing Oracle Problem in Software TEEs

Why speculative execution vulnerabilities represent a structural problem for software-only TEE implementations, and what constant-time hardware arithmetic changes about the threat model.

Spectre, Meltdown, and the Persistent Timing Oracle Problem in Software TEEs

Spectre and Meltdown were disclosed in January 2018. Seven years later, the underlying architectural problem they exposed — speculative execution creating observable microarchitectural state — remains essentially unsolved in general-purpose processors. The mitigations are real, but they are software-applied constraints layered over a hardware design that was never built to be isolation-safe. For software-based trusted execution environments, this is not a temporary inconvenience. It's a structural contradiction.

Why Speculative Execution and Isolation Are Fundamentally Incompatible

Modern out-of-order processors execute instructions speculatively — they guess which branch a conditional will take and begin executing the predicted path before the branch condition is actually resolved. When the prediction is wrong, the speculative results are discarded. But "discarded" in CPU architecture means the architectural registers are rolled back; the microarchitectural state — cache line occupancy, TLB entries, branch predictor state — is not fully cleaned. An attacker who can control cache state before a speculative execution window and measure it afterward can extract information about the data that was accessed speculatively.

This is the core of every Spectre variant. The attack does not rely on a bug in software; it exploits a property of the hardware's performance optimization. There is no patch that removes speculative execution from a general-purpose CPU — that would destroy the performance gains that made out-of-order processors useful in the first place. The mitigations (serializing instructions, microcode patches to suppress certain speculation patterns, retpoline for indirect branches) reduce the attack surface but cannot eliminate it.

Software TEEs run their protected code on the same CPU cores that handle untrusted code. The TEE boundary is enforced by privilege-level controls and memory encryption (in the case of Intel SGX or AMD SEV). But both the trusted and untrusted execution share the same speculative execution engine, the same caches, and the same branch predictors. The hardware has no concept of "this speculative access is crossing an isolation boundary and should be suppressed."

The Timing Oracle: What an Attacker Can Actually Learn

Timing oracles are the mechanism that converts microarchitectural state into data leakage. A cache timing attack works by measuring the difference between a cache hit (roughly 4–6 CPU cycles to access L1) and a cache miss (roughly 200–300 cycles to access DRAM). If an attacker can arrange for a speculative execution path inside a TEE to access memory based on a secret value, then measure whether certain cache lines are hot or cold afterward, they can recover bits of the secret one bit or a few bits at a time.

The academic literature has produced increasingly sophisticated variants. RIDL (Rogue In-Flight Data Load) targets data in-flight through CPU microarchitectural buffers rather than cache. MDS (Microarchitectural Data Sampling) attacks target the Line Fill Buffer, the Load Ports, and the Store Buffer — all internal to the CPU pipeline, all invisible to software but measurable through timing side channels.

What makes these attacks particularly problematic for software TEEs is that the trust model of a TEE explicitly assumes the untrusted host OS and hypervisor are adversarial. But the timing channels are not mediated by the OS or hypervisor — they exist at the hardware level, below the software isolation boundary. An untrusted VM running on the same physical core as a TEE-protected workload can, under the right conditions, extract key material through these channels without ever violating an access control check.

We tracked 43 distinct Spectre/MDS variant publications in the research literature between 2018 and 2024. Not one was addressed by a hardware redesign in a shipping general-purpose CPU. All mitigations were applied at the microcode or software level. The attack surface persists.

Why Constant-Time Hardware Arithmetic Changes the Threat Model

The hardware security approach to this problem is fundamentally different: remove the timing variation, not the speculative execution. If a cryptographic operation takes exactly the same number of cycles regardless of its inputs, there is no timing signal to measure.

This is what "constant-time" means in hardware cryptographic accelerators. The AES and ECDSA implementations in the Bastionchip on-chip enclave are written in synthesizable RTL with explicit constraints: no data-dependent branch paths, no data-dependent memory access patterns, fixed pipeline depths for all input values. The hardware does not speculate on cryptographic operations — it executes them on a fixed schedule in a physically isolated pipeline that has no shared-register paths with the general-purpose host CPU.

ECDSA P-384 signing completes in 18 microseconds. Constant. Regardless of the scalar value being multiplied, regardless of the key handle, regardless of what the host CPU is doing simultaneously. There is no timing signal to measure because there is no timing variation to measure.

This is not achievable in software on a general-purpose processor. Software constant-time code (avoiding data-dependent branches, using bitmasked conditionals) can reduce timing variation, but cannot eliminate it entirely — memory access patterns still depend on hardware prefetcher behavior, and the CPU's out-of-order engine introduces input-dependent variation that software cannot fully control. A hardware accelerator with a fixed execution pipeline can guarantee constant timing in a way that software cannot.

The Isolation Boundary Difference

Beyond timing, hardware enclaves provide a physically separate execution environment. When cryptographic operations execute inside the Bastionchip on-chip enclave, they share no L1/L2 cache with the host CPU, no branch predictor state, no execution ports. The on-chip memory hierarchy is physically distinct. There is no microarchitectural state to measure from outside the boundary.

This is categorically different from a software TEE on a shared-cache multi-core CPU. Intel TDX and AMD SEV-SNP encrypt memory contents to protect against a compromised hypervisor reading DRAM, but both run on the same CPU cores that also run untrusted code. The hardware memory encryption addresses one attack vector — direct DRAM read — while leaving the speculative execution timing surface intact.

A hardware co-processor approach — a separate silicon die for cryptographic operations — addresses the timing oracle problem by eliminating the shared execution environment. The only information that crosses the boundary is the input (plaintext or key handle over PCIe) and the output (ciphertext or signature). The intermediate microarchitectural state never appears on any bus or shared structure that an untrusted observer can measure.

What This Means for TEE Architecture Decisions

The practical implication is that software-only TEEs are appropriate for workload isolation (preventing a compromised host from directly reading workload memory) but insufficient for cryptographic key isolation against an adversary with access to the same physical host. For key derivation, private key storage, and bulk encryption operations, the timing oracle surface in software TEEs represents a residual risk that hardware mitigations do not fully close.

Teams building confidential compute infrastructure should distinguish between these two threat models:

  • Memory confidentiality against a compromised hypervisor: AMD SEV-SNP and Intel TDX address this effectively.
  • Cryptographic key isolation against a co-located adversary with timing measurement capability: This requires hardware separation — either dedicated physical hosts or a cryptographic co-processor with an isolated execution environment.

The research community will continue to find new variants of speculative execution attacks as long as general-purpose CPUs optimize for performance using speculation. Our view is that the correct architectural response is to move key material out of the speculative execution environment entirely, not to add more software constraints on top of hardware that was never designed with isolation as a primary goal.