PUF Hardware Security Silicon Design

Why Physical Unclonable Functions Beat Factory Key Injection

An examination of PUF entropy sources in custom silicon and why they eliminate the factory provisioning attack surface entirely.

January 28, 2025 — Bastionchip Engineering

Microscopic view of silicon PUF cell array structures on a chip die

Every hardware security architecture eventually confronts the same foundational question: where does the first secret come from? For decades, the dominant answer was factory key injection — a provisioning station writes a unique key into fuse memory or one-time programmable storage during manufacturing, and that key becomes the device's root identity. It works, until it doesn't. The provisioning station itself is an attack surface. The key transport from generation to write is an attack surface. The fuse readback vulnerability is an attack surface. Physical Unclonable Functions offer a structurally different answer: derive the root key from manufacturing variation in the silicon itself, so there is no key to inject and therefore no provisioning attack surface to protect.

How PUFs Harvest Entropy from Silicon Physics

A PUF exploits the fact that every CMOS fabrication run produces statistically unique transistor threshold voltages, gate oxide thicknesses, and interconnect resistances — even across nominally identical circuit structures. The three dominant PUF topologies each harvest this variation differently.

SRAM PUFs read the power-on state of an SRAM array. Each bistable cell races to its preferred state at power-on, and that preference is determined by local threshold voltage mismatch between the two cross-coupled inverters. At 28 nm, inter-device Hamming distance typically runs 45–50% — close to the theoretical maximum of 50% for a good entropy source — while intra-device Hamming distance (stability across temperature and voltage) runs in the 1–5% range, depending on process node and operating conditions.

Arbiter PUFs route a rising-edge signal through two symmetrical delay paths and observe which path wins the race to a D-flip-flop. The delay difference is set by manufacturing variation in the routing metals and gates. A 64-stage arbiter PUF on 40 nm TSMC produces a 64-bit response, but these are not directly usable as keys — they exhibit strong machine-learning susceptibility to modeling attacks. Linear arbiter PUFs in particular can be cloned with around 10,000 challenge-response pair samples using standard logistic regression. This is a hard constraint that limits arbiter PUF deployment to authentication protocols with restricted CRP exposure, not key derivation.

Ring Oscillator PUFs compare the frequencies of identically-laid-out ring oscillators. Frequency differences encode manufacturing variation and tend to be more temperature-stable than SRAM PUFs at the cost of requiring a dedicated oscillator array on die, which adds area and power. For key derivation on a dedicated security ASIC — where area budget is available and long key stability matters — ring oscillator designs are worth the tradeoff.

The Reproducibility Problem and Fuzzy Extractors

The critical gap between "PUF response" and "cryptographic key" is reproducibility. A raw SRAM PUF bit-error rate (BER) of 3% means that a 256-bit key reconstruction fails with near-certainty without error correction — the probability of all 256 bits being correct simultaneously is (0.97)^256 ≈ 0.00036. This is where fuzzy extractors become load-bearing infrastructure.

A fuzzy extractor has two phases: Gen(w) takes the noisy PUF response w and outputs a stable key R and a public helper data string P. Rep(w', P) reconstructs R from any w' close enough to w. The helper data leaks information about the PUF response — how much is the key security question. BCH codes with enough redundancy can tolerate BER up to 12–15% while publishing helper data that leaks at most a few bits of entropy about the underlying response, if the code parameters are chosen carefully.

NIST SP 800-90B Section 6 defines the formal entropy estimation framework that any entropy source in a cryptographic module must satisfy. For a PUF-as-entropy-source, the relevant tests are the IID (independent and identically distributed) battery — primarily the permutation test and the chi-squared test — and the non-IID estimators including the most-stringent Multi Most-Common-in-Window (MultiMCW) and Lag predictors. A well-designed SRAM PUF at 28 nm typically passes the non-IID estimators with min-entropy estimates in the range of 0.85–0.95 bits per bit of raw response. Anything below 0.7 bits per bit is a red flag that either the process node is too mature (variation is tightening) or there is systematic bias in the cell layout.

Process Node Dependencies: Not All Fabs Are Equal

This is where PUF design gets inconvenient. SRAM PUF behavior is fundamentally a function of transistor threshold voltage matching, which is a by-product of process variation management — and fabs have been working hard for decades to reduce that variation, because it hurts yield and speed-binning. At 7 nm and below, the random dopant fluctuation (RDF) that SRAM PUFs rely on becomes smaller relative to systematic lithography effects. Some groups have reported that SRAM PUF intra-device BER actually improves at advanced nodes (less temperature drift), but inter-device Hamming distance can compress toward 42–45% in tight-process nodes, which reduces available entropy.

The practical implication for a security ASIC targeting TSMC N28 or similar: characterize the PUF across at least three wafer lots before committing to helper data code parameters. The BER distribution across lots can shift by 1–2 percentage points, which is enough to invalidate a BCH code dimensioned on a single-lot sample. Tape-out of an SRAM PUF without a dedicated silicon characterization run is an underestimated risk — not because the PUF won't work, but because the error correction parameters will be wrong.

Temperature stability deserves explicit attention. A PUF key used in a server HSM running at 70°C ambient must reconstruct correctly. Most SRAM PUFs exhibit BER degradation of 0.5–1.5% per 10°C increase above room temperature; a design characterized only at 25°C will surprise teams during thermal stress testing at 85°C. Correct practice: run SP 800-90B characterization at both temperature extremes of the target operating range and dimension error correction for worst-case BER, not nominal.

Why Factory Key Injection Remains Persistent (and Where It Still Makes Sense)

We're not saying factory key injection is categorically wrong. For low-security consumer devices where the provisioning station is inside a trusted manufacturing environment, the threat model may not justify the added area and complexity of a PUF plus fuzzy extractor. OTP fuse injection is well-understood, has a 20-year production history, and pairs naturally with secure boot ROM that simply reads the fuse array at reset.

The problem emerges at scale and at higher security levels. A provisioning station that writes unique keys to 50,000 devices per day is a high-value target. A compromised provisioning key escrow means all 50,000 devices in that lot are recoverable. FIPS 140-3 Level 3 physical security requirements create pressure to protect the provisioning infrastructure itself — and that infrastructure is off the die, outside the evaluatable security boundary. PUFs solve the escrow problem entirely: the key never exists until the device is powered on, and it exists only inside the security boundary.

Integration Scenario: ASIC Security Block at 28 nm

Consider a custom security ASIC targeting the server HSM market, fabbed at TSMC N28HP. The design includes a 4 kbit SRAM PUF array — sufficient for 256-bit root key extraction with margin — alongside a BCH(511,493) error correction block clocked separately from the main crypto engine to avoid power-analysis side channels. On first enrollment at power-on in a clean-room environment, the fuzzy extractor runs Gen(), computes the stable key material, and stores the helper data in non-volatile memory. The raw PUF response never persists in any register or bus.

During the SP 800-90B qualification run for this design, the non-IID LZ78Y estimator returned min-entropy of 0.91 bits per sample across a 200-device sample set. The MultiMCW estimator, typically the most pessimistic for SRAM PUFs due to the spatial correlation between adjacent cells, returned 0.87 bits per sample — sufficient to claim 220 bits of min-entropy from the 256-bit raw response after accounting for correlation, comfortably above the 256-bit security strength target with the BCH code correction applied to the residual entropy budget.

One dimension that trips teams during evaluation: the SP 800-90B tests must be run on raw source data, not post-processed data. If the fuzzy extractor or any conditioning function is applied before running the estimators, the results are invalid for NIST purposes. The test input must be the voltage rail decision bits before error correction.

The Counter-Argument: Physical PUF Attacks

PUFs are not invulnerable. SRAM PUF cells can be read by a sufficiently motivated attacker with nano-probing equipment under focused ion beam (FIB) access — exactly the threat class that active tamper meshes are designed to prevent. If the mesh zeroizes the device before FIB access reaches the SRAM array, the PUF response is unrecoverable. But if the mesh fails, a PUF is no more recoverable than a fuse key. The security argument for PUFs is primarily about attack surface during manufacturing and distribution — not about physical invulnerability at the chip level.

The key design principle is defense in depth: PUF entropy source eliminates the factory provisioning attack surface; active tamper mesh eliminates the post-deployment physical attack surface; both together define the actual security boundary. Either alone is insufficient for a device targeting FIPS 140-3 Level 3 or Common Criteria EAL 4+ evaluation.

Teams evaluating PUF IP for integration should budget for the SP 800-90B characterization run as a distinct silicon bring-up milestone — typically 2–3 weeks of measurement time across temperature and voltage corners on a 50–100 device sample set. The cost is non-trivial, but it is the only rigorous path to a defensible entropy source claim in a CMVP submission.