Promotor: Prof. Dr. H. Bos
Co-Promotor: Dr. C. Giuffrida
Computers have seen an eventful past century. Originally a profession involving long, tedious hours of manual calculations by way of rigidly following instructions, computers have been mechanized, automated and developed into the myriad devices near inescapable in modern life. Throughout decades mechanical gears gave way to thermionic valves, which gave way to discrete transistors, in turn giving way to integrated circuits all the way to modern VLSI chips that feature billions of transistors whose features are measured in widths of atoms. This long evolutionary journey has been driven forward by a constant pressure to deliver more—more capacity, more performance, more computing per unit cost. In short: people want fast computers. At first this need was easily fulfilled. Rapid developments in materials and manufacturing technology made semiconductors smaller, faster, cheaper and more efficient many times over every few years, yet demand showed no sign of slowing down. Indeed, this rapid advancement had been consistent enough to become the expected norm, to the extent that the so-called “Moore’s law”—an observation of this fact by the then director of R&D at Fairchild Semiconductor—has been enshrined in our collective consciousness ever since. This boom could not last, and technology started hitting fundamental limits, development slowed down facing ever larger hurdles, while demand kept its ever-accelerating pace. Focus shifted away from brute force improvements and towards smarter and more efficient use of on-die resources. Cache hierarchies were built to bridge the growing performance gap between processor and DRAM. Pipelined and out of order designs became common, achieving better utilization of execution units during every clock cycle. Finally, multi-core designs offered more total computation per unit power than a fast single core, leading to their widespread adoption. In the interest of preserving compatibility, however, many of these changes remained invisible to software. The architecture had not adapted to reflect these changes, continuing to present single in-order CPUs with a uniform flat memory address space. These advancements in microarchitecture came with the inevitable, though perhaps unintended, consequence of having to share resources, be it memory shared between multiple CPU cores or logic units shared between execution threads. A 1 2 CHAPTER 1. INTRODUCTION shared resource will naturally behave differently from an exclusively held one, a difference that software remained largely unaware of, thanks to the specific focus on compatibility going into the design of these advancements. This ignorance lead to an increasing gap between reality and what assumptions programs make about the underlying hardware and its behavior. Furthermore, resource sharing blurs the boundaries between domains, where parties can measurably interfere with each other’s execution. In the more benign cases, it can manifest as an unexpected execution bottleneck—a mere performance nuisance. On the other end of the spectrum, a skilled attacker can exploit hidden microarchitectural behavior to their advantage, opening the door to an entirely new class of vulnerabilities. Knowing precisely how an attacker’s level of skill and knowledge of microarchitecture affects the feasibility and effectiveness of exploits constitutes a key motivation for this work. Microarchitectural exploits, through their use of hardware as an unmitigated attack vector, undermine the traditional threat model of system security. This has led to a fruitful avenue for security research, with extensive work that compromises the confidentiality, integrity and availability of systems both old and new. Initial interest [33, 48, 72, 90, 121] focused on memory caches and their potential to expose information about a target program. Execution leaves measurable traces in the cache, and sensitive execution can leave sensitive traces. An attacker can use the cache state it shares with the victim program as a side channel to read these traces and infer its secrets, compromising the confidentiality of an otherwise sound system. Memory caches were not the only microarchitectural components to be targeted however, with more recent developments targeting the out-of-order nature of modern high-performance CPUs [19, 38, 63, 70, 96, 100]. Out-of-order execution designs use internal state that, akin to caches, can leak sensitive information to the outside. On the opposite end of the memory hierarchy, DRAM saw its own chain of developments. Driven by the same constant demand for more performance, clock rates increased and cells shrank down to nanometers, holding less and less charge with each generation. A smaller charge is naturally more prone to unwanted interference, posing various reliability issues to the manufacture of DRAM. One such issue of particular note, disturbance errors, are errors that occur when heightened activity of neighboring cells leak charge from a victim cell, leading to a “bit flip”. While the industry was fully aware of these reliability issues, the common sense at the time viewed DRAM errors as largely random, uncontrollable occurrences, uniform in their distribution, and therefore able to be mitigated by classic error correction. Malicious exploitation of disturbance errors had not been considered as a serious attack vector. This view changed after Rowhammer [62]—a controlled, targeted form of inducing disturbance errors—was demonstrated on commercially available DDR3 DRAM. End-to-end attacks soon followed, targeting various sensitive data structures such as page tables [102, 114, 115, 119], cryptographic keys [97], and object pointers [14, 32]. Attacks also moved towards less privileged environments, from native userspace code [102] to JavaScript [14, 41] to sandboxed mobile applications [114] and even crossing over to GPUs, ostensibly sandboxed through WebGL [32]