← All posts

Rosetta 2 Internals: The Magic Behind Apple Silicon

Date: 2025-11-24 Tags: macOS, ARM64, Emulation, Low Level Author: Wissam Ztaoui


Introduction

When Apple announced the transition from Intel (x86_64) to Apple Silicon (ARM64), skeptics predicted a performance disaster for legacy apps. Instead, Rosetta 2 delivered near-native performance. How? It’s not just an emulator; it’s a marvel of hardware-software co-design.


1. AOT vs. JIT Translation

Traditional emulators (like QEMU) use Just-In-Time (JIT) compilation. They translate code block-by-block as it runs. This adds significant overhead.

Rosetta’s Secret: Ahead-of-Time (AOT)

Rosetta 2 translates the entire executable at installation time (or first launch).

  1. You launch Photoshop_Intel.app.
  2. Rosetta parses the Mach-O binary.
  3. It translates all x86_64 instructions to ARM64 equivalent.
  4. It signs the new binary and caches it.
  5. Subsequent launches run the translated ARM64 code directly.

Note: JIT is still used for applications that generate code at runtime (like Java JVMs or JavaScript engines).


2. The Memory Ordering Nightmare (TSO)

This is the most “genius” part of the M1 chip.

x86: Total Store Ordering (TSO)

x86 guarantees that memory writes are seen by other cores in the order they happened. Store A -> Store B implies that if you see B, you must see A.

ARM: Weak Memory Ordering

ARM is “weakly ordered”. It can reorder reads and writes for performance. Store A -> Store B might result in another core seeing B before A.

The Problem

Emulating TSO on a Weak system requires inserting memory barriers (DMB instructions) after every load/store. This kills performance (up to 40% loss).

Apple’s Solution: Hardware TSO

Apple added a custom hardware mode to the M1/M2/M3 chips. When running a Rosetta process, the CPU switches to TSO Mode. It enforces x86-style memory ordering in hardware.


3. Register Mapping

x86_64 has fewer registers than ARM64, which makes mapping easy.

Rosetta reserves specific ARM registers to hold the x86 flags (ZF, SF, CF, OF). Updating flags is expensive, so Rosetta performs Flag Elision: it analyzes the code to see if the flags are actually used later. If not, it skips the calculation.


4. Handling 4KB Pages

If an app relies on mmap with 4KB alignment, it should crash on a 16KB system. Solution: The M1 supports a hybrid mode where it can enforce 4KB page granularity for specific processes, maintaining compatibility at the MMU level.


Conclusion

Rosetta 2 isn’t just software. It’s a feature of the M-series chips. By modifying the silicon to accommodate the old architecture (TSO, 4KB pages), Apple solved the hardest problems of emulation in hardware, leaving the software to do what it does best: static translation.


← Back to all posts