5455 字

14 分钟

Java JMM内存模型

2024-02-01

cs-base

java

doc

meeting

multi-prog

JMM(Java 内存模型)主要定义了对于一个共享变量，当另一个线程对这个共享变量执行写操作后，这个线程对这个共享变量的可见性。

要想理解透彻 JMM（Java 内存模型），我们先要从 CPU 缓存模型和指令重排序 说起！

从 CPU 缓存模型说起#

为什么要弄一个 CPU 高速缓存呢？ 类比我们开发网站后台系统使用的缓存（比如 Redis）是为了解决程序处理速度和访问常规关系型数据库速度不对等的问题。 CPU 缓存则是为了解决 CPU 处理速度和内存处理速度不对等的问题。

我们甚至可以把 内存看作外存的高速缓存，程序运行的时候我们把外存的数据复制到内存，由于内存的处理速度远远高于外存，这样提高了处理速度。

总结：CPU Cache 缓存的是内存数据用于解决 CPU 处理速度和内存不匹配的问题，内存缓存的是硬盘数据用于解决硬盘访问速度过慢的问题。

现代的 CPU Cache 通常分为三层，分别叫 L1,L2,L3 Cache。有些 CPU 可能还有 L4 Cache。

CPU Cache 的工作方式： 先复制一份数据到 CPU Cache 中，当 CPU 需要用到的时候就可以直接从 CPU Cache 中读取数据，当运算完成后，再将运算得到的数据写回 Main Memory 中。但是，这样存在 内存缓存不一致性的问题 ！比如我执行一个 i++ 操作的话，如果两个线程同时执行的话，假设两个线程从 CPU Cache 中读取的 i=1，两个线程做了 i++ 运算完之后再写回 Main Memory 之后 i=2，而正确结果应该是 i=3。

CPU 为了解决内存缓存不一致性问题可以通过制定缓存一致协议（比如 MESI 协议）或者其他手段来解决。 这个缓存一致性协议指的是在 CPU 高速缓存与主内存交互的时候需要遵守的原则和规范。不同的 CPU 中，使用的缓存一致性协议通常也会有所不同。

我们的程序运行在操作系统之上，操作系统屏蔽了底层硬件的操作细节，将各种硬件资源虚拟化。于是，操作系统也就同样需要解决内存缓存不一致性问题。

操作系统通过 内存模型（Memory Model） 定义一系列规范来解决这个问题。无论是 Windows 系统，还是 Linux 系统，它们都有特定的内存模型。

指令重排序#

说完了 CPU 缓存模型，我们再来看看另外一个比较重要的概念 指令重排序 。

为了提升执行速度/性能，计算机在执行程序代码的时候，会对指令进行重排序。

什么是指令重排序？ 简单来说就是系统在执行代码的时候并不一定是按照你写的代码的顺序依次执行。

常见的指令重排序有下面 2 种情况：

编译器优化重排：编译器（包括 JVM、JIT 编译器等）在不改变单线程程序语义的前提下，重新安排语句的执行顺序。
指令并行重排：现代处理器采用了指令级并行技术(Instruction-Level Parallelism，ILP)来将多条指令重叠执行。如果不存在数据依赖性，处理器可以改变语句对应机器指令的执行顺序。

另外，内存系统也会有“重排序”，但又不是真正意义上的重排序。在 JMM 里表现为主存和本地内存的内容可能不一致，进而导致程序在多线程下执行可能出现问题。

Java 源代码会经历 编译器优化重排 —> 指令并行重排 —> 内存系统重排 的过程，最终才变成操作系统可执行的指令序列。

指令重排序可以保证串行语义一致，但是没有义务保证多线程间的语义也一致 ，所以在多线程下，指令重排序可能会导致一些问题。

编译器和处理器的指令重排序的处理方式不一样。对于编译器，通过禁止特定类型的编译器重排序的方式来禁止重排序。对于处理器，通过插入内存屏障（Memory Barrier，或有时叫做内存栅栏，Memory Fence）的方式来禁止特定类型的处理器重排序。指令并行重排和内存系统重排都属于是处理器级别的指令重排序。

内存屏障（Memory Barrier，或有时叫做内存栅栏，Memory Fence）是一种 CPU 指令，用来禁止处理器指令发生重排序（像屏障一样），从而保障指令执行的有序性。另外，为了达到屏障的效果，它也会使处理器写入、读取值之前，将主内存的值写入高速缓存，清空无效队列，从而保障变量的可见性。

JMM(Java Memory Model)#

什么是 JMM？为什么需要 JMM？#

Java 是最早尝试提供内存模型的编程语言。由于早期内存模型存在一些缺陷（比如非常容易削弱编译器的优化能力），从 Java5 开始，Java 开始使用新的内存模型《JSR-133：Java Memory Model and Thread Specification》。

一般来说，编程语言也可以直接复用操作系统层面的内存模型。不过，不同的操作系统内存模型不同。如果直接复用操作系统层面的内存模型，就可能会导致同样一套代码换了一个操作系统就无法执行了。Java 语言是跨平台的，它需要自己提供一套内存模型以屏蔽系统差异。

这只是 JMM 存在的其中一个原因。实际上，对于 Java 来说，你可以把 JMM 看作是 Java 定义的并发编程相关的一组规范，除了抽象了线程和主内存之间的关系之外，其还规定了从 Java 源代码到 CPU 可执行指令的这个转化过程要遵守哪些和并发相关的原则和规范，其主要目的是为了简化多线程编程，增强程序可移植性的。

为什么要遵守这些并发相关的原则和规范呢？ 这是因为并发编程下，像 CPU 多级缓存和指令重排这类设计可能会导致程序运行出现一些问题。就比如说我们上面提到的指令重排序就可能会让多线程程序的执行出现问题，为此，JMM 抽象了 happens-before 原则（后文会详细介绍到）来解决这个指令重排序问题。

JMM 说白了就是定义了一些规范来解决这些问题，开发者可以利用这些规范更方便地开发多线程程序。对于 Java 开发者说，你不需要了解底层原理，直接使用并发相关的一些关键字和类（比如 volatile、synchronized、各种 Lock）即可开发出并发安全的程序。

JMM 是如何抽象线程和主内存之间的关系？#

Java 内存模型（JMM） 抽象了线程和主内存之间的关系，就比如说线程之间的共享变量必须存储在主内存中。

在 JDK1.2 之前，Java 的内存模型实现总是从主存（即共享内存）读取变量，是不需要进行特别的注意的。而在当前的 Java 内存模型下，线程可以把变量保存 本地内存 （比如机器的寄存器）中，而不是直接在主存中进行读写。这就可能造成一个线程在主存中修改了一个变量的值，而另外一个线程还继续使用它在寄存器中的变量值的拷贝，造成数据的不一致。

这和我们上面讲到的 CPU 缓存模型非常相似。

什么是主内存？什么是本地内存？

主内存：所有线程创建的实例对象都存放在主内存中，不管该实例对象是成员变量，还是局部变量，类信息、常量、静态变量都是放在主内存中。为了获取更好的运行速度，虚拟机及硬件系统可能会让工作内存优先存储于寄存器和高速缓存中。
本地内存：每个线程都有一个私有的本地内存，本地内存存储了该线程以读 / 写共享变量的副本。每个线程只能操作自己本地内存中的变量，无法直接访问其他线程的本地内存。如果线程间需要通信，必须通过主内存来进行。本地内存是 JMM 抽象出来的一个概念，并不真实存在，它涵盖了缓存、写缓冲区、寄存器以及其他的硬件和编译器优化。

Java 内存模型的抽象示意图如下：

从上图来看，线程 1 与线程 2 之间如果要进行通信的话，必须要经历下面 2 个步骤：

线程 1 把本地内存中修改过的共享变量副本的值同步到主内存中去。
线程 2 到主存中读取对应的共享变量的值。

也就是说，JMM 为共享变量提供了可见性的保障。

不过，多线程下，对主内存中的一个共享变量进行操作有可能诱发线程安全问题。举个例子：

线程 1 和线程 2 分别对同一个共享变量进行操作，一个执行修改，一个执行读取。
线程 2 读取到的是线程 1 修改之前的值还是修改后的值并不确定，都有可能，因为线程 1 和线程 2 都是先将共享变量从主内存拷贝到对应线程的工作内存中。

关于主内存与工作内存直接的具体交互协议，即一个变量如何从主内存拷贝到工作内存，如何从工作内存同步到主内存之间的实现细节，Java 内存模型定义来以下八种同步操作（了解即可，无需死记硬背）：

锁定（lock）: 作用于主内存中的变量，将他标记为一个线程独享变量。
解锁（unlock）: 作用于主内存中的变量，解除变量的锁定状态，被解除锁定状态的变量才能被其他线程锁定。
read（读取）：作用于主内存的变量，它把一个变量的值从主内存传输到线程的工作内存中，以便随后的 load 动作使用。
load(载入)：把 read 操作从主内存中得到的变量值放入工作内存的变量的副本中。
use(使用)：把工作内存中的一个变量的值传给执行引擎，每当虚拟机遇到一个使用到变量的指令时都会使用该指令。
assign（赋值）：作用于工作内存的变量，它把一个从执行引擎接收到的值赋给工作内存的变量，每当虚拟机遇到一个给变量赋值的字节码指令时执行这个操作。
store（存储）：作用于工作内存的变量，它把工作内存中一个变量的值传送到主内存中，以便随后的 write 操作使用。
write（写入）：作用于主内存的变量，它把 store 操作从工作内存中得到的变量的值放入主内存的变量中。

除了这 8 种同步操作之外，还规定了下面这些同步规则来保证这些同步操作的正确执行（了解即可，无需死记硬背）：

不允许一个线程无原因地（没有发生过任何 assign 操作）把数据从线程的工作内存同步回主内存中。
一个新的变量只能在主内存中 “诞生”，不允许在工作内存中直接使用一个未被初始化（load 或 assign）的变量，换句话说就是对一个变量实施 use 和 store 操作之前，必须先执行过了 assign 和 load 操作。
一个变量在同一个时刻只允许一条线程对其进行 lock 操作，但 lock 操作可以被同一条线程重复执行多次，多次执行 lock 后，只有执行相同次数的 unlock 操作，变量才会被解锁。
如果对一个变量执行 lock 操作，将会清空工作内存中此变量的值，在执行引擎使用这个变量前，需要重新执行 load 或 assign 操作初始化变量的值。
如果一个变量事先没有被 lock 操作锁定，则不允许对它执行 unlock 操作，也不允许去 unlock 一个被其他线程锁定住的变量。

Java 内存区域和 JMM 有何区别？#

Java 内存区域和内存模型是完全不一样的两个东西：

JVM 内存结构和 Java 虚拟机的运行时区域相关，定义了 JVM 在运行时如何分区存储程序数据，就比如说堆主要用于存放对象实例。
Java 内存模型和 Java 的并发编程相关，抽象了线程和主内存之间的关系就比如说线程之间的共享变量必须存储在主内存中，规定了从 Java 源代码到 CPU 可执行指令的这个转化过程要遵守哪些和并发相关的原则和规范，其主要目的是为了简化多线程编程，增强程序可移植性的。

happens-before 原则是什么？#

happens-before 这个概念最早诞生于 Leslie Lamport 于 1978 年发表的论文《Time，Clocks and the Ordering of Events in a Distributed System》。在这篇论文中，Leslie Lamport 提出了逻辑时钟的概念，这也成了第一个逻辑时钟算法。在分布式环境中，通过一系列规则来定义逻辑时钟的变化，从而能通过逻辑时钟来对分布式系统中的事件的先后顺序进行判断。逻辑时钟并不度量时间本身，仅区分事件发生的前后顺序，其本质就是定义了一种 happens-before 关系。

上面提到的 happens-before 这个概念诞生的背景并不是重点，简单了解即可。

JSR 133 引入了 happens-before 这个概念来描述两个操作之间的内存可见性。

为什么需要 happens-before 原则？ happens-before 原则的诞生是为了程序员和编译器、处理器之间的平衡。程序员追求的是易于理解和编程的强内存模型，遵守既定规则编码即可。编译器和处理器追求的是较少约束的弱内存模型，让它们尽己所能地去优化性能，让性能最大化。happens-before 原则的设计思想其实非常简单：

为了对编译器和处理器的约束尽可能少，只要不改变程序的执行结果（单线程程序和正确执行的多线程程序），编译器和处理器怎么进行重排序优化都行。
对于会改变程序执行结果的重排序，JMM 要求编译器和处理器必须禁止这种重排序。

下面这张是《Java 并发编程的艺术》这本书中的一张 JMM 设计思想的示意图。

了解了 happens-before 原则的设计思想，我们再来看看 JSR-133 对 happens-before 原则的定义：

如果一个操作 happens-before 另一个操作，那么第一个操作的执行结果将对第二个操作可见，并且第一个操作的执行顺序排在第二个操作之前。
两个操作之间存在 happens-before 关系，并不意味着 Java 平台的具体实现必须要按照 happens-before 关系指定的顺序来执行。如果重排序之后的执行结果，与按 happens-before 关系来执行的结果一致，那么 JMM 也允许这样的重排序。

我们看下面这段代码：

1
int userNum = getUserNum();   // 1
2
int teacherNum = getTeacherNum();   // 2
3
int totalNum = userNum + teacherNum;  // 3

1 happens-before 2
2 happens-before 3
1 happens-before 3

虽然 1 happens-before 2，但对 1 和 2 进行重排序不会影响代码的执行结果，所以 JMM 是允许编译器和处理器执行这种重排序的。但 1 和 2 必须是在 3 执行之前，也就是说 1,2 happens-before 3 。

happens-before 原则表达的意义其实并不是一个操作发生在另外一个操作的前面，虽然这从程序员的角度上来说也并无大碍。更准确地来说，它更想表达的意义是前一个操作的结果对于后一个操作是可见的，无论这两个操作是否在同一个线程里。

举个例子：操作 1 happens-before 操作 2，即使操作 1 和操作 2 不在同一个线程内，JMM 也会保证操作 1 的结果对操作 2 是可见的。

happens-before 常见规则有哪些？谈谈你的理解？#

happens-before 的规则就 8 条，说多不多，重点了解下面列举的 5 条即可。全记是不可能的，很快就忘记了，意义不大，随时查阅即可。

程序顺序规则：一个线程内，按照代码顺序，书写在前面的操作 happens-before 于书写在后面的操作；
解锁规则：解锁 happens-before 于加锁；
volatile 变量规则：对一个 volatile 变量的写操作 happens-before 于后面对这个 volatile 变量的读操作。说白了就是对 volatile 变量的写操作的结果对于发生于其后的任何操作都是可见的。
传递规则：如果 A happens-before B，且 B happens-before C，那么 A happens-before C；
线程启动规则：Thread 对象的 start()方法 happens-before 于此线程的每一个动作。

如果两个操作不满足上述任意一个 happens-before 规则，那么这两个操作就没有顺序的保障，JVM 可以对这两个操作进行重排序。

happens-before 和 JMM 什么关系？#

happens-before 与 JMM 的关系用《Java 并发编程的艺术》这本书中的一张图就可以非常好的解释清楚。

再看并发编程三个重要特性#

原子性#

一次操作或者多次操作，要么所有的操作全部都得到执行并且不会受到任何因素的干扰而中断，要么都不执行。

在 Java 中，可以借助synchronized、各种 Lock 以及各种原子类实现原子性。

synchronized 和各种 Lock 可以保证任一时刻只有一个线程访问该代码块，因此可以保障原子性。各种原子类是利用 CAS (compare and swap) 操作（可能也会用到 volatile或者final关键字）来保证原子操作。

可见性#

当一个线程对共享变量进行了修改，那么另外的线程都是立即可以看到修改后的最新值。

在 Java 中，可以借助synchronized、volatile 以及各种 Lock 实现可见性。

如果我们将变量声明为 volatile ，这就指示 JVM，这个变量是共享且不稳定的，每次使用它都到主存中进行读取。

有序性#

由于指令重排序问题，代码的执行顺序未必就是编写代码时候的顺序。

我们上面讲重排序的时候也提到过：

指令重排序可以保证串行语义一致，但是没有义务保证多线程间的语义也一致，所以在多线程下，指令重排序可能会导致一些问题。

在 Java 中，volatile 关键字可以禁止指令进行重排序优化。

2802 字

14 分钟

Java JMM Memory Model

2024-02-01

cs-base

java

doc

meeting

multi-prog

JMM (Java Memory Model) primarily defines the visibility of a shared variable after another thread performs a write operation on that shared variable.

To understand JMM (Java Memory Model) thoroughly, we first need to start with CPU cache models and instruction reordering!

Starting from CPU cache models#

Why do we need a CPU cache? It’s analogous to the caches we use in backend systems (like Redis) to solve the speed mismatch between program processing and accessing a conventional relational database. The CPU cache is to solve the mismatch between CPU processing speed and memory processing speed.

We can even think of memory as a cache for external storage; during program execution we copy data from external storage into memory, and since memory is much faster than external storage, this speeds up processing.

Summary: The CPU cache caches memory data to solve the mismatch between CPU processing speed and memory; the memory cache caches disk data to solve the problem of slow disk access speeds.

Modern CPUs typically have three levels of cache, called L1, L2, L3 cache. Some CPUs may also have an L4 cache.

How the CPU Cache works: first copy data into the CPU Cache; when the CPU needs it, it can read directly from the CPU Cache; after the computation, write the computed data back to Main Memory. However, this can lead to the problem of memory cache coherence! For example, if two threads both perform an i++ operation, and both read i=1 from the CPU Cache, after both increment and write back to Main Memory, i becomes 2, whereas the correct result should be i=3.

To solve memory cache coherence problems, CPUs use cache coherence protocols (for example [MESI protocol]）or other means. This coherence protocol refers to the principles and norms to be followed when CPU caches interact with the main memory. Different CPUs may use different coherence protocols.

Our programs run on top of an operating system, which hides the low-level hardware details and virtualizes resources. Therefore, the OS also needs to address memory cache coherence issues.

The OS defines a set of rules via a Memory Model to address this problem. Whether on Windows or Linux, they have their own memory models.

Instruction reordering#

After discussing the CPU cache model, let’s look at another important concept: instruction reordering.

To improve execution speed/performance, computers may reorder instructions when executing code.

What is instruction reordering? Simply put, the system does not necessarily execute code exactly in the order you wrote.

There are two common situations of instruction reordering:

Compiler optimization reordering: the compiler (including the JVM, JIT compilers, etc.) rearranges the order of statements without changing the semantics of a single-threaded program.
Instruction parallelism reordering: modern processors use instruction-level parallelism (Instruction-Level Parallelism, ILP) to overlap-execute multiple instructions. If there are no data dependencies, the processor can change the execution order of the machine instructions corresponding to statements.

Additionally, the memory system may also “reorder,” but not in the strict sense of real reordering. In the JMM this is manifested as possible inconsistencies between the main memory and local memory, which can lead to issues when programs run across multiple threads.

Java source code goes through a process of compiler optimization reordering → instruction-level parallelism reordering → memory system reordering, eventually becoming the executable instruction sequence for the operating system.

Instruction reordering can preserve serial semantics, but there is no obligation to preserve semantics across multiple threads, so in multithreading, instruction reordering may cause some issues.

Compilers and processors treat instruction reordering differently. For compilers, reordering is prevented by forbidding certain types of compiler reordering. For processors, by inserting memory barriers (Memory Barrier, or sometimes called Memory Fence) to prevent certain types of processor reordering. Instruction-level parallelism reordering and memory system reordering both fall under processor-level instruction reordering.

Memory barrier (Memory Barrier, or sometimes called Memory Fence) is a CPU instruction used to forbid processor instructions from reordering (like a barrier), thus guaranteeing the ordered execution of instructions. In addition, to achieve the barrier effect, it will also cause the processor to write the main memory values into the cache before reads/writes, clear invalid queues, thereby guaranteeing the visibility of variables.

JMM (Java Memory Model)#

What is the JMM? Why do we need the JMM?#

Java was one of the first programming languages to attempt to provide a memory model. Because early memory models had flaws (for example, they could significantly weaken compiler optimizations), starting with Java 5, Java began using a new memory model JSR-133: Java Memory Model and Thread Specification.

Generally, programming languages can reuse the OS memory model directly. However, different operating systems have different memory models. If you reuse the OS memory model directly, the same code might not run on a different OS. Java is cross-platform and thus needs its own memory model to shield system differences.

This is just one reason JMM exists. In fact, for Java, you can think of JMM as a set of specifications defined for concurrent programming. Besides abstracting the relationship between threads and main memory, it also prescribes which concurrency-related principles and rules must be followed in the transformation from Java source code to CPU-executable instructions, with the main goal of simplifying multithreaded programming and enhancing portability.

Why follow these concurrency-related principles and specifications? Because in concurrent programming, designs like CPU multi-level caches and instruction reordering can cause execution issues. For example, the instruction reordering mentioned above may cause problems in multithreaded programs; to address this, the JMM abstracts the happens-before principle (which will be described in detail later).

In short, the JMM defines a set of rules to address these problems, allowing developers to use these rules to develop multithreaded programs more easily. For Java developers, you don’t need to understand the underlying principles; just use some concurrency-related keywords and classes (such as volatile, synchronized, various Locks) to develop thread-safe programs.

How does the JMM abstract the relationship between threads and main memory?#

The Java Memory Model (JMM) abstracts the relationship between threads and main memory, for example, shared variables between threads must reside in main memory.

Before JDK 1.2, the Java memory model implementation always read variables from the main memory (shared memory), without special attention. In the current Java memory model, a thread can keep a variable in local memory (for example, in machine registers) rather than reading/writing directly in main memory. This can cause one thread to modify a variable in main memory while another thread continues to use the copy of the variable in its registers, leading to data inconsistency.

This is very similar to the CPU cache model we discussed above.

What is main memory? What is local memory?

Main memory: All instances created by threads are stored in main memory, whether the instance is a member variable, a local variable, class information, constants, or static variables. To achieve better run-time speed, the VM and hardware may keep working memory in registers and caches.
Local memory: Each thread has a private local memory that stores copies of shared variables for that thread’s reads/writes. Each thread can only operate on its own local memory and cannot directly access other threads’ local memory. If threads need to communicate, they must go through main memory. Local memory is an abstract concept in the JMM; it does not physically exist. It encompasses caches, write buffers, registers, and other hardware and compiler optimizations.

The abstract diagram of the Java Memory Model is as follows:

From the diagram above, if thread 1 and thread 2 want to communicate, they must go through the following two steps:

Thread 1 synchronizes the value of the modified shared variable copy from local memory back to main memory.
Thread 2 reads the corresponding shared variable value from main memory.

That is to say, the JMM provides visibility guarantees for shared variables.

However, in multithreading, operating on a shared variable in main memory can potentially cause thread-safety issues. For example:

Thread 1 and Thread 2 operate on the same shared variable, one performs a modification, the other reads.
Thread 2 might read the value before Thread 1’s modification or after; it’s not certain, because both threads first copy the shared variable from main memory into their working memories.

Regarding the specific interaction protocol between main memory and working memory, i.e., how a variable is copied from main memory to working memory and how it is synchronized back to main memory, the Java Memory Model defines eight synchronization operations (understand them; no need to memorize):

lock: applied to a variable in main memory, marking it as a thread-exclusive variable.
unlock: applied to a variable in main memory, releasing the lock; a variable released from the lock can be locked by other threads.
read: applied to a variable in main memory; it transfers the value of a variable from main memory to the thread’s working memory for subsequent load usage.
load: takes the value obtained by read from main memory and places it into a copy of the variable in working memory.
use: passes the value of a variable in working memory to the execution engine; each time the VM encounters an instruction that uses the variable, this operation is used.
assign: applied to a working memory variable; it assigns to the working memory variable a value received from the execution engine; whenever the VM encounters a bytecode instruction that assigns to a variable, this operation is executed.
store: applied to a working memory variable; it transfers the value of a working memory variable to main memory for subsequent write usage.
write: applied to a variable in main memory; it places the value obtained by the store operation from working memory into the main memory variable.

In addition to these eight synchronization operations, the following synchronization rules are specified to ensure the correct execution of these synchronization operations (understand them; no need to memorize):

A thread is not allowed to synchronize data from its working memory back to main memory without any reason (without any assign operation).
A new variable can only be “born” in main memory; it is not allowed to directly use an uninitialized (load or assign) variable in working memory. In other words, before performing use and store on a variable, you must have performed assign and load.
A variable can be locked by only one thread at the same moment, but a lock operation can be repeated by the same thread multiple times; after performing lock multiple times, only the same number of unlock operations will unlock the variable.
If you perform a lock on a variable, the value of this variable in working memory will be cleared; before the execution engine uses this variable, you need to re-execute load or assign to initialize the variable’s value.
If a variable has not been locked beforehand, unlock operations are not allowed on it, nor unlock a variable that is locked by another thread.

What is the difference between Java memory regions and the JMM?#

Java memory regions and the memory model are two completely different things:

JVM memory structure relates to the runtime areas of the Java Virtual Machine and defines how the JVM partitions and stores program data at runtime; for example, the heap is primarily used to hold object instances.
Java Memory Model relates to Java’s concurrency programming; it abstracts the relationship between threads and main memory and defines the rules and principles to follow when converting from Java source code to CPU-executable instructions, with the aim of simplifying multithreaded programming and improving portability.

What is happens-before?#

The concept of happens-before originated in Leslie Lamport’s 1978 paper “Time, Clocks and the Ordering of Events in a Distributed System”. In this paper, Lamport introduced the concept of logical clocks, which became the first logical clock algorithm. In distributed environments, a set of rules defines the evolution of logical clocks, allowing the ordering of events in a distributed system to be determined by the logical clocks. Logical clocks do not measure time per se; they only distinguish the order of events; in essence, they define a happens-before relationship.

The background of the happens-before concept’s birth mentioned above is not the focus; a quick understanding will do.

JSR 133 introduces the concept of happens-before to describe memory visibility between two operations.

Why is the happens-before principle needed? The happens-before principle was born to balance programmers with compilers and processors. Programmers seek an easily understandable and strongly memory-consistent model by following rules. Compilers and processors seek weaker constraints to optimize performance as much as possible. The design idea of the happens-before principle is really simple:

To minimize constraints on compilers and processors as much as possible, as long as the program’s execution result does not change (single-threaded programs and correctly executed multithreaded programs), compilers and processors can reorder as they please.
For reorders that would change the program’s execution result, the JMM requires compilers and processors to prohibit such reordering.

The following diagram is from the book The Art of Java Concurrency Programming, illustrating the JMM design philosophy.

After understanding the design idea of the happens-before principle, let’s look at JSR-133’s definition of happens-before:

If one operation happens-before another operation, then the result of the first operation will be visible to the second operation, and the first operation’s execution must occur before the second operation.
If there is a happens-before relationship between two operations, it does not mean that the Java platform’s specific implementation must execute them in the exact order specified by happens-before. If the result after reordering is the same as the result when executed according to the happens-before relationship, the JMM also allows such reordering.

We look at the following code:

1
int userNum = getUserNum();   // 1
2
int teacherNum = getTeacherNum();   // 2
3
int totalNum = userNum + teacherNum;  // 3

1 happens-before 2
2 happens-before 3
1 happens-before 3

Although 1 happens-before 2, reordering 1 and 2 does not affect the execution result of the code, so the JMM allows the compiler and processor to reorder them. But 1 and 2 must occur before 3, i.e., 1,2 happens-before 3.

The meaning of the happens-before principle is not just about one operation occurring before another; more accurately, it expresses that the result of the preceding operation is visible to the following operation, regardless of whether they are in the same thread.

For example: Operation 1 happens-before Operation 2; even if Operation 1 and Operation 2 are not in the same thread, the JMM will guarantee that the result of Operation 1 is visible to Operation 2.

There are eight rules for happens-before; not too many, but focus on the five listed below. Memorizing all of them is unlikely and not very useful; you can look them up as needed.

Program order rule: Within a thread, in code order, an operation written earlier happens-before the operation written later;
Unlock rule: unlock happens-before lock;
Volatile variable rule: a write to a volatile variable happens-before a subsequent read of that volatile variable. In short, the effect of a write to a volatile variable is visible to all subsequent operations.
Transitivity rule: If A happens-before B, and B happens-before C, then A happens-before C;
Thread start rule: The start() method of a Thread object happens-before every action in that thread.

If two operations do not satisfy any of the above happens-before rules, there is no ordering guarantee, and the JVM may reorder these two operations.

What is the relationship between happens-before and the JMM?#

The relationship between happens-before and the JMM can be very well explained with a diagram from The Art of Java Concurrency Programming.

Three important properties of concurrent programming#

Atomicity#

An operation or a group of operations must either all complete and not be interrupted by any factor, or none of them execute.

In Java, atomicity can be achieved with synchronized, various Locks, and atomic classes.

synchronized and various Locks guarantee that at any moment only one thread can access the code block, thus providing atomicity. Atomic classes use CAS (compare-and-swap) operations (and may also use volatile or final keywords) to guarantee atomic operations.

Visibility#

When one thread modifies a shared variable, other threads can immediately see the updated value.

In Java, visibility can be achieved with synchronized, volatile, and various Locks.

If we declare a variable as volatile, it tells the JVM that this variable is shared and that every use should read from main memory.

Ordering#

Because of instruction reordering, the execution order of code may not be the same as the order in which it was written.

We mentioned when discussing reordering:

Instruction reordering can preserve serial semantics, but there’s no obligation to preserve semantics across multithreading, so in multithreaded contexts, instruction reordering may cause problems.

In Java, the volatile keyword can prevent instruction reordering optimizations.

7136 字

19 分钟

Java JMMメモリモデル

2024-02-01

cs-base

java

doc

meeting

multi-prog

JMM(Java メモリモデル)#

JMM(Java 内存模型)は、共有変数に対して、別のスレッドがその共有変数へ書き込みを行った後、その変数がそのスレッドに対してどのように可視であるかを定義します。

JMM（Java メモリモデル）を徹底的に理解するには、まず**CPU キャッシュモデルと命令再配置（リオーダリング）**から始める必要があります！

CPU キャッシュモデルから始める#

なぜCPUの高速キャッシュを持つのか？ Webサイトのバックエンドシステムで使うキャッシュ（例えば Redis）を例に挙げると、プログラムの処理速度と従来の関係型データベースのアクセス速度の差を解消するためです。CPUキャッシュは、CPUの処理速度とメモリの処理速度の不一致を解消するためのものです。

私たちはしばしばメモリを外部記憶装置の高速キャッシュとして見ることができます。プログラム実行時に外部記憶のデータをメモリにコピーします。メモリの処理速度は外部記憶より遥かに速いため、処理速度が向上します。

結論：CPUキャッシュはメモリデータをキャッシュしてCPUの処理速度とメモリの不一致を解消するため、メモリキャッシュはハードディスクデータをキャッシュしてディスクアクセス速度の遅さを解消するためのものです。

現代の CPU Cache は通常3層に分かれており、それぞれL1、L2、L3 Cache と呼ばれます。中には L4 Cache を持つ CPU もあります。

CPU Cache の動作原理： 最初にデータを CPU Cache にコピーします。CPU がデータを必要とするときは CPU Cache から直接読み取り、演算が終わったら演算結果を Main Memory に書き戻します。しかし、これにはメモリキャッシュの不整合性の問題が生じます。例えば、私が i++ を実行した場合、2つのスレッドが同時に実行すると、両方のスレッドが CPU Cache から i=1 を読み取り、両方が i++ を実行して Main Memory へ書き戻すと i=2 になってしまい、正しい結果は i=3 になることもあります。

CPU はこのメモリキャッシュ不整合問題を解決するため、キャッシュ一貫性プロトコル（例えば MESI プロトコル）などを用います。 このキャッシュ一貫性プロトコルは、CPU の高速キャッシュと主メモリの間のやり取りを行う際に従うべき原則・規約を指します。CPU によって使用されるキャッシュ一貫性プロトコルは通常異なります。

私たちのプログラムは OS の上で動作しており、OS は下位ハードウェアの操作の詳細を隠蔽し、さまざまなハードウェア資源を仮想化します。したがって、OS も同様にメモリキャッシュの不整合問題を解決する必要があります。

OS は Memory Model（Memory Model）によって一連の規範を定義してこの問題を解決します。Windows でも Linux でも、それぞれ特有のメモリモデルがあります。

命令再配置#

CPU キャッシュモデルの話を終えたら、次に重要な概念である**命令再配置（リオーダリング）**を見ていきます。

実行速度/性能を向上させるため、コンピュータはプログラムコードを実行する際に命令を再配置します。

命令再配置とは何か？ 簡単に言えば、コードを実行する際、書いた順序どおりに逐次実行されるとは限りません。

よくある命令再配置には次の2つがあります：

コンパイラ最適化再配置：コンパイラ（JVM、JIT コンパイラなどを含む）は、単一スレッドのプログラムの意味を変更しない前提のもと、文の実行順序を再配置します。
命令並列再配置：現代のプロセッサは命令レベル並列性（ILP）を用いて複数の命令を重複実行します。データ依存性がなければ、文に対応する機械命令の実行順序を変更できます。

また、メモリシステムにも「再配置」が発生しますが、それは厳密には再配置とは言えません。JMM では、主メモリとローカルメモリの内容が一致しない可能性があり、これがマルチスレッドの実行に問題を生じさせます。

Java のソースコードはコンパイラ最適化再配置 → 命令並列再配置 → メモリシステム再配置の順に経て、最終的に OS が実行可能な命令列へと変換されます。

命令再配置はシリアル意味論の一貫性を保証しますが、マルチスレッド間の意味論が必ずしも一貫することを保証する義務はありません。 したがって、マルチスレッドでは命令再配置が問題を引き起こすことがあります。

コンパイラとプロセッサの命令再配置の扱いは異なります。コンパイラの場合、特定のタイプの再配置を禁止することで再配置を抑制します。プロセッサの場合、Memory Barrier（メモリバリア、時には Memory Fence）を挿入して特定の種類のプロセッサ再配置を禁止します。命令並列再配置とメモリシステム再配置は、いずれもプロセサレベルの命令再配置に該当します。

Memory Barrier（メモリ障壁、時には Memory Fence）とは、CPU の命令で、プロセッサの命令の再配置を禁止し、命令の実行の有順序性を保証します。さらに、屏障の効果を得るために、書き込み/読み取りの前に主メモリの値を高速キャッシュへ書き込み、無効なキューをクリアして、変数の可視性を保証します。

JMM(Java Memory Model)#

何が JMM か？なぜ JMM が必要か？#

Java はメモリモデルを提供することを試みた最も早いプログラミング言語です。初期のメモリモデルには欠陥があり（特にコンパイラの最適化を弱くする要因になりやすい点など）、Java5 以降、Java は新しいメモリモデル《JSR-133：Java Memory Model and Thread Specification》を採用しました。

一般には、プログラミング言語はOSレイヤのメモリモデルを直接再利用することもできます。しかし、OSごとにメモリモデルが異なる場合があり、同じコードが別のOSで動作しなくなる可能性があります。Java はクロスプラットフォーム言語であるため、システム差を吸収するためのメモリモデルを自ら提供します。

これは JMM が存在する理由の一つです。実際には、Java にとって JMM は、並列プログラミングに関連する一連の規範を定義するものと見なせます。スレッドと主内存の関係を抽象化するだけでなく、Java のソースコードから CPU が実行可能な命令へと変換される過程で従うべき並行性関連の原則・規範を定め、その主な目的はマルチスレッドプログラミングを簡素化し、プログラムの移植性を高めることです。

なぜこれらの並行関連の原則と規範を守るのか？ これは、並行プログラミングの下で、CPU の多段キャッシュや命令再配置といった設計がプログラムの実行に問題を引き起こす可能性があるためです。前述の命令再配置がマルチスレッドプログラムの実行を不適切にする可能性があるため、JMM は happens-before 原則（後述で詳説します）を抽象化してこの問題を解決します。

JMM は要するに、これらの問題を解決するための規範を定義しており、開発者はこの規範を活用してマルチスレッドプログラムをより容易に開発できます。Java の開発者にとっては、底層の原理を理解する必要はなく、直接、volatile、synchronized、さまざまな Lock など、並行性に関連するキーワードやクラスを使用して、並行安全なプログラムを開発できます。

JMM はどのようにスレッドと主内存の関係を抽象化するのか？#

Java メモリモデル（JMM） は、スレッドと主内存の関係を抽象化します。例えば、スレッド間で共有変数は主内存に格納されるべきです。

JDK1.2以前は、Java のメモリモデルの実装は常に「主存（共有メモリ）」から変数を読み取るだけで、特別な配慮は必要ありませんでした。しかし、現在の Java メモリモデルでは、スレッドは変数を「ローカルメモリ」（例えば機械のレジスタ）に保存することができ、主内存での直接の読み書きではありません。これにより、あるスレッドが主内存で変数の値を変更したとしても、別のスレッドがまだ自分のレジスタ内のコピーを使い続け、データの不整合を招く可能性があります。

これは前述の CPU キャッシュモデルと非常に似ています。

主内存とは何か？ローカル内存とは何か？

主内存：すべてのスレッドが作成するインスタンスオブジェクトは主内存に格納されます。メンバー変数、ローカル変数、クラス情報、定数、静的変数などは主内存に格納されます。高速化のため、仮想機械とハードウェアは作業メモリをレジスタや高速キャッシュに優先的に格納することがあります。
ローカル内存：各スレッドには私有のローカルメモリがあり、ここにはそのスレッドが共有変数を読み書きするコピーが格納されます。各スレッドは自分のローカル内存内の変数しか操作できず、他のスレッドのローカル内存には直接アクセスできません。スレッド間の通信が必要な場合は主内存を介します。ローカル内存は JMM が抽象的に示した概念で、実在しません。キャッシュ、書き込みバッファ、レジスタ、その他のハードウェア・コンパイラ最適化を含みます。

Java メモリモデルの抽象図は以下のとおりです：

上の図を見ると、スレッド1とスレッド2の間で通信を行う場合、以下の2つのステップを経る必要があります：

スレッド1 がローカルメモリで修正した共有変数のコピーの値を主内存に同期します。
スレッド2 が主メモリから対応する共有変数の値を読み取ります。

すなわち、JMM は共有変数の可視性を保証します。

ただし、マルチスレッドでは、主内存の共有変数を操作するとスレッドセーフの問題を引き起こすことがあります。例えば：

スレッド1 とスレッド2 が同じ共有変数を別々に操作し、1つは値を変更し、もう1つは読み取る。
スレッド2 が読み取る値が、スレッド1 が変更する前の値なのか、変更後の値なのかは不確実です。どちらも起こり得ます。なぜなら、スレッド1とスレッド2はともに、共有変数を主内存から自分のワークメモリにコピーしているからです。

主内存とワークメモリの具体的な相互作用プロトコル、すなわち変数を主内存からワークメモリへコピーし、ワークメモリから主内存へ同期する実装の詳細について、Java メモリモデルは以下の8つの同期操作を定義します（理解しておけば十分で、丸暗記は不要です）：

lock（ロック）：主内存上の変数に作用し、それをスレッドの独占変数としてマークします。
unlock（アンロック）：主内存上の変数に作用し、その変数のロック状態を解除します。ロックが解除された変数は他のスレッドによってロックされることができます。
read（読み取り）：主内存上の変数に作用し、その変数の値を主内存からスレッドのワークメモリへ転送して、後続の load 操作で使用します。
load（ロード）：read 操作で主内存から得た変数値を、ワークメモリ内の変数のコピーに置きます。
use（使用）：ワークメモリ内の変数の値を実行エンジンに渡します。仮想マシンが変数を使用する命令に遭遇するたびにこの操作を行います。
assign（代入）：ワークメモリの変数に作用し、実行エンジンから受け取った値をワークメモリの変数へ代入します。仮想マシンが変数へ値を代入するバイトコード指令に遭遇するたびにこの操作を実行します。
store（ストア）：ワークメモリの変数に作用し、ワークメモリの変数の値を主内存へ転送して、後続の write 操作で使用します。
write（書き込み）：主内存上の変数に作用し、store 操作でワークメモリから得た値を主内存の変数へ格納します。

この8つの同期操作に加え、これらの同期操作を正しく実行することを保証する以下の同期規則が規定されています（理解しておけば十分で、丸暗記は不要です）：

あるスレッドが、根拠なく（assign 操作が発生したことがない状態で）ワークメモリから主内 memory へデータを同期することは許されません。
新しい変数は主内メモリでのみ「誕生」し、ワークメモリ内で初期化されていない（load または assign されていない）変数を直接使用することは許されません。つまり、use と store を実行する前に assign と load を先に実行する必要があります。
同じ時刻に、1つの変数に対しては1つのスレッドのみが lock 操作を行えますが、lock 操作は同一スレッドで複数回実行可能です。複数回 lock した場合、同じ回数だけ unlock を実行するときにのみ変数のロックが解かれます。
もしある変数に lock 操作を実行すると、その変数のワークメモリにある値はクリアされます。実行エンジンがこの変数を使用する前に、再度 load か assign 操作を行って値を初期化する必要があります。
事前に lock 操作でロックされていない変数に対して unlock を実行することは許されません。また、他のスレッドによりロックされている変数を unlock することも許されません。

Java 内存区域と JMM の違いは？#

Java メモリ区域とメモリモデルは、全く別の2つの概念です。

JVM 内のメモリ構造は、Java 仮想マシンの実行時エリアに関連し、JVM が実行時にデータをどのように領域分割して格納するかを定義します。たとえば、ヒープはオブジェクトのインスタンスを主に格納します。
Java メモリモデルは Java の並行プログラミングと関連し、スレッドと主内 Memory の関係を抽象化します。例えば、スレッド間の共有変数は主内 Memory に格納されるべきと規定し、Java のソースコードから CPU が実行可能な命令へと変換される過程で従うべき並行性関連の原則・規範を定め、主な目的はマルチスレッドプログラミングを簡素化し、プログラムの可搬性を高めることです。

happens-before 原則とは？#

happens-before という概念は、Leslie Lamport が 1978 年に発表した論文「Time, Clocks and the Ordering of Events in a Distributed System」に端を発します。この論文で Lamport は論理時計の概念を提案し、これが最初の論理時計アルゴリズムとなりました。分散環境では、論理時計の変化を規則の連なりとして定義し、論理時計を用いて分散システム内のイベントの前後関係を判断します。論理時計自体は時間そのものを測定するものではなく、イベント発生の前後関係を区別するだけであり、本質的にはhappens-before関係を定義します。

上記で述べた happens-before の背景は重要ではなく、簡単に理解できれば十分です。

JSR-133 は happens-before の概念を導入して、2つの操作間のメモリ可視性を記述します。

なぜ happens-before 原則が必要か？ happens-before 原則の誕生は、プログラマとコンパイラ・プロセッサのバランスを取るためです。プログラマは理解しやすい強いメモリモデルを求め、既定の規則に従ってコードを記述します。コンパイラとプロセッサは、制約を緩くした弱いメモリモデルを追求して性能を最大化します。happens-before の設計思想は非常にシンプルです：

できるだけ少ない制約で、プログラムの実行結果を変えない範囲で、コンパイラとプロセッサが再配置の最適化を行ってもよい。
実行結果を変える再配置には、JMM が禁止する。

下面の図は『Java Concurrency in Practice』の一例です。

happens-before の設計思想を理解したうえで、JSR-133 における happens-before の定義は以下のとおりです：

ある操作が別の操作に対して happens-before であるなら、最初の操作の実行結果は第2の操作に可視であり、最初の操作の実行順序は第2の操作より前である。
2つの操作の間に happens-before 関係が存在しても、Java プラットフォームの具体的な実装が必ず happens-before に指定された順序で実行されるとは限らない。再配置後の実行結果が、happens-before に従って実行した結果と一致すれば、JMM もその再配置を許容する。

次のコードを見てください：

1
int userNum = getUserNum();   // 1
2
int teacherNum = getTeacherNum();   // 2
3
int totalNum = userNum + teacherNum;  // 3

1 happens-before 2
2 happens-before 3
1 happens-before 3

1 は 2 に対して happens-before だが、1 と 2 の再配置がコードの実行結果に影響を与えない場合、JMM はコンパイラとプロセッサがこの再配置を実行することを許します。しかし、1 と 2 は 3 を実行する前でなければならず、すなわち 1,2 は 3 に対して happens-before です。

happens-before 原則の意味するところは、ある操作が別の操作の前に発生することではなく、前者の結果が後者に可視であることを表すことです。たとえ2つの操作が同じスレッドにあるかどうかは関係ありません。

例：操作 1 が操作 2 に対して happens-before である場合、操作 1 と操作 2 が同じスレッドにない場合でも、操作 1 の結果は操作 2 に可視です。

happens-before の一般的な規則は何ですか？あなたの理解を述べてください？#

happens-before の規則は8つありますが、ここでは重要な5つを押さえてください。全てを覚えるのは難しく、すぐ忘れてしまいます。必要時に参照してください。

プログラム順序規則：同一スレッド内で、コードの順序で前に書かれた操作は、後に書かれた操作に対して happens-before を持つ。
アンロック規則：アンロックはロックに対して happens-before を持つ。
volatile 変数規則：volatile 変数への書き込みは、その volatile 変数を後続で読む操作に対して happens-before を持つ。要するに volatile 変数の書き込みの結果は、それ以降のあらゆる操作に可視となる。
推移規則：A が B に対して happens-before で、B が C に対して happens-before なら、A は C に対しても happens-before を持つ。
スレッド開始規則：Thread オブジェクトの start() は、そのスレッドの各アクションに happens-before を持つ。

もし2つの操作が上記のいずれの happens-before 規則にも当てはまらない場合、それらの操作には順序の保証がなく、JVM はこれらの操作を再配置できます。

happens-before と JMM の関係は？#

happens-before と JMM の関係は、次の図ですべて説明できます。

再度見るべき並行プログラミングの三つの重要な特性#

原子性#

1回の操作、または複数回の操作が、全て実行されるか、あるいは全く実行されずに終わるかのいずれかであり、途中で中断されません。

Java には、synchronized、各種 Lock、さまざまな原子クラスを使って原子性を実現します。

synchronized と各種 Lock は、いっ時点で1つのスレッドだけがそのコードブロックへアクセスできることを保証します。従って原子性を保証します。各種原子クラスは、CAS（compare and swap）操作を利用して原子操作を保証します（場合によっては volatile または final キーワードも使われます）。

可視性#

あるスレッドが共有変数を変更すると、他のスレッドは直ちに変更後の最新値を見ることができます。

Java では、synchronized、volatile、さまざまな Lock を用いて可視性を実現します。

変数を volatile に宣言すると、この変数が共有され、値が頻繁に変わる可能性があることを JVM に示します。したがって、この変数を使用するたびに主メモリから読み取られます。

有序性#

命令再配置の問題により、コードの実行順序は、コードを書いたときの順序と必ずしも一致しません。

先述の命令再配置の説明でも触れましたが：

命令再配置はシリアル意味論の一貫性を保証しますが、マルチスレッド間の意味論が必ずしも一貫することを保証する義務はありません。

Java では、volatile キーワードが命令の再配置最適化を禁止します。

Java JMM内存模型

https://dreaife.tokyo/posts/java-jmm-memory/

作者

dreaife

发布于

2024-02-01

许可协议

CC BY-NC-SA 4.0

部分信息可能已经过时

Java线程池使用

java并发编程

dreaife的休憩小栈

从 CPU 缓存模型说起#

指令重排序#

JMM(Java Memory Model)#

什么是 JMM？为什么需要 JMM？#

JMM 是如何抽象线程和主内存之间的关系？#

Java 内存区域和 JMM 有何区别？#

happens-before 原则是什么？#

happens-before 常见规则有哪些？谈谈你的理解？#

happens-before 和 JMM 什么关系？#

再看并发编程三个重要特性#

原子性#

可见性#

有序性#

Starting from CPU cache models#

Instruction reordering#

JMM (Java Memory Model)#

What is the JMM? Why do we need the JMM?#

How does the JMM abstract the relationship between threads and main memory?#

What is the difference between Java memory regions and the JMM?#

What is happens-before?#

What are the common happens-before rules? Share your understanding.#

What is the relationship between happens-before and the JMM?#

Three important properties of concurrent programming#

Atomicity#

Visibility#

Ordering#

JMM(Java メモリモデル)#

CPU キャッシュモデルから始める#

命令再配置#

JMM(Java Memory Model)#

何が JMM か？なぜ JMM が必要か？#

JMM はどのようにスレッドと主内存の関係を抽象化するのか？#

Java 内存区域と JMM の違いは？#

happens-before 原則とは？#

happens-before の一般的な規則は何ですか？あなたの理解を述べてください？#

happens-before と JMM の関係は？#

再度見るべき並行プログラミングの三つの重要な特性#

原子性#

可視性#

有序性#