可见性问题

On multiprocessor systems (which are now appearing in the form of multicore processors— multiple CPUs on a single chip), visibility rather than atomicity is much more of an issue than on single-processor systems. Changes made by one task, even if they’re atomic in the sense of not being interruptible, might not be visible to other tasks (the changes might be temporarily stored in a local processor cache, for example), so different tasks will have a different view of the application’s state. The synchronization mechanism, on the other hand, forces changes by one task on a multiprocessor system to be visible across the application. Without synchronization, it’s indeterminate when changes become visible.

The volatile keyword also ensures visibility across the application. If you declare a field to be volatile, this means that as soon as a write occurs for that field, all reads will see the change. This is true even if local caches are involved—volatile fields are immediately written through to main memory, and reads occur from main memory.

可见性与原子性的区别

It’s important to understand that atomicity and volatility are distinct concepts. An atomic operation on a non-volatile field will not necessarily be flushed to main memory, and so another task that reads that field will not necessarily see the new value. If multiple tasks are accessing a field, that field should be volatile; otherwise, the field should only be accessed via synchronization. Synchronization also causes flushing to main memory, so if a field is completely guarded by synchronized methods or blocks, it is not necessary to make it volatile.

什么时候可以用volatile

It’s typically only safe to use volatile instead of synchronized if the class has only one mutable field. Again, your first choice should be to use the synchronized keyword—that’s the safest approach, and trying to do anything else is risky.

What qualifies as an atomic operation? Assignment and returning the value in a field will usually be atomic. However, in C++ even the following might be atomic:

volatile 源码实现

➜ cat Atom.java
public class Atom {
	volatile int i;
	void f1() {i++;}
	void f2() {i+=3;}
}

➜ javap -v Atom
Classfile /Users/nuc/tmp/Atom.class
  Last modified 2022-8-5; size 316 bytes
  MD5 checksum 2ccff8833c7994578e8f4e409da841a0
  Compiled from "Atom.java"
public class Atom
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #4.#15         // java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#16         // Atom.i:I
   #3 = Class              #17            // Atom
   #4 = Class              #18            // java/lang/Object
   #5 = Utf8               i
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               f1
  #12 = Utf8               f2
  #13 = Utf8               SourceFile
  #14 = Utf8               Atom.java
  #15 = NameAndType        #7:#8          // "<init>":()V
  #16 = NameAndType        #5:#6          // i:I
  #17 = Utf8               Atom
  #18 = Utf8               java/lang/Object
{
  volatile int i;
    descriptor: I
    flags: ACC_VOLATILE

  public Atom();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 1: 0

  void f1();
    descriptor: ()V
    flags:
    Code:
      stack=3, locals=1, args_size=1
         0: aload_0
         1: dup
         2: getfield      #2                  // Field i:I
         5: iconst_1
         6: iadd
         7: putfield      #2                  // Field i:I
        10: return
      LineNumberTable:
        line 3: 0

  void f2();
    descriptor: ()V
    flags:
    Code:
      stack=3, locals=1, args_size=1
         0: aload_0
         1: dup
         2: getfield      #2                  // Field i:I
         5: iconst_3
         6: iadd
         7: putfield      #2                  // Field i:I
        10: return
      LineNumberTable:
        line 4: 0
}
SourceFile: "Atom.java"

可以看到，volatile 字段被标记为 ACC_VOLATILE，其他并没有不同（在字节码层面没有插入别的操作）。核心操作就是 get/add/put。

看hotstop中putfield的实现，

//src/cpu/x86/vm/templateTable_x86_64.cpp
void TemplateTable::putfield(int byte_no) {
  putfield_or_static(byte_no, false);
}

void TemplateTable::putfield_or_static(int byte_no, bool is_static) {
  transition(vtos, vtos);
...
volatile_barrier(Assembler::Membar_mask_bits(Assembler::StoreLoad |
                                               Assembler::StoreStore));

void TemplateTable::volatile_barrier(Assembler::Membar_mask_bits order_constraint ) {
  // Helper function to insert a is-volatile test and memory barrier
  if( !os::is_MP() ) return;    // Not needed on single CPU
  __ membar(order_constraint);
}

// Serializes memory and blows flags
  void membar(Membar_mask_bits order_constraint) {
    if (os::is_MP()) {
      // We only have to handle StoreLoad
      if (order_constraint & StoreLoad) {
        // All usable chips support "locked" instructions which suffice
        // as barriers, and are much faster than the alternative of
        // using cpuid instruction. We use here a locked add [esp],0.
        // This is conveniently otherwise a no-op except for blowing
        // flags.
        // Any change to this code may need to revisit other places in
        // the code where this idiom is used, in particular the
        // orderAccess code.
        lock();
        addl(Address(rsp, 0), 0);// Assert the lock# signal here
      }
    }
  }

最终是通过在写数据的时候（putfield)，加上lock addl实现的。

关于 lock指令
Intel 64 and IA32 Volume 2:
Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the 
instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the 
processor has exclusive use of any shared memory while the signal is asserted.

关于 addl(Address(rsp, 0), 0);
给rsp寄存器加0

如上，保证了被volatile标记的字段被更新时，会加锁（保证其他CPU不能访问），并把更新写回到主内存。