11826 字

32 分钟

java集合知识

2024-01-26

cs-base

java

meeting

doc

集合概述#

Java 集合概览#

Java 集合，也叫作容器，主要是由两大接口派生而来：一个是 Collection接口，主要用于存放单一元素；另一个是 Map 接口，主要用于存放键值对。对于Collection 接口，下面又有三个主要的子接口：List、Set 和 Queue。

说说 List, Set, Queue, Map 四者的区别？#

List(对付顺序的好帮手): 存储的元素是有序的、可重复的。
Set(注重独一无二的性质): 存储的元素不可重复的。
Queue(实现排队功能的叫号机): 按特定的排队规则来确定先后顺序，存储的元素是有序的、可重复的。
Map(用 key 来搜索的专家): 使用键值对（key-value）存储，类似于数学上的函数 y=f(x)，“x” 代表 key，“y” 代表 value，key 是无序的、不可重复的，value 是无序的、可重复的，每个键最多映射到一个值。

集合框架底层数据结构总结#

List#

ArrayList：Object[] 数组。
Vector：Object[] 数组。
LinkedList：双向链表(JDK1.6 之前为循环链表，JDK1.7 取消了循环)。

Set#

HashSet(无序，唯一): 基于 HashMap 实现的，底层采用 HashMap 来保存元素。
LinkedHashSet: LinkedHashSet 是 HashSet 的子类，并且其内部是通过 LinkedHashMap 来实现的。
TreeSet(有序，唯一): 红黑树(自平衡的排序二叉树)。

Queue#

PriorityQueue: Object[] 数组来实现小顶堆。
DelayQueue。
ArrayDeque: 可扩容动态双向数组。

Map#

HashMap：JDK1.8 之前 HashMap 由数组+链表组成的，数组是 HashMap 的主体，链表则是主要为了解决哈希冲突而存在的（“拉链法”解决冲突）。JDK1.8 以后在解决哈希冲突时有了较大的变化，当链表长度大于阈值（默认为 8）（将链表转换成红黑树前会判断，如果当前数组的长度小于 64，那么会选择先进行数组扩容，而不是转换为红黑树）时，将链表转化为红黑树，以减少搜索时间。
LinkedHashMap：LinkedHashMap 继承自 HashMap，所以它的底层仍然是基于拉链式散列结构即由数组和链表或红黑树组成。另外，LinkedHashMap 在上面结构的基础上，增加了一条双向链表，使得上面的结构可以保持键值对的插入顺序。同时通过对链表进行相应的操作，实现了访问顺序相关逻辑。
Hashtable：数组+链表组成的，数组是 Hashtable 的主体，链表则是主要为了解决哈希冲突而存在的。
TreeMap：红黑树（自平衡的排序二叉树）。

如何选用集合?#

我们主要根据集合的特点来选择合适的集合。

我们需要根据键值获取到元素值时就选用 Map 接口下的集合，需要排序时选择 TreeMap,不需要排序时就选择 HashMap,需要保证线程安全就选用 ConcurrentHashMap。
我们只需要存放元素值时，就选择实现Collection 接口的集合，需要保证元素唯一时选择实现 Set 接口的集合比如 TreeSet 或 HashSet，不需要就选择实现 List 接口的比如 ArrayList 或 LinkedList，然后再根据实现这些接口的集合的特点来选用。

为什么要使用集合？#

当我们需要存储一组类型相同的数据时，数组是最常用且最基本的容器之一。但是，使用数组存储对象存在一些不足之处，因为在实际开发中，存储的数据类型多种多样且数量不确定。这时，Java 集合就派上用场了。与数组相比，Java 集合提供了更灵活、更有效的方法来存储多个数据对象。Java 集合框架中的各种集合类和接口可以存储不同类型和数量的对象，同时还具有多样化的操作方式。相较于数组，Java 集合的优势在于它们的大小可变、支持泛型、具有内建算法等。总的来说，Java 集合提高了数据的存储和处理灵活性，可以更好地适应现代软件开发中多样化的数据需求，并支持高质量的代码编写。

List#

ArrayList 和 Array（数组）的区别？#

ArrayList 内部基于动态数组实现，比 Array（静态数组）使用起来更加灵活：

ArrayList会根据实际存储的元素动态地扩容或缩容，而 Array 被创建之后就不能改变它的长度了。
ArrayList 允许你使用泛型来确保类型安全，Array 则不可以。
ArrayList 中只能存储对象。对于基本类型数据，需要使用其对应的包装类（如 Integer、Double 等）。Array 可以直接存储基本类型数据，也可以存储对象。
ArrayList 支持插入、删除、遍历等常见操作，并且提供了丰富的 API 操作方法，比如 add()、remove()等。Array 只是一个固定长度的数组，只能按照下标访问其中的元素，不具备动态添加、删除元素的能力。
ArrayList创建时不需要指定大小，而Array创建时必须指定大小。

ArrayList 和 Vector 的区别?#

ArrayList 是 List 的主要实现类，底层使用 Object[]存储，适用于频繁的查找工作，线程不安全。
Vector 是 List 的古老实现类，底层使用Object[] 存储，线程安全。

Vector 和 Stack 的区别?#

Vector 和 Stack 两者都是线程安全的，都是使用 synchronized 关键字进行同步处理。
Stack 继承自 Vector，是一个后进先出的栈，而 Vector 是一个列表。

随着 Java 并发编程的发展，Vector 和 Stack 已经被淘汰，推荐使用并发集合类（例如 ConcurrentHashMap、CopyOnWriteArrayList 等）或者手动实现线程安全的方法来提供安全的多线程操作支持。

ArrayList 可以添加 null 值吗？#

ArrayList 中可以存储任何类型的对象，包括 null 值。不过，不建议向ArrayList 中添加 null 值， null 值无意义，会让代码难以维护比如忘记做判空处理就会导致空指针异常。

1
ArrayList<String> listOfStrings = new ArrayList<>();
2
listOfStrings.add(null);
3
listOfStrings.add("java");
4
System.out.println(listOfStrings);

ArrayList 插入和删除元素的时间复杂度？#

对于插入
- 头部插入：由于需要将所有元素都依次向后移动一个位置，因此时间复杂度是 O(n)。
- 尾部插入：当 ArrayList 的容量未达到极限时，往列表末尾插入元素的时间复杂度是 O(1)，因为它只需要在数组末尾添加一个元素即可；当容量已达到极限并且需要扩容时，则需要执行一次 O(n) 的操作将原数组复制到新的更大的数组中，然后再执行 O(1) 的操作添加元素。
- 指定位置插入：需要将目标位置之后的所有元素都向后移动一个位置，然后再把新元素放入指定位置。这个过程需要移动平均 n/2 个元素，因此时间复杂度为 O(n)。
对于删除：
- 头部删除：由于需要将所有元素依次向前移动一个位置，因此时间复杂度是 O(n)。
- 尾部删除：当删除的元素位于列表末尾时，时间复杂度为 O(1)。
- 指定位置删除：需要将目标元素之后的所有元素向前移动一个位置以填补被删除的空白位置，因此需要移动平均 n/2 个元素，时间复杂度为 O(n)。

LinkedList 插入和删除元素的时间复杂度？#

头部插入/删除：只需要修改头结点的指针即可完成插入/删除操作，因此时间复杂度为 O(1)。
尾部插入/删除：只需要修改尾结点的指针即可完成插入/删除操作，因此时间复杂度为 O(1)。
指定位置插入/删除：需要先移动到指定位置，再修改指定节点的指针完成插入/删除，因此需要移动平均 n/2 个元素，时间复杂度为 O(n)。

这里简单列举一个例子：假如我们要删除节点 9 的话，需要先遍历链表找到该节点。然后，再执行相应节点指针指向的更改

LinkedList 为什么不能实现 RandomAccess 接口？#

RandomAccess 是一个标记接口，用来表明实现该接口的类支持随机访问（即可以通过索引快速访问元素）。由于 LinkedList 底层数据结构是链表，内存地址不连续，只能通过指针来定位，不支持随机快速访问，所以不能实现 RandomAccess 接口。

ArrayList 与 LinkedList 区别?#

是否保证线程安全： ArrayList 和 LinkedList 都是不同步的，也就是不保证线程安全；
底层数据结构： ArrayList 底层使用的是 Object 数组；LinkedList 底层使用的是双向链表数据结构（JDK1.6 之前为循环链表，JDK1.7 取消了循环。注意双向链表和双向循环链表的区别）
插入和删除是否受元素位置的影响：
- ArrayList 采用数组存储，所以插入和删除元素的时间复杂度受元素位置的影响。比如：执行add(E e)方法的时候， ArrayList 会默认在将指定的元素追加到此列表的末尾，这种情况时间复杂度就是 O(1)。但是如果要在指定位置 i 插入和删除元素的话（add(int index, E element)），时间复杂度就为 O(n)。因为在进行上述操作的时候集合中第 i 和第 i 个元素之后的(n-i)个元素都要执行向后位/向前移一位的操作。
- LinkedList 采用链表存储，所以在头尾插入或者删除元素不受元素位置的影响（add(E e)、addFirst(E e)、addLast(E e)、removeFirst()、 removeLast()），时间复杂度为 O(1)，如果是要在指定位置 i 插入和删除元素的话（add(int index, E element)，remove(Object o),remove(int index)），时间复杂度为 O(n) ，因为需要先移动到指定位置再插入和删除。
是否支持快速随机访问： LinkedList 不支持高效的随机元素访问，而 ArrayList（实现了 RandomAccess 接口）支持。快速随机访问就是通过元素的序号快速获取元素对象(对应于get(int index)方法)。
内存空间占用： ArrayList 的空间浪费主要体现在在 list 列表的结尾会预留一定的容量空间，而 LinkedList 的空间花费则体现在它的每一个元素都需要消耗比 ArrayList 更多的空间（因为要存放直接后继和直接前驱以及数据）。

我们在项目中一般是不会使用到 LinkedList 的，需要用到 LinkedList 的场景几乎都可以使用 ArrayList 来代替，并且，性能通常会更好

另外，不要下意识地认为 LinkedList 作为链表就最适合元素增删的场景。我在上面也说了，LinkedList 仅仅在头尾插入或者删除元素的时候时间复杂度近似 O(1)，其他情况增删元素的平均时间复杂度都是 O(n) 。

补充内容: 双向链表和双向循环链表

双向链表：包含两个指针，一个 prev 指向前一个节点，一个 next 指向后一个节点。

双向循环链表：最后一个节点的 next 指向 head，而 head 的 prev 指向最后一个节点，构成一个环。

补充内容接口
1
public interface RandomAccess {}
查看源码我们发现实际上 RandomAccess 接口中什么都没有定义。所以，在我看来 RandomAccess 接口不过是一个标识罢了。标识什么？标识实现这个接口的类具有随机访问功能。在 binarySearch（) 方法中，它要判断传入的 list 是否 RandomAccess 的实例，如果是，调用indexedBinarySearch()方法，如果不是，那么调用iteratorBinarySearch()方法
1
public static <T>
2
int binarySearch(List<? extends Comparable<? super T>> list, T key) {
3
    if (list instanceof RandomAccess || list.size()<BINARYSEARCH_THRESHOLD)
4
        return Collections.indexedBinarySearch(list, key);
5
    else
6
        return Collections.iteratorBinarySearch(list, key);
7
}
ArrayList 实现了 RandomAccess 接口，而 LinkedList 没有实现。为什么呢？我觉得还是和底层数据结构有关！ArrayList 底层是数组，而 LinkedList 底层是链表。数组天然支持随机访问，时间复杂度为 O(1)，所以称为快速随机访问。链表需要遍历到特定位置才能访问特定位置的元素，时间复杂度为 O(n)，所以不支持快速随机访问。ArrayList 实现了 RandomAccess 接口，就表明了他具有快速随机访问功能。 RandomAccess 接口只是标识，并不是说 ArrayList 实现 RandomAccess 接口才具有快速随机访问功能的！

说一说 ArrayList 的扩容机制吧#

先从 ArrayList 的构造函数说起#

ArrayList 有三种方式来初始化，构造方法源码如下（JDK8）：

1
/**
2
 * 默认初始容量大小
3
 */
4
private static final int DEFAULT_CAPACITY = 10;
5

6
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
7

8
/**
9
 * 默认构造函数，使用初始容量10构造一个空列表(无参数构造)
10
 */
11
public ArrayList() {
12
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
13
}
14

15
/**
16
 * 带初始容量参数的构造函数。（用户自己指定容量）
17
 */
18
public ArrayList(int initialCapacity) {
19
    if (initialCapacity > 0) {//初始容量大于0
20
        //创建initialCapacity大小的数组
21
        this.elementData = new Object[initialCapacity];
22
    } else if (initialCapacity == 0) {//初始容量等于0
23
        //创建空数组
24
        this.elementData = EMPTY_ELEMENTDATA;
25
    } else {//初始容量小于0，抛出异常
26
        throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
27
    }
28
}
29

30
/**
31
 *构造包含指定collection元素的列表，这些元素利用该集合的迭代器按顺序返回
32
 *如果指定的集合为null，throws NullPointerException。
33
 */
34
public ArrayList(Collection<? extends E> c) {
35
    elementData = c.toArray();
36
    if ((size = elementData.length) != 0) {
37
        // c.toArray might (incorrectly) not return Object[] (see 6260652)
38
        if (elementData.getClass() != Object[].class)
39
            elementData = Arrays.copyOf(elementData, size, Object[].class);
40
    } else {
41
        // replace with empty array.
42
        this.elementData = EMPTY_ELEMENTDATA;
43
    }
44
}

以无参数构造方法创建 ArrayList 时，实际上初始化赋值的是一个空数组。当真正对数组进行添加元素操作时，才真正分配容量。即向数组中添加第一个元素时，数组容量扩为 10。

补充：JDK6 new 无参构造的 ArrayList 对象时，直接创建了长度是 10 的 Object[] 数组 elementData 。

一步一步分析 ArrayList 扩容机制#

这里以无参构造函数创建的 ArrayList 为例分析。add 方法

1
/**
2
* 将指定的元素追加到此列表的末尾。
3
*/
4
public boolean add(E e) {
5
    // 加元素之前，先调用ensureCapacityInternal方法
6
    ensureCapacityInternal(size + 1);  // Increments modCount!!
7
    // 这里看到ArrayList添加元素的实质就相当于为数组赋值
8
    elementData[size++] = e;
9
    return true;
10
}

注意：JDK11 移除了 ensureCapacityInternal() 和 ensureExplicitCapacity() 方法

ensureCapacityInternal 方法的源码如下：

1
// 根据给定的最小容量和当前数组元素来计算所需容量。
2
private static int calculateCapacity(Object[] elementData, int minCapacity) {
3
    // 如果当前数组元素为空数组（初始情况），返回默认容量和最小容量中的较大值作为所需容量
4
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
5
        return Math.max(DEFAULT_CAPACITY, minCapacity);
6
    }
7
    // 否则直接返回最小容量
8
    return minCapacity;
9
}
10

11
// 确保内部容量达到指定的最小容量。
12
private void ensureCapacityInternal(int minCapacity) {
13
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
14
}
15

16
//判断是否需要扩容
17
private void ensureExplicitCapacity(int minCapacity) {
18
    modCount++;
19
    //判断当前数组容量是否足以存储minCapacity个元素
20
    if (minCapacity - elementData.length > 0)
21
        //调用grow方法进行扩容
22
        grow(minCapacity);
23
}

我们来仔细分析一下：

当我们要 add 进第 1 个元素到 ArrayList 时，elementData.length 为 0 （因为还是一个空的 list），因为执行了 ensureCapacityInternal() 方法，所以 minCapacity 此时为 10。此时，minCapacity - elementData.length > 0成立，所以会进入 grow(minCapacity) 方法。
当 add 第 2 个元素时，minCapacity 为 2，此时 elementData.length(容量)在添加第一个元素后扩容成 10 了。此时，minCapacity - elementData.length > 0 不成立，所以不会进入（执行）grow(minCapacity) 方法。
直到添加第 11 个元素，minCapacity(为 11)比 elementData.length（为 10）要大。进入 grow 方法进行扩容。

grow 方法

1
/**
2
 * 要分配的最大数组大小
3
 */
4
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
5

6
/**
7
 * ArrayList扩容的核心方法。
8
 */
9
private void grow(int minCapacity) {
10
    // oldCapacity为旧容量，newCapacity为新容量
11
    int oldCapacity = elementData.length;
12
    // 将oldCapacity 右移一位，其效果相当于oldCapacity /2，
13
    // 我们知道位运算的速度远远快于整除运算，整句运算式的结果就是将新容量更新为旧容量的1.5倍，
14
    int newCapacity = oldCapacity + (oldCapacity >> 1);
15

16
    // 然后检查新容量是否大于最小需要容量，若还是小于最小需要容量，那么就把最小需要容量当作数组的新容量，
17
    if (newCapacity - minCapacity < 0)
18
        newCapacity = minCapacity;
19

20
    // 如果新容量大于 MAX_ARRAY_SIZE,进入(执行) `hugeCapacity()` 方法来比较 minCapacity 和 MAX_ARRAY_SIZE，
21
    // 如果minCapacity大于最大容量，则新容量则为`Integer.MAX_VALUE`，否则，新容量大小则为 MAX_ARRAY_SIZE 即为 `Integer.MAX_VALUE - 8`。
22
    if (newCapacity - MAX_ARRAY_SIZE > 0)
23
        newCapacity = hugeCapacity(minCapacity);
24

25
    // minCapacity is usually close to size, so this is a win:
26
    elementData = Arrays.copyOf(elementData, newCapacity);
27
}

int newCapacity = oldCapacity + (oldCapacity >> 1),所以 ArrayList 每次扩容之后容量都会变为原来的 1.5 倍左右（oldCapacity 为偶数就是 1.5 倍，否则是 1.5 倍左右）

我们再来通过例子探究一下grow() 方法：

当 add 第 1 个元素时，oldCapacity 为 0，经比较后第一个 if 判断成立，newCapacity = minCapacity(为 10)。但是第二个 if 判断不会成立，即 newCapacity 不比 MAX_ARRAY_SIZE 大，则不会进入 hugeCapacity 方法。数组容量为 10，add 方法中 return true,size 增为 1。
当 add 第 11 个元素进入 grow 方法时，newCapacity 为 15，比 minCapacity（为 11）大，第一个 if 判断不成立。新容量没有大于数组最大 size，不会进入 hugeCapacity 方法。数组容量扩为 15，add 方法中 return true,size 增为 11。

这里补充一点比较重要，但是容易被忽视掉的知识点：

Java 中的 length属性是针对数组说的,比如说你声明了一个数组,想知道这个数组的长度则用到了 length 这个属性.
Java 中的 length() 方法是针对字符串说的,如果想看这个字符串的长度则用到 length() 这个方法.
Java 中的 size() 方法是针对泛型集合说的,如果想看这个泛型有多少个元素,就调用此方法来查看!

hugeCapacity() 方法

从上面 grow() 方法源码我们知道：如果新容量大于 MAX_ARRAY_SIZE,进入(执行) hugeCapacity() 方法来比较 minCapacity 和 MAX_ARRAY_SIZE，如果 minCapacity 大于最大容量，则新容量则为Integer.MAX_VALUE，否则，新容量大小则为 MAX_ARRAY_SIZE 即为 Integer.MAX_VALUE - 8。

1
private static int hugeCapacity(int minCapacity) {
2
    if (minCapacity < 0) // overflow
3
        throw new OutOfMemoryError();
4
    // 对minCapacity和MAX_ARRAY_SIZE进行比较
5
    // 若minCapacity大，将Integer.MAX_VALUE作为新数组的大小
6
    // 若MAX_ARRAY_SIZE大，将MAX_ARRAY_SIZE作为新数组的大小
7
    // MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
8
    return (minCapacity > MAX_ARRAY_SIZE) ?
9
        Integer.MAX_VALUE :
10
        MAX_ARRAY_SIZE;
11
}

Set#

Comparable 和 Comparator 的区别#

Comparable 接口和 Comparator 接口都是 Java 中用于排序的接口，它们在实现类对象之间比较大小、排序等方面发挥了重要作用：

Comparable 接口实际上是出自java.lang包它有一个 compareTo(Object obj)方法用来排序
Comparator接口实际上是出自 java.util 包它有一个compare(Object obj1, Object obj2)方法用来排序

一般我们需要对一个集合使用自定义排序时，我们就要重写compareTo()方法或compare()方法，当我们需要对某一个集合实现两种排序方式，比如一个 song 对象中的歌名和歌手名分别采用一种排序方法的话，我们可以重写compareTo()方法和使用自制的Comparator方法或者以两个 Comparator 来实现歌名排序和歌星名排序，第二种代表我们只能使用两个参数版的 Collections.sort().

Comparator 定制排序#

1
ArrayList<Integer> arrayList = new ArrayList<Integer>();
2
arrayList.add(-1);
3
arrayList.add(3);
4
arrayList.add(3);
5
arrayList.add(-5);
6
arrayList.add(7);
7
arrayList.add(4);
8
arrayList.add(-9);
9
arrayList.add(-7);
10
System.out.println("原始数组:");
11
System.out.println(arrayList);
12
// void reverse(List list)：反转
13
Collections.reverse(arrayList);
14
System.out.println("Collections.reverse(arrayList):");
15
System.out.println(arrayList);
16

17
// void sort(List list),按自然排序的升序排序
18
Collections.sort(arrayList);
19
System.out.println("Collections.sort(arrayList):");
20
System.out.println(arrayList);
21
// 定制排序的用法
22
Collections.sort(arrayList, new Comparator<Integer>() {
23
    @Override
24
    public int compare(Integer o1, Integer o2) {
25
        return o2.compareTo(o1);
26
    }
27
});
28
System.out.println("定制排序后：");
29
System.out.println(arrayList);

无序性和不可重复性的含义是什么#

无序性不等于随机性，无序性是指存储的数据在底层数组中并非按照数组索引的顺序添加，而是根据数据的哈希值决定的。
不可重复性是指添加的元素按照 equals() 判断时，返回 false，需要同时重写 equals() 方法和 hashCode() 方法。

比较 HashSet、LinkedHashSet 和 TreeSet 三者的异同#

HashSet、LinkedHashSet 和 TreeSet 都是 Set 接口的实现类，都能保证元素唯一，并且都不是线程安全的。
HashSet、LinkedHashSet 和 TreeSet 的主要区别在于底层数据结构不同。HashSet 的底层数据结构是哈希表（基于 HashMap 实现）。LinkedHashSet 的底层数据结构是链表和哈希表，元素的插入和取出顺序满足 FIFO。TreeSet 底层数据结构是红黑树，元素是有序的，排序的方式有自然排序和定制排序。
底层数据结构不同又导致这三者的应用场景不同。HashSet 用于不需要保证元素插入和取出顺序的场景，LinkedHashSet 用于保证元素的插入和取出顺序满足 FIFO 的场景，TreeSet 用于支持对元素自定义排序规则的场景。

Queue#

Queue 与 Deque 的区别#

Queue 是单端队列，只能从一端插入元素，另一端删除元素，实现上一般遵循先进先出（FIFO）规则。

Queue 扩展了 Collection 的接口，根据因为容量问题而导致操作失败后处理方式的不同可以分为两类方法: 一种在操作失败后会抛出异常，另一种则会返回特殊值。

Queue 接口	抛出异常	返回特殊值
插入队尾	add(E e)	offer(E e)
删除队首	remove()	poll()
查询队首元素	element()	peek()

Deque 是双端队列，在队列的两端均可以插入或删除元素。

Deque 扩展了 Queue 的接口, 增加了在队首和队尾进行插入和删除的方法，同样根据失败后处理方式的不同分为两类：

Deque 接口	抛出异常	返回特殊值
插入队首	addFirst(E e)	offerFirst(E e)
插入队尾	addLast(E e)	offerLast(E e)
删除队首	removeFirst()	pollFirst()
删除队尾	removeLast()	pollLast()
查询队首元素	getFirst()	peekFirst()
查询队尾元素	getLast()	peekLast()

事实上，Deque 还提供有 push() 和 pop() 等其他方法，可用于模拟栈。

ArrayDeque 与 LinkedList 的区别#

ArrayDeque 和 LinkedList 都实现了 Deque 接口，两者都具有队列的功能，但两者有什么区别呢？

ArrayDeque 是基于可变长的数组和双指针来实现，而 LinkedList 则通过链表来实现。
ArrayDeque 不支持存储 NULL 数据，但 LinkedList 支持。
ArrayDeque 是在 JDK1.6 才被引入的，而LinkedList 早在 JDK1.2 时就已经存在。
ArrayDeque 插入时可能存在扩容过程, 不过均摊后的插入操作依然为 O(1)。虽然 LinkedList 不需要扩容，但是每次插入数据时均需要申请新的堆空间，均摊性能相比更慢。

从性能的角度上，选用 ArrayDeque 来实现队列要比 LinkedList 更好。此外，ArrayDeque 也可以用于实现栈。

说一说 PriorityQueue#

PriorityQueue 是在 JDK1.5 中被引入的, 其与 Queue 的区别在于元素出队顺序是与优先级相关的，即总是优先级最高的元素先出队。

PriorityQueue 利用了二叉堆的数据结构来实现的，底层使用可变长的数组来存储数据
PriorityQueue 通过堆元素的上浮和下沉，实现了在 O(logn) 的时间复杂度内插入元素和删除堆顶元素。
PriorityQueue 是非线程安全的，且不支持存储 NULL 和 non-comparable 的对象。
PriorityQueue 默认是小顶堆，但可以接收一个 Comparator 作为构造参数，从而来自定义元素优先级的先后。

PriorityQueue 在面试中可能更多的会出现在手撕算法的时候，典型例题包括堆排序、求第 K 大的数、带权图的遍历等，所以需要会熟练使用才行。

什么是 BlockingQueue？#

BlockingQueue （阻塞队列）是一个接口，继承自 Queue。BlockingQueue阻塞的原因是其支持当队列没有元素时一直阻塞，直到有元素；还支持如果队列已满，一直等到队列可以放入新元素时再放入。

1
public interface BlockingQueue<E> extends Queue<E> {
2
  // ...
3
}

BlockingQueue 常用于生产者-消费者模型中，生产者线程会向队列中添加数据，而消费者线程会从队列中取出数据进行处理。

BlockingQueue 的实现类有哪些？#

Java 中常用的阻塞队列实现类有以下几种：

ArrayBlockingQueue：使用数组实现的有界阻塞队列。在创建时需要指定容量大小，并支持公平和非公平两种方式的锁访问机制。
LinkedBlockingQueue：使用单向链表实现的可选有界阻塞队列。在创建时可以指定容量大小，如果不指定则默认为Integer.MAX_VALUE。和ArrayBlockingQueue不同的是，它仅支持非公平的锁访问机制。
PriorityBlockingQueue：支持优先级排序的无界阻塞队列。元素必须实现Comparable接口或者在构造函数中传入Comparator对象，并且不能插入 null 元素。
SynchronousQueue：同步队列，是一种不存储元素的阻塞队列。每个插入操作都必须等待对应的删除操作，反之删除操作也必须等待插入操作。因此，SynchronousQueue通常用于线程之间的直接传递数据。
DelayQueue：延迟队列，其中的元素只有到了其指定的延迟时间，才能够从队列中出队。

ArrayBlockingQueue 和 LinkedBlockingQueue 有什么区别？#

ArrayBlockingQueue 和 LinkedBlockingQueue 是 Java 并发包中常用的两种阻塞队列实现，它们都是线程安全的。不过，不过它们之间也存在下面这些区别：

底层实现：ArrayBlockingQueue 基于数组实现，而 LinkedBlockingQueue 基于链表实现。
是否有界：ArrayBlockingQueue 是有界队列，必须在创建时指定容量大小。LinkedBlockingQueue 创建时可以不指定容量大小，默认是Integer.MAX_VALUE，也就是无界的。但也可以指定队列大小，从而成为有界的。
锁是否分离： ArrayBlockingQueue中的锁是没有分离的，即生产和消费用的是同一个锁；LinkedBlockingQueue中的锁是分离的，即生产用的是putLock，消费是takeLock，这样可以防止生产者和消费者线程之间的锁争夺。
内存占用：ArrayBlockingQueue 需要提前分配数组内存，而 LinkedBlockingQueue 则是动态分配链表节点内存。这意味着，ArrayBlockingQueue 在创建时就会占用一定的内存空间，且往往申请的内存比实际所用的内存更大，而LinkedBlockingQueue 则是根据元素的增加而逐渐占用内存空间。

Map（重要）#

HashMap 和 Hashtable 的区别#

线程是否安全： HashMap 是非线程安全的，Hashtable 是线程安全的,因为 Hashtable 内部的方法基本都经过synchronized 修饰。（如果你要保证线程安全的话就使用 ConcurrentHashMap）
效率：因为线程安全的问题，HashMap 要比 Hashtable 效率高一点。另外，Hashtable 基本被淘汰，不要在代码中使用它
对 Null key 和 Null value 的支持： HashMap 可以存储 null 的 key 和 value，但 null 作为键只能有一个，null 作为值可以有多个；Hashtable 不允许有 null 键和 null 值，否则会抛出 NullPointerException
初始容量大小和每次扩充容量大小的不同：
- 创建时如果不指定容量初始值，Hashtable 默认的初始大小为 11，之后每次扩充，容量变为原来的 2n+1。HashMap 默认的初始化大小为 16。之后每次扩充，容量变为原来的 2 倍。
- 创建时如果给定了容量初始值，那么 Hashtable 会直接使用你给定的大小，而 HashMap 会将其扩充为 2 的幂次方大小
底层数据结构： JDK1.8 以后的 HashMap 在解决哈希冲突时有了较大的变化，当链表长度大于阈值（默认为 8）时，将链表转化为红黑树（将链表转换成红黑树前会判断，如果当前数组的长度小于 64，那么会选择先进行数组扩容，而不是转换为红黑树），以减少搜索时间。Hashtable 没有这样的机制。

HashMap 和 HashSet 区别#

如果你看过 HashSet 源码的话就应该知道：HashSet 底层就是基于 HashMap 实现的。（HashSet 的源码非常非常少，因为除了 clone()、writeObject()、readObject()是 HashSet 自己不得不实现之外，其他方法都是直接调用 HashMap 中的方法。

HashMap	HashSet
实现了 Map 接口	实现 Set 接口
存储键值对	仅存储对象
调用 put()向 map 中添加元素	调用 add()方法向 Set 中添加元素
HashMap 使用键（Key）计算 hashcode	HashSet 使用成员对象来计算 hashcode 值，对于两个对象来说 hashcode 可能相同，所以equals()方法用来判断对象的相等性

HashMap 和 TreeMap 区别#

TreeMap 和HashMap 都继承自AbstractMap ，但是需要注意的是TreeMap它还实现了NavigableMap接口和SortedMap 接口。

实现 NavigableMap 接口让 TreeMap 有了对集合内元素的搜索的能力。

实现SortedMap接口让 TreeMap 有了对集合中的元素根据键排序的能力。默认是按 key 的升序排序，不过我们也可以指定排序的比较器。

综上，相比于HashMap来说 TreeMap 主要多了对集合中的元素根据键排序的能力以及对集合内元素的搜索的能力。

HashSet 如何检查重复?#

当你把对象加入HashSet时，HashSet 会先计算对象的hashcode值来判断对象加入的位置，同时也会与其他加入的对象的 hashcode 值作比较，如果没有相符的 hashcode，HashSet 会假设对象没有重复出现。但是如果发现有相同 hashcode 值的对象，这时会调用equals()方法来检查 hashcode 相等的对象是否真的相同。如果两者相同，HashSet 就不会让加入操作成功。

在 JDK1.8 中，HashSet的add()方法只是简单的调用了HashMap的put()方法，并且判断了一下返回值以确保是否有重复元素。直接看一下HashSet中的源码：

1
// Returns: true if this set did not already contain the specified element
2
// 返回值：当 set 中没有包含 add 的元素时返回真
3
public boolean add(E e) {
4
        return map.put(e, PRESENT)==null;
5
}

也就是说，在 JDK1.8 中，实际上无论HashSet中是否已经存在了某元素，HashSet都会直接插入，只是会在add()方法的返回值处告诉我们插入前是否存在相同元素。

HashMap 的底层实现#

JDK1.8 之前#

JDK1.8 之前 HashMap 底层是数组和链表结合在一起使用也就是链表散列。HashMap 通过 key 的 hashcode 经过扰动函数处理过后得到 hash 值，然后通过 (n - 1) & hash 判断当前元素存放的位置（这里的 n 指的是数组的长度），如果当前位置存在元素的话，就判断该元素与要存入的元素的 hash 值以及 key 是否相同，如果相同的话，直接覆盖，不相同就通过拉链法解决冲突。

所谓扰动函数指的就是 HashMap 的 hash 方法。使用 hash 方法也就是扰动函数是为了防止一些实现比较差的 hashCode() 方法换句话说使用扰动函数之后可以减少碰撞。

1
static int hash(int h) {
2
    // This function ensures that hashCodes that differ only by
3
    // constant multiples at each bit position have a bounded
4
    // number of collisions (approximately 8 at default load factor).
5
    h ^= (h >>> 20) ^ (h >>> 12);
6
    return h ^ (h >>> 7) ^ (h >>> 4);
7
}

JDK 1.8 HashMap#

JDK 1.8 的 hash 方法相比于 JDK 1.7 hash 方法更加简化，但是原理不变。

1
static final int hash(Object key) {
2
      int h;
3
      // key.hashCode()：返回散列值也就是hashcode
4
      // ^：按位异或
5
      // >>>:无符号右移，忽略符号位，空位都以0补齐
6
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
7
  }

相比于 JDK1.8 的 hash 方法，JDK 1.7 的 hash 方法的性能会稍差一点点，因为毕竟扰动了 4 次。

所谓 “拉链法” 就是：将链表和数组相结合。也就是说创建一个链表数组，数组中每一格就是一个链表。若遇到哈希冲突，则将冲突的值加到链表中即可。

JDK1.8 之后#

相比于之前的版本， JDK1.8 之后在解决哈希冲突时有了较大的变化，当链表长度大于阈值（默认为 8）（将链表转换成红黑树前会判断，如果当前数组的长度小于 64，那么会选择先进行数组扩容，而不是转换为红黑树）时，将链表转化为红黑树，以减少搜索时间

TreeMap、TreeSet 以及 JDK1.8 之后的 HashMap 底层都用到了红黑树。红黑树就是为了解决二叉查找树的缺陷，因为二叉查找树在某些情况下会退化成一个线性结构。

我们来结合源码分析一下 HashMap 链表到红黑树的转换。

putVal 方法中执行链表转红黑树的判断逻辑。

链表的长度大于 8 的时候，就执行 treeifyBin （转换红黑树）的逻辑。

1
// 遍历链表
2
for (int binCount = 0; ; ++binCount) {
3
    // 遍历到链表最后一个节点
4
    if ((e = p.next) == null) {
5
        p.next = newNode(hash, key, value, null);
6
        // 如果链表元素个数大于等于TREEIFY_THRESHOLD（8）
7
        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
8
            // 红黑树转换（并不会直接转换成红黑树）
9
            treeifyBin(tab, hash);
10
        break;
11
    }
12
    if (e.hash == hash &&
13
        ((k = e.key) == key || (key != null && key.equals(k))))
14
        break;
15
    p = e;
16
}

treeifyBin 方法中判断是否真的转换为红黑树。

1
final void treeifyBin(Node<K,V>[] tab, int hash) {
2
    int n, index; Node<K,V> e;
3
    // 判断当前数组的长度是否小于 64
4
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
5
        // 如果当前数组的长度小于 64，那么会选择先进行数组扩容
6
        resize();
7
    else if ((e = tab[index = (n - 1) & hash]) != null) {
8
        // 否则才将列表转换为红黑树
9

10
        TreeNode<K,V> hd = null, tl = null;
11
        do {
12
            TreeNode<K,V> p = replacementTreeNode(e, null);
13
            if (tl == null)
14
                hd = p;
15
            else {
16
                p.prev = tl;
17
                tl.next = p;
18
            }
19
            tl = p;
20
        } while ((e = e.next) != null);
21
        if ((tab[index] = hd) != null)
22
            hd.treeify(tab);
23
    }
24
}

将链表转换成红黑树前会判断，如果当前数组的长度小于 64，那么会选择先进行数组扩容，而不是转换为红黑树。

HashMap 的长度为什么是 2 的幂次方#

为了能让 HashMap 存取高效，尽量较少碰撞，也就是要尽量把数据分配均匀。我们上面也讲到了过了，Hash 值的范围值-2147483648 到 2147483647，前后加起来大概 40 亿的映射空间，只要哈希函数映射得比较均匀松散，一般应用是很难出现碰撞的。但问题是一个 40 亿长度的数组，内存是放不下的。所以这个散列值是不能直接拿来用的。用之前还要先做对数组的长度取模运算，得到的余数才能用来要存放的位置也就是对应的数组下标。这个数组下标的计算方法是“ (n - 1) & hash”。（n 代表数组长度）。这也就解释了 HashMap 的长度为什么是 2 的幂次方。

这个算法应该如何设计呢？

我们首先可能会想到采用%取余的操作来实现。但是，重点来了：“取余(%)操作中如果除数是 2 的幂次则等价于与其除数减一的与(&)操作（也就是说 hash%length==hash&(length-1)的前提是 length 是 2 的 n 次方；）。” 并且采用二进制位操作 &，相对于%能够提高运算效率，这就解释了 HashMap 的长度为什么是 2 的幂次方。

HashMap 多线程操作导致死循环问题#

JDK1.7 及之前版本的 HashMap 在多线程环境下扩容操作可能存在死循环问题，这是由于当一个桶位中有多个元素需要进行扩容时，多个线程同时对链表进行操作，头插法可能会导致链表中的节点指向错误的位置，从而形成一个环形链表，进而使得查询元素的操作陷入死循环无法结束。

为了解决这个问题，JDK1.8 版本的 HashMap 采用了尾插法而不是头插法来避免链表倒置，使得插入的节点永远都是放在链表的末尾，避免了链表中的环形结构。但是还是不建议在多线程下使用 HashMap，因为多线程下使用 HashMap 还是会存在数据覆盖的问题。并发环境下，推荐使用 ConcurrentHashMap 。

HashMap 为什么线程不安全？#

JDK1.7 及之前版本，在多线程环境下，HashMap 扩容时会造成死循环和数据丢失的问题。

数据丢失这个在 JDK1.7 和 JDK 1.8 中都存在，这里以 JDK 1.8 为例进行介绍。

JDK 1.8 后，在 HashMap 中，多个键值对可能会被分配到同一个桶（bucket），并以链表或红黑树的形式存储。多个线程对 HashMap 的 put 操作会导致线程不安全，具体来说会有数据覆盖的风险。

举个例子：

两个线程 1,2 同时进行 put 操作，并且发生了哈希冲突（hash 函数计算出的插入下标是相同的）。
不同的线程可能在不同的时间片获得 CPU 执行的机会，当前线程 1 执行完哈希冲突判断后，由于时间片耗尽挂起。线程 2 先完成了插入操作。
随后，线程 1 获得时间片，由于之前已经进行过 hash 碰撞的判断，所有此时会直接进行插入，这就导致线程 2 插入的数据被线程 1 覆盖了。

还有一种情况是这两个线程同时 put 操作导致 size 的值不正确，进而导致数据覆盖的问题：

线程 1 执行 if(++size > threshold) 判断时，假设获得 size 的值为 10，由于时间片耗尽挂起。
线程 2 也执行 if(++size > threshold) 判断，获得 size 的值也为 10，并将元素插入到该桶位中，并将 size 的值更新为 11。
随后，线程 1 获得时间片，它也将元素放入桶位中，并将 size 的值更新为 11。
线程 1、2 都执行了一次 put 操作，但是 size 的值只增加了 1，也就导致实际上只有一个元素被添加到了 HashMap 中。

HashMap 常见的遍历方式?#

使用迭代器（Iterator）EntrySet 的方式进行遍历

1
Iterator<Map.Entry<Integer, String>> iterator = map.entrySet().iterator();
2
 while (iterator.hasNext()) {
3
     Map.Entry<Integer, String> entry = iterator.next();
4
     // ...
5
 }

使用迭代器（Iterator）KeySet 的方式进行遍历

1
Iterator<Map.Entry<Integer, String>> iterator = map.keySet().iterator();
2
 while (iterator.hasNext()) {
3
     Integer key = iterator.next();
4
     // ...
5
 }

使用 For Each EntrySet 的方式进行遍历

1
for (Map.Entry<Integer, String> entry : map.entrySet()) {
2
     // ...
3
 }

使用 For Each KeySet 的方式进行遍历

1
for (Map.Entry<Integer, String> entry : map.entrySet()) {
2
     // ...
3
 }

使用 Lambda 表达式的方式进行遍历

1
map.forEach((key, value) -> {
2
     // ...
3
 });

使用 Streams API 单线程的方式进行遍历

1
map.entrySet().stream().forEach((entry) -> {
2
     // ...
3
 });

使用 Streams API 多线程的方式进行遍历

1
map.entrySet().parallelStream().forEach((entry) -> {
2
     // ...
3
 });

当遍历不存在阻塞时, parallelStream 的性能是最低的：

1
Benchmark               Mode  Cnt     Score      Error  Units
2
Test.entrySet           avgt    5   288.651 ±   10.536  ns/op
3
Test.keySet             avgt    5   584.594 ±   21.431  ns/op
4
Test.lambda             avgt    5   221.791 ±   10.198  ns/op
5
Test.parallelStream     avgt    5  6919.163 ± 1116.139  ns/op

加入阻塞代码Thread.sleep(10)后, parallelStream 的性能才是最高的:

1
Benchmark               Mode  Cnt           Score          Error  Units
2
Test.entrySet           avgt    5  1554828440.000 ± 23657748.653  ns/op
3
Test.keySet             avgt    5  1550612500.000 ±  6474562.858  ns/op
4
Test.lambda             avgt    5  1551065180.000 ± 19164407.426  ns/op
5
Test.parallelStream     avgt    5   186345456.667 ±  3210435.590  ns/op

存在阻塞时 parallelStream 性能最高, 非阻塞时 parallelStream 性能最低。

ConcurrentHashMap 和 Hashtable 的区别#

ConcurrentHashMap 和 Hashtable 的区别主要体现在实现线程安全的方式上不同。

底层数据结构： JDK1.7 的 ConcurrentHashMap 底层采用 分段的数组+链表 实现，JDK1.8 采用的数据结构跟 HashMap1.8 的结构一样，数组+链表/红黑二叉树。Hashtable 和 JDK1.8 之前的 HashMap 的底层数据结构类似都是采用 数组+链表 的形式，数组是 HashMap 的主体，链表则是主要为了解决哈希冲突而存在的；
实现线程安全的方式（重要）：
- 在 JDK1.7 的时候，ConcurrentHashMap 对整个桶数组进行了分割分段(Segment，分段锁)，每一把锁只锁容器其中一部分数据，多线程访问容器里不同数据段的数据，就不会存在锁竞争，提高并发访问率。
- 到了 JDK1.8 的时候，ConcurrentHashMap 已经摒弃了 Segment 的概念，而是直接用 Node 数组+链表+红黑树的数据结构来实现，并发控制使用 synchronized 和 CAS 来操作。（JDK1.6 以后 synchronized 锁做了很多优化）整个看起来就像是优化过且线程安全的 HashMap，虽然在 JDK1.8 中还能看到 Segment 的数据结构，但是已经简化了属性，只是为了兼容旧版本；
- Hashtable(同一把锁) :使用 synchronized 来保证线程安全，效率非常低下。当一个线程访问同步方法时，其他线程也访问同步方法，可能会进入阻塞或轮询状态，如使用 put 添加元素，另一个线程不能使用 put 添加元素，也不能使用 get，竞争会越来越激烈效率越低。

下面，我们再来看看两者底层数据结构的对比图。

Hashtable
JDK1.7 的 ConcurrentHashMap

ConcurrentHashMap 是由 Segment 数组结构和 HashEntry 数组结构组成。

Segment 数组中的每个元素包含一个 HashEntry 数组，每个 HashEntry 数组属于链表结构。
JDK1.8 的 ConcurrentHashMap

JDK1.8 的 ConcurrentHashMap 不再是 Segment 数组 + HashEntry 数组 + 链表，而是 Node 数组 + 链表 / 红黑树。不过，Node 只能用于链表的情况，红黑树的情况需要使用 TreeNode。当冲突链表达到一定长度时，链表会转换成红黑树。

TreeNode是存储红黑树节点，被TreeBin包装。TreeBin通过root属性维护红黑树的根结点，因为红黑树在旋转的时候，根结点可能会被它原来的子节点替换掉，在这个时间点，如果有其他线程要写这棵红黑树就会发生线程不安全问题，所以在 ConcurrentHashMap 中TreeBin通过waiter属性维护当前使用这棵红黑树的线程，来防止其他线程的进入。

ConcurrentHashMap 线程安全的具体实现方式/底层具体实现#

JDK1.8 之前#

首先将数据分为一段一段（这个“段”就是 Segment）的存储，然后给每一段数据配一把锁，当一个线程占用锁访问其中一个段数据时，其他段的数据也能被其他线程访问。

ConcurrentHashMap 是由 Segment 数组结构和 HashEntry 数组结构组成。

Segment 继承了 ReentrantLock,所以 Segment 是一种可重入锁，扮演锁的角色。HashEntry 用于存储键值对数据。

1
static class Segment<K,V> extends ReentrantLock implements Serializable {
2
}

一个 ConcurrentHashMap 里包含一个 Segment 数组，Segment 的个数一旦初始化就不能改变。 Segment 数组的大小默认是 16，也就是说默认可以同时支持 16 个线程并发写。

Segment 的结构和 HashMap 类似，是一种数组和链表结构，一个 Segment 包含一个 HashEntry 数组，每个 HashEntry 是一个链表结构的元素，每个 Segment 守护着一个 HashEntry 数组里的元素，当对 HashEntry 数组的数据进行修改时，必须首先获得对应的 Segment 的锁。也就是说，对同一 Segment 的并发写入会被阻塞，不同 Segment 的写入是可以并发执行的。

JDK1.8 之后#

ConcurrentHashMap 取消了 Segment 分段锁，采用 Node + CAS + synchronized 来保证并发安全。数据结构跟HashMap1.8 的结构类似，数组+链表/红黑二叉树。Java 8 在链表长度超过一定阈值（8）时将链表（寻址时间复杂度为 O(N)）转换为红黑树（寻址时间复杂度为 O(log(N))）。

Java 8 中，锁粒度更细，synchronized 只锁定当前链表或红黑二叉树的首节点，这样只要 hash 不冲突，就不会产生并发，就不会影响其他 Node 的读写，效率大幅提升。

JDK 1.7 和 JDK 1.8 的 ConcurrentHashMap 实现有什么不同？#

线程安全实现方式：JDK 1.7 采用 Segment 分段锁来保证安全， Segment 是继承自 ReentrantLock。JDK1.8 放弃了 Segment 分段锁的设计，采用 Node + CAS + synchronized 保证线程安全，锁粒度更细，synchronized 只锁定当前链表或红黑二叉树的首节点。
Hash 碰撞解决方法 : JDK 1.7 采用拉链法，JDK1.8 采用拉链法结合红黑树（链表长度超过一定阈值时，将链表转换为红黑树）。
并发度：JDK 1.7 最大并发度是 Segment 的个数，默认是 16。JDK 1.8 最大并发度是 Node 数组的大小，并发度更大。

ConcurrentHashMap 为什么 key 和 value 不能为 null？#

ConcurrentHashMap 的 key 和 value 不能为 null 主要是为了避免二义性。null 是一个特殊的值，表示没有对象或没有引用。如果你用 null 作为键，那么你就无法区分这个键是否存在于 ConcurrentHashMap 中，还是根本没有这个键。同样，如果你用 null 作为值，那么你就无法区分这个值是否是真正存储在 ConcurrentHashMap 中的，还是因为找不到对应的键而返回的。

拿 get 方法取值来说，返回的结果为 null 存在两种情况：

值没有在集合中
值本身就是 null

这也就是二义性的由来。

多线程环境下，存在一个线程操作该 ConcurrentHashMap 时，其他的线程将该 ConcurrentHashMap 修改的情况，所以无法通过 containsKey(key) 来判断否存在这个键值对，也就没办法解决二义性问题了。

与此形成对比的是，HashMap 可以存储 null 的 key 和 value，但 null 作为键只能有一个，null 作为值可以有多个。如果传入 null 作为参数，就会返回 hash 值为 0 的位置的值。单线程环境下，不存在一个线程操作该 HashMap 时，其他的线程将该 HashMap 修改的情况，所以可以通过 contains(key)来做判断是否存在这个键值对，从而做相应的处理，也就不存在二义性问题。

也就是说，多线程下无法正确判定键值对是否存在（存在其他线程修改的情况），单线程是可以的（不存在其他线程修改的情况）。

如果你确实需要在 ConcurrentHashMap 中使用 null 的话，可以使用一个特殊的静态空对象来代替 null

1
public static final Object NULL = new Object();

ConcurrentHashMap 能保证复合操作的原子性吗？#

ConcurrentHashMap 是线程安全的，意味着它可以保证多个线程同时对它进行读写操作时，不会出现数据不一致的情况，也不会导致 JDK1.7 及之前版本的 HashMap 多线程操作导致死循环问题。但是，这并不意味着它可以保证所有的复合操作都是原子性的，一定不要搞混了！

复合操作是指由多个基本操作(如put、get、remove、containsKey等)组成的操作，例如先判断某个键是否存在containsKey(key)，然后根据结果进行插入或更新put(key, value)。这种操作在执行过程中可能会被其他线程打断，导致结果不符合预期。

那如何保证 ConcurrentHashMap 复合操作的原子性呢？

ConcurrentHashMap 提供了一些原子性的复合操作，如 putIfAbsent、compute、computeIfAbsent 、computeIfPresent、merge等。这些方法都可以接受一个函数作为参数，根据给定的 key 和 value 来计算一个新的 value，并且将其更新到 map 中。

这种情况也能加锁同步，但不建议使用加锁的同步机制，违背了使用 ConcurrentHashMap 的初衷。在使用 ConcurrentHashMap 的时候，尽量使用这些原子性的复合操作方法来保证原子性。

Collections 工具类（不重要）#

Collections 工具类常用方法:

排序
查找,替换操作
同步控制(不推荐，需要线程安全的集合类型时请考虑使用 JUC 包下的并发集合)

排序操作#

1
void reverse(List list)//反转
2
void shuffle(List list)//随机排序
3
void sort(List list)//按自然排序的升序排序
4
void sort(List list, Comparator c)//定制排序，由Comparator控制排序逻辑
5
void swap(List list, int i , int j)//交换两个索引位置的元素
6
void rotate(List list, int distance)//旋转。当distance为正数时，将list后distance个元素整体移到前面。当distance为负数时，将 list的前distance个元素整体移到后面

查找,替换操作#

1
int binarySearch(List list, Object key)//对List进行二分查找，返回索引，注意List必须是有序的
2
int max(Collection coll)//根据元素的自然顺序，返回最大的元素。 类比int min(Collection coll)
3
int max(Collection coll, Comparator c)//根据定制排序，返回最大元素，排序规则由Comparatator类控制。类比int min(Collection coll, Comparator c)
4
void fill(List list, Object obj)//用指定的元素代替指定list中的所有元素
5
int frequency(Collection c, Object o)//统计元素出现次数
6
int indexOfSubList(List list, List target)//统计target在list中第一次出现的索引，找不到则返回-1，类比int lastIndexOfSubList(List source, list target)
7
boolean replaceAll(List list, Object oldVal, Object newVal)//用新元素替换旧元素

同步控制#

Collections 提供了多个synchronizedXxx()方法·，该方法可以将指定集合包装成线程同步的集合，从而解决多线程并发访问集合时的线程安全问题。

我们知道 HashSet，TreeSet，ArrayList,LinkedList,HashMap,TreeMap 都是线程不安全的。Collections 提供了多个静态方法可以把他们包装成线程同步的集合。

最好不要用下面这些方法，效率非常低，需要线程安全的集合类型时请考虑使用 JUC 包下的并发集合。

方法如下：

1
synchronizedCollection(Collection<T>  c) //返回指定 collection 支持的同步（线程安全的）collection。
2
synchronizedList(List<T> list)//返回指定列表支持的同步（线程安全的）List。
3
synchronizedMap(Map<K,V> m) //返回由指定映射支持的同步（线程安全的）Map。
4
synchronizedSet(Set<T> s) //返回指定 set 支持的同步（线程安全的）set。

4388 字

22 分钟

Java Collections Overview

2024-01-26

cs-base

java

meeting

doc

Collections Overview#

Java Collections Overview#

Java collections, also called containers, are mainly derived from two core interfaces: one is the Collection interface, mainly used to store a single element; the other is the Map interface, mainly used to store key-value pairs. For the Collection interface, there are three main subinterfaces: List, Set, and Queue.

Differences among List, Set, Queue, and Map#

List (a great helper for preserving order): stored elements are ordered and can be duplicated.
Set (emphasizing uniqueness): stored elements are not allowed to be duplicates.
Queue (implements queuing functionality): determines the order according to a specific queuing rule; stored elements are ordered and can be duplicated.
Map (expert at searching by key): stores data as key-value pairs, similar to the mathematical function y=f(x), where “x” represents the key and “y” the value. Keys are unordered and non-duplicated, values are unordered and can be duplicated; each key maps to at most one value.

Summary of underlying data structures in the Collection Framework#

List#

ArrayList: Object[] array.
Vector: Object[] array.
LinkedList: doubly linked list (before JDK 1.6 it was a circular linked list; JDK 1.7 removed the circularity).

Set#

HashSet (unordered, unique): based on HashMap; elements are stored using a HashMap underneath.
LinkedHashSet: LinkedHashSet is a subclass of HashSet, and internally implemented via LinkedHashMap.
TreeSet (ordered, unique): red-black tree (self-balancing binary search tree).

Queue#

PriorityQueue: implemented as a min-heap using an Object[] array.
DelayQueue: PriorityQueue.
ArrayDeque: expandable dynamic double-ended array.

Map#

HashMap: before JDK 1.8, HashMap was implemented as a combination of an array and linked lists; the array is the main structure, lists resolve hash collisions (chaining). After JDK 1.8, collisions are handled with significant changes: when a chain length exceeds a threshold (default 8), the chain is converted to a red-black tree to reduce search time (if the current array length is less than 64, it may grow first instead of converting to a tree).
LinkedHashMap: LinkedHashMap inherits from HashMap, so its underlying structure remains a chained hash structure composed of an array and lists or trees. Additionally, LinkedHashMap adds a doubly linked list on top to preserve the insertion order of key-value pairs, and also implements access-order logic by manipulating the linked list.
Hashtable: array + linked list; the array forms the main body, the linked list resolves hash collisions.
TreeMap: red-black tree (self-balancing binary search tree).

How to choose a collection?#

We mainly select a suitable collection based on its characteristics.

If you need to access elements by keys, choose the Map interface; for sorting, use TreeMap; if you don’t need sorting, use HashMap; for thread-safety, use ConcurrentHashMap.
If you only need to store element values, choose a collection implementing the Collection interface; for uniqueness, choose a Set implementation such as TreeSet or HashSet; if not, choose a List implementation such as ArrayList or LinkedList, and then pick based on the characteristics of those implementations.

Why use collections?#

When we need to store a set of data of the same type, arrays are one of the most common and basic containers. However, using arrays to store objects has drawbacks because in real development, data types can be diverse and the quantity may be unknown. This is where Java collections come in. Compared to arrays, Java collections provide more flexible and efficient ways to store multiple data objects. The various collection classes and interfaces in the Java Collections Framework can store objects of different types and quantities, and offer a variety of operations. Compared with arrays, the advantages of Java collections include variable size, generic support, and built-in algorithms. In short, Java collections enhance the flexibility of data storage and processing, better meeting the diverse data needs in modern software development, and supporting high-quality code.

List#

Differences between ArrayList and Array (array)?#

ArrayList is implemented on top of a dynamic array, and is more flexible than a static Array:
- ArrayList grows or shrinks dynamically based on the actual elements stored, while an Array cannot change its length once created.
- ArrayList allows you to use generics to ensure type safety; arrays do not.
- ArrayList can store only objects. Primitive types require their wrapper classes (e.g., Integer, Double). Arrays can store primitive types directly as well as objects.
- ArrayList supports insertion, deletion, traversal, and other common operations, with a rich API such as add(), remove(), etc. Arrays are fixed-length and can only be accessed by index; they do not support dynamic addition or removal of elements.
- Creating an ArrayList does not require specifying a size, while arrays require a size at creation.

Differences between ArrayList and Vector?#

ArrayList is the main implementation of List, backed by Object[] and suitable for frequent lookups; it is not thread-safe.
Vector is an older implementation of List, also backed by Object[] and is thread-safe.

Differences between Vector and Stack?#

Vector and Stack are both thread-safe, using synchronized for synchronization.
Stack inherits from Vector and is a LIFO stack, while Vector is a list.

As Java concurrency programming evolved, Vector and Stack have been deprecated. It is recommended to use concurrent collection classes (e.g., ConcurrentHashMap, CopyOnWriteArrayList, etc.) or manually implement thread-safe approaches to provide safe multithreaded operations.

Can ArrayList contain null values?#

ArrayList can store any type of object, including null values. However, it is not recommended to add null values, as null is meaningless and can make code harder to maintain; for example, forgetting to perform null checks can lead to NullPointerException.

1
ArrayList<String> listOfStrings = new ArrayList<>();
2
listOfStrings.add(null);
3
listOfStrings.add("java");
4
System.out.println(listOfStrings);

Time complexity of inserting and deleting elements in ArrayList?#

For insertion
- Insertion at the head: requires shifting all elements one position to the right, so O(n).
- Insertion at the tail: when capacity is not reached, O(1); when capacity is full and growth is needed, an O(n) operation copies the old array to a larger one, then O(1) to insert.
- Insertion at a given index: shifts all elements after the target position one place to the right, so O(n) (average n/2 moves).
For deletion
- Deletion at the head: shifts all elements one position to the left, so O(n).
- Deletion at the tail: O(1) when removing the last element.
- Deletion at a given index: shifts elements after the target position to fill the gap, so O(n) (average n/2 moves).

Time complexity of inserting and deleting elements in LinkedList?#

Insertion/deletion at the head: O(1).
Insertion/deletion at the tail: O(1).
Insertion/deletion at a given position: O(n) because you must traverse to the position first.

Here is a simple example: if we want to delete node 9, we need to traverse the list to locate the node, then adjust the relevant pointers accordingly.

Why can’t LinkedList implement RandomAccess?#

RandomAccess is a marker interface indicating that a class supports fast random access (i.e., constant-time indexed access). Since LinkedList is based on a linked list, with non-contiguous memory and traversal needed to reach a given position, it does not support fast random access and thus cannot implement RandomAccess.

Differences between ArrayList and LinkedList?#

Thread-safety: Both ArrayList and LinkedList are not synchronized; neither is thread-safe.
Underlying data structure: ArrayList uses a plain Object[] array; LinkedList uses a doubly linked list (JDK 1.6 had circular lists, JDK 1.7 removed them. Note the difference between doubly linked list and doubly circular list).
Insert/delete time depending on position:
- ArrayList stores data in an array, so insertion and deletion times depend on position. For example, add(E e) appends to the end by default with O(1). If inserting or deleting at a specific position i (add(int index, E element)), the time is O(n) since elements after i must be shifted.
- LinkedList stores data in a linked list, so head/tail insertions or deletions do not depend on element position (add(E e), addFirst(E e), addLast(E e), removeFirst(), removeLast()) with O(1). If inserting or deleting at a specific position i (add(int index, E element), remove(Object o), remove(int index)), time is O(n) because you must traverse to the position first.
Fast random access support: LinkedList does not support efficient random access, while ArrayList (which implements RandomAccess) does. Fast random access means quickly retrieving elements by index (get(int index)).
Memory usage: ArrayList wastes space by preallocating capacity at the list end, while LinkedList’s memory usage is tied to the extra room needed for each node (to store next/prev links plus data).

In practice, we typically do not use LinkedList; in scenarios where LinkedList would be used, ArrayList can usually replace it with better performance.

Additionally, do not assume that LinkedList is universally best for insertion/deletion scenarios simply because it is a linked list. As noted above, LinkedList only achieves near O(1) time for head/tail insertions/deletions; for other cases, the average time is O(n).

Supplement: Doubly Linked List and Doubly Circular Linked List

Doubly Linked List: contains two pointers, a prev pointing to the previous node, and a next pointing to the next node.

Doubly Circular Linked List: the last node’s next points to the head, and the head’s prev points to the last node, forming a loop.

Supplement: RandomAccess Interface
1
public interface RandomAccess {}
Looking at the source, RandomAccess is essentially empty. So, in my view, RandomAccess is just a marker. What does it mark? It marks that the implementing class supports random access. In the binarySearch() method, it checks whether the input list is an instance of RandomAccess; if so, it calls indexedBinarySearch(), otherwise it calls iteratorBinarySearch().
1
public static <T>
2
int binarySearch(List<? extends Comparable<? super T>> list, T key) {
3
    if (list instanceof RandomAccess || list.size()<BINARYSEARCH_THRESHOLD)
4
        return Collections.indexedBinarySearch(list, key);
5
    else
6
        return Collections.iteratorBinarySearch(list, key);
7
}
ArrayList implements RandomAccess, while LinkedList does not. Why? It relates to the underlying data structure. ArrayList is backed by an array, and arrays naturally support random access with O(1) time, hence fast random access. Lists backed by linked lists require traversal, so they do not support fast random access. ArrayList implements RandomAccess to indicate it offers fast random access. The RandomAccess interface is merely a marker and does not by itself guarantee fast random access!

Talk about ArrayList resizing mechanism#

Start with ArrayList constructors#

ArrayList can be initialized in three ways; the constructor source code (JDK8) is as follows:

1
/**
2
 * 默认初始容量大小
3
 */
4
private static final int DEFAULT_CAPACITY = 10;
5

6
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
7

8
/**
9
 * 默认构造函数，使用初始容量10构造一个空列表(无参数构造)
10
 */
11
public ArrayList() {
12
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
13
}
14

15
/**
16
 * 带初始容量参数的构造函数。（用户自己指定容量）
17
 */
18
public ArrayList(int initialCapacity) {
19
    if (initialCapacity > 0) {//初始容量大于0
20
        //创建initialCapacity大小的数组
21
        this.elementData = new Object[initialCapacity];
22
    } else if (initialCapacity == 0) {//初始容量等于0
23
        //创建空数组
24
        this.elementData = EMPTY_ELEMENTDATA;
25
    } else {//初始容量小于0，抛出异常
26
        throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
27
    }
28
}
29

30
/**
31
 *构造包含指定collection元素的列表，这些元素利用该集合的迭代器按顺序返回
32
 *如果指定的集合为null，throws NullPointerException。
33
 */
34
public ArrayList(Collection<? extends E> c) {
35
    elementData = c.toArray();
36
    if ((size = elementData.length) != 0) {
37
        // c.toArray might (incorrectly) not return Object[] (see 6260652)
38
        if (elementData.getClass() != Object[].class)
39
            elementData = Arrays.copyOf(elementData, size, Object[].class);
40
    } else {
41
        // replace with empty array.
42
        this.elementData = EMPTY_ELEMENTDATA;
43
    }
44
}

Creating an ArrayList with the no-arg constructor actually initializes with an empty array. Only when elements are added does it allocate capacity. That is, adding the first element expands the capacity to 10.

Supplement: In old JDK6, new ArrayList() directly created an Object[] array of length 10.

Step-by-step analysis of ArrayList resizing#

Here we analyze the add method for an ArrayList created with the no-arg constructor.

1
/**
2
* 将指定的元素追加到此列表的末尾。
3
*/
4
public boolean add(E e) {
5
    // 加元素之前，先调用ensureCapacityInternal方法
6
    ensureCapacityInternal(size + 1);  // Increments modCount!!
7
    // 这里看到ArrayList添加元素的实质就相当于为数组赋值
8
    elementData[size++] = e;
9
    return true;
10
}

Note: JDK 11 removed ensureCapacityInternal() and ensureExplicitCapacity() methods.

The source of ensureCapacityInternal is:

1
// 根据给定的最小容量和当前数组元素来计算所需容量。
2
private static int calculateCapacity(Object[] elementData, int minCapacity) {
3
    // 如果当前数组元素为空数组（初始情况），返回默认容量和最小容量中的较大值作为所需容量
4
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
5
        return Math.max(DEFAULT_CAPACITY, minCapacity);
6
    }
7
    // 否则直接返回最小容量
8
    return minCapacity;
9
}
10

11
// 确保内部容量达到指定的最小容量。
12
private void ensureCapacityInternal(int minCapacity) {
13
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
14
}
15

16
//判断是否需要扩容
17
private void ensureExplicitCapacity(int minCapacity) {
18
    modCount++;
19
    //判断当前数组容量是否足以存储minCapacity个元素
20
    if (minCapacity - elementData.length > 0)
21
        //调用grow方法进行扩容
22
        grow(minCapacity);
23
}

Let’s analyze carefully:

When we add the first element, elementData.length is 0 (still an empty list). Because ensureCapacityInternal() is called, minCapacity is 10. Now minCapacity - elementData.length > 0 is true, so it enters grow(minCapacity).
When adding the second element, minCapacity is 2; at this time, after adding the first element, the capacity has expanded to 10. Now minCapacity - elementData.length > 0 is false, so grow(minCapacity) is not called.
Only when adding the 11th element will minCapacity (11) be greater than elementData.length (10); it then enters the grow method to expand.

grow method

1
/**
2
 * 要分配的最大数组大小
3
 */
4
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
5

6
/**
7
 * ArrayList扩容的核心方法。
8
 */
9
private void grow(int minCapacity) {
10
    // oldCapacity为旧容量，newCapacity为新容量
11
    int oldCapacity = elementData.length;
12
    // 将oldCapacity 右移一位，其效果相当于oldCapacity /2，
13
    // 我们知道位运算的速度远远快于整除运算，整句运算式的结果就是将新容量更新为旧容量的1.5倍，
14
    int newCapacity = oldCapacity + (oldCapacity >> 1);
15

16
    // 然后检查新容量是否大于最小需要容量，若还是小于最小需要容量，那么就把最小需要容量当作数组的新容量，
17
    if (newCapacity - minCapacity < 0)
18
        newCapacity = minCapacity;
19

20
    // 如果新容量大于 MAX_ARRAY_SIZE,进入(执行) `hugeCapacity()` 方法来比较 minCapacity 和 MAX_ARRAY_SIZE，
21
    // 如果minCapacity大于最大容量，则新容量则为`Integer.MAX_VALUE`，否则，新容量大小则为 MAX_ARRAY_SIZE 即为 `Integer.MAX_VALUE - 8`。
22
    if (newCapacity - MAX_ARRAY_SIZE > 0)
23
        newCapacity = hugeCapacity(minCapacity);
24

25
    // minCapacity is usually close to size, so this is a win:
26
    elementData = Arrays.copyOf(elementData, newCapacity);
27
}

int newCapacity = oldCapacity + (oldCapacity >> 1), so ArrayList expands to about 1.5x capacity each time (even oldCapacity yields exactly 1.5x, odd yields around 1.5x).

Now with examples:

When adding the first element, oldCapacity is 0; after comparison, first if is true, newCapacity = minCapacity (10). But the second if does not trigger; capacity becomes 10; size becomes 1.
When adding the 11th element, newCapacity is 15, which is greater than minCapacity (11); first if not triggered. New capacity is not greater than the max size; thus hugeCapacity is not invoked. Capacity becomes 15; size becomes 11.

A few important notes, easy to miss:

In Java, the length attribute is for arrays; for a declared array, to know its length you use length.
In Java, the length() method is for strings; to know the length of a string, use length().
In Java, the size() method is for generic collections; to see how many elements a collection has, call size()!

hugeCapacity() method

From grow() we know: if the new capacity exceeds MAX_ARRAY_SIZE, hugeCapacity() compares minCapacity and MAX_ARRAY_SIZE; if minCapacity is greater than the maximum, new capacity becomes Integer.MAX_VALUE; otherwise, new capacity becomes MAX_ARRAY_SIZE (Integer.MAX_VALUE - 8).

1
private static int hugeCapacity(int minCapacity) {
2
    if (minCapacity < 0) // overflow
3
        throw new OutOfMemoryError();
4
    // 对minCapacity和MAX_ARRAY_SIZE进行比较
5
    // 若minCapacity大，将Integer.MAX_VALUE作为新数组的大小
6
    // 若MAX_ARRAY_SIZE大，将MAX_ARRAY_SIZE作为新数组的大小
7
    // MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
8
    return (minCapacity > MAX_ARRAY_SIZE) ?
9
        Integer.MAX_VALUE :
10
        MAX_ARRAY_SIZE;
11
}

Set#

Differences between Comparable and Comparator#

Comparable and Comparator are both sorting interfaces in Java; they play important roles in comparing and sorting objects:

Comparable interface comes from java.lang and has a compareTo(Object obj) method used for sorting.
Comparator interface comes from java.util and has a compare(Object obj1, Object obj2) method used for sorting.

Typically, when you want to sort a collection with a custom order, you override compareTo() or compare(). If you need two different sorts for a collection—such as sorting by a song’s title and by the artist’s name—then you can override compareTo() and/or use custom Comparator methods, or use two Comparators to achieve title-based and artist-based sorting; the latter implies using the two-argument version of Collections.sort().

Custom sorting with Comparator#

1
ArrayList<Integer> arrayList = new ArrayList<Integer>();
2
arrayList.add(-1);
3
arrayList.add(3);
4
arrayList.add(3);
5
arrayList.add(-5);
6
arrayList.add(7);
7
arrayList.add(4);
8
arrayList.add(-9);
9
arrayList.add(-7);
10
System.out.println("Original array:");
11
System.out.println(arrayList);
12
// void reverse(List list)：reverse
13
Collections.reverse(arrayList);
14
System.out.println("Collections.reverse(arrayList):");
15
System.out.println(arrayList);
16

17
// void sort(List list), sort in natural ascending order
18
Collections.sort(arrayList);
19
System.out.println("Collections.sort(arrayList):");
20
System.out.println(arrayList);
21
// Custom sorting usage
22
Collections.sort(arrayList, new Comparator<Integer>() {
23
    @Override
24
    public int compare(Integer o1, Integer o2) {
25
        return o2.compareTo(o1);
26
    }
27
});
28
System.out.println("After custom sort:");
29
System.out.println(arrayList);

What do unorderedness and non-duplication mean?#

Unorderedness is not the same as randomness; unordered means the data in the underlying array is not added in the order of the array indices but is determined by the hash value of the data.
Non-duplication means that when adding elements, equals() determines whether they are duplicates; you should override both equals() and hashCode() to ensure proper behavior.

Compare HashSet, LinkedHashSet, and TreeSet#

HashSet, LinkedHashSet, and TreeSet are all implementations of the Set interface; they guarantee element uniqueness and are not thread-safe.
The main differences lie in their underlying data structures. HashSet uses a hash table (based on HashMap). LinkedHashSet uses a combination of a linked list and a hash table, with insertion and retrieval order following FIFO. TreeSet uses a red-black tree; elements are ordered, with either natural ordering or custom ordering.
The different data structures lead to different usage scenarios: HashSet for when you don’t need to preserve insertion/removal order, LinkedHashSet for maintaining insertion/removal order, and TreeSet when you need custom element ordering rules.

Queue#

Differences between Queue and Deque#

Queue is a single-ended queue that can only insert at one end and remove from the other, generally following First-In-First-Out (FIFO).

Queue extends Collection; due to behavior on failure from capacity issues, it can be categorized into two types of methods: one that throws an exception on failure, and another that returns a special value.

Queue interface	Throws exception	Returns special value
Insert at tail	add(E e)	offer(E e)
Delete head	remove()	poll()
Peek head	element()	peek()

Deque is a double-ended queue; elements can be inserted or removed from both ends.

Deque extends the Queue interface, adding methods to operate at both the head and the tail, and these too are categorized by failure handling into two types:

Deque interface	Throws exception	Returns special value
Insert at head	addFirst(E e)	offerFirst(E e)
Insert at tail	addLast(E e)	offerLast(E e)
Delete head	removeFirst()	pollFirst()
Delete tail	removeLast()	pollLast()
Peek head	getFirst()	peekFirst()
Peek tail	getLast()	peekLast()

In fact, Deque also provides push() and pop() and can be used to simulate a stack.

Differences between ArrayDeque and LinkedList#

ArrayDeque and LinkedList both implement the Deque interface and both can behave as a queue, but they differ:

ArrayDeque is implemented using a resizable array with a pair of pointers; LinkedList uses a linked list.
ArrayDeque does not support storing null values, but LinkedList does.
ArrayDeque was introduced in JDK 1.6, while LinkedList has existed since JDK 1.2.
Insertion in ArrayDeque may incur resizing, but the amortized insertion time remains O(1). Although LinkedList does not require resizing, each insertion allocates new heap space, so its amortized performance is typically slower.

From a performance standpoint, using ArrayDeque for queues is better than LinkedList. Additionally, ArrayDeque can also be used to implement a stack.

Talk about PriorityQueue#

PriorityQueue was introduced in JDK 1.5; its difference from Queue is that its dequeue order depends on priority, i.e., the highest-priority element is dequeued first.

PriorityQueue is implemented using a binary heap data structure, backed by a resizable array.
PriorityQueue uses heap-up and heap-down operations to achieve O(log n) time for inserting elements and removing the top element.
PriorityQueue is not thread-safe, and it does not support storing null or non-comparable objects.
PriorityQueue defaults to a min-heap, but you can pass a Comparator in the constructor to customize the priority order.

PriorityQueue is often encountered in interviews in problems such as heap sort, finding the K-th largest number, and graph traversals with weights; thus, you should be proficient in using it.

What is BlockingQueue?#

BlockingQueue (blocking queue) is an interface that extends Queue. It blocks when the queue is empty (waiting for elements) or when the queue is full (waiting for space), depending on the operation.

1
public interface BlockingQueue<E> extends Queue<E> {
2
  // ...
3
}

BlockingQueue is commonly used in producer-consumer patterns, where producers add data to the queue and consumers take data from the queue for processing.

What are the implementations of BlockingQueue?#

Common blocking queue implementations in Java include:

ArrayBlockingQueue: a bounded blocking queue backed by an array. Capacity must be specified at creation and it supports fair and non-fair locking modes.
LinkedBlockingQueue: a optionally bounded blocking queue backed by a singly linked list. Capacity can be specified at creation; if not, it defaults to Integer.MAX_VALUE. Unlike ArrayBlockingQueue, it only supports non-fair locking.
PriorityBlockingQueue: an unbounded blocking queue that orders by priority. Elements must implement Comparable or a Comparator can be provided; null elements are not allowed.
SynchronousQueue: a queue that does not store elements. Each insert must wait for a corresponding removal, and vice versa; typically used for direct handoffs between threads.
DelayQueue: a delayed queue where elements can only be taken after their specified delay.

Differences between ArrayBlockingQueue and LinkedBlockingQueue#

ArrayBlockingQueue and LinkedBlockingQueue are common blocking queue implementations in Java’s concurrency package; both are thread-safe.

Underlying implementation: ArrayBlockingQueue is based on an array; LinkedBlockingQueue is based on a linked list.
Boundedness: ArrayBlockingQueue is bounded and requires a capacity at creation. LinkedBlockingQueue can be created with or without a capacity bound; by default it is unbounded (Integer.MAX_VALUE), but you can specify a bound to make it bounded.
Lock separation: ArrayBlockingQueue uses a single lock for producers and consumers; LinkedBlockingQueue uses separate locks for put and take, which reduces lock contention between producers and consumers.
Memory usage: ArrayBlockingQueue allocates a fixed array upfront; LinkedBlockingQueue dynamically allocates linked-list nodes as elements are added. This means ArrayBlockingQueue uses a fixed amount of memory at creation, and may allocate more memory than is actually used, whereas LinkedBlockingQueue grows with the number of elements.

Map (Important)#

Differences between HashMap and Hashtable#

Thread safety: HashMap is non-thread-safe; Hashtable is thread-safe because most of its methods are synchronized. (If you need thread-safety, use ConcurrentHashMap.)
Efficiency: Because of synchronization overhead, Hashtable is less efficient than HashMap; Hashtable has largely fallen out of use and should generally be avoided.
Support for null keys and values: HashMap can store null keys and values, but a null key is allowed only once; null values can be multiple. Hashtable does not allow null keys or values (NullPointerException if attempted).
Initial capacity and rehash behavior: If no initial capacity is specified, Hashtable defaults to an initial size of 11 and grows to 2n+1 on expansion. HashMap defaults to an initial size of 16 and grows to double on expansion. If an initial capacity is provided, Hashtable uses that size directly, while HashMap expands to the next power of two.
Underlying data structure: In JDK 8 and later, HashMap handles hash collisions with significant improvements: when the chain length exceeds a threshold (default 8), the chain is converted to a red-black tree to reduce search time (if the array length is less than 64, it grows first rather than converting to a tree). Hashtable does not have this mechanism.

HashMap and HashSet differences#

If you’ve looked at the HashSet source, you’ll know: HashSet is implemented on top of HashMap. (HashSet’s source is very small because, aside from clone(), writeObject(), and readObject(), every other method delegates to HashMap.)

HashMap	HashSet
Implements Map interface	Implements Set interface
Stores key-value pairs	Stores only objects
Uses put() to add elements to the map	Uses add() to add elements to the Set
HashMap uses the key (Key) to compute hashCode	HashSet uses the member object to compute hashCode; two objects may have the same hashCode, so equals() determines equality

HashMap and TreeMap differences#

TreeMap and HashMap both extend AbstractMap, but TreeMap also implements NavigableMap and SortedMap.

Implementing NavigableMap gives TreeMap the ability to search within the map.
Implementing SortedMap gives TreeMap the ability to sort elements by key. By default, it sorts by key in ascending order, but you can specify a comparator.

In short, compared to HashMap, TreeMap mainly adds the ability to sort elements by keys and to search within the map.

How does HashSet check for duplicates?#

When you add an object to a HashSet, the set first computes the object’s hashCode to determine the insertion location and compares hashCodes with other elements. If there is no matching hashCode, the set assumes there is no duplicate. If there are objects with matching hashCodes, equals() is used to determine if they are truly the same. If they are the same, the addition fails.

In JDK 8, HashSet’s add() simply calls HashMap’s put() and checks the return value to determine if a duplicate was present. See HashSet’s source:

1
// Returns: true if this set did not already contain the specified element
2
// 返回值：当 set 中没有包含 add 的元素时返回真
3
public boolean add(E e) {
4
        return map.put(e, PRESENT)==null;
5
}

That is, in JDK 8, HashSet will insert the element regardless of whether it already exists; the add() return value simply indicates whether the element was already present before insertion.

HashMap’s underlying implementation#

Before JDK 8#

Before JDK 8, HashMap’s underlying structure was a combination of an array and linked lists — chaining. HashMap computes a hash from the key’s hashCode, uses a perturbation function to get a hash value, then uses (n - 1) & hash to determine the position (n is the array length). If there is an element at that position, it compares the stored key and hash with the new key’s, and if matching, it overwrites; otherwise, it uses chaining to resolve collisions.

The perturbation function is HashMap’s hash method, designed to reduce collisions from poorly implemented hashCode() methods.

1
static int hash(int h) {
2
    // This function ensures that hashCodes that differ only by
3
    // constant multiples at each bit position have a bounded
4
    // number of collisions (approximately 8 at default load factor).
5
    h ^= (h >>> 20) ^ (h >>> 12);
6
    return h ^ (h >>> 7) ^ (h >>> 4);
7
}

HashMap in JDK 8#

HashMap’s hash method in JDK 8 is simplified compared to JDK 7, but the principle remains:

1
static final int hash(Object key) {
2
      int h;
3
      // key.hashCode(): returns the hash value (hashCode)
4
      // ^: bitwise XOR
5
      // >>>: unsigned right shift, zero-extend
6
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
7
  }

Compared with the JDK 8 hash method, the JDK 7 hash method is a bit slower due to more perturbations.

“Chaining” means: combining a linked list with an array. You create an array of lists; each slot in the array is a list. When collisions occur, you add the colliding value to the list.

After JDK 8#

HashMap’s collision handling changed significantly: when the linked list length exceeds the threshold (default 8) and the array length is at least 64, the list is converted into a red-black tree to reduce search time (before converting, if the array length is less than 64, it expands first rather than converting).

TreeMap, TreeSet, and HashMap implementations after JDK 8 all use red-black trees in some form. A red-black tree resolves the drawbacks of plain binary search trees, which can degenerate into a linear structure.

We’ll analyze the HashMap’s conversion from linked list to red-black tree with code:

The putVal method determines when to convert a list to a red-black tree. If the linked list length exceeds 8, it triggers treeifyBin (convert to a red-black tree).
In treeifyBin, it checks whether the conversion is necessary. If the array length is less than 64, it will resize rather than convert to a red-black tree.

Why is HashMap’s length a power of two?#

To maximize lookup efficiency and minimize collisions, HashMap uses hashing. The resulting index is computed as (n - 1) & hash, where n is the array length. This works best when n is a power of two, which is why the array length is kept as a power of two.

This design leverages bitwise operations to speed up index calculation compared to modulo operations.

HashMap multithreaded operations causing infinite loops#

In versions of HashMap before JDK 8, resizing in a multithreaded environment could cause a dead loop. This happened when multiple threads resized a bucket and used head insertion, causing the list to loop.

To fix this, JDK 8 uses tail insertion to avoid listing inversions, ensuring inserted nodes go to the end of the list, preventing circular structures. Nevertheless, HashMap remains unsafe under concurrent access; use ConcurrentHashMap in multi-threaded contexts.

Why is HashMap not thread-safe?#

In JDK 7 and earlier, concurrent resizing of HashMap could lead to dead loops and data loss.

Data loss can occur in HashMap 1.8 as well. In 1.8, several key-value pairs may fall into the same bucket and be stored as a list or a tree; concurrent put operations can cause data races and overwrites.

Example:

Two threads attempt to put at the same bucket with a hash collision.
Depending on thread scheduling, one thread may be paused after determining a collision, while the other thread inserts.
When the first thread resumes, it inserts again, potentially overwriting the other’s data.

There is also a risk that size increments may be inconsistent when multiple threads perform puts simultaneously.

If you truly need null in a ConcurrentHashMap, you can use a special static empty object as a stand-in for null:

1
public static final Object NULL = new Object();

Can ConcurrentHashMap guarantee atomicity of composite operations?#

ConcurrentHashMap is thread-safe, meaning it guarantees consistency for concurrent reads and writes and avoids the dead-loop issues seen in older HashMaps. However, it does not guarantee atomicity for all composite operations.

Composite operations are those formed by multiple basic operations (put, get, remove, containsKey, etc.), for example: containsKey(key) followed by put(key, value). Such sequences can be interrupted by other threads, leading to unexpected results.

To guarantee atomicity for composite operations, ConcurrentHashMap provides several atomic operations like putIfAbsent, compute, computeIfAbsent, computeIfPresent, merge, etc. These methods accept a function to compute a new value and update the map accordingly. While these can be implemented with locking, it is not recommended to lock manually; instead, use these atomic operations to ensure atomicity.

Collections Utility Class (not essential)#

Collections utility class common methods:

Sorting
Searching and replacing operations
Synchronization control (not recommended; for thread-safe collections, consider using the concurrent collections in the java.util.concurrent package)

Sorting operations#

1
void reverse(List list)// reverse
2
void shuffle(List list)// shuffle
3
void sort(List list)// sort by natural order (ascending)
4
void sort(List list, Comparator c)// custom sort controlled by Comparator
5
void swap(List list, int i , int j)// swap elements at two indices
6
void rotate(List list, int distance)// rotate. If distance is positive, move the last distance elements to the front; if negative, move the first -distance elements to the back

Searching and replacing operations#

1
int binarySearch(List list, Object key)// binary search on List; list must be sorted
2
int max(Collection coll)// return the maximum element by natural order. Comparable to int min(Collection coll)
3
int max(Collection coll, Comparator c)// return the maximum element by a custom order; comparator controls the rule. Comparable to int min(Collection coll, Comparator c)
4
void fill(List list, Object obj)// replace all elements in the list with the specified object
5
int frequency(Collection c, Object o)// count occurrences of an element
6
int indexOfSubList(List list, List target)// find the first index of target in list; -1 if not found
7
boolean replaceAll(List list, Object oldVal, Object newVal)// replace all occurrences of oldVal with newVal

Synchronization control#

Collections provides several synchronizedXxx() methods to wrap a given collection as a thread-safe collection.

We know HashSet, TreeSet, ArrayList, LinkedList, HashMap, TreeMap are not thread-safe. Collections provides several static methods to wrap them into thread-synchronized collections.

Prefer to avoid using these methods in performance-critical contexts. For thread-safe collections, consider using the concurrent collections in the JUC package.

1
synchronizedCollection(Collection<T>  c) // returns a thread-safe synchronized collection backed by the specified collection
2
synchronizedList(List<T> list)// returns a synchronized List backed by the specified list
3
synchronizedMap(Map<K,V> m) // returns a synchronized Map backed by the specified map
4
synchronizedSet(Set<T> s) // returns a synchronized Set backed by the specified set

12009 字

32 分钟

Javaコレクション概要

2024-01-26

cs-base

java

meeting

doc

コレクションの概要#

Java コレクションの概要#

Java のコレクション、つまりコンテナは、主に二つのインターフェースから派生します：一つは Collection インターフェース、主に単一の要素の格納に使用されます；もう一つは Map インターフェース、主にキーと値のペアの格納に使用されます。Collection インターフェースには、以下の三つの主要なサブインターフェースがあります：List、Set、Queue。

List, Set, Queue, Map の四者の違いについて#

List（順序の扱いの頼れる味方）：格納される要素は有序で、重複可能である。
Set（独自性を重視）：格納される要素は重複不可。
Queue（待ち行列機能を実現するキュー）：特定の待機規則に従って先後順を決定し、格納される要素は有序で、重複してよい。
Map（キーを用いて検索の専門家）：キーと値のペア（key-value）を格納します。数学の関数 y=f(x) のように、“x” が key、“y” が value を表します。key は無序で重複不可、value は無序で重複可。各キーは最大で一つの値にマッピングされます。

コレクションフレームワークの基盤データ構造の要約#

List#

ArrayList：Object[] 配列。
Vector：Object[] 配列。
LinkedList：双方向リスト（JDK1.6 以前は循環リンクリスト、JDK1.7 で循環がなくなった）。

Set#

HashSet（無序、唯一性）: HashMap を基盤として実装され、底層は HashMap を使用して要素を保存します。
LinkedHashSet: LinkedHashSet は HashSet のサブクラスで、内部は LinkedHashMap によって実装されています。
TreeSet（有序、唯一性）：赤黒木（自己平衡のソート済み二分木）。

Queue#

PriorityQueue: Object[] 配列を用いて小頂点ヒープを実現。
DelayQueue: PriorityQueue。
ArrayDeque: 拡張可能な動的双方向配列。

Map#

HashMap：JDK1.8 以前は HashMap は配列+リストで構成されており、配列が主体、リストはハッシュ衝突を解決するための拉链法。JDK1.8 以降、衝突解決には大きな変化があり、リンクリストの長さが閾値（デフォルトは8）を超えると赤黒木へ変換して検索時間を短縮します。なお、現在の配列の長さが64未満のときは先に配列拡張を選択してから赤黒木へ変換します。
LinkedHashMap：LinkedHashMap は HashMap を継承しており、底層は引き続き拉链式ハッシュ構造（配列とリンクリストまたは赤黒木）です。さらに、上記構造の上に双方向リストを追加して、挿入順序と/またはアクセス順序を保持します。
Hashtable：配列+リンクリストで構成。配列が主体、リンクリストはハッシュ衝突を解決するために存在します。
TreeMap：赤黒木（自己平衡のソート済み二分木）。

どうやって集合を選ぶべきか？#

私たちは主に集合の特性に基づいて適切な集合を選びます。

キー値から要素値を取得する必要がある場合は Map インターフェース下の集合を選択します。並べ替えが必要なら TreeMap、不要なら HashMap、スレッドセーフを保証する場合は ConcurrentHashMap を選択します。
要素値のみを格納する場合は Collection インターフェースを実装したコレクションを選択します。要素の一意性を保証する必要がある場合は Set を実装したコレクション（例：TreeSet または HashSet）を選択します。不要なら List を実装した ArrayList や LinkedList を選択し、実装クラスの特性に基づいて選択します。

なぜ集合を使うのか？#

データ型が同じ一群のデータを格納する必要があるとき、配列は最も一般的で基本的なコンテナの一つですが、実際には配列でオブジェクトを格納する際にいくつかの欠点があります。実際の開発では、格納データの型は多様で量も不確定です。このとき Java のコレクションが活躍します。配列と比較して、Java のコレクションは複数のデータオブジェクトを格納するためのより柔軟で効率的な方法を提供します。Java のコレクションフレームワークのさまざまな集合クラスとインターフェースは、異なる型と数のオブジェクトを格納でき、多様な操作方法を備えています。配列と比較して、サイズが可変で、ジェネリックスをサポートし、組み込みアルゴリズムを備える点がコレクションの利点です。要約すると、Java のコレクションはデータの格納と処理の柔軟性を高め、現代のソフトウェア開発における多様なデータ需要に適応し、高品質なコードの作成をサポートします。

List#

ArrayList と Array（配列）の違い？#

ArrayList は内部的に動的配列を基盤として実装され、Array（静的配列）よりも柔軟に使用できます。

ArrayList は実データに応じて動的に容量を拡張または縮小しますが、Array は作成後に長さを変更できません。
ArrayList はジェネリクスを使って型安全を確保できますが、Array はできません。
ArrayList はオブジェクトのみを格納します。基本型を扱う場合は対応するラッパークラス（例: Integer、Double など）を使用します。Array は基本型データを直接格納することも、オブジェクトを格納することもできます。
ArrayList は挿入・削除・走査などの通常の操作をサポートし、add()、remove() などの豊富な API 操作を提供します。Array は固定長の配列で、下位のインデックスで要素を参照するだけで、動的な追加・削除はできません。
ArrayList の作成時にはサイズを指定する必要はありませんが、Array は作成時にサイズを指定する必要があります。

ArrayList と Vector の違い？#

ArrayList は List の主要実装クラスで、内部は Object[] を使用して格納します。頻繁な検索作業に適しており、スレッドセーフではありません。
Vector は List の古い実装クラスで、内部は Object[] を使用して格納します。スレッドセーフです。

Vector と Stack の違い？#

Vector と Stack はどちらもスレッドセーフで、同期処理には synchronized を用います。
Stack は Vector を継承しており、後入先出のスタックで、Vector はリストです。

Java の並行プログラミングの発展に伴い、Vector と Stack は廃止されつつあり、並行コレクション（例：ConcurrentHashMap、CopyOnWriteArrayList など）を使用するか、手動でスレッドセーフな操作を実装して安全なマルチスレッド操作を提供することが推奨されます。

ArrayList に null 値を追加できますか？#

ArrayList には任意の型のオブジェクトを含めることができます。null 値も含まれます。ただし、ArrayList に null を追加することは推奨されません。null 値は意味が薄く、コードの保守性を低下させ、例えば null チェックを忘れると NullPointerException を引き起こす可能性があります。

1
ArrayList<String> listOfStrings = new ArrayList<>();
2
listOfStrings.add(null);
3
listOfStrings.add("java");
4
System.out.println(listOfStrings);

ArrayList の挿入と削除の時間計算量は？#

挿入
- 先頭への挿入：すべての要素を後ろへ1つずつ移動する必要があるため、時間計算量は O(n)。
- 末尾への挿入：ArrayList の容量が限界に達していなければ、末尾へ追加する時間計算量は O(1)。容量が限界で拡張が必要なら、1回 O(n) の操作で元の配列を新しい大きな配列へコピーしてから要素を追加します。
- 指定位置への挿入：対象位置より後ろのすべての要素を後方へ1つ移動してから新しい要素を指定位置に置くため、平均で n/2 個の要素を移動する必要があり、O(n)。
削除
- 先頭削除：すべての要素を前方へ1つ移動する必要があるため、O(n)。
- 末尾削除：末尾の要素を削除する場合、O(1)。
- 指定位置削除：対象要素以降の要素を前方へ移動して空白を埋めるため、平均で n/2 個の要素を移動し、O(n)。

LinkedList の挿入と削除の時間計算量は？#

先頭への挿入/削除：ヘッドのポインターを変更するだけで済むため、O(1)。
末尾への挿入/削除：テールのポインターを変更するだけで済むため、O(1)。
指定位置への挿入/削除：指定位置まで移動してからポインターを変更する必要があるため、平均で n/2 個の要素を移動し、O(n)。

ここでは簡単な例を挙げます。ノード 9 を削除する場合、まずリストを走査して該当ノードを見つけます。次に、対応するノードのポインターの変更を実行します。

LinkedList は RandomAccess インターフェースを実装できないのはなぜ？#

RandomAccess はマーク付きインターフェースで、これを実装したクラスはランダムアクセスをサポートすることを示します。LinkedList の内部データ構造はリストで、メモリアドレスは連続していないため、インデックスで素早く位置を特定してアクセスすることができず、ランダム高速アクセスをサポートしていません。そのため RandomAccess を実装できません。

ArrayList と LinkedList の違いは？#

スレッドセーフ性の保証： ArrayList と LinkedList はどちらも同期されておらず、スレッドセーフではありません。
内部データ構造： ArrayList は内部で Object[] を使用します。LinkedList は内部で双方向リストを使用します（JDK1.6 以前は循環リンクリスト、JDK1.7 で循環がなくなりました。なお、双方向リストと双方向循環リストの違いに注意）。
挿入・削除が要素位置の影響を受けるか：
- ArrayList は配列で格納するため、挿入・削除の時間計算量は要素の位置に影響されます。例えば add(E e) は末尾への追加のケースでは O(1) ですが、指定位置 i へ挿入・削除すると O(n) になります。なぜなら、位置 i 以降の(n-i)個の要素を前後へ移動する必要があるからです。
- LinkedList はリスト構造のため、先頭・末尾への挿入・削除は位置に関係なく O(1) です（add(E e)、addFirst(E e)、addLast(E e)、removeFirst()、removeLast()）。ただし、指定位置への挿入・削除（add(int index, E element)、remove(Object o)、remove(int index)）は O(n) です。なぜなら、指定位置へ移動する必要があるからです。
高速なランダムアクセスのサポート： LinkedList は高速なランダム要素アクセスをサポートせず、ArrayList は RandomAccess を実装してサポートします。高速なランダムアクセスとは、要素の番号で要素を素早く取得すること（get(int index) に対応）。
メモリ空間の占有： ArrayList の空間の無駄は、リストの末尾にある程度の容量を先読みして確保する点に現れ、LinkedList の空間コストは、各要素が次の直接的な後続と前続、およびデータを格納するため、一般に ArrayList より多くの空間を必要とします。

私たちのプロジェクトでは、通常 LinkedList を使うことはなく、LinkedList が必要とされる場面はほとんど ArrayList に置き換えることが可能で、通常はパフォーマンスが良いです。

また、LinkedList を「リンクリストだから挿入・削除に最適」と安易に考えないでください。上でも述べたように、LinkedList はヘッダ/テールへの挿入・削除でほぼ O(1) ですが、その他のケースでは挿入・削除の平均時間計算量は O(n) となります。

補足内容: 双方向リンクリストと双方向循環リンクリスト

双方向リンクリスト：2つのポインタを含み、1つは前のノードを指す prev、もう1つは次のノードを指す next。

双方向循環リンクリスト：最後のノードの next が head を指し、head の prev が最後のノードを指す、連結した環を形成します。

補足内容: RandomAccess インターフェース
1
public interface RandomAccess {}
ソースコードを見てみると RandomAccess インターフェースには何も定義されていません。ですので、私の見解では RandomAccess インターフェースは単なるマーカーです。何をマークするのかというと、それを実装するクラスがランダムアクセス機能を持つことを示します。 binarySearch（) メソッドでは、引数 list が RandomAccess のインスタンスかどうかを判定します。もしそうなら indexedBinarySearch() メソッドを、そうでなければ iteratorBinarySearch() メソッドを呼び出します。
1
public static <T>
2
int binarySearch(List<? extends Comparable<? super T>> list, T key) {
3
    if (list instanceof RandomAccess || list.size()<BINARYSEARCH_THRESHOLD)
4
        return Collections.indexedBinarySearch(list, key);
5
    else
6
        return Collections.iteratorBinarySearch(list, key);
7
}
ArrayList は RandomAccess インターフェースを実装していますが、LinkedList は実装していません。なぜかというと、基盤データ構造が関係しているからです！ArrayList は内部で配列を使用し、LinkedList は内部でリストを使用します。配列は自然にランダムアクセスをサポートし、時間計算量は O(1) です。これが高速なランダムアクセスと呼ばれます。リンクリストは特定の位置の要素へアクセスするには走査が必要で、時間計算量は O(n) であり、速いランダムアクセスをサポートしません。ArrayList が RandomAccess インターフェースを実装していることは、彼が高速なランダムアクセス機能を持っていることを示します。 RandomAccess インターフェースは単なるマーカーであり、ArrayList が RandomAccess を実装しているからといって必ずしも高速なランダムアクセスが可能になるわけではありません！

ArrayList の拡張（拡張）機構について#

まず ArrayList のコンストラクタから#

ArrayList には初期化方法が3つあります。コンストラクタのソースコードは以下のとおりです（JDK8）：

1
/**
2
 * 默认初期容量サイズ
3
 */
4
private static final int DEFAULT_CAPACITY = 10;
5

6
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
7

8
/**
9
 * 默认のコンストラクタ、初期容量10で空リストを構築する（引数なしコンストラクタ）
10
 */
11
public ArrayList() {
12
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
13
}
14

15
/**
16
 * 初期容量を指定するコンストラクタ。（ユーザーが容量を指定）
17
 */
18
public ArrayList(int initialCapacity) {
19
    if (initialCapacity > 0) {//初期容量が0より大きい場合
20
        //initialCapacity サイズの配列を作成
21
        this.elementData = new Object[initialCapacity];
22
    } else if (initialCapacity == 0) {//初期容量が0の場合
23
        //空配列を作成
24
        this.elementData = EMPTY_ELEMENTDATA;
25
    } else {//初期容量が負の場合、例外を投げる
26
        throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
27
    }
28
}
29

30
/**
31
 *构造包含指定collection元素的列表，这些元素利用该集合的迭代器按顺序返回
32
 *如果指定的集合为null，throws NullPointerException。
33
 */
34
public ArrayList(Collection<? extends E> c) {
35
    elementData = c.toArray();
36
    if ((size = elementData.length) != 0) {
37
        // c.toArray might (incorrectly) not return Object[] (see 6260652)
38
        if (elementData.getClass() != Object[].class)
39
            elementData = Arrays.copyOf(elementData, size, Object[].class);
40
    } else {
41
        // replace with empty array.
42
        this.elementData = EMPTY_ELEMENTDATA;
43
    }
44
}

無引数コンストラクタで ArrayList を作成すると、実際には空の配列を初期化します。実際に配列へ要素を追加する操作が行われる時に初めて容量が割り当てられます。つまり、配列に最初の要素を追加すると容量が 10 へ拡張されます。

補足：JDK6 の new 無参構造の ArrayList オブジェクトは、長さ 10 の Object[] 配列 elementData を直接作成しました。

一歩ずつ分析 ArrayList 拡張機構#

無参構造関数で作成された ArrayList を例に、add メソッドを分析します。

1
/**
2
* 将指定された要素をこのリストの末尾に追加します。
3
*/
4
public boolean add(E e) {
5
    // 要素を追加する前に ensureCapacityInternal メソッドを呼び出す
6
    ensureCapacityInternal(size + 1);  // modCount をインクリメント!!
7
    // ここで配列へ要素を代入する処理になる
8
    elementData[size++] = e;
9
    return true;
10
}

注意：JDK11 では ensureCapacityInternal() と ensureExplicitCapacity() メソッドは削除されました

ensureCapacityInternal メソッドのソースは以下のとおりです：

1
// 最小容量 minCapacity を与えられた場合の必要容量を計算する
2
private static int calculateCapacity(Object[] elementData, int minCapacity) {
3
    // 現在の配列が空配列（初期状態）であればデフォルト容量と最小容量の大きい方を必要容量として返す
4
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
5
        return Math.max(DEFAULT_CAPACITY, minCapacity);
6
    }
7
    // それ以外は minCapacity を返す
8
    return minCapacity;
9
}
10

11
// 内部容量を指定の最小容量に達するように確保する
12
private void ensureCapacityInternal(int minCapacity) {
13
    ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
14
}
15

16
// 容量が足りるかどうかを判断する
17
private void ensureExplicitCapacity(int minCapacity) {
18
    modCount++;
19
    // 現在の配列容量が minCapacity より小さい場合、拡張を実行
20
    if (minCapacity - elementData.length > 0)
21
        // growメソッドを呼び出して拡張
22
        grow(minCapacity);
23
}

詳しく分析すると：

最初の 1 個目の要素を add する時、elementData.length は 0 です（空のリストのままのため）。この時 ensureCapacityInternal() が実行されるので、minCapacity は 10 になります。このとき minCapacity - elementData.length > 0 が成立し、grow(minCapacity) が呼び出されます。
2 番目の要素を add する時、minCapacity は 2 ですが、1 個目の要素を追加した後に容量が 10 に拡張されているため、minCapacity - elementData.length > 0 は成立せず、grow(minCapacity) は呼ばれません。
11 番目の要素を追加する時には minCapacity が 11 となり、elementData.length は 10 を超えます。grow メソッドを呼び出して拡張します。

grow メソッド

1
/**
2
 * 要割り当ての最大配列サイズ
3
 */
4
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
5

6
/**
7
 * ArrayList 拡張の核心メソッド。
8
 */
9
private void grow(int minCapacity) {
10
    // oldCapacity は旧容量、newCapacity は新容量
11
    int oldCapacity = elementData.length;
12
    // oldCapacity を右シフト1しており、oldCapacity / 2 に相当
13
    // ビット演算の方が除算より速いため、結果として新容量を旧容量の1.5倍に更新します
14
    int newCapacity = oldCapacity + (oldCapacity >> 1);
15

16
    // 新容量が最小必要容量より小さい場合、minCapacity を新容量とする
17
    if (newCapacity - minCapacity < 0)
18
        newCapacity = minCapacity;
19

20
    // 新容量が MAX_ARRAY_SIZE を超える場合、hugeCapacity() を呼び出して minCapacity と MAX_ARRAY_SIZE を比較
21
    if (newCapacity - MAX_ARRAY_SIZE > 0)
22
        newCapacity = hugeCapacity(minCapacity);
23

24
    // minCapacity はサイズに近いことが多いのでこの最適化は有効:
25
    elementData = Arrays.copyOf(elementData, newCapacity);
26
}

int newCapacity = oldCapacity + (oldCapacity >> 1) なので、ArrayList は容量を 1.5 倍ずつ拡張します（oldCapacity が偶数なら 1.5 倍、そうでなければ±1 近くで 1.5 倍程度）。

grow() メソッドの例を通して見ると：

要素が 1 個目のとき、oldCapacity は 0、最初の if が成立し minCapacity が 10 となる。second if は MAX_ARRAY_SIZE 未満のため成立せず、容量は 10、add は true を返し、size は 1 になります。
11 番目の要素を追加する時、newCapacity は 15 となり minCapacity（11）より大きいので first if は成立せず、MAX_ARRAY_SIZE 未満なので hugeCapacity は呼ばれず、容量は 15、size は 11 となります。

ここで重要な点を補足します：

Java の length プロパティは配列を指します。宣言した配列の長さを知りたい場合は length を使います。
Java の length() メソッドは文字列を指します。文字列の長さを知りたい場合は length() を使います。
Java の size() メソッドはジェネリックコレクションを指します。コレクションの要素数を知りたい場合は size() を呼び出します。

hugeCapacity() メソッド

1
private static int hugeCapacity(int minCapacity) {
2
    if (minCapacity < 0) // overflow
3
        throw new OutOfMemoryError();
4
    // minCapacity と MAX_ARRAY_SIZE を比較
5
    // minCapacity が最大を超えた場合は Integer.MAX_VALUE を新配列のサイズとする
6
    // MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
7
    return (minCapacity > MAX_ARRAY_SIZE) ?
8
        Integer.MAX_VALUE :
9
        MAX_ARRAY_SIZE;
10
}

Set#

Comparable と Comparator の違い#

Comparable インターフェースと Comparator インターフェースは、Java でのソートに使用されるインターフェースです。クラスのオブジェクト同士を比較して順序付けを行う点で重要です。

Comparable インターフェースは java.lang パッケージにあり、compareTo(Object obj) メソッドを持ってソートを行います。
Comparator インターフェースは java.util パッケージにあり、compare(Object obj1, Object obj2) メソッドを持ってソートを行います。

通常、コレクションに対して独自のソートを適用したい場合は compareTo() メソッドをオーバーライドするか、compare() メソッドを持つ Comparator を使用します。あるコレクションに対して2種類のソートを実現したい場合、例えば song オブジェクトの曲名とアーティスト名の両方で別のソートを行う場合、compareTo() のオーバーライドと自作の Comparator の利用、または2つの Comparator を用いて曲名順とアーティスト名順を実装します。後者は Collections.sort() の2引数版のみで実現できます。

Comparator によるカスタムソート#

1
ArrayList<Integer> arrayList = new ArrayList<Integer>();
2
arrayList.add(-1);
3
arrayList.add(3);
4
arrayList.add(3);
5
arrayList.add(-5);
6
arrayList.add(7);
7
arrayList.add(4);
8
arrayList.add(-9);
9
arrayList.add(-7);
10
System.out.println("原始配列:");
11
System.out.println(arrayList);
12
// void reverse(List list)：反転
13
Collections.reverse(arrayList);
14
System.out.println("Collections.reverse(arrayList):");
15
System.out.println(arrayList);
16

17
// void sort(List list),按自然排序的升序排序
18
Collections.sort(arrayList);
19
System.out.println("Collections.sort(arrayList):");
20
System.out.println(arrayList);
21
// 定制排序の使用
22
Collections.sort(arrayList, new Comparator<Integer>() {
23
    @Override
24
    public int compare(Integer o1, Integer o2) {
25
        return o2.compareTo(o1);
26
    }
27
});
28
System.out.println("定制排序後：");
29
System.out.println(arrayList);

無秩序性と重複不可性の意味は何か#

無秩序性はランダム性と等しくはない。無秩序性とは、底層配列にデータを追加する順序が配列インデックスの順序ではなく、データのハッシュ値に基づいて決定されることを指します。
重複不可性とは、追加される要素が equals() の判定で false を返す場合、equals() と hashCode() の両方を再定義する必要があることを指します。

HashSet、LinkedHashSet、TreeSet の違いと共通点#

HashSet、LinkedHashSet、TreeSet はすべて Set インターフェースの実装クラスで、要素の一意性を保証します。いずれもスレッドセーフではありません。
HashSet、LinkedHashSet、TreeSet の主な違いは、基盤データ構造が異なる点です。HashSet はハッシュテーブル（HashMap を基盤とする）を使用します。LinkedHashSet は LinkedHashMap を基盤とし、挿入・取得順序を保持します。TreeSet は赤黒木（自動平衡なソート済み二分木）を基盤とし、要素は有序です。自然順序かカスタムソートかで分類されます。
基盤データ構造が異なるため、用途も異なります。HashSet は挿入・取得の順序を保証しない場面、LinkedHashSet は挿入順の順序を保証する場面、TreeSet は要素をカスタムソートルールで並べたい場面で有効です。

Queue#

Queue と Deque の違い#

Queue は片側キューで、片側からのみ要素を挿入し、もう一方の端から削除します。実装上は一般的に先入れ先出し（FIFO）ルールに従います。

Queue は Collection のインターフェースを拡張します。容量の問題により操作が失敗した場合の処理方法の差異により、2つのカテゴリのメソッドに分けられます。1つは操作失敗時に例外を投げるもので、もう1つは特別な値を返します。

Queue インターフェース	例外を投げる	特殊値を返す
末尾への挿入	add(E e)	offer(E e)
先頭の削除	remove()	poll()
先頭要素の参照	element()	peek()

Deque は両端キューで、両端のいずれかで挿入・削除が可能です。

Deque は Queue のインターフェースを拡張し、先頭と末尾での挿入・削除のメソッドを追加します。失敗時の処理方法の違いにより、同様に2つのカテゴリに分かれます。

Deque インターフェース	例外を投げる	特殊値を返す
先頭への挿入	addFirst(E e)	offerFirst(E e)
末尾への挿入	addLast(E e)	offerLast(E e)
先頭の削除	removeFirst()	pollFirst()
末尾の削除	removeLast()	pollLast()
先頭要素の参照	getFirst()	peekFirst()
末尾要素の参照	getLast()	peekLast()

実際には Deque は push() や pop() などの他のメソッドも提供しており、スタックの模倣にも使用できます。

ArrayDeque と LinkedList の違い#

ArrayDeque と LinkedList はいずれも Deque インターフェースを実装しており、両方ともキュー機能を提供しますが、違いは何でしょうか？

ArrayDeque は可変長の配列とダブルポインタを用いて実装され、LinkedList はリストを用いて実装されます。
ArrayDeque は NULL データを格納できませんが、LinkedList は格納できます。
ArrayDeque は JDK1.6 で導入され、LinkedList は JDK1.2 から存在します。
ArrayDeque の挿入には拡張が生じる場合がありますが、アベレージの挿入は依然として O(1) です。LinkedList は拡張を必要としませんが、データを挿入するたびに新しいヒープ領域を確保する必要があるため、平均性能は低くなる傾向があります。

性能の観点から、キューの実装には ArrayDeque を選ぶ方が LinkedList より良いです。さらに ArrayDeque はスタックの実装にも利用できます。

PriorityQueue について#

PriorityQueue は JDK1.5 で導入され、Queue との違いは要素の出隊順が優先順位に関連している点です。常に最高優先度の要素が先に出隊します。

PriorityQueue は二分ヒープのデータ構造を利用して実装され、底層は可変長の配列を用いてデータを格納します。
ヒープ要素の「上昇」および「沈下」を通じて、要素の挿入とヒープトップの削除を O(log n) の時間計算量で実現します。
PriorityQueue は非スレッドセーフで、NULL および非比較可能（non-comparable）なオブジェクトの格納をサポートしていません。
PriorityQueue はデフォルトで小さな値を優先しますが、構築時に Comparator を渡すことにより、要素の優先順位の先後をカスタマイズできます。

PriorityQueue は面接などでアルゴリズムの練習時に頻出され、典型的な問題としてヒープソート、K番目の数の取得、重み付きグラフの走行などが挙げられます。そのため、使いこなすことが求められます。

BlockingQueue とは？#

BlockingQueue（ブロッキングキュー）は Queue を継承するインターフェースです。BlockingQueue がブロックされる理由は、キューに要素がない場合は要素が入るまでブロックを続け、またキューが満杯で新しい要素を投入できる状態になるまで待機する機能をサポートしている点です。

1
public interface BlockingQueue<E> extends Queue<E> {
2
  // ...
3
}

BlockingQueue は生産者-消費者モデルでよく用いられます。生産者スレッドはキューにデータを追加し、消費者スレッドはキューからデータを取り出して処理します。

BlockingQueue の実装クラスは何がある？#

Java でよく使われるブロッキングキューの実装クラスは以下のとおりです：

ArrayBlockingQueue：配列を用いた有界ブロッキングキュー。容量を作成時に指定する必要があり、公平性と非公平性の両方のロックアクセス機構をサポートします。
LinkedBlockingQueue：単方向リンクリストを用いた任意の有界ブロッキングキュー。容量を作成時に指定可能で、指定しなければ Integer.MAX_VALUE がデフォルトです。ArrayBlockingQueue と異なり、非公平なロックアクセス機構のみをサポートします。
PriorityBlockingQueue：優先順位付けされた無界ブロッキングキュー。要素は Comparable を実装するか、コンストラクタで Comparator を渡す必要があり、null 要素の挿入はできません。
SynchronousQueue：同期キューで、要素を格納することはありません。挿入操作は対応する削除操作を待機し、削除操作も挿入操作を待機します。したがって、SynchronousQueue は通常、スレッド間のデータの直接伝達に使用されます。
DelayQueue：遅延キュー。要素は指定された遅延時間が経過したときのみキューから取り出せます。

ArrayBlockingQueue と LinkedBlockingQueue の違いは？#

ArrayBlockingQueue と LinkedBlockingQueue は Java の並行パッケージでよく使われるブロッキングキュー実装で、いずれもスレッドセーフです。しかし、以下の違いもあります：

基盤実装：ArrayBlockingQueue は配列を基盤とします。LinkedBlockingQueue はリンクリストを基盤とします。
有界性：ArrayBlockingQueue は有界で、作成時に容量を指定する必要があります。LinkedBlockingQueue は作成時に容量を指定しなくても良く、デフォルトは Integer.MAX_VALUE、つまり無界です。ただし、容量を指定して有界にすることも可能です。
ロックの分離：ArrayBlockingQueue のロックは分離されていません。生産と消費は同じロックを使用します。LinkedBlockingQueue のロックは分離されており、生産は putLock、消費は takeLock を使用します。これにより生産者と消費者スレッド間のロック競合を防ぎます。
メモリ使用量：ArrayBlockingQueue は事前に配列メモリを割り当てる必要があり、LinkedBlockingQueue はノードメモリを動的に割り当てます。これにより、ArrayBlockingQueue は作成時に一定のメモリを消費しますが、通常実メモリよりも大きくなる傾向があり、LinkedBlockingQueue は要素の増加に応じてメモリを徐々に使用します。

Map（重要）#

HashMap と Hashtable の違い#

スレッドセーフ性： HashMap は非スレッドセーフ、Hashtable はスレッドセーフです。Hashtable の内部のほとんどのメソッドは synchronized 修飾されているためです。（スレッドセーフを保証したい場合は ConcurrentHashMap を使用してください）
効率：スレッドセーフの問題のため、HashMap の方が Hashtable より効率的です。Hashtable は基本的に廃止されつつあり、コードでの使用は避けてください。
Null のキーと値の扱い： HashMap は null のキーと値を格納できますが、キーとしての null は1個のみ、値としての null は複数格納可能です。Hashtable は null キーと null 値を許容せず、そうすると NullPointerException がスローされます。
初期容量と拡張規則の違い：
- 初期値を指定しない場合、Hashtable のデフォルト初期サイズは 11、拡張時は容量が元の 2n+1 になります。HashMap のデフォルト初期サイズは 16。以降、拡張時は容量が元の2倍になります。
- 初期容量を指定した場合、Hashtable はそのサイズを直接使用しますが、HashMap は 2 のべき乗のサイズへ拡張します。
底層データ構造：JDK1.8 以降の HashMap はハッシュ衝突解決に大きな変化があり、リンクリストの長さが閾値（デフォルトは8）を超えると赤黒木へ変換して検索時間を短縮します（ただし現在の配列長が64未満のときはまず配列拡張を選択してから変換します）。Hashtable にはこのような仕組みはありません。

HashMap と HashSet の違い#

HashSet の元となる実装を見たことがあるなら、HashSet の底層は HashMap を基盤としていることが分かるでしょう。（HashSet のソースは非常に少なく、clone()、writeObject()、readObject() 以外はほとんど HashMap のメソッドを直接呼び出します）。

HashMap	HashSet
Map インターフェースを実装	Set インターフェースを実装
キーと値のペアを格納	要素のみを格納
put() を呼び出して要素を追加	add() を呼び出してセットに要素を追加
HashMap はキー（Key）で hashcode を計算	HashSet はメンバーオブジェクトの hashcode 値を用いて計算、hashcode が同じ場合は equals() で同一性を判断

HashMap と TreeMap の違い#

TreeMap は HashMap に対して、キー順序による自動ソートという追加機能を提供します。

TreeMap と HashMap は AbstractMap を継承しますが、TreeMap は NavigableMap インターフェースと SortedMap インターフェースを実装しています。

NavigableMap インターフェースを実装することで、TreeMap は集合内の要素をキーで検索する能力を持ちます。
SortedMap インターフェースを実装することで、TreeMap は要素をキーの順序で並べ替える能力を持ちます。デフォルトはキーの昇順ですが、比較器を指定することもできます。

要約すると、HashMap に比べ TreeMap は、キーに基づくソート機能と集合内要素の検索機能を追加で持つ点が特徴です。

HashSet が重複を検出する方法は？#

オブジェクトを HashSet に追加すると、HashSet はまず对象の hashCode を計算して追加位置を決定します。さらに、他の追加済みオブジェクトの hashCode と比較します。hashCode が異なれば重褄はないと判断します。ただし、同じ hashCode 値を持つ要素が見つかった場合には、equals() を呼び出して hashCode が等しい要素が実際に同じかどうかを検査します。もし同じなら、追加操作は成功しません。

JDK1.8 では、HashSet の add() は単に HashMap の put() を呼び出すだけで、戻り値を見て重複があるかどうかを判断します。HashSet のソースを見てみましょう：

1
// Returns: true if this set did not already contain the specified element
2
// 返回值：set に指定要素が含まれていなければ true
3
public boolean add(E e) {
4
        return map.put(e, PRESENT)==null;
5
}

つまり、JDK1.8 では HashSet がすでに同一要素を含んでいるかどうかに関係なく、要素を直接追加します。ただし、add() の戻り値で挿入前に同一要素が存在したかを示します。

HashMap の底層実装#

JDK1.8 以前#

JDK1.8 以前は HashMap は配列とリストの組み合わせ、つまりライ Hash（チェーン法）です。HashMap はキーの hashCode を用いてハッシュ値を得た後、(n - 1) & hash の演算で現在の要素の格納場所を判断します（n は配列の長さ）。もし現在の場所に要素があれば、その要素のハッシュ値とキーが同一かを判定します。等しければ上書き、そうでなければチェーン法で衝突を解決します。

所谓扰动函数とは HashMap の hash メソッドを指します。ハッシュコードの衝突を抑制するためのものです。

1
static int hash(int h) {
2
    // This function ensures that hashCodes that differ only by
3
    // constant multiples at each bit position have a bounded
4
    // number of collisions (approximately 8 at default load factor).
5
    h ^= (h >>> 20) ^ (h >>> 12);
6
    return h ^ (h >>> 7) ^ (h >>> 4);
7
}

JDK1.8 HashMap#

JDK 1.8 の hash メソッドは JDK 1.7 のそれより簡略化されていますが、原理は変わっていません。

1
static final int hash(Object key) {
2
      int h;
3
      // key.hashCode()：ハッシュ値、すなわち hashcode
4
      // ^：ビットごとの排他的 OR
5
      // >>>: 符号なし右シフト、符号ビットを無視、空きは 0 で補完
6
      return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
7
  }

1.8 の hash メソッドは 1.7 のものより若干高速ですが、衝突の処理自体は同じ原理です。

「拉链法」ことは：リンクリストと配列を組み合わせた構造です。つまり、配列の各格子にリンクリストを格納するという形です。衝突が発生した場合には、衝突した値をリンクリストに追加します。

JDK1.8 以降#

以前のバージョンと比較すると、JDK1.8 以降ではハッシュ衝突の解決方法に大きな変化があり、リンクリストの長さが閾値（デフォルトは 8）を超えた場合、リンクリストを赤黒木へ変換して検索時間を短縮します（現在の配列長が 64 未満の場合は先に配列を拡張してから変換します）。

TreeMap、TreeSet および JDK1.8 以降の HashMap の底層は全て赤黒木を用います。赤黒木は二分探索木の欠陥を解決するための構造で、二分探索木が状況によって線形構造へ退化するのを回避します。

HashMap のリンクリストを赤黒木へ変換するプロセスをソースコードとともに見ていきます。

putVal メソッド内でリンクリストを赤黒木へ変換する判定ロジック。リンクリストの長さが 8 を超える場合、treeifyBin（赤黒木への変換）ロジックを実行します。

1
for (int binCount = 0; ; ++binCount) {
2
    // リストの最後のノードに到達
3
    if ((e = p.next) == null) {
4
        p.next = newNode(hash, key, value, null);
5
        // 要素数が TREEIFY_THRESHOLD (8) 以上
6
        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
7
            // 赤黒木への変換（直接赤黒木へは変換しない）
8
            treeifyBin(tab, hash);
9
        break;
10
    }
11
    if (e.hash == hash &&
12
        ((k = e.key) == key || (key != null && key.equals(k))))
13
        break;
14
    p = e;
15
}

treeifyBin メソッドで本当に赤黒木へ変換するか判断します。

1
final void treeifyBin(Node<K,V>[] tab, int hash) {
2
    int n, index; Node<K,V> e;
3
    // 現在の配列長が 64 未満ならまずは配列を拡張
4
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
5
        resize();
6
    else if ((e = tab[index = (n - 1) & hash]) != null) {
7
        TreeNode<K,V> hd = null, tl = null;
8
        do {
9
            TreeNode<K,V> p = replacementTreeNode(e, null);
10
            if (tl == null)
11
                hd = p;
12
            else {
13
                p.prev = tl;
14
                tl.next = p;
15
            }
16
            tl = p;
17
        } while ((e = e.next) != null);
18
        if ((tab[index] = hd) != null)
19
            hd.treeify(tab);
20
    }
21
}

リンクリストを赤黒木へ変換する前に、現在の配列長が 64 未満ならまず配列を拡張します。

HashMap の長さがなぜ 2 のべき乗なのか#

HashMap の目的はデータを高効率で取得することです。ハッシュ値の範囲は -2147483648 〜 2147483647 と大きい範囲ですが、実際にはこの 40 億程度の範囲をそのまま配列のインデックスとして使えるわけではありません。インデックスを求めるには「(n - 1) & hash」という演算を用います（n は配列長）。このように、HashMap の長さが 2 のべき乗になる理由は、ハッシュ値を配列のインデックスとして割り当てる際の計算を効率化するためです。

この設計をどう思いつくかというと、換言すると「余り（% 演算）」を用いるよりも、2 のべき乗長を使い「hash & (length - 1)」でインデックスを求める方が高速であるためです。これが HashMap の長さが 2 のべき乗になる理由です。

HashMap の多重スレッド操作によるデッドループの問題#

JDK1.7 以前の HashMap では、多スレッド環境での拡張時にデッドループが発生する可能性がありました。1つのバケツ（桶）に複数の要素を拡張する必要がある場合、複数スレッドが同時にリンクリストを操作すると、ヘッド挿入法が原因でノードが誤った位置を指してしまい、循環するリンクリストを形成して検索が無限ループに陥ることがありました。

この問題を解決するため、JDK1.8 ではヘッド挿入法を廃し、尾部挿入法を採用してリンクリストの倒立を回避し、挿入されるノードを常にリストの末尾に配置するようにしました。しかし、それでも並行性のある状況で HashMap の使用は推奨されず、データの上書きなどの問題が発生する可能性があります。並行環境では ConcurrentHashMap を使用することを推奨します。

HashMap がスレッドセーフでない理由は？#

JDK1.7 およびそれ以前のバージョンでは、多重スレッド環境での HashMap の拡張時にデッドループとデータ損失の問題が発生しました。

JDK1.8 以降、HashMap では複数のキー/値が同じバケツに割り当てられることがあります。複数のスレッドが put を実行することでデータが上書きされるリスクがあり、サイズの上昇にともなって局所的な上書きが起こる可能性があります。

以下のような例が挙げられます：

2つのスレッド 1,2 が同時に put 操作を実行し、ハッシュ衝突が発生する。
時間片の切替えにより、スレッド 1 がハッシュ衝突の判断後に中断され、スレッド 2 が挿入を完了する。
後にスレッド 1 が再開し、既に衝突が解決済みのため直ちに挿入を行い、スレッド 2 の挿入データを上書きしてしまう。

別のケースとして size が正しく増加しないことによるデータの上書きが起こることもあります。複数スレッドが同時に put を行う場合、size が 1 増えるだけで済むためです。

多くの場合、並行環境では ConcurrentHashMap の利用を推奨します。

ConcurrentHashMap は複合操作の原子性を保証しますか？#

ConcurrentHashMap はスレッドセーフであり、複数のスレッドが同時に読み取り/書き込みを行ってもデータの一貫性を保つよう設計されています。しかし、すべての複合操作が原子で保証されるわけではありません。混同しないでください！

複合操作とは、 put、get、remove、containsKey などの複数の基本操作から構成される操作のことです。例えば、あるキーが存在するかを first checking containsKey(key) し、その結果に基づいて put(key, value) する、といったケースです。このような操作は実行途中で他のスレッドに中断され得るため、期待通りには動作しません。

どうすれば ConcurrentHashMap の複合操作の原子性を確保できるかというと、putIfAbsent、compute、computeIfAbsent、computeIfPresent、merge などの原子的な複合操作を提供する手段を使用します。これらのメソッドは、キーと値を受け取り、新しい値を計算して map に更新します。

このようなケースではロックを使って同期を取ることも可能ですが、ConcurrentHashMap の設計趣旨に反するため推奨されません。可能な限り、これらの原子性の複合操作を使用して原子性を保証してください。

Collections ツール類（重要ではない）#

Collections ツールクラスのよく使われるメソッド:

ソート
検索・置換操作
同期制御（不要、スレッドセーフな集合が必要な場合は JUC パッケージの並行コレクションを検討してください）

ソート操作#

1
void reverse(List list) // 反転
2
void shuffle(List list) // シャッフル（ランダム）
3
void sort(List list) // 自然順序の昇順でソート
4
void sort(List list, Comparator c) // カスタムソート、ソートロジックは Comparator によって決定
5
void swap(List list, int i , int j) // 2つのインデックスの要素を交換
6
void rotate(List list, int distance) // 回転。distance が正なら list の後ろの distance 個の要素を前に移動、負なら前の distance 個を後ろに移動

検索・置換操作#

1
int binarySearch(List list, Object key) // List を二分探索、List は有序でなければならない
2
int max(Collection coll) // 自然順で最大要素を返す
3
int max(Collection coll, Comparator c) // カスタムソートで最大要素を返す
4
void fill(List list, Object obj) // 指定した要素でリスト内の全要素を置換
5
int frequency(Collection c, Object o) // 出現回数を数える
6
int indexOfSubList(List list, List target) // target が list 内で最初に出現するインデックスを返す。見つからなければ -1
7
boolean replaceAll(List list, Object oldVal, Object newVal) // 旧要素を新要素で置換

同期制御#

Collections は複数の synchronizedXxx() メソッドを提供します。これにより、指定したコレクションをスレッド同期化されたコレクションとしてラップし、複数スレッド間の同時アクセス時にスレッドセーフ問題を解決します。

我々は HashSet、TreeSet、ArrayList、LinkedList、HashMap、TreeMap がすべてスレッドセーフでないことを知っています。Collections はそれらをスレッド同期化されたコレクションとしてラップする静的メソッドを複数提供します。

以下の方法は効率が非常に低く、推奨されません。スレッドセーフなコレクションが必要な場合は、JUC パッケージの並行コレクションを検討してください。

1
synchronizedCollection(Collection<T>  c) // 指定の collection をスレッドセーフな Collection にラップして返す
2
synchronizedList(List<T> list) // 指定の List をスレッドセーフな List にラップして返す
3
synchronizedMap(Map<K,V> m) // 指定の Map をスレッドセーフな Map にラップして返す
4
synchronizedSet(Set<T> s) // 指定の Set をスレッドセーフな Set にラップして返す

java集合知识

https://dreaife.tokyo/posts/java-collections-overview/

作者

dreaife

发布于

2024-01-26

许可协议

CC BY-NC-SA 4.0

部分信息可能已经过时

java并发编程

java反射&代理面试知识

dreaife的休憩小栈

集合概述#

Java 集合概览#

说说 List, Set, Queue, Map 四者的区别？#

集合框架底层数据结构总结#

List#

Set#

Queue#

Map#

如何选用集合?#

为什么要使用集合？#

List#

ArrayList 和 Array（数组）的区别？#

ArrayList 和 Vector 的区别?#

Vector 和 Stack 的区别?#

ArrayList 可以添加 null 值吗？#

ArrayList 插入和删除元素的时间复杂度？#

LinkedList 插入和删除元素的时间复杂度？#

LinkedList 为什么不能实现 RandomAccess 接口？#

ArrayList 与 LinkedList 区别?#

说一说 ArrayList 的扩容机制吧#

先从 ArrayList 的构造函数说起#

一步一步分析 ArrayList 扩容机制#

Set#

Comparable 和 Comparator 的区别#

Comparator 定制排序#

无序性和不可重复性的含义是什么#

比较 HashSet、LinkedHashSet 和 TreeSet 三者的异同#

Queue#

Queue 与 Deque 的区别#

ArrayDeque 与 LinkedList 的区别#

说一说 PriorityQueue#

什么是 BlockingQueue？#

BlockingQueue 的实现类有哪些？#

ArrayBlockingQueue 和 LinkedBlockingQueue 有什么区别？#

Map（重要）#

HashMap 和 Hashtable 的区别#

HashMap 和 HashSet 区别#

HashMap 和 TreeMap 区别#

HashSet 如何检查重复?#

HashMap 的底层实现#

JDK1.8 之前#

JDK 1.8 HashMap#

JDK1.8 之后#

HashMap 的长度为什么是 2 的幂次方#

HashMap 多线程操作导致死循环问题#

HashMap 为什么线程不安全？#

HashMap 常见的遍历方式?#

ConcurrentHashMap 和 Hashtable 的区别#

ConcurrentHashMap 线程安全的具体实现方式/底层具体实现#

JDK1.8 之前#

JDK1.8 之后#

JDK 1.7 和 JDK 1.8 的 ConcurrentHashMap 实现有什么不同？#

ConcurrentHashMap 为什么 key 和 value 不能为 null？#

ConcurrentHashMap 能保证复合操作的原子性吗？#

Collections 工具类（不重要）#

排序操作#

查找,替换操作#

同步控制#

Collections Overview#

Java Collections Overview#

Differences among List, Set, Queue, and Map#

Summary of underlying data structures in the Collection Framework#

List#

Set#

Queue#

Map#

How to choose a collection?#

Why use collections?#

List#

Differences between ArrayList and Array (array)?#

Differences between ArrayList and Vector?#

Differences between Vector and Stack?#

Can ArrayList contain null values?#

Time complexity of inserting and deleting elements in ArrayList?#

Time complexity of inserting and deleting elements in LinkedList?#

Why can’t LinkedList implement RandomAccess?#

Differences between ArrayList and LinkedList?#

Talk about ArrayList resizing mechanism#

Start with ArrayList constructors#