HashMap源码阅读

发表于 2025-09-11 更新于 2025-09-15 分类于源码阅读

成员变量

HashMap中基本的成员变量:

/* ---------------- Fields -------------- */

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 *
 * 哈希表，首次使用时初始化，并在必要时扩容。分配时长度总是2的幂。
 * （某些操作允许长度为0，以支持目前不需要的引导机制。）
 */
transient Node<K,V>[] table;

/**
 * Holds cached entrySet(). Note that AbstractMap fields are used
 * for keySet() and values().
 *
 * 缓存entrySet()，注意AbstractMap的字段用于keySet()和values()。
 */
transient Set<Map.Entry<K,V>> entrySet;

/**
 * The number of key-value mappings contained in this map.
 *
 * 当前Map中键值对的数量。
 */
transient int size;

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 *
 * HashMap结构性修改的次数。结构性修改指改变映射数量或内部结构（如rehash）。
 * 该字段用于让Collection视图的迭代器快速失败（fail-fast）。
 * （参见ConcurrentModificationException）
 */
transient int modCount;

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * 每次需要扩容的阈值（容量*负载因子）。
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
// （Javadoc描述在序列化时有效。如果table数组未分配，该字段保存初始容量，0表示DEFAULT_INITIAL_CAPACITY。）
int threshold;

/**
 * The load factor for the hash table.
 *
 * 哈希表的负载因子。
 *
 * @serial
 */
final float loadFactor;

构造器

无参构造器

这个也是我们最常用的一个构造器，仅仅指定了负载因子，其他的字段都使用默认值。

/**
 * Constructs an empty {@code HashMap} with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

而负载因子是0.75，这个值在时间和空间的权衡之下的一个不错的选择。


/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

公共方法

put

put自身的功能很简单，就是插入或更新一个键值对。

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * 将指定的值与此映射中的指定键关联。
 * 如果该映射先前包含该键的映射，则旧值将被替换。
 *
 * @param key key with which the specified value is to be associated
 *            要关联指定值的键
 * @param value value to be associated with the specified key
 *              要与指定键关联的值
 * @return the previous value associated with {@code key}, or
 *         {@code null} if there was no mapping for {@code key}.
 *         (A {@code null} return can also indicate that the map
 *         previously associated {@code null} with {@code key}.)
 *         返回与key关联的前一个值，如果没有映射则返回null。
 *         （返回null也可能表示该映射之前将null与key关联。）
 */
public V put(K key, V value) {
    // 调用putVal方法，插入或更新键值对
    return putVal(hash(key), key, value, false, true);
}

代码很简单，通过hash(key)方法计算哈希值，然后调用putVal方法插入或更新键值对。

hash

而hash函数是怎么实现的呢?

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 *
 * 计算key的hashCode，并将高位的hash通过异或运算混合到低位。
 * 因为哈希表的长度总是2的幂，只有低位参与寻址，如果高位不同而低位相同会导致哈希冲突。
 * 通过将高位与低位异或，可以让高位的信息参与寻址，减少哈希冲突。
 */
static final int hash(Object key) {
    int h;
    // 如果key为null，返回0，否则返回hashCode与其高16位右移后异或的结果
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

在hashmap不是null键的情况下，调用键对象的hashCode()方法获取哈希值，然后将该哈希值与其高16位右移后的值进行异或运算，得到最终的哈希值。
这样做的目的是为了进行一个hash扰动的过程，将高位的信息也混合到低位中，从而减少哈希冲突。

为什么这么做？主要是考虑了hash寻址的一个实现方式:

HashMap 通过如下方式计算数组的索引：

1	int index = (table.length - 1) & hash;

这里 & 是按位与运算，参加运算的两个数，按二进制位进行“与”运算。
运算规则：只有两个数的二进制同时为1，结果才为1，否则为0。（负数按补码形式参加按位与运算）
翻译一下来说，就是在当 table.length 是 2 的幂时，这个算法等价于取模运算，但效率更高，因为位运算比除法快。
但这也会带来一个问题在通常使用场景下，table数量有限的情况下，table.length-1 的二进制表示为低位全是 1。
例如，table.length=16（二进制 10000），n-1=15（二进制 01111）。在table.length=16 时，只有 hash 的最低 4 位参与索引计算。
这样，按位与实现(table.length - 1) & hash 实际上就是取 hash 的低 log₂(table.length) 位作为哈希表的索引。也就是只保留
hash 的低位部分作为索引，高位被舍弃。因此，只有哈希值的低位部分决定元素放在哈希表的哪个位置。

这样的话，即使 hashCode 的高位分布再好，但低位分布不好，仍然会导致哈希冲突。
所以 HashMap 还会对 hashCode 做扰动（让高16位与低16位异或），让高位的信息也混合到低位，提升分布均匀性。

具体的实现细节在putVal方法中,我们看看putVal的方法:

/**
 * Implements Map.put and related methods.
 * 实现了Map.put及相关方法。
 *
 * @param hash hash for key
 *             键的哈希值
 * @param key the key
 *            键
 * @param value the value to put
 *              要放入的值
 * @param onlyIfAbsent if true, don't change existing value
 *                     如果为true，不覆盖已有值
 * @param evict if false, the table is in creation mode.
 *              如果为false，表示表处于创建模式
 * @return previous value, or null if none
 *         返回旧值，如果没有则返回null
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 如果table为空或长度为0，则进行初始化或扩容
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 计算哈希表下标，如果该位置为空，直接插入新节点
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

尝试逐行解析一下，一开始的四个环境变量，通读整段代码，可以知道分别代表了

tab - 存放hashmap所有元素的哈希表
n - 哈希表的长度

resize

因此，第一次添加元素肯定会走resize ,

整个代码很长，精简一下:

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * 初始化或扩容哈希表（容量加倍）。如果table为null，则根据threshold字段分配初始容量。
 * 否则，由于采用2的幂次扩容，原哈希表中的元素要么保持原索引，要么以2的幂为偏移移动到新表中。
 *
 * @return the table
 *         返回哈希表数组
 */
final Node<K,V>[] resize() {
    // 记录原来的哈希表、哈希表容量与阈值
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    // 初始化新的容量与阈值
    int newCap, newThr = 0;
    // 从这里开始就是关于容量的核心计算方式
    // 情况一：容量大于0，哈希表数组已经初始化，进行正常扩容
    if (oldCap > 0) {
        // 情况一：容量大于0，哈希表数组已经初始化，进行正常扩容
        if (oldCap >= MAXIMUM_CAPACITY) {
            // 如果已达最大容量2^30，阈值设为最大，直接返回，不进行容量调整
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 情况二：容量在默认值16和最大值2^30之间，正常扩容，让容量翻倍
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold 
    }
    else if (oldThr > 0)   // 情况二：数组未初始化，但构造时指定了初始容量，这种情况下，初始容量值被保存在 threshold 变量里
        newCap = oldThr;  // initial capacity was placed in threshold
    else {               // 情况三：数组未初始化，且构造时未指定容量（调用 new HashMap()）
        newCap = DEFAULT_INITIAL_CAPACITY; // 默认初始容量
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); // 默认阈值
    }
    // 处理情况二中的 newThr 未被赋值的问题
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE); // 计算新阈值
    }
    threshold = newThr; // 更新阈值
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; // 新建扩容后的哈希表
    table = newTab; // 更新存储哈希表
    // 原数据不为空，进行数据迁移逻辑
    if (oldTab != null) {
        // 遍历旧的哈希表数组的每一个桶
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                // 置空旧哈希表的该桶，帮助GC
                oldTab[j] = null;
                if (e.next == null)
                    // 情况A: 桶中只有一个节点,直接迁移
                    newTab[e.hash & (newCap - 1)] = e; 
                else if (e instanceof TreeNode)
                    // 情况B: 桶中是红黑树节点，调用红黑树的split方法进行迁移
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); 
                else { 
                // 情况 C: 桶中是链表（最核心的优化点）
                // preserve order 保持链表顺序
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

有个疑问就是int oldThr = threshold;中threshold一开始就是4了

这部分代码比较复杂，整体分为两部分，扩容和数据迁移，
扩容主要处理了三种情况：

情况一（oldCap > 0）：这是最常见的扩容场景，数组已经存在，直接将容量（newCap）和阈值（newThr）都扩大为原来的 2 倍
情况二（oldCap == 0 && oldThr > 0）：数组还未初始化，但构造函数中指定了初始容量，这时直接使用该初始容量作为新的容量
情况三（oldCap == 0 && oldThr == 0）：数组未初始化，且构造函数中也未指定初始容量，这时使用默认的初始容量（16）和默认的负载因子（0.75）计算
阈值（newThr）。
数据迁移部分主要处理了三种情况：
情况A（桶中只有一个节点）：直接将该节点迁移，因为无链表，也能直接确定元素存放位置，
之所以是e.hash & (newCap - 1)，也是一个优化点，避免了取模运算，通过位运算提高效率，并获得了与e.hash % newCap 相同的效果。
具体来说，假设旧容量是16（2^4），新容量是32（2^5），那么对于一个节点e：

旧索引计算：旧容量 - 1 = 15（二进制 01111），索引 = e.hash & 15
新索引计算：新容量 - 1 = 31（二进制 11111），索引 = e.hash & 31
由于新容量是旧容量的两倍，索引计算时多了一位，因此节点要么保持原索引不变（如果该位是0），
要么索引增加旧容量（如果该位是1）。这就是为什么在链表迁移时会用到 e.hash & oldCap 来判断节点的新位置。

情况B（桶中是红黑树节点）：调用红黑树的 split 方法进行迁移
情况C（桶中是链表）：由于容量是 2 的幂次方，扩容后，一个桶中的元素只可能去往两个位置：要么留在原索引位置，要么移动到
原索引 + 老容量的位置，通过位运算将链表节点分为两部分，一部分索引不变，另一部分索引加上旧容量（oldCap），然后分别插入到新表中，保持链表顺序。

核心优化点在于通过(e.hash & oldCap) == 0 决定元素去向。
oldCap 是旧数组的容量，它一定是 2 的幂（例如 16，二进制为 10000），e.hash & oldCap 这个位运算，实际上是在检查 e.hash 在 oldCap
的最高位上是 0 还是 1
如果结果为 0，说明该元素的哈希值在该位上是 0扩容后，它的新索引与旧索引完全相同这些元素被串在一起，形成低位链表（lo-list）
如果结果不为 0，说明该元素的哈希值在该位上是 1扩容后，它的新索引就是旧索引 + oldCap这些元素被串在一起，形成高位链表（hi-list）
通过一次遍历，源码就巧妙地将一个长链表拆分成了两个子链表（loHead 和 hiHead），然后将这两个子链表分别放置到新数组的对应位置，极大地提高了扩容效率。

putVal

回到putVal方法中，这样前面就很清晰了。哈希表不存在，resize方法初始化哈希表，resize完成后，n就有值了，然后计算索引i，如果该位置为空，直接插入新节点。

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        // 已经有值，存在哈希冲突
        // 下面分析
        ...
    }

而该位置不为空的话，那就是发生了哈希冲突，需要处理冲突:

Node<K,V> e; K k;
if (p.hash == hash &&
    ((k = p.key) == key || (key != null && key.equals(k))))
    e = p;
else if (p instanceof TreeNode)
    e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
    for (int binCount = 0; ; ++binCount) {
        if ((e = p.next) == null) {
            p.next = newNode(hash, key, value, null);
            if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                treeifyBin(tab, hash);
            break;
        }
        if (e.hash == hash &&
            ((k = e.key) == key || (key != null && key.equals(k))))
            break;
        p = e;
    }
}
if (e != null) { // existing mapping for key
    V oldValue = e.value;
    if (!onlyIfAbsent || oldValue == null)
        e.value = value;
    afterNodeAccess(e);
    return oldValue;
}

如果该位置不为空，说明发生了哈希冲突，需要处理冲突:

如果该位置的节点hash值与新节点相同，并且key也相同，说明是更新操作，直接更新值
如果该位置的节点是红黑树节点，调用红黑树的插入方法
否则，说明是链表节点，遍历链表：
- 如果找到相同hash和key的节点，说明是更新操作，直接更新值
- 如果遍历到链表末尾，插入新节点
- 如果链表长度超过阈值，调用treeifyBin方法，将链表转换为红黑树
  处理完hash冲突，剩下的就是几行代码了:

++modCount;
if (++size > threshold)
    resize();
afterNodeInsertion(evict);
return null;

最后，更新modCount和size，如果size超过阈值，调用resize进行扩容,以及提供了一个afterNodeInsertion的回调。用来给子类LinkedHashMap使用的
afterNodeInsertion
，方便LinkedHashMap中被覆盖的afterNodeInsertion方法，用来回调移除最早放入Map的对象。可以看到源码中留有对应的钩子:

// Callbacks to allow LinkedHashMap post-actions
void afterNodeAccess(Node<K,V> p) { }
void afterNodeInsertion(boolean evict) { }
void afterNodeRemoval(Node<K,V> p) { }

而如何判断链表是否需要树化成红黑树呢，是通过binCount >= TREEIFY_THRESHOLD - 1来确定的。
简单来说，如果当前桶（bin）中的节点数量已经达到或超过 TREEIFY_THRESHOLD - 1，则在插入新节点后，桶中的节点数将达到或超过
TREEIFY_THRESHOLD，此时需要将链表结构转换为红黑树结构（treeify）。

1 是因为计数是从0开始的，插入新节点后才会达到阈值。例如，TREEIFY_THRESHOLD 默认是8，当已有7个节点（binCount ==
7）时，再插入一个节点就达到8，需要树化。

treeify

关于树化，是通过以下代码实现的:

final void treeify(Node<K,V>[] tab) {
    TreeNode<K,V> root = null;
    for (TreeNode<K,V> x = this, next; x != null; x = next) {
        next = (TreeNode<K,V>)x.next;
        x.left = x.right = null;
        if (root == null) {
            x.parent = null;
            x.red = false;
            root = x;
        }
        else {
            K k = x.key;
            int h = x.hash;
            Class<?> kc = null;
            for (TreeNode<K,V> p = root;;) {
                int dir, ph;
                K pk = p.key;
                if ((ph = p.hash) > h)
                    dir = -1;
                else if (ph < h)
                    dir = 1;
                else if ((kc == null &&
                          (kc = comparableClassFor(k)) == null) ||
                         (dir = compareComparables(kc, k, pk)) == 0)
                    dir = tieBreakOrder(k, pk);

                TreeNode<K,V> xp = p;
                if ((p = (dir <= 0) ? p.left : p.right) == null) {
                    x.parent = xp;
                    if (dir <= 0)
                        xp.left = x;
                    else
                        xp.right = x;
                    root = balanceInsertion(root, x);
                    break;
                }
            }
        }
    }
    moveRootToFront(tab, root);
}

具体流程如下：

遍历以当前节点为头的链表，将每个节点插入到红黑树中。
第一个节点直接作为根节点，标记为黑色。
后续节点根据 hash 值、可比较性和类名等规则插入到红黑树的合适位置，并保持红黑树的平衡。
最后将红黑树的根节点放到哈希表的桶头。

get

同样的，get本身的逻辑很简单，是委托给内部的一个getNode方法实现的。

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 *
 * 返回指定key对应的value，如果没有该key的映射则返回null。
 * 注意，返回null不一定表示没有该key的映射，也可能是该key被显式映射为null。
 * 可以通过containsKey方法区分这两种情况。
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(key)) == null ? null : e.value;
}

getNode

getNode的实现如下,很简单直接在关键源码处添加注释了:

/**
 * Implements Map.get and related methods.
 *
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k;
    // 哈希表不为空且长度大于0且计算哈希表下标后获取该位置的第一个节点也不为空
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & (hash = hash(key))]) != null) {
        // 检查第一个节点是否匹配
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 遍历链表或红黑树
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                // 如果是红黑树节点，调用红黑树的查找方法
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                // 否则遍历链表
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}