How does a HashMap work in JAVA

 

Most JAVA developers are using Maps and especially HashMaps. A HashMap is a simple yet powerful way to store and get data. But how many developers know how a HashMap works internally? A few days ago, I’ve read a huge part of the source code of java.util.HashMap (in Java 7 then Java 8) in order to have a deep understanding of this fundamental data structure. In this post, I’ll explain the implementation of java.util.HashMap, present what’s new in the JAVA 8 implementation and talk about performance, memory and known issues when using HashMaps.

 

Internal storage

The JAVA HashMap class implements the interface Map<K,V>. The main methods of this interface are:

  • V put(K key, V value)
  • V get(Object key)
  • V remove(Object key)
  • Boolean containsKey(Object key)

HashMaps use an inner class  to store data: the Entry<K, V>. This entry is a simple key-value pair with two extra data:

  • a reference to another Entry so that a HashMap can store entries like singly linked lists
  • a hash value that represents the hash value of the key. This hash value is stored to avoid the computation of the hash every time the HashMap needs it.

Here is a part of the Entry implementation in JAVA 7:

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
…
}

A HashMap stores data into multiple singly linked lists of entries (also called buckets or bins). All the lists are registered in an array of Entry (Entry<K,V>[] array) and the default capacity of this inner array is 16.

 

internal_storage_java_hashmap The following picture shows the inner storage of a HashMap instance with an array of nullable entries. Each Entry can link to another Entry to form a linked list.

 

All the keys with the same hash value are put in the same linked list (bucket). Keys with different hash values can end-up in the same bucket.

When a user calls put(K key, V value) or get(Object key), the function computes the index of the bucket in which the Entry should be. Then, the function iterates through the list to look for the Entry that has the same key (using the equals() function of the key).

In the case of the get(), the function returns the value associated with the entry (if the entry exists).

In the case of the put(K key, V value), if the entry exists the function replaces it with the new value otherwise it creates a new entry (from the key and value in arguments) at the head of the singly linked list.

 

This index of the bucket (linked list) is generated in 3 steps by the map:

  • It first gets the hashcode of the key.
  • It rehashes the hashcode to prevent against a bad hashing function from the key that would put all data in the same index (bucket) of the inner array
  • It takes the rehashed hash hashcode and bit-masks it with the length (minus 1) of the array. This operation assures that the index can’t be greater than the size of the array. You can see it as a very computationally optimized modulo function.

Here is the JAVA 7 and 8 source code that deals with the index:

// the "rehash" function in JAVA 7 that takes the hashcode of the key
static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
// the "rehash" function in JAVA 8 that directly takes the key
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
// the function that returns the index from the rehashed hash
static int indexFor(int h, int length) {
    return h & (length-1);
}

In order to work efficiently, the size of the inner array needs to be a power of 2, let’s see why.

Imagine the array size is 17, the mask value is going to be 16 (size -1). The binary representation of 16 is 0…010000, so for any hash value H the index generated with the bitwise formula “H AND 16” is going to be either 16 or 0. This means that the array of size 17 will only be used for 2 buckets: the one at index 0 and the one at index 16, not very efficient…

But, if you now take a size that is a power of 2 like 16, the bitwise index formula is “H AND 15”. The binary representation of 15 is 0…001111 so the index formula can output values from 0 to 15 and the array of size 16 is fully used. For example:

  • if H = 952 , its binary representation is 0..01110111000, the associated index is 0…01000 = 8
  • if H = 1576 its binary representation is 0..011000101000, the associated index is  0…01000 = 8
  • if H = 12356146, its binary representation is 0..0101111001000101000110010, the associated index is 0…00010 = 2
  • if H = 59843, its binary representation is 0..01110100111000011, the associated index is 0…00011 = 3

 

This is why the array size is a power of two. This mechanism is transparent for the developer: if he chooses a HashMap with a size of 37, the Map will automatically choose the next power of 2 after 37 (64) for the size of its inner array.

 

Auto resizing

After getting the index, the function (get, put or remove) visits/iterates the associated linked list to see if there is an existing Entry for the given key. Without modification, this mechanism could lead to performance issues because the function needs to iterate through the entire list to see if the entry exists. Imagine that the size of the inner array is the default value (16) and you need to store 2 millions values. In the best case scenario, each linked list will have a size of 125 000 entries (2/16 millions). So, each get(), remove() and put() will lead to 125 000 iterations/operations. To avoid this case, the HashMap has the ability to increase its inner array in order to keep very short linked lists.

When you create a HashMap, you can specify an initial size and a loadFactor with the following constructor:

public HashMap(int initialCapacity, float loadFactor)

If you don’t specify arguments, the default initialCapacity is 16 and the default loadFactor is 0.75. The initialCapacity represents to the size of the inner array of linked lists.

Each time you add a new key/value in your Map with put(…), the function checks if it needs to increase the capacity of the inner array. In order to do that, the map stores 2 data:

  • The size of the map: it represents the number of entries in the HashMap. This value is updated each time an Entry is added or removed.
  • A threshold: it’s equal to (capacity of the inner array) * loadFactor and it is refreshed after each resize of the inner array

Before adding the new Entry, put(…) checks if size > threshold and if it the case it recreates a new array with a doubled size. Since the size of the new array has changed, the indexing function (which returns the bitwise operation “hash(key) AND (sizeOfArray-1)”) changes. So, the resizing of the array creates twice more buckets (i.e. linked lists) and redistributes all the existing entries into the buckets (the old ones and the newly created).

This aim of this resize operation is to decrease the size of the linked lists so that the time cost of put(), remove() and get() methods stays low. All entries whose keys have the same hash will stay in the same bucket after the resizing. But, 2 entries with different hash keys that were in the same bucket before might not be in the same bucket after the transformation.

resizing_of_java_hashmap

The picture shows a representation before and after the resizing of the inner array. Before the increase, in order to get Entry E, the map had to iterate through a list of 5 elements. After the resizing, the same get() just iterates through a linked list of 2 elements, the get() is 2 times faster after the resizing !

 

Note: the HashMap only increases the size of the inner array, it doesn’t provide a way to decrease it.

 

Thread Safety

If you already know HashMaps, you know that is not threads safe, but why? For example imagine that you have a Writer thread that puts only new data into the Map and a Reader thread that reads data from the Map, why shouldn’t it work?

Because during the auto-resizing mechanism, if a thread tries to put or get an object, the map might use the old index value and won’t find the new bucket in which the entry is.

The worst case scenario is when 2 threads put a data at the same time and the 2 put() calls resize the Map at the same time. Since both threads modify the linked lists at the same time, the Map might end up with an inner-loop in one of its linked lists. If you tries to get a data in the list with an inner loop, the get() will never end.

The HashTable implementation is a thread safe implementation that prevents from this situation. But, since all the CRUD methods are synchronized this implementation is very slow. For example, if thread 1 calls get(key1), thread 2 calls get(key2) and thread 3 calls get(key3), only one thread at a time will be able to get its value whereas the 3 of them could access the data at the same time.

A smarter implementation of a thread safe HashMap exists since JAVA 5: the ConcurrentHashMap. Only the buckets are synchronized so multiples threads can get(), remove() or put() data at the same time if it doesn’t imply accessing the same bucket or resizing the inner array. It’s better to use this implementation in a multithreaded application.

 

Key immutability

Why Strings and Integers are a good implementation of keys for HashMap? Mostly because they are immutable! If you choose to create your own Key class and don’t make it immutable, you might lose data inside the HashMap.

Look at the following use case:

  • You have a key that has an inner value “1”
  • You put an object in the HashMap with this key
  • The HashMap generates a hash from the hashcode of the Key (so from “1”)
  • The Map  stores this hash in the newly created Entry
  • You modify the inner value of the key to “2”
  • The hash value of the key is modified but the HashMap doesn’t know it (because the old hash value is stored)
  • You try to get your object with your modified key
  • The map computes the new hash of your key (so from “2”) to find in which linked list (bucket) the entry is
    • Case 1: Since you modified your key, the map tries to find the entry in the wrong bucket and doesn’t find it
    •  Case 2: Luckily, the modified key generates the same bucket as the old key. The map then iterates through the linked list to find the entry with the same key. But to find the key, the map first compares the hash values and then calls the equals() comparison. Since your modified key doesn’t have the same hash as the old hash value (stored in the entry), the map won’t find the entry in the linked-list.

Here is a concrete example in Java. I put 2 key-value pairs in my Map, I modify the first key and then try to get the 2 values. Only the second value is returned from the map, the first value is “lost” in the HashMap:

public class MutableKeyTest {

	public static void main(String[] args) {

		class MyKey {
			Integer i;

			public void setI(Integer i) {
				this.i = i;
			}

			public MyKey(Integer i) {
				this.i = i;
			}

			@Override
			public int hashCode() {
				return i;
			}

			@Override
			public boolean equals(Object obj) {
				if (obj instanceof MyKey) {
					return i.equals(((MyKey) obj).i);
				} else
					return false;
			}

		}

		Map<MyKey, String> myMap = new HashMap<>();
		MyKey key1 = new MyKey(1);
		MyKey key2 = new MyKey(2);

		myMap.put(key1, "test " + 1);
		myMap.put(key2, "test " + 2);

		// modifying key1
		key1.setI(3);

		String test1 = myMap.get(key1);
		String test2 = myMap.get(key2);

		System.out.println("test1= " + test1 + " test2=" + test2);

	}

}

The output is: “test1= null test2=test 2”. As expected, the Map wasn’t able to retrieve the string 1 with the modified key 1.

 

JAVA 8 improvements

The inner representation of the HashMap has changed a lot in JAVA 8. Indeed, the implementation in JAVA 7 takes 1k lines of code whereas the implementation in JAVA 8 takes 2k lines. Most of what I’ve said previously is true except the linked lists of entries. In JAVA8, you still have an array but it now stores Nodes that contains the exact same information as Entries and therefore are also linked lists:

Here is a part of the Node implementation in JAVA 8:


   static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;

So what’s the big difference with JAVA 7? Well, Nodes can be extended to TreeNodes. A TreeNode is a red-black tree structure that stores really more information so that it can add, delete or get an element in O(log(n)).

FYI, here is the exhaustive list of the data stored inside a TreeNode

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
	final int hash; // inherited from Node<K,V>
	final K key; // inherited from Node<K,V>
	V value; // inherited from Node<K,V>
	Node<K,V> next; // inherited from Node<K,V>
	Entry<K,V> before, after;// inherited from LinkedHashMap.Entry<K,V>
	TreeNode<K,V> parent;
	TreeNode<K,V> left;
	TreeNode<K,V> right;
	TreeNode<K,V> prev;
	boolean red;
 

Red black trees are self-balancing binary search trees. Their inner mechanisms ensure that their length is always in log(n) despite new adds or removes of nodes. The main advantage to use those trees is in a case where many data are in the same index (bucket) of the inner table, the search in a tree will cost O(log(n)) whereas it would have cost O(n) with a linked list.

As you see, the tree takes really more space than the linked list (we’ll speak about it in the next part).

By inheritance, the inner table can contain both Node (linked list ) and TreeNode (red-black tree). Oracle decided to use both data structures with the following rules:
– If for a given index (bucket) in the inner table there are more than 8 nodes, the linked list is transformed into a red black tree
– If for a given index (bucket) in the inner table there are less than 6 nodes, the tree is transformed into a linked list

internal_storage_java8_hashmap

This picture shows an inner array of a JAVA 8 HashMap with both trees (at bucket 0) and linked lists (at bucket 1,2 and 3). Bucket 0 is a Tree because it has more than 8 nodes.

 

Memory overhead

JAVA 7

The use of a HashMap comes at a cost in terms of memory. In JAVA 7, a HashMap wraps key-value pairs in Entries. An entry has:

  • a reference to a next entry
  • a precomputed hash (integer)
  • a reference to the key
  • a reference to the value

Moreover, a JAVA 7 HashMap uses an inner array of Entry. Assuming a JAVA 7 HashMap contains N elements and its inner array has a capacity CAPACITY, the extra memory cost is approximately:

sizeOf(integer)* N + sizeOf(reference)* (3*N+C)

Where:

  • the size of an integer depends equals 4 bytes
  • the size of a reference depends on the JVM/OS/Processor but is often 4 bytes.

Which means that the overhead is often 16 * N + 4 * CAPACITY bytes

Reminder: after an auto-resizing of the Map, the  CAPACITY  of the inner array equals the next power of two after N.

Note: Since JAVA 7, the HashMap class has a lazy init. That means that even if you allocate a HashMap, the inner array of entry (that costs 4 * CAPACITY bytes) won’t be allocated in memory until the first use of the put() method.

JAVA 8

With the JAVA 8 implementation, it becomes a little bit complicated to get the memory usage because a Node can contain the same data as an Entry or the same data plus 6 references and a Boolean (if it’s a TreeNode).

If the all the nodes are only Nodes, the memory consumption of the JAVA 8 HashMap is the same as the JAVA 7 HashMap.

If the all the nodes are TreeNodes, the memory consumption of a JAVA 8 HashMap becomes:

N * sizeOf(integer) + N * sizeOf(boolean) + sizeOf(reference)* (9*N+CAPACITY )

In most standards JVM, it’s equal to 44 * N + 4 * CAPACITY bytes

 

Performance issues

Skewed HashMap vs well balanced HashMap

In the best case scenario, the get() and put() methods have a O(1) cost in time complexity. But, if you don’t take care of the hash function of the key, you might end up with very slow put() and get() calls. The good performance of the put() and get depends on the repartition of the data into the different indexes of the inner array (the buckets). If the hash function of your key is ill-designed, you’ll have a skew repartition (no matter how big the capacity of the inner array is). All the put() and get() that use the biggest linked lists of entry will be slow because they’ll need to iterate the entire lists. In the worst case scenario (if most of the data are in the same buckets), you could end up with a O(n) time complexity.
Here is a visual example. The first picture shows a skewed HashMap and the second picture a well balanced one.

skewed_java_hashmap

 

In the case of this skewed HashMap the get()/put() operations on the bucket 0 are costly. Getting the Entry K will cost 6 iterations

well_balanced_java_hashmapIn the case of this well balanced HashMap, getting the Entry K will cost 3 iterations. Both HashMaps store the same amount of data and have the same inner array size. The only difference is the hash (of the key) function that distributes the entries in the buckets.

Here is an extreme example in JAVA where I create a hash function that puts all the data in the same bucket then I add 2 million elements.

public class Test {

	public static void main(String[] args) {

		class MyKey {
			Integer i;
			public MyKey(Integer i){
				this.i =i;
			}

			@Override
			public int hashCode() {
				return 1;
			}

			@Override
			public boolean equals(Object obj) {
			…
			}

		}
		Date begin = new Date();
		Map <MyKey,String> myMap= new HashMap<>(2_500_000,1);
		for (int i=0;i<2_000_000;i++){
			myMap.put( new MyKey(i), "test "+i);
		}

		Date end = new Date();
		System.out.println("Duration (ms) "+ (end.getTime()-begin.getTime()));
	}
}

On my core i5-2500k @ 3.6Ghz it takes more than 45 minutes with java 8u40 (I stopped the process after 45 minutes).

Now, If I run the same code but this time I use the following hash function

	@Override
	public int hashCode() {
		int key = 2097152-1;
		return key+2097152*i;
}

it takes 46 seconds, which is way better! This hash function has a better repartition than the previous one so the put() calls are faster.

And If I run the same code with the following hash function that provides an even better hash repartition

 @Override
 public int hashCode() {
 return i;
 }

it now takes 2 seconds.

I hope you realize how important the hash function is. If a ran the same test on JAVA 7, the results would have been worse for the first and second cases (since the time complexity of put is O(n) in JAVA 7 vs O(log(n)) in JAVA 8)

When using a HashMap, you need to find a hash function for your keys that spreads the keys into the most possible buckets. To do so, you need to avoid hash collisions. The String Object is a good key because of it has good hash function. Integers are also good because their hashcode is their own value.

 

Resizing overhead

If you need to store a lot of data, you should create your HashMap with an initial capacity close to your expected volume.

If you don’t do that, the Map will take the default size of 16 with a factorLoad of 0.75. The 11 first put() will be very fast but the 12th (16*0.75) will recreate a new inner array (with its associated linked lists/trees) with a new capacity of 32. The 13th to 23th will be fast but the 24th (32*0.75) will recreate (again) a costly new representation that doubles the size of the inner array. The internal resizing operation will appear at the 48th, 96th,192th, … call of put(). At low volume the full recreation of the inner array is fast but at high volume it can takes seconds to minutes. By initially setting your expected size, you can avoid these costly operations.

But there is a drawback: if you set a very high array size like 2^28 whereas you’re only using 2^26 buckets in your array, you will waste a lot of memory (approximately 2^30 bytes in this case).

 

Conclusion

For simple use cases, you don’t need to know how HashMaps work since you won’t see the difference between a O(1) and a O(n) or O(log(n)) operation. But it’s always better to understand the underlaying mecanism of one of the most used data structures. Moreover, for a java developer position it’s a typical interview question.

At high volume it becomes important to know how it works and to understand the importance of the hash function of the key.

I hope this article helped you to have a deep understanding of the HashMap implementation.

image_pdfimage_print

53 Responses to“How does a HashMap work in JAVA”

  1. Anonymous
    December 25, 2016 at 6:21 am #

    tks for sharing. like your style.

  2. RK
    November 13, 2016 at 11:56 pm #

    Few junior devs are unaware that keys should be immutable and in case of JDK 7: initializing with right size is important to avoid resizing, you have to still compare all keys using equals on the entire list in that bucket, it is an array of singly linked list or Entry, hash is cached, a bad hash function still works fine as JDK 7 rehashes all keys anyways, keys with different hash values can still end up in same bucket because of the modulo function, size of the array is an even, the optimizied modulo function needs it to be even, the capacity is actually the size of array which is always “doubled” to keep it even.

    Although initial size is size of array, the resize is based on number of total entries that happen on adding an entry. Specifying a initial size of 16, load factor 0.75 gives threshold: 12, 13th element will resize it. Specifying only right size which you always know based on your use case, can still incur in resize unless “load factor of 1” – for well distributed hashcodes, especially for very large known sizes this is even more important, if your max array size is roughly same number of elements which is the size specificed to HashMap as new HashMap(100,000) but you forgot to specify size, you resize array and rehash all elements again upto 16 times 2^17 = 131072.

    However, most hash functions are not uniform (like integers) will still have collisions and you end up comparing more than one key using equals before returning value, so, resize could also be based on array size but also on max length of linked list, say 5, basically maintain size of each list, no point resizing so frequently in reality.

    The default load factor of 0.75 could double array size although you don’t need it, 2^18 – 2^17 > 100K elements.

    void createEntry(int hash, K key, V value, int bucketIndex) {

    size++;
    }
    void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
    resize(2 * table.length);

    }
    createEntry(hash, key, value, bucketIndex);
    }

    roundUpToPowerOf2: Integer.highestOneBit((arraySize – 1) <<

    • RK
      November 13, 2016 at 11:58 pm #

      you should add “load factor of 1” when size is known case

  3. Amir
    October 12, 2016 at 9:12 pm #

    Excellent explanation, thank you.

  4. Mohan
    September 26, 2016 at 2:12 am #

    Beautiful explanation in layman terms. Nice work!! Best Hashmap explanation I have read until now!! Keep up the wonderful work!!

  5. Anonymous
    September 18, 2016 at 6:11 am #

    THANK YOU Christophe! This has been one of the best explanations I’ve read about HashMaps! 🙌

  6. Anonymous
    September 13, 2016 at 12:29 pm #

    I must say, this is the best HashMap treatment that I have ever read until now. Cheers!

  7. Raja
    June 16, 2016 at 7:04 am #

    Hi Christophe:

    Great article. It’s helps me a lot 🙂

    I have one doubt.

    Under the section ‘Internal storage’ you mentioned as
    ———————————————————————-
    All the keys with the same hash value are put in the same linked list (bucket). Keys with different hash values can end-up in the same bucket.

    Under the section ‘Auto resizing’ it’s mentioned as
    ———————————————————————-
    But, 2 entries with different hash keys that were in the same bucket before might not be in the same bucket after the transformation

    I understand the first part. However the second one confusing me. My question is, how come different hash code objects will be in the same bucket?

    Thanks in advance.

    • June 16, 2016 at 11:19 pm #

      Hi,

      “how come different hash code objects will be in the same bucket”.
      In my article, I explain how the index of the bucket is generated from the hashcode (look for the word “modulo”). And, this generation leads to the case were 2 different hashcodes end up in the same bucket.

      So, the question is why does a HashMap have this weird behaviour?

      This idea is that the hashcode() function of an object returns an integer. And this integer has 2^32 possible values: from Integer.MIN_VALUE to Integer.MAX_VALUE.
      But, an hashmap has a very limited number of buckets (16 by default) and this number can change.

      To give you an example:
      – imagine that an empty HashMap needs to store 4 keys with the hashcodes 1 ,2 ,3 and 4. In this case, it’s obvious that the 4 keys will end up in different buckets.
      – now, imagine that an empty HashMap needs to store 4 keys with the hashcodes 1 , 133 473, 12 547 and 19 001. Since this hashmap only has 16 buckets, the hashcodes have to be transformed into a number between 0 and 15 in order to find the right bucket for the key. This transformation is the “modulo function” in my article. And, during this transformation, it’s highly possible that 2 differents hashcode will give the same number between 0 and 15.

      In this example the size of the hashmap is 16 but it’s true for every size of a hashmap except the size 2^32 because in this case the number of possible hashcodes equals the number of bukets. But with a HashMap with 2^32 buckets, you’ll likely to get an OutOfMemoryException…

      Now, if you increase the number of buckets from 16 to 32, the modulo function changes (because this modulo is based on the number of buckets). And, the fact that modulo(hashcode1,16) == modulo(hashcode2,16) doesn’t imply that modulo(hashcode1,32) == modulo(hashcode2,32). Because of that, even if the hashcode 1 and 133 473 were in the same bucket when the hashmap had 16 buckets, they might not be in the same bucket when the hashmap has 32 buckets.

  8. Anonymous
    May 13, 2016 at 2:41 am #

    In the Resizing part,”The 13th to 23th will be fast but the 23th (32*0.75) will recreate (again)”,32*0.75 should be the 24th.

  9. Alpha
    May 10, 2016 at 7:22 am #

    can you speak Chinese,I read it so hard.

    • May 10, 2016 at 9:10 am #

      Hi, unfortunately I only speak French and English.
      But, there are many Chinese translations of this article. You should find them on baidu.

  10. Shafali
    May 8, 2016 at 10:15 am #

    Christophe,

    Sorry for bugging you up with so many questions!

    I have a question regarding the immutable keys. I understand from your above example that if i dont make my key as immutable then setI on MyKey will result in different hashcode.

    I want to understand- How the datatype of the variables in Object(MyKey in your example)
    plays role?i.e. In this case Integer is an immutable object and calling hashcode method on immutable objects gurantees same hashcode.

    Lets take an example where MyKey contains List keys;
    And MyKey object is used as an Hashmap key and properly overrides equals & hashcode (uSIng both the data members of the Key class) method properly.

    Map myMap = new HashMap();
    List keys = new ArrayList();
    keys.add(“Test”);
    MyKey key1 = new MyKey(1,keys );

    myMap.put(key1, “test ” + 1);

    // modifying key1
    keys.add(“Hello”)
    key1.setKeys(keys);

    myMap.get(key1);

    Will I be able to retrieve the keys properly? I think NO because List will call hashcode method and will result in different hashcode.

    I hope I make myself clear. Please let me know if there are further questions.

    To conclude, How data member of the Key Objects plays role in map?

    • Shafali
      May 8, 2016 at 10:16 am #

      Its a List , list of String

    • May 9, 2016 at 7:34 am #

      Hi,
      You’re right with your example. The hashcode() and the equals() won’t give the same result. So, you won’t find your object “test 1” unless you do a “keys.remove(“Hello”)”.

      Concerning the question “How the datatype of the variables in Object(MyKey in your example)
      plays role”. The datatype doesn’t play any role as long as your key is immutable. If in your exemple you use an immutable list instead of an ArrayList (I don’t think there is one in Java but lets say you have built one) it will work perfectly.

      FYI, in your exemple, you don’t need to call the “key1.setKeys(keys);” after you modify keys because it’s already a property of key1.

  11. Kannan sadasivam
    April 22, 2016 at 6:26 am #

    Thanks for sharing this, it’s very useful.

  12. Shafali
    April 16, 2016 at 11:05 am #

    When rehashing is perfomed usinh hash function while computing the index, why do we override hashcode method in case we want to make custom object as key in hashmap

    • April 16, 2016 at 3:28 pm #

      The main reason is the implicit contract between hashcode() and equals():
      If 2 objects are equals then the hashcode() function HAS TO return the same value for 2 two objects.
      If you create your own key, you’ll have to modify the equals() function. By default, the hashcode() function returns the memory address of an object.
      So, if you’re putting an object Obj in the HashMap with the key MyKey key1 = new MyKey(18) and later in the code you try to get Obj with the key MyKey key2 = new MyKey(18), it won’t work. Although key1 and key 2 are equals, they don’t have the same hashcode and therefore they’re unlikely to end-up in the same bucket so the get() function with key2 won’t find the object registered with key1. And this is very important because most of the time, you won’t have the object key1 that was used to put Obj in the HashMap and you’ll have to recreate it (for example: if the HashMap is loaded in another function or in another thread).

      • Anonymous
        April 26, 2016 at 7:46 pm #

        Thanks for writing such a detailed article. It really helped alot.

  13. Abhinav
    February 22, 2016 at 6:07 am #

    thanks for this article, it is helpful and updated

  14. neeraj
    December 31, 2015 at 4:52 pm #

    Thanks for taking time to explain it using Java 7 and Java 8 as well as giving good perspective on performance. Just posted your article to my board in pinterest https://www.pinterest.com/toneeraj/java-architect/

  15. Prasanth
    December 18, 2015 at 3:37 pm #

    Well explained

  16. Anonymous
    November 26, 2015 at 7:23 pm #

    nice one

  17. wooden_code
    November 16, 2015 at 1:39 am #

    very useful and to date.

  18. Jevran
    October 30, 2015 at 10:42 am #

    How below statement can be true, as per my understanding for each different hashcode there is a new bucket?

    Keys with different hash values can end-up in the same bucket.

    • Christophe
      October 30, 2015 at 12:56 pm #

      The statement is true because the HashMap internally rehashes the hash values with a bit-masking operation.

      If you have 2 different hash values and if the internal bit-masking operation gives the same index, the hash values will end-up in the same bucket.

      For more details, look for the part that begins with “This index of the bucket (linked list) is generated in 3 steps by the map”. At the end of the part, I give an example where the hash values 952 and 1576 end up in the same bucket (bucket 8).

      Tell me if it’s still not clear after reading this part.
      Note: What I call hash value is the value given by the hashCode() function.

      • Vish
        February 9, 2016 at 12:30 pm #

        Does rehash happens ( step 2) all the time or only when unbalanced ?

        • February 9, 2016 at 8:36 pm #

          It happens all the time because there is no way to know if the user’s hash is a good hash.

      • Shafali
        April 27, 2016 at 7:26 pm #

        Hi christophe,

        I was trying to find an answer –

        Why hash values are stored in Entry objects, when its calculated again while getting element from Hashmap?

        And i think now, this is the main reason that “If you have 2 different hash values and if the internal bit-masking operation gives the same index, the hash values will end-up in the same bucket.”

        Also, to ensure that we are retrieving the right object while using get method because whenever we call hashcode method on same object it should gives us the same hash code.

        For ex. If we have saved values as (“1”, “Test”) so hash value calculated (1.hashcode) while putting should be same during the process of the get method as well (1.hashcode) and then we do comparison in the if method while iterating over the Linked list.

        Please correct me if I am not.

        • April 27, 2016 at 11:33 pm #

          Hi Shafali, it’s a very good question.
          I asked myself this question when I wrote this article and here is what I think.

          You’re right when you say “If you have 2 different hash values and if the internal bit-masking operation gives the same index, the hash values will end-up in the same bucket.”.
          But I’ll go deeper.
          If you look at the get() method you’ll see the following code


          for each Entry e in the bucket
          if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
          return e.value;
          ….

          They store the hash in the entry because recomputing the hash on the fly can be very expensive since it’s up to the user to define the hashcode() function. And, to have a good hashing function, it’s often costly (like the hashcode() of the class String).

          The Java team could have chosen not to store the hashcode in the Entry and use instead:

          for each Entry e in the bucket
          if (key.equals(k))
          return e.value;

          But, if you do that, for each object in the bucket you’ll have to call the equals() function.
          And this function can be very slow compared to a simple integer comparison. Indeed, if your key is more than an integer (for example, a String, 2 Strings, a String and a date, …) the equals() function will cost at least 10 times more CPU cycles.

          Storing the hashcode speeds up the retrieval since, for all the objects in the bucket that doesn’t have the same hashcode, the costly equals() function isn’t called. And, if you designed well the hashcode() function of the key, there should be only a few objects with the same hashcode in the bucket. So, the equals() function will rarely be called.

          • Shafali
            April 28, 2016 at 9:15 am #

            Thanks!

            I understand the fact that equals operation is costly and storing the hash value eliminates the need of computing it everytime.

            But i could not undestand the statement – “They store the hash in the entry because recomputing the hash on the fly can be very expensive”. As if we see get method code, it also calculates the index location using –

            int hash = key.hashCode();
            int i = indexFor(hash, table.length);

            And then iterating over the linked list of the particular index.

            However, i feel that one more reason for e.hash == hash is to compare hash values as – ” to ensure that we are retrieving the right object while using get method because whenever we call hashcode method on same object it should gives us the same hash code.” and thus if first condition is false don’t compute on equals method.

            Do you agree to it?

            • April 28, 2016 at 10:10 am #

              1) But i could not undestand the statement – “They store the hash in the entry because recomputing the hash on the fly can be very expensive”. As if we see get method code, it also calculates the index location using

              —>
              Because once you found the bucket, instead of using the stored hash (e.hash) like that:
              (e.hash == hash && ((k = e.key) == key || key.equals(k)))
              you could compute it on the fly:
              (e.key.hashcode() == hash && ((k = e.key) == key || key.equals(k)))

              The “int hash = key.hashCode()” you’re talking about is the hash of the key you put on the get(K key) call. You have to compute its hash on the fly no matter what to find the right bucket. I’m talking about the keys of the objects already in the HashMap (stored as entries).

              2) However, i feel that one more reason for e.hash == hash is to compare hash values as – ” to ensure that we are retrieving the right object while using get method because whenever we call hashcode method on same object it should gives us the same hash code.”

              —>
              Yes and No. No matter what, you’ll have to call the equals() function on the keys to ensure that you get the right object. The hash comparison is not enough to ensure this condition (and could be avoided if it wasn’t for performance reasons). Because, if the 2 hashes are the same, it doesn’t ensure that both keys are the same.

  19. Josh
    July 11, 2015 at 10:27 pm #

    Thanks! This was a great read, certainly learned a new thing or three.

    A correction: in your example of bad vs. good hashing functions you used the same hashCode()-implementation in your first and second example. I suppose the first one should read something like


    return 1;

    instead, made me wonder for half a minute or so.

    • Christophe
      July 12, 2015 at 12:22 am #

      Hi,
      You’re totally right (I made a bad copy paste from my source code).
      I’ve made the correction, thanks for the tip.

Pingbacks/Trackbacks

  1. 数据库原理分析——转自伯乐在线 | LiuZhaoyang's Site - August 31, 2016

    […] 想要更详细的信息,你可以阅读我在Java HashMap 上的文章,是关于高效哈希表实现的。你不需要了解Java就能理解文章里的概念。 […]

  2. 数据库的原理 | 天空之雄鹰博客园 - July 20, 2016

    […] 想要更详细的信息,你可以阅读我在Java HashMap 上的文章,是关于高效哈希表实现的。你不需要了解Java就能理解文章里的概念。 […]

  3. How HashMap works in Java – My Gateway - July 7, 2016

    […] How does a HashMap work in JAVA […]

  4. 如果有人问你数据库的原理,叫他看这篇文章 | 雪域 - June 27, 2016

    […] 想要更详细的信息,你可以阅读我在Java HashMap 上的文章,是关于高效哈希表实现的。你不需要了解Java就能理解文章里的概念。 […]

  5. [译]Java HashMap原理探究 | 神刀安全网 - June 17, 2016

    […] 原文: How does a HashMap work in JAVA […]

  6. The powere of HashMap in Java – Youcomit - May 26, 2016

    […] How HashMap works in Java […]

  7. How does a relational database work | 一个忙碌的程序猿 - May 21, 2016

    […] more information, you can read my article on the Java HashMap which is an efficient hash table implementation; you don’t need to understand Java to understand […]

  8. How does a relational database work | - May 5, 2016

    […] more information, you can read my article on the Java HashMap which is an efficient hash table implementation; you don’t need to understand Java to understand […]

  9. 数据库的原理简介 How does a relational database work – 青哲科技 - May 5, 2016

    […] 想要更详细的信息,你可以阅读我在Java HashMap 上的文章,是关于高效哈希表实现的。你不需要了解Java就能理解文章里的概念。 […]

  10. 如果有人问你数据库的原理,叫他看这篇文章 | 熊猫人Pandamen.ren | 最全面的IT资讯和交流互动的IT博客分享平台! - May 4, 2016

    […] 想要更详细的信息,你可以阅读我在Java HashMap 上的文章,是关于高效哈希表实现的。你不需要了解Java就能理解文章里的概念。 […]

  11. 关系数据库是如何工作的(2) | DevBean's World - March 26, 2016

    […] Java HashMap,这篇文章介绍了高性能哈希表的实现。即便不懂 […]

  12. 1С: Клуб программистов » Как работает реляционная БД - September 25, 2015

    […] подробностями можете обратиться к статье о Java HashMap, это пример эффективной реализации хэш-таблицы. Для […]

  13. リレーショナルデータベースの仕組み (1/3) | コンピュータサイエンス | POSTD - September 15, 2015

    […] より詳しい情報については、以前投稿したJava HashMapという記事をご参照ください。効率的なハッシュテーブルの実装について書いています。Javaの知識がなくても記事の概念は理解できると思います。 […]

  14. Java HashMap工作原理 | 心情回忆录 - September 8, 2015

    […] coding-geek 翻译: ImportNew.com – Wing 译文链接: […]

  15. Java 8 hashmap implementation using TreeNode instead of linkedlist | JAVA - July 13, 2015

    […] http://coding-geek.com/how-does-a-hashmap-work-in-java/ […]

  16. How does Shazam work - Coding Geek - May 24, 2015

    […] efficient way to store and get data. If you wan’t to know more, you can read my article on the HashMap in Java which is just an efficient hash […]

Leave a Reply

Your email address will not be published.

Proudly powered by WordPress   Premium Style Theme by www.gopiplus.com