Use Concurrent HashMap instead of hashtable & synchronizedMap | JDK 1.5 new features Concurrent HashMap

Use Concurrent HashMap instead of hashtable & synchronizedMap
Some of the drawbacks of  Synchronized collection such as  HashTable , Collections.synchronizedMap are as follows .

  Synchronized collection classes such as  Hashtable and  the synchronized wrapper classes created by the Collections.synchronizedMap are thread safe  with poor concurrency, less performance and scalabilty  .

1. Poor concurrency : When these collections  are  accessed by two or more threads, they  achieve thread safety  by  making the collection's data private and synchronizing all public methods  so that   only  one  thread at a time can access the  collection (hashtable /  synchronizedMap )   data.  This leads to poor concurrency.  As  Single lock is used for the whole collection , multiple threads struggle for the collection wide lock which reduces the performance

2. ConcurrentModificationException :
                When one thread is traversing the hashtable / Collections.synchronizedMap through an Iterator ,  while another thread  changes it  by mutative operations (put, remove , etc) , iterator  implemented in the java.util collections classes  fails by throwing ConcurrentModificationException . The exception occurs when  the hasNext() or next() method of Iterator class is called.  The same error also occurs (See  Code Part 1 : )  , when elements are added  in hashtable or  synchronizedMap , once the  iterator is constructed.  While iterating the collection (hashtable) through iterator  , collection / table- wide locking is required ,  otherwise ConcurrentModificationException is occured .

3. Scalabilty Issues :
      Scalabilty is the major issue when we use synchronized collections .  When the workload of the application increases ,   increasing the resources like processor , memory  should also  increase the throughtput of the application.   Unfortunately , it does not happen .  A scalable program can handle   a proportionally larger workload with more resources.  As synchronized collections synchronize on a single common lock , it restricts access  to  a  single thread at a  time,  other threads are restricted to access that collections , even if the resources  are available to schedule those threads.


4.  Some of the  common sequences of operations , such as put-if-absent (to check if an element is in the collection before adding it)    or iteration ,  require external synchronization (i.e. client side locking ) (See Code Part  3 )  to avoid data races  .


Code Part 1 :



//Map hm=Collections.synchronizedMap(new HashMap());

Map hm=new Hashtable(new HashMap());

//ConcurrentHashMap hm=new ConcurrentHashMap();

hm.put(1, "Blue");

hm.put(2, "Green");

hm.put(3, "Yellow");

Iterator entries = hm.entrySet().iterator();

hm.put(4, "Red");

hm.put(5, "Orange");



while (entries.hasNext()) {

    Map.Entry entry = (Map.Entry) entries.next();

    Integer key = (Integer)entry.getKey();

    String value = (String)entry.getValue();

    System.out.println("Key = " + key + ", Value = " + value);}
To overcome the above  issues with the synchronized collections , a new version of HashMap with concurrent access   has been designed  that is ConcurrentHashMap. This class is packaged with  java.util.concurrent  in JDK 1.5)

The main purpose to create ConcurrentHashMap is to provide
 
 1. better concurrency
  2  high scalability
  3. thread safe

and it supports
   1.  full concurrency of retrievals. Allows all readers to read the table concurrently  .  No lock is used for retrival operations.

  2.   concurrency for  writes . Allows  a limited number of writers to update the table concurrently

  3. full thread safe .

ConcurrentHashMap can be used where more read operation is required  ( i.e.  traversal is the dominant  operation )

How a ConcurrentHashMap is implemented ? or How it works?  or how concurrency is achieved?

          Volatile fields  and  lock striping  plays major role for  to achieve concurrency .
Lock striping :   Synchronizing  every method  on  a  single  lock,  restricts access  to  a  single  thread at a  time.   Instead of using single lock ,  ConcurrentHashMap  uses  different  locking mechanism  called lock  striping  to access the shared collection  concurrently which increases the scalabilty and performance .     Using different locks to allow different  threads to operate  on different portions of the same data structure  called lock striping. Splitting the lock into more  than one  improves the scalability .  For example two locks allow two threads to execute concurrently instead of one.

    Lock splitting can sometimes be extended to partition locking on a variablesized set of independent objects, in which case it is called lock striping.    


              Now let see that how lock striping mechanism is applied to ConcurrentHashMap .  The  strategy is to subdivide the  collection (hashtable) into independent  subsets called segments each guarded by a lock sothat  each subset (itself a hashtable)   can be  accessed  concurrently.    It uses an array of 16 locks each of which guards 1/16 of the hash buckets.  N/16 locks are used  for a hashtable having  N hash buckets.  Hash  bucket N  is guarded by  lock N mod 16.  16 locks allow  maximum of 16 threads to modify the hashtable at same time.   Mutative operations such as put() and remove() use locks  where as read operation does not use locks .


Note : The  number  of  locks  can  be  increased


Volatile Fields :   Some of the volatile fileds declared in the ConcurrentHashMap are


transient volatile int count;
    static final class HashEntry<K,V> {
         final K key;
         final int hash;
        volatile V value;
         volatile HashEntry<K,V> next;

        HashEntry(K key, int hash, HashEntry<K,V> next, V value) {
            .....
            .....
         }
 transient volatile HashEntry<K,V>[] table;

From the source of ConcurrentHashMap


As we know , volatile field ensures visibilty i.e.  one thread reads the most up-to-date value written by another thread .  For example count is the volatile field which is used to track the number of  elements  . When one thread  adds  an element to the table , the count is increased  by one  , Similarly when one  thread  removes  an element from the table , the count is decreased  by one .  Now the other threads doing   many read operations  get the count variable's most  recent  updated value.


Similarly   HashEntry<K,V>[] table , value  , volatile HashEntry<K,V> next     fileds  are declared as volatile.  This ensures that all the threads see the the most  recent written  value of  those  fields at all times.       


When iterating the collection (hashtable) through iterator  , it does not  throw ConcurrentModificationException, but the elements  added  or removed after the iterator was constructed  may or may not be reflected . No collection / table- wide locking is required  while iterating the collection.
 Issue:   How to  protect  / lock the entire collection?  . There is no support for locking the entire table in a way that prevents all access. Then  One way is to acquire all of the locks recursively  which is costlier than using a single lock .
 ConcurrentHashMap provides three new update methods:
 putIfAbsent(key, value)  -  check if the key  is in the collection before adding the specified key  and  associate it with the given value
 replace( key, value)  - Replace the existing  key with given key  ,  only if  the key is  mapped to  given value.
 remove(key, value) - Remove the key only if the key is mapped to  given value.


The following program using ConcurrentHashMap  helps to keep the  accessed files in a cache .
 Code Part 2 :



import java.util.*;

import java.util.concurrent.ConcurrentHashMap;

import java.io.*;



public class CacheUsingMap2 {



  ConcurrentHashMap  cache;



  public CacheUsingMap2() {

    cache = new ConcurrentHashMap();

   }



  public String getFile2(String fname) {

     cache.putIfAbsent(fname,  readFile(fname));

      return ((myFile)cache.get(fname)).getFileData();

     }





  public myFile readFile(String name)

                   {

                  File file = new File(name);

                String fileData="";

                  try {

                      

              Scanner scan = new Scanner(file);

              scan.useDelimiter("\\Z");

              fileData = scan.next();



                         } catch (FileNotFoundException e){

                               System.out.println(e);



                                }

                             catch ( IOException e) {

                         System.out.println(e);

                                }



         return (new myFile( fileData));

                }



  public static void main(String args[]) {

  CacheUsingMap2 cache=new CacheUsingMap2();



 String filePath="D:/Files/";

System.out.println( cache.getFile2(filePath+"k.txt"));

System.out.println( cache.getFile2(filePath+"k1.txt"));

System.out.println( cache.getFile2(filePath+"k.txt"));

System.out.println( cache.getFile2(filePath+"k1.txt"));



  }

}



class myFile {


          String fileData;

    public myFile(String data)

          {

              fileData=data; 

          }


  public String getFileData() {

         return fileData;

     }



Code Part  3 :

Sample code to createt cache using  Hashtable  (implements put-if-absent operation) which requires client side locking




....

Hashtable  cache =new Hashtable();

....



  public String getFile(String fname) {



//   if (cache.get(fname)==null)

         if (!cache.containsKey(fname))

                 {

  synchronized(cache)

                                        {

  cache.put(fname, readFile(fname));

                                         }

                 }


                 return ((myFile)cache.get(fname)).getFileData();     } 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.