Intersection of Two Arrays II

LeetCode Q 350 - Intersection of Two Arrays II

Given two arrays, write a function to compute their intersection.
Note:

  • Each element in the result should appear as many times as it shows in both arrays.
  • The result can be in any order.

Follow up:

  • What if the given array is already sorted? How would you optimize your algorithm?
  • What if nums1’s size is small compared to nums2’s size? Which algorithm is better?
  • What if elements of nums2 are stored on disk, and the memory is limited such that you cannot load all elements into the memory at once?

Similar Question: Intersection of Two Arrays

Solution

Method 1: Sort + Two Pointers

Time Complexity: O(nlogn)

Code:

public int[] intersect(int[] nums1, int[] nums2) {
    Arrays.sort(nums1); Arrays.sort(nums2);
    int i = 0, j = 0;
    List<Integer> list = new ArrayList<>();
    while (i < nums1.length && j < nums2.length) {
        if (nums1[i] == nums2[j]) {
            list.add(nums1[i]);
            i++; j++;
        } else if (nums1[i] > nums2[j]) {
            j++;
        } else {
            i++;
        }
    }
    
    int[] res = new int[list.size()];
    int index = 0;
    for (int num: list) {
        res[index++] = num;
    }
    
    return res;
}

Method 2: HashMap

Time Complexity: O(n)

Code:

public int[] intersect(int[] nums1, int[] nums2) {
    Map<Integer, Integer> map = new HashMap<>();
    for (int num: nums1) {
        map.put(num, map.getOrDefault(num, 0) + 1);
    }
    
    List<Integer> list = new ArrayList<>();
    for (int num: nums2) {
        if (map.containsKey(num) && map.get(num) > 0) {
            list.add(num);
            map.put(num, map.get(num) - 1);
        }
    }
    
    int[] res = new int[list.size()];
    int index = 0;
    for (int num: list) 
        res[index++] = num;
    
    return res;
}

Solution to 3rd follow-up question:

  • If only nums2 cannot fit in memory, put all elements of nums1 into a HashMap, read chunks of array that fit into the memory, and record the intersections.

  • If both nums1 and nums2 are so huge that neither fit into the memory, sort them individually (external sort), then read 2 elements from each array at a time in memory, record intersections.

It is a classical question in database perspective. External sort is a trick used to implement JOIN, basically called sort-merge join. A detailed explanation of External sorting can be found in wekipedia.

  • External sorting is a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory, usually a hard disk drive. Thus, external sorting algorithms are external memory algorithms and thus applicable in the external memory model of computation.

  • External sorting algorithms generally fall into two types, distribution sorting, which resembles quicksort, and external merge sort, which resembles merge sort. The latter typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file.

  • External sorting algorithms can be analyzed in the external memory model. In this model, a cache or internal memory of size M and an unbounded external memory are divided into blocks of size B, and the running time of an algorithm is determined by the number of memory transfers between internal and external memory. Like their cache-oblivious counterparts, asymptotically optimal external sorting algorithms achieve a running time (in Big O notation) of

$$ \frac{N}{B}O(log_{\frac{M}{B}}\frac{N}{B}) $$

For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:

  1. Read 100 MB of the data in main memory and sort by some conventional method, like quicksort.
  2. Write the sorted data to disk.
    Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks (there are 900MB / 100MB = 9 chunks), which now need to be merged into one single output file.
  3. Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk into input buffers in main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
  4. Perform a 9-way merge and store the result in the output buffer. Whenever the output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available. This is the key step that makes external merge sort work externally – because the merge algorithm only makes one pass sequentially through each of the chunks, each chunk does not have to be loaded completely; rather, sequential parts of the chunk can be loaded as needed. We can do the n-way merge like Merge k Sorted Lists.

   Reprint policy


《Intersection of Two Arrays II》 by Tong Shi is licensed under a Creative Commons Attribution 4.0 International License
  TOC