leetcodeQ187

LeetCode Q 187 - Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Solution

TODO: use Rabin–Karp algorithm to optimize the solution.

Code:

public List<String> findRepeatedDnaSequences(String s) {
	Set<String> seen = new HashSet<>();
	Set<String> repeated = new HashSet<>();
	for (int i = 0; i < s.length() - 9; i++) {
		if (!seen.add(s.substring(i, i + 10))) 
			repeated.add(s.substring(i, i + 10));
	}
	return new ArrayList(repeated);
}

   Reprint policy


《leetcodeQ187》 by Tong Shi is licensed under a Creative Commons Attribution 4.0 International License
  TOC