Maximum Cut Commonality

Problem Statement

Given a string, divide it into two substrings such that the substrings have the most number of characters in common. The cut commonality is number of characters in common between the two substrings. Find the maximum cut commonality, and provide the runtime of your approach.

Assume the string only contains the characters ‘a’ through ‘z’.


def max_cut_commonality(s):
	// Your Code Here

Note:

If a character is repeated, the cut commonality count for that character is the minimum of the count of that character in the two substrings.

Test Cases

Test Case 1

Calling the function on "abcdedeara" should return 3 since the best possible way is to split it into "abcde" and "deara", with common characters a, d, and e

Test Case 2

Calling the function on "aabbbbaa" should return 4 since the best split is "aabb" and "bbaa", and the two strings share 2 a’s and 2 b’s

Follow Up Questions To Think About

Suppose the string is not limited to characters ‘a’ through ‘z’. What modifications would need to be made regarding space?

Click Below to see the Solution

Algorithm

Create two lists that keep track of the number of characters on each side of the cut. Traverse through the string and repeatedly adjust the list and calculate the cut commonality by taking the minimum of each character count between both sides of the cut, returning the maximum at the end. The data structure that is suitable to use as the list is an array, because the alphabet size is fixed, and modifying an element value takes O(1) time.

Walkthrough for Test Case 1

For test case 1, the cut begins at the beginning of the string, with the left list being [0, 0, ..., 0] (because there are no letters to the left of the cut), and the right list being [3, 1, 1, 2, 2, 0, ..., 0], because there are 3 a’s, 1 b, 1 c, etc.

The next iteration, the cut shifts to the right one letter, causing the left list to become [1, 0, ..., 0] and the right list to become [2, 1, ..., 0], as the letter ‘a’ moves from the right side to the left side.

Calculating the cut commonality for this cut would be min(1, 2) + min(0, 1) + ... + min(0, 0), which is 1 (since only 1 letter ‘a’ is shared across both cuts).

Code

def max_cut_commonality(s):
	# initialize two lists that hold character counts on the right side of the cut (lst 1) and the left side of the cut (lst 2)
	lst1 = [0] * 26
	lst2 = [0] * 26

	# populate the character count list of the right side of the cut, where the cut is before the first letter
	for c in s:
		lst1[ord(c) - ord('a')] += 1

	best_cut = 0
	for c in s:
		# after each letter that moves from the right side of the cut to another, subtract one from the corresponding letter count in lst1, and add one to the count in lst 2
		lst1[ord(c) - ord('a')] -= 1
		lst2[ord(c) - ord('a')] += 1
		current_cut = 0
		# calculate the cut commonality for this specific cut, by iterating through the lists and summing the minimum counts for each letter
		for i in range(26):
			current_cut += min(lst1[i], lst2[i])
		best_cut = max(current_cut, best_cut)

	return best_cut

Runtime Analysis

We iterate through the string twice, so the computational complexity is $O(n)$ where $n$ is the length of the string s. Because the length of the lists is constant, calculating the cut commonality for each split is $O(1)$ .

Follow Up Answer

Now that the alphabet is not fixed, instead of using an array, the data structure being used would be two hash maps, with the character being the key and the character count being the corresponding value. A hash map retains the $O(1)$ lookup time, and is able to expand in size. However, the runtime of this algorithm now depends on the number of distinct characters in the string, with a maximum runtime of $O(n^2)$ .