When I was looking for a simple programming task that I can use to evaluate candidates, I had the following criteria in mind:

- solvable within 10 min
- does not require obscure algorithmic knowledge
- has potential for further discussion
- not language specific

I went through my list of simple problems (I only considered the easy ones) and I stumbled across Pairs. I instantly realized, that it is the perfect question. There is a brute force solution, the specification can be expanded, we can talk about sorting, hashing, hash collisions, Big-O notation and cases when the input does not fit in memory. Even Bloom Filters are a viable topic of discussion.

My first interview was planned and I was thrilled. I would use the challenge and it would be great! The first three candidates I exposed to the task failed horribly. They could not implement the brute force solution O(n^2) without bugs in the code. I was *horrified*. Is the industry really this bad? Can programmers really not program?

In the last 3 years I used this little coding task in almost 50 interviews and the failure rate is alarming. Only *two* people managed to solve it flawlessly. They both accepted the offer and I have been thrilled to work with them ever since. There have been others that got stuck in the weeds but nevertheless showed an ability to think. Kudos to those! I will expand on the definition of flawlessly below.

I adjust the -pass- criteria accordingly to the person’s experience (people who graduated a long time ago might not remember what Big-O is), title (did they code in the last job?), future job duties (are they supposed to code?) and the prowess with which they claim that they know how to program. Support engineers that have had limited exposure to python and shell are fine with something that resembles code. People that have been in the industry for a decade better be ready to go through at least 3 iterations of increasing complexity.

Before reading on, I suggest that you go and solve that challenge. I will wait….

Solved? Good! Let’s look at my variation with a deliberately vague specification.

Given an array N of integers such as [1,3,5] and an integer K=2, find all pairs in N whose difference is K. You can use any language you want, as long as I can understand it.

A good candidate will instantly ask a subset of the following questions:

- is the array sorted?
- how big is the array?
- are there duplicates?
- are all numbers positive?
- what is the range of values in N?
- how much memory can I use?
- is K positive?

Less than 5 people ever asked me additional questions. Most rush to the brute force implementation. Finding pairs in a sorted array can be done in O(N) time. If the range of values is only 0-127, counting sort could be the optimal way forward.

Every software engineer should understand the problem that she is solving. Very often the problem is much easier than it seems at first. Sometimes it is the other way round and something that seems trivial blows up into a monstrosity. Figuring out the -what- should always come before writing the first line of code.

Languages such as C++ have a lot of accidental complexity. Rather than trying to solve the problem at hand, you will have to deal with nulls, end iterators and the general lack of syntactic sugar. This is the main reason why this blog is using Python even though I am a full time C++ dev. Don’t even get me started on Java…

When you jump to the implementation, without having a clear plan in mind, it is very easy to get lost in the complexity of the language. You can loose sight of the forest due to all the trees.

Before you start writing any code, you should have a clear vision of the algorithm. Knowing the limitations of the language (C++ set is actually a tree rather than a hash set) is important for identifying bottle necks.

Let’s look at a possible verbal explanation of the O(NlogN) solution for no duplicates; unsorted array; fits in memory; random 64 bit integers; would look like:

As we iterate over the array, we are looking for possible pair candidates. We know that if we encounter a value V, a pair has to be either V+K or V-K. Ideally the lookup has to be as fast as possible. We also know that pairs are symmetric, 3/5 and 5/3 are both valid pairs. We should only count them once, but it does not matter which value we find first. This brings us to a conclusion, that we can iterate over the array and check whether a possible pair candidate has already been encountered. We store the values in a tree or a hashset, depending on the risk of collision.

Once you know what you are solving and how you are solving it, you can implement it. This process might result in a code like this:

def pairs(k, arr): cnt = 0 encountered = set() for v in arr: if v+k in encountered: cnt += 1 if v-k in encountered: cnt += 1 encountered.add(v) return cnt

When the algorithm comes first, the implementation tends to communicate the intention of the code much clearer.

If you have been in software engineering, you might have noticed that we barely ever write any code. We talk about code, we fight over code, we review code and most of the time, we *read* code. Writing code is so rare, that some people might not have done it in months.

“Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. …[Therefore,] making it easy to read makes it easier to write.” ― Clean Code: A Handbook of Agile Software Craftsmanship

It is therefore natural, that writing code on a whiteboard is an exceptionally stressful situation. We actually have very little exposure to writing code in the first place.

Since 2016 I have been recommending all engineers to write code in their spare time. Either find an open source project and contribute to it, start a little project of your own or work on programming challenges. I prefer the last option, since it actually exposes the person to the complexity of the job.

If you play music, you might know how important it is to know your notes, scales and arpeggios. Being consistent with the basics allows us to build on top and learn much more complex pieces. Similarly, engineers should be fluent with their basic tool: algorithms and data structures.

As with anything else, mastery comes with practice. **Practice writing code!**

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

Find the number of ways that a given integer, X , can be expressed as the sum of the Nth powers of unique, natural numbers.

For example, if X = 13 and N = 2, we have to find all combinations of unique squares adding up to 13. The only solution is 2^2 + 3^2.

time complexity is O(N!)

space complexity is O(1)

This solution does not use DP or memoisation, but it does not time out. We know that it needs to explore all P! possible combinations where P is floor(sqrt(X, N)). Since maximum X is 1000, this should still compute in real time.

Keep in mind that all integers in the solution have to be unique. So for X = 3, the solution 1^2 + 1^2 + 1^2 is not acceptable.

Did I mention that I dislike both NP-Hard problems and recursion? No? So, now you know.

def powerSum(X, N, current = 1): pw = pow(current, N) if pw > X: return 0 elif pw == X: return 1 else: return powerSum(X, N, current+1) + powerSum(X-pw, N, current+1)

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

Whenever George asks Lily to hang out, she’s busy doing homework. George wants to help her finish it faster, but he’s in over his head! Can you help George understand Lily’s homework so she can hang out with him?

Consider an array of m distinct integers, arr = [a[0], a[1], …, a[n-1]]. George can swap any two elements of the array any number of times. An array is *beautiful* if the sum of |a[i] – a[i-1] among 0 < i < n is minimal.

time complexity is O(N*log(N))

space complexity is O(N)

Let us rephrase the problem to a sorting problem: Find the number of swaps to make the array sorted. We need to consider both ascending and descending arrays. The solution assumes no duplicates.

def cntSwaps(arr): positions = sorted(list(enumerate(arr)), key=lambda e: e[1]) swaps = 0 for idx in xrange(len(arr)): while True: if (positions[idx][0] == idx): break else: swaps += 1 swapped_idx = positions[idx][0] positions[idx], positions[swapped_idx] = positions[swapped_idx], positions[idx] return swaps def lilysHomework(arr): return min(cntSwaps(arr), cntSwaps(arr[::-1]))

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

A gene is represented as a string of length N (where is divisible by 4), composed of the letters A, C, G, and T. It is considered to be *steady* if each of the four letters occurs exactly 1/4 times. For example, GACT and AAGGCCTT are both steady genes.

time complexity is O(N)

space complexity is O(1)

A bit of trivia: I’ve looked at this problem a couple times 2 years ago and I could not figure it out. Sometimes it is good to take a long break

In the first pass I establish how many occurrences there are of each character. Then I establish how many are missing to make the string steady. Remember that the number of occurrences needs to be evenly distributed.

The second pass takes advantage of the caterpillar method. I am looking for the shortest string that contains all the extra characters. They need to be replaced by the missing characters. All others can just be replaced with the same character. The caterpillar extends as long as it does not hit one of the extra characters. It shortens if it contains more extra characters than are needed.

def steadyGene(gene): min_length_string = len(gene) occurences = dict() occurences['A'] = 0 occurences['G'] = 0 occurences['C'] = 0 occurences['T'] = 0 expected = len(gene) // 4 for g in gene: occurences[g] += 1 for x in occurences: occurences[x] = max(0, occurences[x] - expected) if occurences['A'] == 0 and occurences['G'] == 0 and occurences['C'] == 0 and occurences['T'] == 0: return 0 found = dict() found['A'] = 0 found['G'] = 0 found['C'] = 0 found['T'] = 0 tail = 0 head = 0 while head != len(gene): found[gene[head]] += 1 if found['A'] >= occurences['A'] and \ found['C'] >= occurences['C'] and \ found['G'] >= occurences['G'] and \ found['T'] >= occurences['T']: # this is a valid candidate min_length_string = min(min_length_string, head-tail+1) # try to shorten it while found[gene[tail]] > occurences[gene[tail]]: found[gene[tail]] -= 1 tail += 1 min_length_string = min(min_length_string, head-tail+1) head += 1 return min_length_string

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

Sherlock considers a string to be *valid* if all characters of the string appear the same number of times. It is also *valid* if he can remove just 1 character at 1 index in the string, and the remaining characters will occur the same number of times. Given a string , determine if it is *valid*. If so, return `YES`

, otherwise return `NO`

.

time complexity is O(N)

space complexity is O(N)

This is one of the easier medium problems. Create a character occurrence map. Then create an occurrence-occurrence map. If all the occurrences are the same (size is 1) the string is valid. If there is exactly one character that occurs exactly once, it is also valid. Otherwise invalid

def isValid(S): char_map = Counter(S) char_occurence_map = Counter(char_map.values()) if len(char_occurence_map) == 1: return True if len(char_occurence_map) == 2: for v in char_occurence_map.values(): if v == 1: return True return False

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

Amanda has a string of lowercase letters that she wants to copy to a new string. She can perform the following operations with the given costs. She can perform them any number of times to construct a new string p:

- Append a character to the end of string p at a cost of 1 dollar.
- Choose any substring of p and append it to the end of at no charge.

time complexity is O(N)

space complexity is O(N)

The solution sounds too easy, but it is still very simple. A substring of length 1 is still a substring. Each character in the final string needs to be copied once for 1$. Each other occurrence of that string can be copied for 0$. Aka just count the number of distinct letters in the expected string.

def stringConstruction(s): return len(set(s))

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

A weighted string is a string of lowercase English letters where each letter has a *weight*. Character weights are 1 to 26 from a to z…

time complexity is O(N)

space complexity is O(N)

Parsing the string for every query is suboptimal, so I first preprocess the string. Now we know that uniform strings contain the same characters. A string can be of length 1. Do a single pass of the string and create all uniform substrings.

def weightedUniformStrings(s, queries): weights = set() prev = -1 length = 0 for c in s: weight = ord(c) - ord('a') + 1 weights.add(weight) if prev == c: length += 1 weights.add(length*weight) else: prev = c length = 1 rval = [] for q in queries: if q in weights: rval.append("Yes") else: rval.append("No") return rval

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

We say that a string contains the word `hackerrank`

if a subsequence of its characters spell the word `hackerrank`

. For example, if string s = haacckkerrannkk it does contain `hackerrank`

, but s = haacckkerannk does not. In the second case, the second `r`

is missing. If we reorder the first string as , it no longer contains the subsequence due to ordering.

time complexity is O(N)

space complexity is O(1)

Keep two pointers. One to the expected string (needle) and one to the input string. If you find the needle in the haystack before you run out of characters, you are good.

def hackerrankInString(s): needle = 'hackerrank' idx_in_needle = 0 for c in s: if c == needle[idx_in_needle]: idx_in_needle += 1 if idx_in_needle == len(needle): break if idx_in_needle == len(needle): return "YES" else: return "NO"

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

Sami’s spaceship crashed on Mars! She sends a series of `SOS`

messages to Earth for help.

time complexity is O(N)

space complexity is O(1)

We know that the message is basically a lot of concatenated SOS strings. There is no magic to this one.

S = raw_input().strip() errors = 0 for i in xrange(len(S)): if i % 3 == 0 and S[i] != 'S': errors += 1 if i % 3 == 1 and S[i] != 'O': errors += 1 if i % 3 == 2 and S[i] != 'S': errors += 1 print errors

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>

Louise joined a social networking site to stay in touch with her friends. The signup page required her to input a *name* and a *password*. However, the password must be *strong*. The website considers a password to be *strong* if it satisfies the following criteria:

- Its length is at least 6.
- It contains at least one digit.
- It contains at least one lowercase English character.
- It contains at least one uppercase English character.
- It contains at least one special character. The special characters are:
`!@#$%^&*()-+`

She typed a random string of length n in the password field but wasn’t sure if it was strong. Given the string she typed, can you find the minimum number of characters she must add to make her password strong?

time complexity is O(N)

space complexity is O(1)

This is a pythonesque solution by Jay Mistry. The O(4*N) complexity might be suboptimal, but it showcases a nice use of the language.

def minimumNumber(n, password): count = 0 if any(i.isdigit() for i in password)==False: count+=1 if any(i.islower() for i in password)==False: count+=1 if any(i.isupper() for i in password)==False: count+=1 if any(i in '!@#$%^&*()-+' for i in password)==False: count+=1 return max(count,6-n)

If you enjoyed this post, then make sure you subscribe to my Newsletter and/or Feed.

]]>