Comments
**interviewer:** Welcome, can I get you coffee or anything? Do you need a break?

**me:** No, I've probably had too much coffee already!

**interviewer:** Great, great. And are you OK with writing code on the whiteboard?

**me:** It's the only way I code!

**interviewer:** ...

**me:** That was a joke.

**interviewer:** OK, so are you familiar with "fizz buzz"?

**me:** ...

**interviewer:** Is that a yes or a no?

**me:** It's more of a "I can't believe you're asking me that."

**interviewer:** OK, so I need you to print the numbers from 1 to 100, except that
if the number is divisible by 3 print "fizz", if it's divisible by 5 print "buzz",
and if it's divisible by 15 print "fizzbuzz".

**me:** I'm familiar with it.

**interviewer:** Great, we find that candidates who can't get this right don't do well here.

**me:** ...

**interviewer:** Here's a marker and an eraser.

**me:** [thinks for a couple of minutes]

**interviewer:** Do you need help getting started?

**me:** No, no, I'm good. So let's start with some standard imports:

import numpy as np
import tensorflow as tf

**interviewer:** Um, you understand the problem is *fizzbuzz*, right?

**me:** Do I ever. So, now let's talk models. I'm thinking a simple multi-layer-perceptron
with one hidden layer.

**interviewer:** Perceptron?

**me:** Or neural network, whatever you want to call it.
We want the input to be a number, and the output to be the correct "fizzbuzz"
representation of that number. In particular, we need to turn each input into a
vector of "activations". One simple way would be to convert it to binary.

**interviewer:** Binary?

**me:** Yeah, you know, 0's and 1's? Something like:

def binary_encode(i, num_digits):
return np.array([i >> d & 1 for d in range(num_digits)])

**interviewer:** [stares at whiteboard for a minute]

**me:** And our output will be a one-hot encoding of the fizzbuzz representation
of the number, where the first position indicates "print as-is", the second
indicates "fizz", and so on:

def fizz_buzz_encode(i):
if i % 15 == 0: return np.array([0, 0, 0, 1])
elif i % 5 == 0: return np.array([0, 0, 1, 0])
elif i % 3 == 0: return np.array([0, 1, 0, 0])
else: return np.array([1, 0, 0, 0])

**interviewer:** OK, that's probably enough.

**me:** That's enough setup, you're exactly right. Now we need to generate some training data. It would be
cheating to use the numbers 1 to 100 in our training data, so let's train it on
all the remaining numbers up to 1024:

NUM_DIGITS = 10
trX = np.array([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = np.array([fizz_buzz_encode(i) for i in range(101, 2 ** NUM_DIGITS)])

**interviewer:** ...

**me:** Now we need to set up our model in tensorflow. Off the top of my head I'm
not sure how many hidden units to use, maybe 10?

**interviewer:** ...

**me:** Yeah, possibly 100 is better. We can always change it later.

We'll need an input variable with width NUM_DIGITS, and an output variable
with width 4:

X = tf.placeholder("float", [None, NUM_DIGITS])
Y = tf.placeholder("float", [None, 4])

**interviewer:** How far are you intending to take this?

**me:** Oh, just two layers deep -- one hidden layer and one output layer.
Let's use randomly-initialized weights for our neurons:

def init_weights(shape):
return tf.Variable(tf.random_normal(shape, stddev=0.01))
w_h = init_weights([NUM_DIGITS, NUM_HIDDEN])
w_o = init_weights([NUM_HIDDEN, 4])

And we're ready to define the model. As I said before, one hidden layer,
and let's use, I don't know, ReLU activation:

def model(X, w_h, w_o):
h = tf.nn.relu(tf.matmul(X, w_h))
return tf.matmul(h, w_o)

We can use softmax cross-entropy as our cost function and try to minimize it:

py_x = model(X, w_h, w_o)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y))
train_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost)

**interviewer:** ...

**me:** And, of course, the prediction will just be the largest output:

predict_op = tf.argmax(py_x, 1)

**interviewer:** Before you get *too far* astray,
the problem you're *supposed to be* solving is
to generate fizz buzz for the numbers from 1 to 100.

**me:** Oh, great point, the `predict_op`

function will output a number from 0 to 3,
but we want a "fizz buzz" output:

def fizz_buzz(i, prediction):
return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

**interviewer:** ...

**me:** So now we're ready to train the model. Let's grab a tensorflow session
and initialize the variables:

with tf.Session() as sess:
tf.initialize_all_variables().run()

Now let's run, say, 1000 epochs of training?

**interviewer:** ...

**me:** Yeah, maybe that's not enough -- so let's do 10000 just to be safe.

And our training data are
sequential, which I don't like, so let's shuffle them each iteration:

for epoch in range(10000):
p = np.random.permutation(range(len(trX)))
trX, trY = trX[p], trY[p]

And each epoch we'll train in batches of, I don't know, 128 inputs?

So each training pass looks like

for start in range(0, len(trX), BATCH_SIZE):
end = start + BATCH_SIZE
sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})

and then we can print the accuracy on the training data, since why not?

print(epoch, np.mean(np.argmax(trY, axis=1) ==
sess.run(predict_op, feed_dict={X: trX, Y: trY})))

**interviewer:** Are you serious?

**me:** Yeah, I find it helpful to see how the training accuracy evolves.

**interviewer:** ...

**me:** So, once the model has been trained, it's fizz buzz time. Our input should
just be the binary encoding of the numbers 1 to 100:

numbers = np.arange(1, 101)
teX = np.transpose(binary_encode(numbers, NUM_DIGITS))

And then our output is just our `fizz_buzz`

function applied to the model output:

teY = sess.run(predict_op, feed_dict={X: teX})
output = np.vectorize(fizz_buzz)(numbers, teY)
print(output)

**interviewer:** ...

**me:** And that should be your fizz buzz!

**interviewer:** Really, that's enough. We'll be in touch.

**me:** In touch, that sounds promising.

**interviewer:** ...

# Postscript

I didn't get the job. So I tried actually running this
(code on GitHub),
and it turned out it got some of the outputs wrong! Thanks a lot, machine learning!

In [185]: output
Out[185]:
array(['1', '2', 'fizz', '4', 'buzz', 'fizz', '7', '8', 'fizz', 'buzz',
'11', 'fizz', '13', '14', 'fizzbuzz', '16', '17', 'fizz', '19',
'buzz', '21', '22', '23', 'fizz', 'buzz', '26', 'fizz', '28', '29',
'fizzbuzz', '31', 'fizz', 'fizz', '34', 'buzz', 'fizz', '37', '38',
'fizz', 'buzz', '41', '42', '43', '44', 'fizzbuzz', '46', '47',
'fizz', '49', 'buzz', 'fizz', '52', 'fizz', 'fizz', 'buzz', '56',
'fizz', '58', '59', 'fizzbuzz', '61', '62', 'fizz', '64', 'buzz',
'fizz', '67', '68', '69', 'buzz', '71', 'fizz', '73', '74',
'fizzbuzz', '76', '77', 'fizz', '79', 'buzz', '81', '82', '83',
'84', 'buzz', '86', '87', '88', '89', 'fizzbuzz', '91', '92', '93',
'94', 'buzz', 'fizz', '97', '98', 'fizz', 'fizz'],
dtype='<U8')

I guess maybe I should have used a deeper network.

Comments