Why do we flatten?

FEBRUARY 13, 2022

My most successful machine learning video so far, according to YouTube's metrics of success, is The Flatten Layer, Explained—a three-minute clip from a live stream that graphically explains how the flatten operation works and why there's a need to flatten neural network layers.

In a nutshell, the flatten operation squeezes tensors of complex shapes into a one-dimensional array of digits, a list of numbers, that contains all of the numbers on the previous layers without their structure. The result is a flat encoding of a given set of features that may have been extracted by former network operations.

For instance, you can pass an image through a series of dense or convolutional layers to extract image features, but then flatten the outputs of those operations to represent them in one dimension. Each image that passes through the network would in the end be represented by the same number of digits, often called codings or encodings.

In the case of Autoencoders, you would train an encoder to encode images into 1d codings, training a feature extractor. You would then train a decoder to decode those codings into the original images again, effectively, constructing an image generator. A workflow that has been proved to work well at noise reduction and super-resolution tasks, but not at data compression.