>> The key takeaway? Depth is important for certain functions. I thought the key...

shawntan · on April 18, 2023

> I thought the key takeaway is that nonlinearity is important. Multilayer perceptrons can be collapsed into an equivalent single layer if there isn't nonlinearity thrown in. multi-input XOR can be solved using a form of sin(x) function after summing in a single layer (sin squared if you don't like negative numbers).

Some element of this is used in this paper: https://arxiv.org/pdf/2202.12172.pdf

More broadly, as I've alluded to in other replies, XOR is not the problem in itself, but illustrates a bigger problem.

Yes, non-linearities can solve problems, but one hidden layer and the wrong non-linearity (sigmoid instead of sin in this case) may not. Which leads us to the question: What do we include in the broad set we label "non-linearities"? All Turing-complete functions? (This issue is alluded to here: https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00493...)

Then we're back to analysing the computability of a given non-linearity, which in itself has multiple steps of iteration. It's turtles all the way down.

phkahler · on April 18, 2023

>> What do we include in the broad set we label "non-linearities"?

Recursion/iterarion/thinking time.