What's a Good Neural Network?

A question on answered on a Forum.

It depends on the problem you are solving. First of all, neural networks (NNs) are not always the best solution. A lot of times, other AI methods work better. So you should always try to use those simpler models first, especially for structured data. Now, going back to your question. It depends on:

  1. the type of data you are working with
    • If you are doing image classification you would use a different neural architecture than if you were doing text classification for example
  1. the amount of data you have
    • usually if you have more data, deeper and wider NNs work better. But if you don't have a lot of data, maybe you should stick to smaller NNs.
  1. the difficulty of the problem you are solving
    • for example hand written one-digit number classification is an easier task than hand written text transcription, which is again easier than predicting one's personality based on their hand writing. You can get really good results with only a few small layers for hand written number classification (see http://yann.lecun.com/exdb/mnist/ for example)
    • Usually you would use deeper and wider NNs for more difficult tasks (e.g. bigger NNs for hand written text classification than hand written one-digit number classification)
  1. Domain of the problem is also important
    • this ties to the previous point. For example predicting a disease based on person's information is harder than predicting which university someone goes to based on their high school grades and location.
  1. the amount of resources (time and computing power) you have
    • if you are training on a machine with no GPU, 16GB of RAM, and 8 cores, it probably takes you ages to train a very large model (e.g. with millions/billions of parameters). So in that case, you probably want to use a smaller NN, whereas if you are Google and can run your model on thousands of GPUs, then you can train bigger models
  1. I'm sure there are more factors that don't come to my mind right now, but these were the most important ones I could think of

It is true that in theory, given unlimited data you can get better performance by increasing the size of your model. But you never have unlimited data. You also don't have unlimited compute power and time. So larger models are NOT always better.

If you want to design a NN for a problem, you should look at existing literature to see what kind of architectures work best. For example, if you are working with images, you can usually get better performance by using (even a smaller number of) convolutional layers than dense layers.

There are no one-size-fits-all solution, and a lot of researchers say that "designing NNs is an art, not a science". But there are certainly methods that work better in certain situations – so I guess it is becoming more of a "science". Finding these "good" methods is up to you as a designer of a NN. You need to look at existing literature to get a feel of what type/size of layers works best in your scenario. Think of it like proving a theorem in math. Given a hypotheses, how do you prove/disprove it? Usually if you have seen theorems similar to the ones you are trying to prove, you can use similar tricks to get to your result. Designing NNs is very similar to that. You need to see a bunch of examples (and do a lot of research) to get a sense of what works and what doesn't.