AI, Artificial Recognition and Parameterized Generation

The writer of this finished AI at Utrecht University and worked at the UvA building neural network simulations of the mechanism of emotions in humans, and is up to date with much of AI’s current methods.

The AI field suffers from the same weakness as every other field with high “magic” content : Overpromising and exhageration of what is achieved. When “magic” plays a role in a field, meaning when it is hard for non-experts to understand what is being claimed, some ‘experts’ will start to exploit this space usually to increase revenue or draw attention. AI today really is not close to what I would call human intelligence, and that is not because of the lack of depth of knowledge, but of the basic mechanism by which ‘AI’ is achieved.

I would like to split up the idea of AI so that it becomes more clear what we are talking about. Some will argue I a are guilty of moving the goal posts, as has happened a lot in the history of AI development. Every time some new milestone was achieved (like a chess match being won) some would say “But this is not real AI!” and the field would have to start over. This is not a critique like that. We argue that real AI has certain qualities that are not even approximated by current algorithms, and only once they are can we speak of real AI.

The hype of today in AI is ‘Deep learning’. This is field of training a new type of neural network (LSTM) consisting of so called Long Short Term Memory nodes on datasets. This kind of system do two things : Recognizing and Generating. When used as a recognizer the LSTM network learns during multiple small adaptions based on data what output to generate in response to certain input. The nice thing about these networks is that you can provide any input and output, even whith unknown correlation, and as long as there is some consistency between the two the network will capture it eventually. This we would call Artificial Recognition (AR but sadly that clashes a bit with Augmented Reality).

The field of Artificial Recognition is highly dynamic and new methods are developed and published continously. Another proces that we would separate out from AI/Deep Learning is Parameterized Generation. What happens is that a LSTM network is build that takes in a video feed and sharpens the image. This is done by first providing a video feed that has been blurred as input and the original feed as output. The network will first compress the data in the video feed to a minimal set (depends on the image quality you wish to accomplish) and then it will take that minimal set and translate it back into an image. Another example is to increase the lighting in images. This is done by providing images that are artificially dimmed as input, and undimmed images as output. The same compression and expansion is learned by the network.

The expansion part, from a minimal set of values towards an image, is parameterized generation. If you change the start values you can generate a variety of images. Of course they will all be in the trained image space, so if your images where cats you will generate all kinds of wierd cats by varying the start values. Of course it is possible to make a system that can generate all images of a certain size, but it would be extremely large and hard to train.

Another type of Parameterized Generation occurs in 3D modelling, where an avatar or human figure is not specified completely, but simply has parameters like arm length, waist, head size etc. Most ‘Deepfake’ videos use a network that has internalized a parametric model of a face or body, and the challenge becomes to find the parameters that make the body image match a video feed, which is a much smaller challenge than learing how the pixels correlate. Neural networks, LSTM and Deep Learning algorithms can do this. The variety of examples is similar to that possible with binary representation in computers. If you ask what you can represent with bits of zero and one the answer is “Everything”.

But Artificial Recognition and Parameterized Generation are not Artificial Intelligence. The mistake made is that the process of arriving at the algorithms would require smarts in a human, a proces of changes towards a perfect output involves rejection of false outcomes, checking etc. To understand the process this is what our mind projects on the algorithms, and this is correct, but the algorithm is not intelligent. It is adaptive, to a highly detailed level we can barely comprehend (and some say we should not try to).

What is real AI then? Real AI involves action. The ‘dataset’ is not static, it is generated by the AI through its output, its movements. A real AI has internal drives, so it does not stop or start like any tool humans use. It certainly uses recognition and generation mechanisms but not the type Deep Learing is used to. They are too inflexible. Real AI gets angry. I wrote about this, a real AI system will have to push ahead with an action in spite of not having enough information to know it is safe. It will push ahead with an action when it believes it is safe but it can be wrong. Humans use a lot of energy to avoid harm to others and to themselves. The AI will have to be able to. So you can define a task like “Spot the tank in the field” and use Artifical Recognition and use the output to instruct a Warthog A-10 plane into action. You can not let the Warthog decide for itself where to fly and where to find tanks. That becomes way to complicated super fast and the software needed to make it work would be slow and unreliable.

Now it is possible to do small tasks and task repertoires with less ‘insight’ and not surprisingly this is how our society is organized. After all if you talk about real AI you talk about what humans do. The capacity of a brain is limited, its learning capacity is as well. Evolutionary mechanisms cause it to learn less quickly as you age. A real AI can at first only be allowed to work in a confined space on a specific task, or it has to be pysically very weak.

A striking feature of our memory is that one moment can be recorded in a split second, an impressive experience, an accident or moment of recognition, and then be available to us for a lifetime. This feature was one of the real braincrackers for me when I was studying the brain. How is that even possible? LSTM networks can’t do it. They also can not progress towards algorithms that can, because key aspects of it prevent it. From recent research I can claim that theoretically we are not far off, but even then the hardware is going to be a challenge. For now we may have to accept systems of recognizers, parameterized generators and ordinary code is the best we can do.