Deep learning methods lose their ability to learn over time
Researchers at the University of Alberta have confirmed that deep learning systems lose their ability to learn over time — and they’ve proposed a solution to fully harness the power of this form of artificial intelligence.
“It’s this understudied phenomenon observed in artificial neural networks that when you train them for a really long time, they start losing their ability to learn,” says J. Fernando Hernandez-Garcia, a PhD student in the Department of Computing Science and co-author of the recent study published in Nature.
“We were the first to show a demonstration of that phenomenon and explain why it is a big deal for our current systems.”
Shibhansh Dohare, first author on the paper and computing science PhD student, explains that plasticity — the ability of these systems to learn new things — hinges on the vast sets of neurons used to train them. To maintain plasticity in deep learning, an algorithm that constantly introduces new, diverse neurons is needed.
Teaching old algorithms new tricks
That’s precisely what the U of A team created with continual backpropagation, a newly designed algorithm that improves on backpropagation, the algorithm that has been used for decades to train these types of systems.
“Our algorithm seems to be very effective at preventing the problem of plasticity loss,” says Hernandez-Garcia.
Continual backpropagation works by going through the artificial neurons in the network and assessing them based on how useful they are. It then reinitializes the neurons it has ranked as least useful, essentially resetting those neurons’ various connections to the larger neural network and bringing back the original level of plasticity.
“The basic idea is that in the beginning, when it was learning one or two tasks, the network had plasticity. But then it’s lost over time. That means the initial pattern of connections was useful, it was enabling plasticity. Reinitializing brings it back,” says Dohare.
“A similar idea — neurogenesis — is already observed in neuroscience, in human brains and other animal brains, but it’s a mechanism that wasn’t being used in deep learning,” adds Rupam Mahmood, assistant professor in the Department of Computing Science and Canada CIFAR AI Chair at Amii.
The connections between the neurons are represented by something called connection weights, which function like synapses do in biological neural networks. Different connection weights signal different connection strengths.
“The algorithms change the strength of those connections, and that’s how learning happens in these networks,” says Dohare.
Plugging a leak in learning ability
It is challenging to obtain thorough evidence of plasticity loss because, as Dohare explains, “you have to run these experiments for a long, long time and it requires a lot of computational power.” With this study, the researchers showed that the system’s performance on later tasks was noticeably worse than on earlier tasks, confirming the loss of plasticity when using backpropagation.
“It’s naturally expected of automated systems that they’re able to learn continuously,” says Mahmood. “It’s quite important for the world to know that the predominant method used in deep learning, backpropagation, is incapable of doing that. We should worry about that.”
A familiar example of deep learning is ChatGPT, which gets trained on particular data sets and gains a great deal of knowledge about those topics. But if you ask it about something that happened after that training session, such as a current news event, it wouldn’t be able to provide accurate information. “That’s because it’s not continually being trained on the new data that comes in,” says Hernandez-Garcia.
“Current deep learning methods are not actually learning when we are interacting with the system — they’re frozen in time,” says Mahmood.
Without the ability to learn continually, deep learning systems would need to be retrained whenever there’s a substantial amount of new data. This requires an immense amount of computational power and comes at an astronomical cost. That’s why the solution lies in new algorithms that help ensure the network retains its learning capacity over time, says Mahmood.
“Our work is sort of a testing ground for what we are expecting in the future, which is deep learning systems being employed in the real world and learning continually.”
link