Jane Wang on the Learning salon. Some of her interesting works:
- Meta-learning in natural and artificial intelligence
- PFC as a meta-RL system
- TL;DR: Dopamine modulates PFC, which learns task structure.
Meta-learning as priors learned from previous experience that help inform faster learning or better decisions. @7m24s Also from overview of meta learning:
After trained over a distribution of tasks, the agent is able to solve a new task by developing a new RL algorithm with its internal activity dynamics.
Meta-learning in ML @9m50s
- Gradient-based, e.g. Model-Agnostic Meta Learning (MAML). Also cf. Tom Zahavy’s work at DeepMind and Robert Lange’s overview.
- Non-parametric, e.g. Matching nets (Vinyals)
- Blackbox / memory based (LSTM that tunes inner learner), e.g. Jane’s work on PFC as a meta-RL system
Memory-based Learning to RL (L2RL). @11m55s Concept of having two loops. Harlow task that Botvinick also likes to discuss, @13m14s. Monkey able to “one-shot” pick the correct object after initial structure learning. They applied this to RL agents (freeze NN). @15m22s [Curious: did anyone try to do Harlow’s experiments with more complicated tasks? E.g. every 2nd time the correct object is green or something. I guess that’d be testing monkey’s intelligence moreso than ability to meta-learn.]
Neurosci: @20m3s
- Innate (priors): place cells, intuitive physics, motoric, language propensity
- Learned behavior: can arise out of innate processes
Schema learning: knowledge more easily acquired if you already have a framework on which to scaffold that knowledge. Schemas and memory consolidation (2007). No free lunch. Inductive biases help with faster learning only in structured environments, when you can make assumptions about future problems.
Open questions in meta-learning:
- Which inductive biases, where from?
- Do we need to learn them only once?
- Does having memory automatically result in meta learning?
- OOD generalization.
Discussion
Is “DeepMind + their algorithm” the real meta-learning?
Jane: meta-learning is easier to evaluate than continual learning.
Ida: There is a clear mathematical notion of meta-learning. Neuroscience/psych idea of meta-learning is more relaxed. @47m58s There’s no one psych. fn. in humans corresponding to meta-learning. @1h47m43s
- ML algos learning from other ML algos.
- Stacking / ensemble learning.
- Multi-task learning.
John: a lot of phenomena looking like meta-learning are just retrieval. @50m19s Hard to disambiguate retrieval of previous experience vs. meta. @57m8s [I guess he’s talking about a person recognizing “I’ve already done this task”.] There is a single “impulse” where agents “learn” the task. @53m34s
John: What is a task? Is there a hidden shared structure? @58m20s How do we get a generalization-free notion of “task”? Jane: meta-learning task have notorious weaknesses. There is no good theory to formulate how related two tasks are. Let’s instead say “everything is a task distribution”. [That doesn’t actually help that much though?] Characterize learning instead of focusing on pure metrics. @1h35m36s E.g. we haven’t yet characterized GPT-3. Task as a “set of parametrizations of an environment” @1h51m25s, a bit like what DM is currently doing in their Open-Ended Learning Team.
John: about coaching. @1h1m23s Good teachers “project” meta learning? [I think it’s about a particular way of presenting knowledge. Good teachers capture a good way to model the distribution of learners.]
Ida: Sakai and Passingham discuss task “domains” and “rules.” @1h5m35s E.g. domain of numbers, but rule can be font color or the actual quantity. Descriptive quality in cogsci is much richer than what exists in ML. We should enrich how tasks and distributions of tasks are defined.
Rules for generalizing? @1h9m3s Jane: When you meta-learn something, you narrow down on a set of hypotheses.
Meta learning works assuming underlying task structure similarity. @1h12m50s In the case of unrelated tasks, you still need a lot of data. Ida: transfer learning and meta learning is not the same. The task of the meta-learner is different. The goal of the meta-learner is to set the params of your base learner.
Question about evolution being a sort of meta-learning. @1h20m14s
John: Learning requires plasticity, but a plastic change doesn’t imply learning. Something needs to have phsyically changed for learning to happen. @1h26m12s
Melanie: meta-evolution (mutation has its own parameters). Genetically, they are represented in the same way (?). Are learning and meta-learning different mechanisms? @1h29m38s
John: no. @1h30m54s Cerebellar mechanism learns task structure the 1st time. Basal ganglia dependent mechanism is responsible for retrieval. Patients with Alzheimers don’t perform better repeating tasks cmp. to control. Initial learning and subsequent meta learning processes are dissociated.
Base and meta learners could be from the same set of learners, so long as the meta learner has access to some task distribution as input. In the brain, no species have an equally evolved PFC.
Automatic vs controlled processing. @1h40m47s How might it relate to psychological constructs? Can learning systems affect each other?
John: within-task vs. between-task learning. @1h43m44s Don’t try to play Rachmaninoff’s 3rd first time you sit by a piano. Meta-learning seems to be morphing to “intelligence” in this salon. @1h57m29s “Going beyond what you’ve already learned”. [Yes, to some extent meta-learning is exactly intelligence.]