I've been thinking about how behavioral psychology might explain AI capabilities. Here's my working hypothesis:
I think emergence in LLMs comes from relational diversity. In this case, relations are the concept of verbally connecting things (stimuli, events, concepts, etc) in some way. This typically takes the form of comparison or hierarchy or many others.
Effectively, we can think of relations as kind of a graph, relating concepts to one another. Let's say that you start with two concepts like "dog" and "cat". Starting off, you generally think "cat" "smaller than" "dog" (physical spatially related). Add in "human" and you think of "human" "cares for" "cat" and "human" "cares for" "dog". As the entities in this hypothetical network grow, so too do the number of relations. And one of the key "insights" in human language is the fact that some derived relations have their own relations (basically, relations have relations). Just because you can relate human / dog / cat with the "smaller than" relation in a transitive way, doesn't mean that "cares for" is necessarily transitive. (IE, if you add "doctor", just because "doctor cares for human" doesn't mean "doctor cares for cat".)
So the diversity of learned relations grows the network, and the derived relations grow exponentially, namely because you can multiply each new relation by some multiple of existing relations. At some point, once you feed enough of these relations into a neural network, we may reach a point of saturation, where the network has enough connections to behave sufficiently impressive & practical to us.
My hypothesis is that corpora that are richer in relational diversity are one of the critical components of making LLMs smart. My other hopeful hypothesis is that LLMs also exhibit the same types of relational phenomena we find in humans, namely,
From an early age, my father taught me about behavior analysis, and how it could explain human behavior. So I believe it's an effective, though unorthodox at the moment, point of view to have about humans. I also think it's accurate and leads to insights we don't get from other theories.
And for this reason, I think it could be effective to apply to language models. One technique that has improved how LMs behave is reinforcement learning. And behavior analysis has at its core the concept of reinforcement and how it influences and changes behavior. It stands to reason that a theory around how reinforcement shapes and creates language could aid in understanding LMs.
My other hope is that by understanding how concepts from RFT apply and emerge in LMs could help us understand humans a bit better. And that they could act as somewhat dual models to influence how we conceptually understand one another.
As for some examples of what I see in LMs and how RFT could help with the explanation, I think that RFT could help in cases where LMs break down or fail to reason somehow. Basically, when an LLM fails to response in ways we would hope they would (and a human would do just fine), tracing the error back to a failure to relate would give us the first step for "debugging" these cases.
This is early thinking. I'll be exploring this over the coming months.