Operant Conditioning and the Traditional Trainer
Dear Clicker Friends,
April, 2006
attentive dog
When I was training and competing in obedience, back in the Pleistocene,
correction and praise were the only training tools. The choke chain was
not just allowed, it was expected in the ring and obligatory in training
class. I learned the proper way to fit the choke chain, so it tightened
quickly when you jerked and released instantly when you eased off. I
dutifully learned the correct way to yank my young Weimaraner.
Nevertheless, we both enjoyed the classes and we did well together in
competition. Gus's tail wagged the whole time he was working. Everyone
thought I had one of those highly unusual dogs, the "happy-working" dog
that respects the trainer but has not been cowed by the training.
(Professional trainers still look for those dogs, trying and discarding
dogs that are too "soft," until they get one that can "stand up to the
work." Until recently, a serious cost factor in the training of guide
dogs, patrol dogs, and other working dogs has been the flunk-out rate,
the percentage of dogs that "shut down," or "just don't work out," and
have to be sold or given away as pets despite months or years of
investment of time and money.)
Then I became a dolphin trainer, and learned to use operant conditioning
and positive reinforcement to teach behavior. In fact we called our
training "operant conditioning," and presented it to our audiences as
news: a different way to train. It no longer mattered what kind of
individual we were working with, or even what species. Fish or fowl,
wild or domestic, fierce or friendly, timid or bold, they all responded
beautifully to this kind of training.
Operant techniques based on positive reinforcement and the use of a
conditioned reinforcer (the clicker) as an event marker spread into the
dog world in the '90s. At first we continued to call what we did
"operant conditioning," as the distinguishing factor. Then more and more
traditional trainers began pointing out that any dog training that works
is actually operant conditioning. Traditional, or as they say in the
medical world "standard-practice" trainers, with their correction-based
methods, are not using classical conditioning, instilling some
unconscious, automatic response such as drooling; they are teaching dogs
to carry out a suite of actions deliberately and consciously.
That's what Skinner's phrase, "operant conditioning," means: conscious,
purposeful learned behavior. Dolphins work to earn reinforcement.
Traditionally trained dogs work to avoid punishment. It's still operant
conditioning.
Correction-based operant conditioning
The instructors that Gus and I worked with in Hawaii in the early 1960s
were a cohesive group of colleagues. They were all Japanese-American
World War II veterans of the elite 442nd regiment and its K9 Corps. They
gave us two tools that greatly reduced the incidence of correction, and
that went a long way toward making all the dogs in all their classes
"happy-working" dogs. The tools were words, delivered in a calm voice
but with perfect timing. The instructors taught us how to use what
behaviorists would call a conditioned punisher, the word "no," meaning
"You will get jerked for that if you don't stop." They had also taught
us to use a conditioned negative reinforcer, the word "good," meaning
"Yes, that's right, so you're safe now, I won't jerk the leash."
If the dog began to do something wrong-get a bit ahead of you while
heeling, say-instead of jerking him back with the leash you said "no."
That gave the dog a chance to correct himself. If he now fell back in
place, you said "good" and kept your hand quiet instead of jerking. The
crucial element was the timing: both words had to be delivered exactly
as the dog was making the move. Not afterwards, even by a split second:
during. That way the dog knew exactly what was wrong and what was right.
By my use of these two words Gus quickly learned how to fix a crooked
sit or a too-wide turn, and how to heel precisely at any speed. Even
when he wasn't correcting an error I had many opportunities to reinforce
his actions with the word "good" while they were happening. Often, on
hearing my word "good" as he made the correct move, he'd throw me a
glance and a grin and wag his abbreviated tail. He was not working for
my approval, I was working for his! As a result, by the time we went
into competition Gus not only worked with great elan and precision, he
worked with his tail going a mile a minute the whole time.
What's missing today?
As traditional training filtered down to new generations of instructors,
the conditioned punisher and the conditioned negative reinforcer seem to
have vanished. Nowadays, in my observation, most people just wham the
dog without warning, so the dog has to guess what, if anything, he was
doing wrong. It takes a lot of repetitions for the dog to sort that out.
And the dog has no opportunity to correct his error himself, and thus
learn what to do right and be reassured by the praise word. No wonder
obedience training seems to take much longer, now, than it did then.
"Dolphins work to earn reinforcement. Traditionally trained
dogs work to avoid punishment. It's still operant
conditioning."
Often an even more basic operant element is missing: the timing that
makes the correction contingent on the behavior. I see people at
ClickerExpo walking down the halls popping their dogs automatically at
every change of direction or speed, completely unrelated to what the dog
is doing. I see class instructors calling for leash corrections long
after the misbehavior, so the student ends up punishing something the
dog is actually doing right. No wonder behavior deteriorates and has to
be constantly "brushed up" and retrained. No wonder dogs shut down and
people drop out; it hardly seems worth it.
Correction and perfection-the operant way
So, this is an invitation to all you competitive NON-clicker trainers
out there. How about starting to use the principles of operant
conditioning correctly, even if there's not a clicker in sight?
For example: You are heeling, and your dog lags, or forges, or swings
wide on the turn. Instead of correcting with a well-timed leash pop, try
using a conditioned punisher, with the same timing, so the animal has a
chance to repair the mistake. (If lagging, forging, and so on are
long-standing problems, just at first you should pat your thigh to help
him find the spot where he should be.) Then, as he moves into the
correct placement, use the word "good" as a conditioned negative
reinforcer, to mark his move as it's occurring.
You don't need to play-act a lot of emotion-disapproval in the "no," or
feigned joy in the "good." The information is so valuable that the two
words will both quickly become something the animal works to avoid (the
"no," which is a warning) and works to gain (the "good," meaning from
the dog's standpoint "Whew, I'm out of trouble!"). Praise and play with
him after the session, if you want to. But while you're training, just
tell him what he needs to know. And watch his face and his tail; he'll
tell you when he approves of what you're doing.
Of course I hope this glimpse into the powers of operant conditioning
will lead people to pick up a clicker and explore the powers of positive
reinforcement. But until then, I'd love to see traditional trainers at
least taking advantage of the immutable laws of operant conditioning,
for faster learning, for better outcomes, and for the sake of the dogs.
Happy Clicking!
Karen Pryor |