Cognitive Theory of Multimedia Instruction and the Modality Principle

3 replies [Last post]
shaffer
shaffer's picture
Offline
White BeltYellow BeltGreen BeltRed BeltBlack Belt
Joined: 2009-05-28
Posts:
Points: 2009

Personally, I have always been skeptical about using audio narration for instruction. Perhaps I was biased early by too many bad YouTube screencapture videos that take 5 minutes to illustrate how to use a software feature when 30 seconds of text could have explained it better. This issue is crucial to OpenDSA in terms of our module development. The success of the Khan Academy videos is having a big impact on education. But what does this mean to us? Do KA videos succeed because that is inherently a good approach, or do they succeed for reasons such as novelty, and being the only alternative within their niche?

I have recently come across some important research literature that addresses precisely this issue. Two good papers that summarizes the research are:

They essentially cover the same ground, but I find the first one a bit easier to read, while the second has more up-to-date information.

The bottom line for us is that the research shows that, for tutorials like ours, audio narration linked to the on-screen visualizations is superior to text explaining the visualizations. There are also a lot of secondary results relevant to how we would implement our modules. Some specifics include:

  • Do not add text duplicating the audio content. That turns out to be merely a distractor from the visualization.
  • Break the audio segments into very short (a few seconds) clips, so that the student can process what they heard. So for example, just as we decided that the Shellsort module should have text interspersed with short bits of slidehsow as opposed to one long slideshow, an audio implementation of this module should have a series of short audio clips, perhaps one for each bit of slideshow, instead of a single audio tutorial that is several minutes long.
  • Any text labels should appear immediately with what is being labeled, and not as a caption down below or otherwise separated from the graphic.
  • The style of presentation (whether audio narration or text) should be less formal, speaking in first or second person directly from the narrator to the listener rather than more formal third person.

I have started thinking about re-implementing some of our demonstration modules such as Shellsort to include narration. Possibly we could have both a text version and a narrated version to do some initial testing. When thinking about how the narration might go, it occured to me that the words that I would use to convey the narrated version just seems better than the current "textbook" style that I used. What I am thinking is to write the narration script, and then make the text version match that. Some advantages to maintaining a text version include:

  • An evaluator (whether a potential student or instructor) can skim the text and images to see if they are interested, whereas they cannot really do that with a video.
  • Sometimes you are in an environment where you cannot listen to something.

This will be easier to assess once we have examples. But what do you think about this?

 

naps
Offline
White BeltYellow BeltGreen BeltRed Belt
Joined: 2009-06-11
Posts:
Points: 65
Re: Cognitive Theory of Multimedia Instruction and the ...

There is an alternative that is mid-way between no audio narrative and a complete, pre-recorded audio narrative.  That is to use speech synthesizing software to generate small audio cues that can alert the AV viewer that a particularly important event is about to happen. For example, have the synthesizer say something like "Be sure to watch what happens to the value 4 in the next operation."   The advantage of synthesized speech is that a string can be created dynamically that includes particular data values that may have been generated as part of the visualization, and then that string can be spoken in a way that is aware of the particular data in this instance of the visualization.   Such speech synthesis software is available for Java — see freetts.sourceforge.net/docs/index.php .   I suspect there’s probably something similar for Javascript?

 

The obvious disadvantage of speech synthesis is that it cannot provide a high-quality complete narration of an algorithm such as that accompanying the classic "Sorting Out Sorting".   It seems to me that both kinds of narration are worth exploring as time allows.  

 

A paper "A study of algorithm animations on mobile devices" by Hurst, Lauer, and Nold in SIGCSE 07 actually presented significant statistical evidence that low-grade audio queues improved student learning in watching animations of heap sort on a variety of devices.

 

ville
ville's picture
Offline
White BeltYellow BeltGreen BeltRed BeltBlack Belt
Joined: 2009-05-28
Posts:
Points: 559
Re: Cognitive Theory of Multimedia Instruction and the ...

naps wrote:

There is an alternative that is mid-way between no audio narrative and a complete, pre-recorded audio narrative.  That is to use speech synthesizing software to generate small audio cues that can alert the AV viewer that a particularly important event is about to happen. For example, have the synthesizer say something like "Be sure to watch what happens to the value 4 in the next operation."   The advantage of synthesized speech is that a string can be created dynamically that includes particular data values that may have been generated as part of the visualization, and then that string can be spoken in a way that is aware of the particular data in this instance of the visualization.   Such speech synthesis software is available for Java — see freetts.sourceforge.net/docs/index.php .   I suspect there’s probably something similar for Javascript?

 

The obvious disadvantage of speech synthesis is that it cannot provide a high-quality complete narration of an algorithm such as that accompanying the classic "Sorting Out Sorting".   It seems to me that both kinds of narration are worth exploring as time allows. 

 

For JavaScript, there is speak.js (see online demo). This highlights the disadvantage quite clearly. The quality is quite low, and listening to explanations using this would not be enjoyable. I do like the idea of using speech synthesis, especially in the development phase of the AV. Once the AV is complete, it would probably make sense to pre-record the audio narrative.

Ville Karavirta, Aalto University, http://villekaravirta.com/

shaffer
shaffer's picture
Offline
White BeltYellow BeltGreen BeltRed BeltBlack Belt
Joined: 2009-05-28
Posts:
Points: 2009
Re: Cognitive Theory of Multimedia Instruction and the ...

I imagine that, once we get our system down, creating narration/screencasts should be fairly easy.