Chapter 5: Review of Studies on Auditory Graphing Techniques

Review of Studies on Auditory Graphing Techniques

There have been several important studies producing sonified graphical information. Most notably are those by Flowers & Hauer , Flowers, Buhman, & Turnage , and Mansur, Blattner, & Joy. [Flo96, Flo97, Man85]

There is a large field devoted to the representation of data with sound. Generally, this field falls under the heading of Auditory Display, and can encompass a wide range of sound representations such as the use of auditory cues ("earcons") as locators to more direct representations of data. The field is large enough to warrant annual meetings of the International Conference on Auditory Display (ICAD) with published proceedings. [ICA94]

The quest to find a useful auditory data display has been approached from many fields such as mathematics, chemistry, computer science, as well physics. From the diversity of auditory display techniques, it is readily apparent that no single display method will suffice for all data to be presented, just as no single visual graph type works for all data. The following studies are those that directly relate to auditory techniques that would otherwise use two dimensional plots.

Pollack & Ficks

One of the first studies concerning auditory displays of information was performed by Pollack & Ficks. [Pol54] In their paper they investigated the relationship between auditory display stimulus aspects in order to find a satisfactory procedure for increasing the information that can be transmitted from elementary auditory displays. The basic task of their subjects was to identify different qualities of sound stimuli. There were eight sound qualities tested using tones and noise: frequency ranges of the noise or tone, loudness of the noise or tone, rate of alternation between noise and tones, duration of tone display and fraction of time tone was on, and direction of origination of the tone. Subjects were students and military personnel. The sounds were binary coded, in that the tones were either high or low, alternation rates fast or slow, for sound intensity levels were loud or soft, etc. In half of the conducted tests, subjects responded as they listened to the display, while in the other half of the tests, they responded after the sounds finished.

Pollack & Ficks reported that their subjects found the multidimensional displays easy to learn, especially the binary coded displays, and that subjects tended to associate the sounds with verbal symbols ("chirping birds"). They also reported that the multidimensional displays were able to effectively transmit more information than unidimensional displays, but that there was little improvement in information transmission when the dimensions were subdivided (degrees of loudness or alternation rates.) The average error was lowest for the binary comparison of frequency of the tone, at 0.08%. This rate was dramatically lower than for the other dimensions studied. The next lowest values were for sound duration (0.9%) and repetition rate (1.1%).

Their conclusion was that the use of multiple stimulus dimensions is a satisfactory method for increasing the transmission of information via auditory displays. Also, that it was more useful to have a greater number of binary coding dimensions rather than subdivision of only a few dimensions.

Mansur, Blattner & Joy

Mansur, Blattner, & Joy [Man85] reported on a very significant study for providing data representation through sound. Their study, which generally provided the template for the current investigation, used sound patterns to represent two dimensional line graphs. They were investigating a prototype system to provide the blind with a means of understanding line graphs similar to printed graphs for those with sight. This study utilized auditory graphs that had a three second continuously varying pitch to present the graphed data. The auditory graphs were also compared to engraved plastic tactile representations of the graphs. The authors cited research by Stevens, Volkmann, & Newman [Man85, p.**] on the pitch response of hearing that showed an exponential relationship between pitch and perceived height.

Mansur, Blattner & Joy found in their study that there were difficulties in identifying secondary aspects of sound graphs such as the slope of the curves. They suggested that a full sound graph system should contain information for secondary aspects of the graph such as the first derivative, but they suggested encoding this by adding more overtones to the sound to change the timbre. They also suggested utilizing special signal tones to indicate a graph's maxima or minima, inflection points, or discontinuities.

Their main study consisted of several comparison tests to indicate the effectiveness of sound versus tactile graphing methods. These consisted of comparing the slope of lines, straight vs. exponential lines, monotonicity, convergence, and symmetry. There were fourteen subjects, half of whom were blind. The sighted subjects were blindfolded for the tests. The subjects were tested with one presentation method, and then retested with the second method. There was random assignment as to which method was encountered first. The order of question presentation was also randomly assigned.

The results were that the tactile graphs had a small, but statistically significant, advantage to the sound graphs in overall accuracy (88.3% vs. 83.4%.) This difference appears to come mainly from the test on comparison of straight lines vs. exponential curves with a 12% difference in the accuracy (96% vs. 84%), and the test of weather a graph was converging to some limiting value which had a 9% difference in the scores (89% vs. 80%).

Lunney & Morrison

Lunney & Morrison [Lun90] describe an auditory alternative to visual graphs in order to provide access to instrumental measurements. Their system was to convert infrared chemical spectra patterns into musical patterns. The translation first converts the continuous spectrum into a "stick spectrum" in which absorption peaks are replaced with lines representing location and intensity. The spectrum is then mapped to a chromatic scale with the infrared frequency converted to pitch. The sound map is played in the form of two patterns. The first is to play from highest pitch to lowest, with intensity represented by note duration. The second pattern is to play the spectrum in order of decreasing peak intensity, with equal note duration. The first pattern is played twice, and the second three times. The six strongest peaks are also played together as a chord at the end. The authors mentioned that this was an effective technique for chemical analysis of spectra.

Frysinger

Frysinger [Fry90] wrote a review paper detailing various research approaches to data sonification. The bulk of his review describes data sonification, the areas of psychoacoustics (the psychology of hearing,) and sound perception issues. Several of the articles that were reviewed are were summarized above. From the review he provided some very general indications for research direction and methodology. These suggestions are utilization of synthetic data generation for parametric control, consideration of the object for immediate investigation, the use of two sessions to compare the effectiveness of two display types, and using forced choice type questions.

Minghim & Forrest

For more complex sound mappings, Minghim and Forrest [Min95] presented a review of several studies and an analysis of data sonification development. They mentioned the following areas where sound can be a useful tool in aiding data visualization: adding further dimensionality to data, alternate perceptual properties, additional interactive processes, inherent time dimension for data, use of sound as a validation process, and increasing the ability to remember data due to additional modal encoding. They also described a sonification program called SSound which implements a number of sound functions for aiding surface based data analysis. Various surface properties were mapped to sound qualities such as pitch for density, rhythm for change in a function, and timbre for data correlation. Sound was produced on quadraphonic speakers for information depth. This system required some training, but the authors did not report any formal results as to the effectiveness of the system.

Wilson

A similar program to provide data sonification is the Listen data sonification toolkit described by Wilson. [Wil96] The primary goal of this program was to provide a flexible sound toolkit for use in sonification research. The Listen program is an object oriented modular system designed on SGI workstations incorporating MIDI sound libraries. Listen was designed to be a component for incorporation into other data visualization programs. The main modules of the Listen program are: Interface, Control, Data Manager, Sound Mapping, and the Sound Device modules. The Interface only interacts with the Control module, which then interacts with the other three. With this program, data fields can be mapped to four types of sound parameters pitch, duration, volume, and location. Pitches used the semitone scale. Data could also be given timbres relating to various MIDI instruments for further diversification.

Flowers & Hauer

There are several important studies relating to the success of auditory graphs for display purposes. Flowers and Hauer produced a set of studies investigating the perceptual similarities between visual and auditory graphs.

The first paper [Flo92] consisted of a single experiment to study how effective information about central tendency, variability, and the shape of data distributions could be portrayed with auditory graphs versus as a visual graph. Data in this experiment were presented as visual histograms, auditory quartile displays, and visual histograms. The auditory histograms presented the data distribution with the numeric value mapped to pitch, and the frequency of a data value mapped to the number of times a note was repeated. The visual histogram was presented on a computer screen as text characters, with the numeric value mapped to the x axis, and the frequency of the data distribution presented with vertical stacks of asterisk symbols. The auditory quartile displays were a musical analogue of the Tukey box and whiskers drawing that coded the minimum, first, second, and third quartile, and maximum data values as a set of five musical notes.

Twelve psychology graduate student subjects performed 132 comparison trials in each of three sessions, with only one presentation modality per session. The subjects gave a 1 to 10 similarity judgment rating for each of the graph comparisons. The judgments were based on differences in central tendency, variability, and the shape of the data distribution.

This study specifically investigated the perceptual structure of plots through dissimilarity judgments of a graph’s slope or level when depicted by visual versus auditory displays. The study consisted of three tests, labeled Experiments 1, 2, and 3. Experiment 1 investigated student's ability to distinguish visual graphs, while Experiment 2 investigated auditory graphs. Experiment 3 was similar to 1 and 2 except that it provided a more sensitive evaluation between visual and auditory graphs. Results showed that the auditory histogram and quartile display (r= 0.36, 0.40 resp.) graphing techniques produced a far greater dissimilarity rating than did the visual histogram (r=0.06) graphs, but that the opposite was true for skew (r=0.39 vs. 0.11 and 0.06) and kurtosis (presence of long or short distribution tails, r=0.21 vs. 0.07 and 0.02). Judgments on the range of data values were similar for all three graph types.

The second study by Flowers & Hauer [Flo93] extended the first with two experiments investigating whether combined auditory and visual presentations enhanced discrimination of stimulus parameters, and whether the auditory quartile (Tukey box and whisker) plots provided an adequate distribution of information. In the first experiment, twenty-five paid student subjects, with normal hearing and vision, participated in a study similar to that conducted in their previous paper on comparative judgment analysis of visual and auditory histogram graphs. There were three display methods, visual presentation, auditory presentation, and a combined auditory and visual presentation, with the auditory histograms using the same method as previously described. Their results showed that visual graphs again had a greater reliability in the dissimilarity judgments than auditory graphs, and that there was no evidence that combined presentation led to a greater consistency of judgments than visual presentation alone. The second experiment consisted of the use of auditory quartile displays, slightly modified from the previous study in that these displays had an additional leading note, representing the median as a prefix to the five note system. A comparison of dissimilarity judgments between the original quartile display method, and the leading note prefix method showed a greater attention to the median (r=0.58 vs. 0.20) but reduced attention to skew and range (r=0.20, 0.31 vs. 0.38, 0.41.) Thus, focusing on the central tendency came at the expense of other characteristics.

In their third paper, Flowers & Hauer [Flo95] compared the perception equivalence between auditory and visual graphs and the ability to convey information regarding the profile of changes of an independent variable.

At least two of the samples consisted of introductory psychology students at the University of Nebraska. In Experiment 1, there were 18 students (seven male, 11 female) who received credit for a research exposure requirement for their introductory course. Experiment 2 consisted of 14 student volunteers who were each paid $15. Experiment 3 consisted of two groups of eight and 11 students who were in a similar situation as those in Experiment 1. It was not stated if students in one Experiment were also in another, or what size of a class population that these students were drawn from; thus the number of students involved could be from 19 to 51.

There was some discrepancy between the methods for comparing the graphs in Experiments 1 and 2. In the first, students were instructed to sort 68 graphs into no fewer than three and no more than 10 categories. In the second, students used a pair-wise numeric (1-10) dissimilarity rating procedure of all possible pairings of 34 of the 68 graphs. Half of the subjects (seven) received one pairing set, and the other half compared the second set. The auditory graphs used the same data sets as the visual plots.

In Experiment 3, 16 graphs were used for comparison purposes. In this trial, both the visual and auditory graphs were compared in a pair-wise fashion. One minor problem was that the visual graphs to be compared were displayed at the same time, while the auditory graphs were displayed sequentially.

For Experiments 1 and 2, sorting and comparison data were converted into composite dissimilarity matrices and then submitted to multidimensional scaling (MDS) using the ASCAL routine on the SPSSX statistical package and plotted. A Kruskal stress value was also found in four dimensions. The article does not give adequate explanation on this. A Pearson correlation between the slope code and MDS scale values was also given. Experiment 3 used an individual differences scaling procedure (INDSCAL) option on the statistics program as well as the Kruskal value.

The authors’ conclusion was that the experiments illustrated a close correspondence between the perception of auditory and visual graphs with regards to gross differences in function shape as well as slope and level (height) perception. The main result of this study was as a first step, suggesting with some supporting evidence, that it is possible to use auditory graphs to display information but it did not demonstrate the understanding and interpretation of more complex auditory displays.

Turnage, Bonebright., Buhman, & Flowers

Turnage, Bonebright, Buhman, & Flowers [Tur96] reported on a study, comprising of two experiments, comparing the equivalence of visual and auditory representations of periodic numerical data. The first experiment investigated whether equivalence of auditory vs. visual presentations of wave form stimuli would parallel that reported for other graph types. Twenty six undergraduate psychology student subjects participated to fulfill a course research requirement. The subjects were divided into two groups of 13, for each graphing method. Graphs consisting of 100 data points, were constructed with three shape patterns (sine, square, or combination), three frequencies (high - 8 cycles/100 data points, medium - 6/100, and low - 4/100), and two amplitudes (high, and low) for a total of 18 graphs. The visual graphs were constructed with Microsoft Excel and presented via overhead transparencies. The auditory graphs were played as a series of 100 musical notes with a two octave range. Pitch as the y axis and time as the x axis. The auditory graphs had a 6 second duration.

The subjects were presented with the task of providing similarity ratings for all independent pairs (153) of graphs. They were initially presented all 18 graphs in random order for familiarization, and given three practice trials. They then rated the graph pairs on a 9 point similarity scale for each of the three conditional dimensions (1: shape, 2: amplitude, and 3: frequency.) Coefficients of congruence (CC), interpreted like correlation coefficient, revealed that the visual and auditory graphs were very similar for all three condition dimensions (CC1= 0.96, CC 2= 0.98, CC 3 = 0.94.) Thus, the two graphing methods have high similarity for difference discrimination. There was also some indication of slightly greater discrimination between sine and composite wave patterns with the auditory display than with the visual display.

The second experiment investigated the relative performance accuracy of visual and auditory graphs on a task involving discrimination between similar wave forms. Thirty eight undergraduate psychology student subjects participated to fulfill a course research requirement. The subjects were divided into two groups of 19, for each graphing method. The graphs were constructed and presented as in the first experiment. Forty pairs of wave form graphs were selected for a comparison task. Subjects were sequentially presented with two graphs, A and B, and then presented with a third graph, X, from which they were to determine whether X was the same as A, B. or neither. The subjects were given three practice trials for familiarization. Results showed a significant difference in the performance scores of the two groups with the Auditory graph group average of 81% correct, and the Visual graph average of 96% correct.

Flowers, Buhman, & Turnage

Most recently, a study relating to auditory graphs for display purposes was done by Flowers, Buhman, and Turnage [Flo97] which investigated the equivalence of visual and auditory scatter plots to explore bivariate data. Their study consisted of two experiments, the first examining the relationship between visual and auditory judgments for the direction and magnitude of correlation for 24 bivariate data samples.

The first experiment used 45 unpaid advanced undergraduate psychology student volunteers. 19 of the subjects, in groups of three to eight, judged visual scatter plots of data samples, while the remaining 26 were assigned in groups of five to 16 to judge auditory scatter plots of the same data. The graphed data samples consisted of 50 random numbers about a Gaussian distribution with a mean of 50. Some of the data samples were given transformations to produce various correlations between the resulting 24 sample plots. The standard deviation within data samples ranged from about 6.2 to 11.6. Sound generation was constructed using Microsoft Excel to compute sound generation parameters for use in the CSound program. The auditory graphs had a five second duration, with individual data points represented by 0.1 second guitar pluck notes. The X axis was represented by time and the Y axis by a linear mapping to the chromatic pitch scale ranging one octave below, to two octaves above middle C.

Subjects rated the magnitude and sign of the correlation in the graphs by marking on a rating scale at a point corresponding to the magnitude of the perceived correlation between the variables. The judgment data was recorded as a distance from the zero point on the scale. The visual graphs were presented for 10 seconds, while the auditory graphs were played twice (total of 10 seconds of listening time.) Pearson's correlations between the actual correlation and the judged correlation were computed for each subject. Results from this first experiment were that the mean correlation between judgment and actual Person's correlation was 0.92 for the visual group and 0.91 for the auditory group. A t-test showed no significant difference between the auditory and visual groups.

The second experiment was a direct evaluation between visual and auditory perceptual sensitivity to data points lying outside the main data groupings. This was accomplished by examining changes in the perceived magnitudes and direction of the correlation for scatterplots that were altered with the addition of data points. In this experiment, 32 advanced undergraduate psychology student volunteers participated, 20 in a visual graph group, and 12 in an auditory graph group. Eight data sets from the first experiment were modified by moving one data point, in half the data point was moved to an outlying position in the center of the plot, in the rest, the data point was moved to an extreme end of the plot. The eight original plots, the eight modified plots, and eight additional plots were used so the number of test stimuli equaled that used in the first experiment.

Of the 24 plots, two of the modified plots showed significant differences in the judgment of correlation magnitudes. The two were for plots where the outliers were for moderately correlated data samples rather than for weakly or strongly correlated data sets. Both auditory and visual conditions gave similar results. Thus, this study seems to indicate that judgments between correlation effects for both visual and auditory scatter plots are very similar. Both are effective in conveying sign and magnitude of correlation, and are similarly influenced by error variance and single outliers.

Summary of Reviewed Literature

As has been shown here, and in the previous two chapters, there is a wide range of literature relating to the current research. Perhaps the most relevant studies are those concerning auditory graphing techniques, especially those by Mansur, and those with Flowers as a co-author. The subject material for the questions on which to base the graphs comes predominantly from those studies presented in the second section on physics graphs and concepts. Those studies presented relating the use of computers in the graphing process demonstrate that the student subjects are familiar with the computer as a graphing tool, and that it need not be presented as an unfamiliar object.

While the studies concerning graph perception may seem the least relevant, they serve as an underlying basis for the foundation of this work. It is important to keep in mind the common structures that people are familiar with when creating new representations for data display.

To Table of Contents
Next: Chapter 6
Last:Chapter 4