In order to fully discuss results of previous studies, it is advantageous to mention some of the terminology that is brought up in the reviewed literature.
In some of the literature about learning graphical information, references are made to Piaget's theory of Development. Very briefly, this theory states that children develop through four main stages: sensorimotor, preoperational, concrete operational, and formal operational. Each stage has large changes in the understanding of concepts such as conservation, classification, and relations. The sensorimotor stage is from 0 to 2 years, and consists of the formation of simple understanding of classes and relationships. The preoperational stage is from 2 to 7 years, with the ability to represent ideas in language and mental imagery. The concrete operational stage is from 7 to about 12 years, and consists of the development of conservation of quantities, time, seriation, and class inclusion. Finally, the formal operational stage has the development of consideration of all possible outcomes to a situation, relating actual outcomes to logically possible outcomes, the ability to plan ahead, and systematic experimentation. There have been some important exceptions to Piaget's classification, but in general, it is a reasonable first approximation for developmental processes. [Sei91]
In social contexts, many researchers refer to the population of their subjects. In this case, population refers to a generalization of all of the possible people who could be tested (all students, all first year freshmen, all first year physics undergraduate students at OSU, etc.) There is usually an emphasis on the most restrictive category. Another consideration is the sample size. Depending on the situation, the sample will either refer to individual subjects, or to class groupings. A determination of the number of samples, n, is roughly given by:
where s is the standard deviation of the expected results, and L is the error tolerance. Usually L = 5% of the average value to yield a 95% confidence limit. With an n given by (\2.1), there is a 95% probability that the average value will fall within the limits set by L.
[Sne89, p. 52]
Conversely, given n and s , one can find that the inherent error limits on the measurement:
Data collection is through the use of various experimental instruments. In the case with studies about learning or response evaluations, the instruments are questionnaires, tests, observations of individuals, or a combination of these techniques. Questionnaires are measurement techniques for items which are not directly observable, such as attitudes, motivations, and feelings. They are documents which ask the same questions of all subjects. Tests are structured situations measuring performance to yield numerical scores. The results can then be analyzed for differences in the items measured by the test. Observations are recordings of various activities, where the person acting as the observer either may count action frequencies, make some inference from subject's actions, or rate the quality of those actions. [Gal96, p. 332, 767 - 774]
As with any research, there are issues concerning the how useful, appropriate, and consistent inferences that can be made from the data, and how reproducible the data is. These issues are quantified by statements about a tests validity and reliability. Major validity measurements include construct, content, and predictive.
Construct validity is a measurement of how well a test can be shown to evaluate what is claims to measure. There is no single technique for establishing this measurement; it is developed through multiple types of evidence. Content validity of a test is established through review by experts who judge whether the test covers the content which the research claims. Predictive validity is the ability of a test to correctly predict the outcome of future behavior based on past test results. [Gal96 p. 247-259]
To evaluate the reliability of a testing method, the most common technique is to utilize the product-moment correlation coefficient, also called the Pearson r value, or simply r. This is a measurement of how closely two variables are related. It is a quantitative expression of the similarity between two groups and is given by the expression:
where the sum is over the number of pairs.
When this technique is combined with a grouping of test questions into two parts, a method of evaluating the reliability of the test arises. This is called split half reliability. In this case, the results of half of the questions form the value, while the other half form the value. The reliability coefficient can vary between -1 and 1, with values above 0.8 being most desirable. Values of 1 are perfectly correlated, values of -1 are anti-correlated, and values of 0 are uncorrelated. [Sne89, p. 177-195]
Once validity and reliability for a test is established, it is important to determine if any of the differences seen in scores are real or are only the effects of random fluctuations. This is accomplished by looking at the statistical significance of a study, which usually involves testing and rejecting a null hypothesis, H0 when it is false. The null hypothesis states that there is no real difference between two groupings. Thus, if a study is statistically significant, at some preset probability level, it would be a false statement to say that two groups are equal. This gives no indication, or validation, on the magnitude of the difference other than to reject the null hypothesis. [Sne89, p.62]
The probability value, p, indicates the significance level. A p < 0.05 is the most common level for research, but p < 0.10 is sometimes used for exploratory studies, or a p < 0.01 is occasionally used as a very stringent value. [Gal96, p. 183]
One method for testing the null hypothesis with the Student's t distribution, which is a two tailed test. This test measures whether two sample means are distinct when a population standard deviation is not known. The t-test is given by:
where n = number of samples, m = mean of a data group, is the sample mean (mean of a set of means of data groups, or mean of a second group), s is the estimated standard deviation given by:
where n-1 is also called the degrees of freedom. The calculated t value is compared to tabulated values to find the probability level. If the tabulated probability is greater than the level of significance, the null hypothesis is not rejected.
[Sne89, p. 64-71,466]
If many comparative tests are required, an Analysis of Variance (ANOVA) is required. This is a statistical procedure that compares the amount of variance between groups to the variance found within groups. If the ratio of the variance between the groups to within the groups, called the F or variance ratio, yields a nonsignificant value, then use of t-tests to compare pairs of means is not appropriate. [Gal96, p. 392]
The data's F value should be greater than the calculated (tabulated) F value due to the number of degrees of freedom (number of samples and number of comparisons) for significance at a given probability level. [Sne89, p. 223]
Analysis of covariance (ANCOVA) is a technique that combines the features of analysis of variance and regression and is used for modeling complex interactions with multiple classifications.
When reviewing literature concerning auditory display methods, such as those used in this study to produce the auditory graphs, several terms are useful to mention.
Sonification is the term that is generally used when related to the process of converting data into an acoustic format for presentation. This can occur by several methods. One of which being to directly play the data as in a recording of sound intensity patterns. For example, the air pressure on a membrane can be recorded via a voltage measuring device, this variation can then be used to directly drive a speaker system to reproduce the sound. In some cases, a pressure recording can be applied to variation patterns that occur over a very long, or very short period of time, such as for earthquakes, astronomical data, or vibrational modes in structures. The data can then be time compressed or expanded so that the resulting fluctuations will drive a speaker at an audiable frequency. This process is sometimes referred to as Audification.
Another method of sonifying data is to let individual data points represent some quality of sound. This process is referred to as mapping the data to sound and can be accomplished in a variety of ways.
Often, the data's value will be represented with a tone at a specific frequency, or pitch. There are two standard methods to do this: a linear mapping, in which the data have a linear correspondence to frequency, and a logarithmic mapping, called a Chromatic scale mapping, which is found in musical instruments.
Other sound qualities may be also utilized, such as amplitude (volume), attack (onset of the sound), note duration, decay (how the sound fades away), timbre (clarity of the note, affected by the incorporation of higher harmonics), brightness (amplitude of factors influencing the timbre), the spatial location, vibrato (a slight oscillation in the pitch, usually at about 1 to 10 Hz), or by the modification of the sound's wave pattern to approximate noises heard in the world such as instruments or environment noises (doors, footsteps, sirens, etc.). By combining the different qualities of sound, many researchers hope to concisely portray multi-dimensional information.
When multiple sounds are played, for example, if two data sets are represented by unique instrument sounds, the sound is sometimes noted as having multiple voices. If the sound is located at a particular location in space, through the use of stereo or quadraphonic speakers, it may be referred to as a beacon.
The sound used in auditory research is produced through a variety of methods, usually through computer generation of sound files which are then played back to the subjects. Two sound generation techniques are generally employed, construction of the sound's wave pattern and storage as a file type, and the use of triggered events to play pre-stored sound samples. When the complete sound pattern is constructed, Microsoft's Wave format is commonly used, as well as Sun computer's AIFF format. These file types are often identified with the .wav or .aiff extension to their file name. Many popular programs, for example CSound, will construct Wave pattern files given some initial parameters. Wave, AIFF, and other types of sound files, tend to have a high fidelity, or the ability to reconstruct the intended sound, as they are akin to digital recording techniques. The disadvantage to these files are that they can become quite large and do not readily lend themselves for random access of the sound data.
When sound is constructed through the use of triggered sound samples, the General Musical Instrument Digital interface (GMIDI) format, which is an extension of the MIDI protocol, is the most commonly used method. This popular format specifies certain data streams to trigger sound events and can be found as a standard interface on many musical devices as well as computer sound systems. One of the difficulties with MIDI is sound fidelity due to the different methods used to record, store, and generate the stored sound samples that various sound systems employ. Another problem is sound resolution, as there are only 128 sound frequency steps available. Advantages of MIDI are that it is a common sound platform, has very small file sizes, and allows for random access of the sound file.
The representation that sound can employ can range from analogic to symbolic. An analogic representation is a direct correspondence between the data and the sound, such as pitch to height. A symbolic representation is where the sound represents the structure of the procedure by which the data was computed, such as an alarm. Symbolic representations may also employ the use of metaphorical association, such as a dripping sound as alert that there is a memory leak in a program. [Kra94, p. 1-79, 185-222]
Parts of the current study employ the use of computer jargon. Browser refers to a program designed to access information pages on the World Wide Web network. The most common of these programs are Netscape, Microsoft's Internet Explorer, Opera, and Mosaic. The browsers often employ a special sub-program called a plug-in, which increases the functionality of the browser program. One such plug-in is Apple's QuickTime, which is allows sound and movies in the QuickTime format to be displayed as an item contained within the web page. Microsoft's ActiveX is a set of control modules that a plug-in can use to enhance the ability of the Internet Explorer program.To Table of Contents
Copyright 1999 Steven Sahyun