Quantum
WHAT IS QUANTUM AND WHAT DOES IT DO?..................................................................................................................4 Stages in a Quantum run:...........................................................................................................................................4 Basic Elements In Quantum........................................................................................................................................5 Different Number types that can be used in Quantum:...............................................................................................7 Variables and arrays...................................................................................................................................................7 Data variables.................................................................................................................................................... 8 Integer variables................................................................................................................................................. 8 Real variables..................................................................................................................................................... 9 Subscription..................................................................................................................................................... 10 Expressions...............................................................................................................................................................11 Arithmetic expressions....................................................................................................................................... 11 Combining arithmetic expressions........................................................................................................................ 12 Counting the number of codes in a column...............................................................................................................14 Generating a random number ..................................................................................................................................15 Logical expressions...................................................................................................................................................15 Comparing data variables and data constants..........................................................................................................16 Fields of data variables.............................................................................................................................................19 Checking the arithmetic value of a field of columns.................................................................................................22 Combining logical expressions..................................................................................................................................23 Comparing variables and arithmetic expressions to a list.........................................................................................26 Naming lists..............................................................................................................................................................28 Speeding up large programs.....................................................................................................................................28 How Quantum reads data..........................................................................................................................................29 Types of record............................................................................................................................................ 29 Ordinary records.......................................................................................................................................... 29 Multicard records.......................................................................................................................................... 29 Multicard records with Trailer Cards.........................................................................................................................30 Reading data into the C array.............................................................................................................................. 30 Ordinary records............................................................................................................................................... 30 Multicard records............................................................................................................................................... 30 Ignoring card types........................................................................................................................................... 30 Processing the data...................................................................................................................................................30 Changing the contents of a variable..........................................................................................................................31 Trailer Cards .............................................................................................................................................................31 Allread............................................................................................................................................................. 32 firstread and lastread......................................................................................................................................... 32 Reserved variables............................................................................................................................................ 32 Describing the data structure for Multicard records............................................................................................ 32 Record type...................................................................................................................................................... 32 Ordinary Records......................................................................................................................................... 33 Multicard Records......................................................................................................................................... 33 Record length................................................................................................................................................... 33 Serial number location....................................................................................................................................... 34 Card type location............................................................................................................................................. 34 Required card types........................................................................................................................................... 35 Repeated card types.......................................................................................................................................... 35 Highest card type number................................................................................................................................... 36 Dealing with alphanumeric card types................................................................................................................... 37 Merging Data using Quantum....................................................................................................................................38 Merge sequence for Trailer Cards......................................................................................................................... 38 Merging data files.............................................................................................................................................. 38 Merging complete cards...................................................................................................................................... 39 Merging a field of data from an external file........................................................................................................... 40 Writing out data........................................................................................................................................................42 Print files......................................................................................................................................................... 43 Printing out individual records............................................................................................................................. 43
Writing Out Parts of Records............................................................................................................................... 46 Data files......................................................................................................................................................... 47 Creating new cards............................................................................................................................................ 48 Some General Instances for forcecoding cleaning etc..............................................................................................48 Writing to a report file........................................................................................................................................ 48 Assignment statements...................................................................................................................................... 49 Copying codes.................................................................................................................................................. 50 Assignment with and, or and xor......................................................................................................................... 51 Adding codes into a column................................................................................................................................ 52 Deleting codes from a column............................................................................................................................. 53 Forcing single-coded answers.............................................................................................................................. 53 Setting a random code in a column...................................................................................................................... 54 Reading numeric codes into an array.................................................................................................................... 54 Clearing variables.............................................................................................................................................. 57 Flow control...................................................................................................................................................... 58 Statements of condition...................................................................................................................................... 58 Examining records............................................................................................................................................. 61 Holecounts....................................................................................................................................................... 61 Frequency distributions...................................................................................................................................... 61 require............................................................................................................................................................ 63 Column and code validation................................................................................................................................ 63 Comments with require...................................................................................................................................... 64 Checking codes in columns................................................................................................................................. 65 Exclusive codes................................................................................................................................................. 65 Automatic error correction.................................................................................................................................. 66 Validating logical expressions.............................................................................................................................. 67 Testing the equivalence of logical expressions........................................................................................................ 68 Actions when a require statement fails.................................................................................................................. 68 Data correction................................................................................................................................................. 69 Forced editing (forced cleaning)........................................................................................................................... 69 Introduction to the tabulation ..................................................................................................................................70 The hierarchy of the tabulation section................................................................................................................. 71 Components of a tabulation program.................................................................................................................... 71 Run control statements...................................................................................................................................... 71 Defining run conditions....................................................................................................................................... 71 Table control statements.................................................................................................................................... 73 Creating a table................................................................................................................................................ 73 commonly used options in tab section................................................................................................................... 74 Axis control statements...................................................................................................................................... 75 factors............................................................................................................................................................. 78 Miscellaneous ‘n’ statements............................................................................................................................... 78 More commands to generates counts.................................................................................................................... 79 The col statement.............................................................................................................................................. 79 The val statement............................................................................................................................................. 80 The fld statement.............................................................................................................................................. 80 Weighting in Quantum..............................................................................................................................................81 Weighting methods............................................................................................................................................ 82 Types of weighting............................................................................................................................................ 82 Descriptive statistics.................................................................................................................................................84 Quanvert...................................................................................................................................................................84 Structure of Quantum Spec:...........................................................................................................................................87
WHAT IS QUANTUM AND WHAT DOES IT DO? Quantum is a highly sophisticated and very flexible computer language designed to simplify the process of obtaining useful information from a set of questionnaires. So it converts technical information collected by using questionnaires into managerial Information by programming Quantum performs a variety of tasks. It can: ►
check and validate the data
►
edit and correct the data
►
produce different types of lists and reports of data
►
produce new data files
►
recode data and produce new variables
►
generate tables
►
Perform statistical calculations.
Stages in a Quantum run:
A. First, the data is read onto a disk. Data on disk can come from a number of different sources, for example: o
It may be entered directly via a terminal by a telephone interviewer using Quancept CATI.
o
It may be collected over the World Wide Web using software such as Quancept Web.
o
It may be entered directly into a computer by an interviewer conducting a personal interview using Quancept CAPI.
o
It may be entered by a data entry clerk using a data entry package.
B. Next, the tasks to be performed are defined using the Quantum language. C. Then, Quantum translates these tasks into instructions that the computer can understand.
D. Finally, the computer itself uses this program to run your job.
Quantum comprises two sections – an edit and tabulation section. The edit section checks and validates the data, generates lists and reports, corrects data, produces new data files, and recodes data and creates new variables. The tabulation section produces tables and performs statistical calculations. Quantum reads the records in the data file one at a time and passes them through the various parts of the Quantum program. As long as there are records remaining in the data file, the loop of ‘read a record -edit - tabulate’ is repeated; once the last record has been processed, the tables are ready for printing.
Basic Elements In Quantum There are three basic elements in Quantum: o
Data constants
o
Integer numbers
o
Real numbers
Which are stored in variables: o
Data variables store data constants
o
Integer variables store whole numbers
o
Real variables store real numbers
Individual constants An individual constant is one or more of the codes 1234567890–& or blank. The – is sometimes referred to as the 11 or X punch, and & is sometimes called the 12, V or Y punch. Each code represents one answer to a question. For example, let’s take the question ‘What is your favorite color?’ which has the response list: Red
1
Yellow 2 Blue
3
Green 4 Black 5 White 6 These codes are coded into one column. If my favorite color is green, this will appear in the data file as a 4 in the appropriate column, just as if your favorite color is red, there will be a 1 in that
column. To refer to these answers inside your Quantum program (maybe we only want our table to include those respondents whose favorite color is blue), type in the code enclosed in single quotes: ’3’ You will also have to tell Quantum which column to look in. Several codes may be combined in the same column and are called multicodes.. Multicodes or multicoding mean two or more codes in the same column. Suppose the next question asks me to choose three colors from the same list; I pick yellow, black and white. If these answers were all coded in the same column (a multicoded column), They would be referred as : ’256’ or ’526’ or ’652’
Or Any other variation of those three codes. Quantum does not care what the codes are entered in. If you have a series of consecutive codes in the order &–01234567890–& you may either type each code separately or you may enter the first and last codes separated by a slash (/) meaning ‘through’, as shown below: ’1/7’ means ’1234567’ ’&/4’ means ’&–01234’ ’&/9’ means ’&–0123456789 (all 12 codes) ’1/&’ means ’1234567890–& (all 12 codes) As you can see, the last two examples mean exactly the same thing. However, the notations ’0/&’ and ’0–&’ are not the same: ’0/&’ means ’01234567890–&’ whereas ’0– &’ is ’0’, ’–’ and ’&’ only. Some combinations of codes represent ASCII characters; that is, they represent characters which you can type on your screen: ’&1’ is the equivalent of ’A’ ’&2’ is the equivalent of ’B’ The only time you would use letters rather than codes (i.e., ’A’ rather than ’&1’) is when the questionnaire tells you that a column should contain a letter. Sometimes we may need to write a notation for ‘no codes’ – for instance, if my favorite color does not appear in the list of choices. To do this, we write ’ ’ (i.e., a blank enclosed in single quotes). Strings of data constants To refer to a string of codes in a field of columns, it has to be provided between two “$” signs: e.g. $codes$
When data constants are single-coded or the multicodes correspond to ASCII characters (e.g. A’, ’B’) they may be strung together. Strings of data constants are sometimes called literals or column fields. Strings are enclosed in dollar signs, with the component single codes losing their single quotes. For example: $12345$ $ABC$ $916 7&$ The first string is five columns long with 1 in the first column, 2 in the second, 3 in the third, and so on. The third string is six columns wide with the fourth column being blank. Instances when strings might be used are: • When we want to refer to a questionnaire serial number • When the answers to a question are represented by codes of more than 1 digit. For example, in a car ownership survey the car make and model owned may be represented by a 3-digit code. To pick up respondents owning a particular type of car you would need to check whether the relevant columns contained the code for that car. For instance, to look for owners of Ford Escorts you might ask Quantum to search for the string $132$ in a particular field of columns.
Different Number types that can be used in Quantum:
Quantum can deal with whole numbers (integers) in the range -2,147,483,647 to 2,147,483,647.
Real numbers are numbers containing decimal points. To be valid, they must have at least one digit on either side of the decimal point: 0.1 and 1.0 are correct .1 and 1. are not Quantum deals with real numbers of any size with accuracy up to six significant figures. Numbers with more than six significant figures have the sixth figure rounded up or down depending on the value of the remaining figures. 96.82529 is rounded to 96.8253 189462.1 is rounded to 189462.0
Variables and arrays There are three types of variables – data, integer and real – each used for storing different types of information. You may create your own variables with names representing the type of
information stored (e.g., the variable called meals might contain a count of the number of meals eaten during the day) or you may use the ones offered automatically by Quantum. Sometimes it is useful for a series of variables to have the same name. Each variable may then be addressed by its position in the group. This arrangement is known as an array.
Data variables To define a data variable, type: data var_name sizes <
> At the start of every job, Quantum provides you with an array of 1,000 data cells called C. This array is sometimes referred to as the C matrix. The individual cells are called C-variables. Each C-variable stores one ‘column’ of data. Quantum reads data from your data file into this array. Let’s say we have a very small questionnaire which uses 43 columns to store the data. Quantum will read the data for each respondent into cells 1 to 43 of the C array, one respondent at a time. The codes from column 1 of the data are copied into cell 1 of the C array, the codes from column 2 of the data are copied into cell 2, and so on. When Quantum has finished with that respondent’s data it clears out the cells in the C matrix and reads the data for the next respondent, placing it in cells 1 to 43 of the array we can access this data by defining the columns whose contents we wish to inspect or change. Let’s take the questions about color that we mentioned earlier. The printed questionnaire tells us that the respondent’s favorite color will be coded into column 15, to look at this column we would write: c15 or c(15) C-variables are reset to blank before a new respondent’s data is read. Thus, you can be certain that Quantum never muddles the contents of column 10 for the first respondent with those of c10 for the second respondent. As we mentioned above, you may create your own data variables to store specific pieces of data. For instance, in a shopping survey we may want to store data about visits to Sainsburys in an array called ‘sains’ and data about visits to Safeways in an array called ‘safe’ Before we can use these arrays, we must create them. If each array is to contain 100 cells or column of data, we would write: data sains 100s data safe 100s where the s at the end of each statement causes Quantum to recognize that, for example, safe1 is the same as safe(1), just as it knows that c15 and c(15) refer to the same column of data. If you created the arrays without the s, then Quantum would not recognize safe1 as being the same as safe(1).
Integer variables To define an integer variable, type: int var_name sizes
To refer to an integer variable, type: name[cell_number] Integer variables store whole numbers. Strings of integer variables are called integer arrays, and each cell in the array may store any whole number from -2,147,483,647 to 2 ,147,483,647. At the start of each run, Quantum provides an array of 200 integer variables called T. The first cell in this array is the integer variable t1 which may store any value within the given range; the second cell in the array is the integer variable called t2 which may also store any value within the given range. To illustrate the difference between a data variable and an integer variable, let’s suppose that our data contains the value of the respondent’s car to the nearest whole pound. If the value is £6,000, this will take up 4 columns in the data (assuming that we are only concerned with the digits) – that is, four data variables, the first of which will contain the 6, and the other three of which will all contains zeroes. If we placed this same value in an integer variable, we would only need one variable to store the whole value because each variable can store values in the range from -2,147,483,647 to 2,147,483,647 We have already mentioned that Quantum provides an integer array of 200 integer variables. You may create your own arrays using statements similar to those shown above for data variables. Suppose you have a household survey in which you have collected the value of each car that the family owns. You want to set up an integer array in which to store each value, so you write: int carval 10s This creates an array called carval which contains ten separate integer variables called carval1 to carval10. Notice that we have followed the array size with the letter s so that we can omit the parentheses from the individual variable names. We can then copy the value of the first car into carval1, the value of the second car into carval2, and so on. If a particular household owns three cars values at £6,000, £2,500 and £500, then carval1 would have a value of 6,000, carval2 would be 2,500 and carval3 would be 500. If you create your own integer variables, it is recommended that you name them with names that reflect their purpose in the run.
Real variables To define a real variable, type: real var_name sizes To refer to a real variable, type: name[cell_number] You may define real variables and arrays to store real numbers with accuracy up to six significant figures. Values with more than six significant figures have the sixth figure rounded up or down according to the value of the extra figures. As with integer variables, the names of real variables should give some clue to the type of information they contain. Real arrays are created by statements of the form: real liters 5s this example creates a real array called liters which has five
real variables named liters1 to liters5. It can store five real values, the first in liters1 and the fifth in liters5. Quantum also provides a set of 100 real variables named X which you may use. As an example, let’s say that the data contains information on how long, on average, each person in the household spent watching television during a given week. We want to manipulate these figures so we create an array of real variables in which to store the average viewing figures real tvwatch 8s this provides room for up to eight people’s figures. If our household contains four people with viewing averages of 20.8 hours, 15.75 hours, 9.75 hours and 10.0 hours, then tvwatch1 will have a value of 20.8, tvwatch2 will have a value of 15.75, tvwatch3 will be 9.75 and tvwatch4 will be 10.0 hours. The rest of the variables in the array have values of 0.0. Reading real numbers from columns To read real values from the C array, type: cx(start_col, end_col) Data from the questionnaire is read into columns for use during the run. When the data contains real numbers you will have to tell Quantum that the dot is to be treated as a decimal point rather than as a multicode representing a number of different answers. The way to do this is to refer to the field as cx: cx(15,20) cx(131,135) Here we have two fields containing real numbers: the first is six columns wide including the decimal place, which means that the number itself contains five digits, whereas the second is only five columns wide with four digits Notice that there is no need to tell Quantum where the decimal point is
Subscription As we have shown above, you may refer to specific variables in integer and real arrays and cells or columns in data arrays by naming their position in the array. For example: c1 is the first column of the C array t5 is the fifth variable in the T array time3 is the third variable in the array called time seg(2) is variable 2 of the seg array Variables within an array may also be referred to using any arithmetic expression. In this case, parentheses must be used. For example:
c(t1)
the column number depends on the value of t1. If t1 has a value of 10, then the variable is c10; if t1 is 67, the variable is c67.
c(t4,t5)
the field delimiters depend on the values of t4 and t5. If t4 has a value of 12 and t5 has a value of 19, the column field referred to is c(12,19).
t(c4)
the variable number depends on the value in c4. If c4 contains a single code in the range 1 to 9, the integer variable will be one of t1 to t9 depending on the exact value in c4. If c4 is multicoded, then the result is nonsense.
time(c4*23)
the variable number is the result of multiplying the value in c4 by 23 As in the previous example, c4 must be single-coded in the range 1 to 9 for this example to make sense. Thus, if c4 contains just a 4, the value of the expression is 92 so the variable referred to is time92.
When variables are referenced in this way, the value of the expression must be positive. The expression c(t15) is acceptable as long as t1 is at least 5. If the expression has a zero or negative value Quantum will issue an array dimension error when it comes to read the data during the datapass. Also, if the variable refers to columns, the value of the subscript must not exceed 32,767. These are called subscripted variables and they greatly increase the flexibility with which you can write your edit.
Expressions Quantum recognizes two types of expression – arithmetic and logical. Arithmetic expressions are used to produce numeric values and logical expressions, when evaluated, produce a value of true or false.
Arithmetic expressions The simplest form of arithmetic expression is a single positive or negative number such as 10 or 26.5 or an integer or real variable. Although the C Array is data, columns may also be used in arithmetic when the response coded into those columns is a numeric response, such as a respondent’s age or the number of different shops he visited. For example, if columns 243 to 247 contain the codes 4,7,2,6 and 0 respectively the value in c(243,247) could be read as 47,260. Similarly, if columns 45 to 48 contain 7, 8, a dot and 2 respectively, the value in cx(45,48) would be 78.2. Blank columns in a field are ignored when the codes in those columns are evaluated. Thus, if columns 20 to 21 contain the codes 6 and 7 respectively, and column 22 is blank, the codes in c(20,22) will be evaluated as 67. A similar result is produced if the blank column appears anywhere else in the field. All the examples of c(20,22) below produce an arithmetic value of 67:
+----20----+ +----21----+ +----22----+ 6 6
7 7
6
7
The same applies to multicoded columns. If you use a multicoded column as part of an arithmetic expression, the multicoded column will be ignored. The exception to this is a multicode of a digit and a minus sign which creates a negative number: a minus sign anywhere in a numeric field negates the value in the field as a whole, not just then number it is multicoded with.
For example:
2---+----3----+----4 12-4
is -1234
3 4---+----5----+----6 83-
is -83
Combining arithmetic expressions To combine arithmetic expressions, type: variable operator variable [operator variable ... ] where variable is a numeric value or the name of a variable containing a numeric value, and operator is one of the arithmetic operators , , * (multiply) or / (divide). More often than not you will want to combine numeric expressions to form a larger expression, for instance to count the number of records read with a given code in a named column. Arithmetic expressions are linked with any of the arithmetic operators listed below: Expressions may contain more than one of these operators, for instance:
t5 + c(134,136) / tot c(150,152) * 10 + 2.5 Quantum evaluates such expressions in the following order: 1. Expressions in parentheses. 2. Multiplication and division 3. Addition and subtraction
If you wish to change this order you should enclose the expressions which go together in parentheses. The first expression in the example above will be evaluated by dividing the value in columns 134 to 136 by otot and adding the result to t5. If you change the expression to: (t5 + c(134,136)) / tot this adds the values of t5 and c(134,136) first and then divides that by otot. Let’s substitute numbers and compare the results. If t5=10, otot=5 and the value in c(134,136) is 125 the two versions of the expression would read as follows: 10 + 125 / 5 = 35 and (10 + 125) / 5 = 27 Where two integer expressions are combined, the result is integer (any decimal places are ignored), but if an expression contains a real then the result will be real. Therefore, if t1=5 and t2=3, then:
t1 + 4 = 9 t1 + 4.0 = 9.0 t1 * t2 = 15 t1 / t2 = 1 t1 * 1.0 = 5.0 t1 * 1.0 / t2 = 1.66
If you use parentheses in expressions which contain both integer and real variables, you need to take extra care to ensure that your expression is producing the correct results. Let’s look at an example to illustrate how an expression can look correct but can still produce unexpected results. If we assume that t40=2 and t41=70, the expression: t40 * 100.0 / t41 yields a result of 2.8 (i.e., 200.0/70). The final value will be 2.8 if the result is saved in a real variable, or 2 if it is saved in an integer variable. If we use parentheses: (t40 / t41) * 100.0 the result is 0.0 (or 0 if saved in an integer variable). The reason for this is as follows Because Quantum evaluates expressions in parentheses before it deals with the rest of the expression, it treats that expression as integer arithmetic. The rules for integer arithmetic dictate that real results are truncated at the decimal point, so the true result of 0.28 becomes 0. Any multiplication involving zero is always zero, so the final result is zero. If you find that a run gives unexpected
zero results, try looking for expressions of this type and checking whether the parenthesized part of the expression has been truncated because the integer division results in a decimal number.
Counting the number of codes in a column
To count the number of codes in a column or list of columns, type: numb(cn1[’codes’], cn2[’codes’], ... ) If any columns are followed by a code reference, only those codes will be counted for those columns. The function numb is an arithmetic expression which counts the number of codes in a column or list of columns. Its format is: numb(cn1,cn2, ... cnn) where cn1 to cnn are the columns whose codes are to be counted. So, if we wanted to count the number of codes in columns 132 to 135 we would type: numb(c132,c133,c134,c135) Notice that even though the columns are consecutive, each one is entered separately, with each column number preceded by a ‘c’. It is incorrect to define only the start and end columns of a field when using numb. Therefore it is wrong to write numb(c(132,135)) or numb(c(132,135)) and, if you write statements such as these, Quantum will flag them as errors. Sometimes you will only be interested in certain codes, for instance you may want to know how many 1, 2 or 3 codes there are in a group of columns. In this case the function is entered as: numb(cn’p1’,cn’p2’, ... cnn’pn’)
where p1 to pn are the codes to be counted. Only the named codes are counted – any others appearing in the columns are ignored. Let’s say our data on card 1 is as follows:
1---+----2---...---5----+----4 121 6// 867 8
and we want to count the number of codes in column 115 and also the number of codes in the range ’5/8’ in columns 121 and 157. The expression would be entered as: numb(c115,c121’5/8’,c157’5/8’) When Quantum checks these columns and codes, it will tell us that there are 9 codes in these columns which are within the given ranges. These codes are all four codes in column 115 (we did not specify which codes to count in that column), codes 5 and 6 in column 121 (codes 2 to 4 are outside the given range), and codes 5 to 7 in column 157 (codes 1 to 4 are outside the given range). Generating a random number To generate a random number in the range 1 to n, type: random(n) Quantum can generate random numbers automatically with the random function: random(n) where n is the maximum value the random number may take. So, to generate a random number in the range 1 to 100, the expression would read: random(100) The number produced may be saved for later use in an integer variable or column, thus: rnum=random(32) c(110,112)=random(156) When using random with columns, always make sure that the number of columns allocated to the number is sufficient to store the highest possible number that can be generated. In our example, we need three columns in order to store numbers up to 156. Logical expressions
Logical expressions are used for comparing values, codes and variables.
Comparing values to compare the values of two arithmetic expressions, type: <> log_operator <> where log_operator is one of the operators .eq., .gt., .ge., .lt., .le or .ne
Values are compared when you need to check whether an expression has a given value – for example, did the respondent buy more than 10 pints of milk? Values are compared by placing arithmetic expressions on either side of one of the following operators: Exp.
Value
.eq.
equal to
.gt.
greater than
.ge.
greater than or equal to
.lt.
less than
.le.
less than or equal to
.ne
not equal to / unequal to
If the number of pints of milk that the respondent bought is stored in columns 114 and 115, the expression to check whether he bought more than ten pints would be: c(114,115) .gt. 10 If the number in these columns is greater than ten the expression is true, otherwise it is false. Earlier we have said that integer variables may take numeric values or the logical values true and false depending upon whether or not the value is zero. To check whether the respondent bought any packets of frozen vegetables, we can either write: fveg .gt. 0 To check the numeric value of the variable fveg, or we can simply say: fveg to check whether the logical value of fveg is true. To check whether fveg is false (i.e. zero), we would write .not. fveg Comparing data variables and data constants In virtually every Quantum run you will want to check which codes occur in which columns. This is easily done using logical expressions. There are several forms of expression depending on whether you are checking a column or a field of columns.
Data variables To test whether a data variable contains at least one of a list of codes, type:
var_name’codes’ To test whether a data variable contains none of the listed codes, type: var_namen’codes’ To test whether a data variable contains exactly the given codes and nothing else, type: var_name = ’codes’ To test whether two data variables contain identical codes, type: var_name1 = var_name2 To test whether a data variable contains codes other than those listed, type: var_nameu’codes’ To test whether two data variables do not contain identical codes, type: var_name1uvar_name2 To check whether a column or data variable contains certain codes, place the codes, enclosed in single quotes, immediately after the name of the column or data variable: e.g. c1’1’ c156’23’ brand’5’ The expression: Cn’p’ checks whether a column (n) contains a certain code or codes (p). The expression is true as long as column n contains at least one of the given codes. It does not matter if there are other codes present since these are ignored. For example, to check whether column 6 contains any of the codes 1 through 4 we Would type: c6’1/4’ The expression is true if C6 contains any of the codes 1, 2, 3 or 4 or any combination of those odes, regardless of what other codes may also be present. For instance:
----+----1 ----+----1 ----+----1 1
1
1
6
2
3
8
3
0
-
4
&
are true, but:
----+----1 5 7 9 — is false. In our original example we chose the codes 1 through 4. You can, of course, use any codes you like and they may be entered in any order.
The opposite of Cn’p’ is:
cnN’p’ which checks that a column does not contain the given code or codes. The expression is true as long as the column does not contain any of the listed codes. For example: c478n’5/7&’ is true as long as column 478 does not contain a 5, 6, 7 or & or any combination of them. A multicode of ’189’ returns the logical value true, because it does not contain any of the codes ’5/7&’ whereas a multicode of ’1589’ makes the expression false because it contains a ’5’. The ’=’ operator is used to check that the contents of a column are identical to the given codes. The expression: c312=’1/46’ is true as long as c312 contains all of the codes 1 through 4 and 6, and nothing else. The expression: c142=’ ’ checks that column 142 is blank. The equals sign is optional when checking for blanks, so we could simply write: c142’ ’ to check whether column 142 is blank. The ’=’ operator may also be used to compare the contents of two data variables. For example: c56=c79 checks whether c56 contains exactly the same codes as c79. If so, the expression is true, otherwise it is false. If we have +----6----+ ... +----8---1
1
5
5
the expression is true, but:
+----6----+ ... +----8---1
1
5
5
9 yields the value false because column 79 contains a ’9’ when column 56 does not. If you have defined your own data variables, you could write a statement of the form: brand1=c79 to check whether the data variable called brand1 contains the same codes as c79.
The opposite of ’=’ is ’U’ (unequal): cnU’p’ This checks whether column n contains something other than just the code ’p’. Suppose we have two sets of data: ----+-----5 ----+-----5 1
1
4
5
7
9
and we write: c44u’7’ The expression is true for both sets of data. In the first example, the ’7’ is multicoded with a ’1’ and a ’4’, while in the second example, column 44 does not contain a ’7’ at all. The only time this expression is false is when column 44 contains a ’7’ and nothing else
Fields of data variables
To test whether a field contains a given list of codes, type: var_name(start, end) = $codes$
To test whether two fields contain identical strings, type: var_name1(start1, end1) = var_name2(start2, end2)
To test whether the codes in one field differ from a given string, type: var_name(start, end)u$codes$ To test whether the codes in one field differ from those in another, type: var_name1(start1, end1)uvar_name2(start2, end2) The contents of data fields must be enclosed in dollar signs with each code in the string referring to a separate column in the field. For instance, to check whether columns 47 to 50 contain the codes –, 6, 4 and 9 respectively we would type: c(47,50)=$–649$ The only data for which this expression is true is:
+----5-----+ -649
However, if our data read:
+----5-----+ -529 164& the expression would be false because all columns are multicoded. All our examples have used columns, but the same rules apply to data variables that you define yourself. For example: rating(1,4)=$1234$ checks whether the field rating1 to rating4 contains the codes 1, 2, 3 and 4 in that order That is, it checks whether rating1 contains a 1, whether rating2 contains a 2, and so on. When checking the contents of fields in this way, make sure that you enter as many columns as there are codes in the string (i.e. five codes require five columns). The exception to this rule occurs when you are checking for blanks when the expression may be shortened to: c(50,80)=$ $ This type of statement may also be used to compare two fields, to check whether the second field contains exactly the same codes as the first field. When you compare one field with another, Quantum takes each column in the first field in turn and looks to see whether the corresponding
column in the second field contains exactly the same codes. For example, if the first column of the first field contains a code 1 and a code 2 and nothing else, then Quantum will check whether the first column of the second field also contains a code 1 and a code 2 and nothing else. If all columns of the second field are identical to their counterparts in the first field, then the expression is true; otherwise it is false. Here is an example: c(129,132)=c(356,359) For this expression to be true, column 129 must contain exactly the same codes as column 356, column 130 must be exactly the same as column 357, and so on. Once gain the two expressions on either side of the equals sign must be the same length Comparisons of one data variable against another are concerned with columns and codes: they are not concerned with the arithmetic values of the codes in the fields as a whole.
If we have: ----+----3----+---02
2
the expression: c(24,25)=c(34,35) is false because the string $02$ is not the same as the string $2$. If you want to compare fields arithmetically (i.e., is 02 the same as 2) then you will need to use the eq. operator: c(24,25).eq.c(34,35) to test whether the value in c(34,35) was equal to the value in c(24,25). The .eq. operator is described in the section entitled "Comparing values" To check whether the codes in one field match a given string or the codes in another field, we can use the = (equals) operator: c(m,n)=$codes$ cm=cn c(m,n)=c(m1,n1) If codes in the field c(m,n) match the given string or the codes in c(m1,n1) then the expression is true. If the two fields are not identical, then the expression is false Let’s look at an example of the unequals operator. The statement: c(67,69)u$123$ is true at all times unless our data reads:
The expression:
c(67,69)uc(77,79) is true as long as columns 67 to 69 differ by at least one code from columns 77 to 79. If our data is:
+----7----+----8 123
256
the expression is true because each of columns 77 to 79 differ from columns 67 to 69 Also, if we have:
+----7----+----8 123
123 5
the expression is true because column 77 is multicoded ’15’. The only time the expression is false is when columns 67 to 69 are identical to columns 77 to 79.
Checking the arithmetic value of a field of columns To test whether a value in a field is within a specified range, type: range(start, end, minimum, maximum) Blanks at the start of the field cause this statement to give a false result. To ignore leading blanks, type: rangeb(start, end, minimum, maximum) The logical expression range checks whether the number in a field of columns is within a given range. If so, the expression is true, otherwise it is false. The format of this statement is: range(start,end,min,max) where start and end are column numbers and min and max are the range delimiters. For example, the statement: range(137,139,100,150) will return the value true if the number in columns 37 to 39 of card 1 is in the range 100 to 150.
A variation of range is rangeb which allows columns to the left of the field to be blank if the number is right-justified in the field. In all other respects it is exactly the same as range. If our data is:
----+----2 123 6 the expression: rangeb(17,18,1,10) will be true because the string $ 6$ will be read as 6. With range the value would be false. However, the expression: rangeb(15,18,2000,3000) returns false because of the blank in c17.
Combining logical expressions To combine logical expressions, type: expression operator expression where operator is one of .or., .and., or .xor. Two or more logical expressions may be combined into a single expression using the operators: and. both/all true or. one or the other or both/all true not. negates (reverses) an expression Any number of subexpressions may be combined to form a larger expression, but whether the result is true or false depends upon the values of the subexpressions and also upon the operators used to combine them The .and. operator requires that all the expressions preceding and following the .and. be true for the whole expression to be true. Thus, the statement: int1.eq.9 .and. c116’1’ is true if the integer variable int1 has a value of 9 and column 116 contains a 1. If either subexpression is false, the whole expression is false too By comparison, the .or. operator requires that one expression or the other, or both, be true in order for the whole expression to be true. c(249,251)=$159$ .or. numb(c132,c135) .gt. 4
For this expression to be true, columns 249 to 251 must contain nothing but a ’1’, ’5’ and ’9’ respectively or the number of codes in columns 132 to 135 must be greater than 4. It is also true if both expressions are true. However, if both are false, the overall result is false. Expressions are reversed (negated) simply by preceding them with the keyword .not. Although it is not wrong to use it with a single variable, it is more generally used to reverse an expression containing the keywords .and. and .or.. Thus, it is not wrong to write .not.c15’1/5’ but it is much simpler to write this as c15n’1/5’.
Example: The .and. operator requires that all the expressions preceding and following the .and. be true for the whole expression to be true. Thus, the statement: int1.eq.9 .and. c116’1’ is true if the integer variable int1 has a value of 9 and column 116 contains a 1. If either subexpression is false, the whole expression is false too. By comparison, the .or. operator requires that one expression or the other, or both, be true in order for the whole expression to be true. c(249,251)=$159$ .or. numb(c132,c135) .gt. 4 For this expression to be true, columns 249 to 251 must contain nothing but a ’1’, ’5’ and ’9’ respectively or the number of codes in columns 132 to 135 must be greater than 4. It is also true if both expressions are true. However, if both are false, the overall result is false. Expressions are reversed (negated) simply by preceding them with the keyword .not. Although it is not wrong to use it with a single variable, it is more generally used to reverse an expression containing the keywords .and. and .or.. Thus, it is not wrong to write .not.c15’1/5’ but it is much simpler to write this as c15n’1/5’. Take care when using .not. with the .eq. operator. Statements of the form: .not. c(1,3) .eq. 100 are incorrect and will not work. They should be written as either: (not.(c(1,3).eq.100)) with the expression to be reversed enclosed in parentheses, or: (c(1,3).ne.100) Any of the operators .and., .or, and .not. may appear in a statement more than once, as long as you use parentheses to define the order of evaluation. For example:
(c15’1/47’ .or. c16’3579’) .and. c22’&’ causes Quantum to check whether the .or. condition is true before dealing with the .and Suppose our data is:
----+----2----+ 13
&
79 The first expression (c15’1/47’) is true because column 15 contains a 1 and a 7 and the second expression (c16’3579’) is also true since the codes it contains are amongst those listed as acceptable. Thus, the .or. condition is true. Column 22 contains an ampersand so the last expression is also true, therefore the expression as a whole is true regardless If both expressions in the parentheses were false, the whole expression would be false not. with .and. and .or. When you use .not. with expressions in parentheses, be very careful that what you write is what you mean. Let’s take the conditions male and married and forget about columns and codes for the minute. The condition: (Male .and. Married) refers only to married men. The opposite of this is: .not. (Male .and. Married) which refers to unmarried men and all women. This can also be written as: not.Male or.not.Married The first .not. collects all the women, the second collects everyone who is not married (e.g. single, widowed etc), and together they collect people who are female and unmarried. We use .or. instead of .and. here because the latter will gather unmarried women but will ignore the unmarried men and married women. Reversing .or. expressions works in exactly the same way. The expression: (Male .or. Married) means anyone who is Male, or anyone who is Married, or anyone who is Male and Married. The opposite of this is: .not. (Male .or. Married) which means anyone who is not Male or is not Married or is not both; that is, anyone who is a woman and is unmarried. This can be written as: .not. Male .and. .not. Married
Thus, we can summarize, as follows:
Positive
Negative
Is the Same as
(A .and. B)
.not. (A .and. B)
.not. A .or. .not. B
(A .or. B)
.not. (A .or. B)
.not. A .and. .not. B
Here is an example using columns and codes: .not. (c(135,137)=$519$ .or. c160’6/0’) If our data is:
3----+----4----+----5----+----6----+ 519
1
9& the expression is true because c(135,137) do not contain just the codes 5, 1 and 9 (c135 is multicoded), and c160 does not contain any of the codes 6 through 0. The expression will only be false if: A) Column 135 contains a 5 only, column 136 contains a 6 only and column 137 contains a 9 only, and Column 160 contains any of the codes 6 through 0, either singly or as a multicode.
We could therefore write the expression as: .not. c(135,137)=$519$ .and. .not. c160’6/0’
Comparing variables and arithmetic expressions to a list To compare the value of a variable or an arithmetic expression to a list of numbers, type: item .in. (value1, value2, ... ) Ranges of numbers may be entered in the list as start:end. If the item is a reference to a field containing blanks, enter the values as strings of codes enclosed in dollar signs. Example: C(3,5).in.($123$,$765,$ 26$) C(120,122).in.(100,110,200:250) From time to time you may need to check whether a variable or arithmetic expression has one of a given list of values. For example, if the questionnaire codes brands of frozen vegetables as 3digit codes into columns 145 to 147 we might want to check that only valid codes appeared in this field. This is achieved using the logical expression .in. as follows:
variable-name .in. (list) or arithmetic-exp .in. (list) where variable-name is that of the variable to be checked and list is a list of permissible values. The arithmetic expression is an expression consisting of data or integer variables, arithmetic operators and integer values as described earlier in this chapter. If the variable or arithmetic expression has one of the listed values, the expression is true, if not, it is false. The left-hand side of the expression may contain integer variables, columns or data variables containing whole numbers, or expressions using these types of variables. If it is a data variable, then the list may contain codes enclosed in dollar signs. Quantum will then compare the codes in the data variable with the codes inside the dollar signs. We could therefore check that the frozen vegetables have been coded correctly by keying in a statement which says:
c(145,147) .in. ($205$,$206$,$207$,$210$,$215$,$220$)
Quantum will flag any records in which c(145,147) does not contains exactly 205, 206, 207, 210, 215 or 220 (i.e. three single-coded columns) as incorrect. If the data variable contains a valid positive or negative whole number, then the list may also contain such values. Ranges of values may be entered in the form min:max, where min is the lowest acceptable value and max is the highest. Since the frozen vegetables have numeric codes, we could write the expression as:
c(145,147) .in. (205:207,210,215,220) Any columns in the field which contain non-numeric data (e.g. multicodes) will be flagged as incorrect, as will any which contain values which do not match the specification Sometimes, though, the codes and numbers will not be interchangeable. If you have 2- digit codes in a 3column field, the statement:
c(206,2 09) .in. ($ 10$,$ 11$,$ 12$,$ 13$)
is not the same as:
c(206,209) .in. (10:13) unless column 206 is always blank. If the 2-digit codes have been padded on the left with zeroes instead of blanks (i.e., 010, 011) or if they all start in column 206 (i.e., $10 $, $11 $), then the first expression will be false, even though the second one will still be true. If the left-hand side of the expression is an integer variable or an arithmetic expression, the list may contain positive or negative whole numbers: total .in. (100,200,500:1000) Lists may contain up to 247 values or codes, which may be entered in any order. In our examples, we have always entered them in ascending order, but this is not a requirement of Quantum. You may enter codes in a list in any order you like. The exception is numeric ranges which must be entered in the form lowest:highest
Naming lists To assign a name to a list of values, type: definelist name=(list) where list is a comma-separated list of numbers, ranges or code strings enclosed in dollar signs. If you have a list that is used more than once you may give it a name and refer to it by that name instead of typing in the complete list each time. To name a list, write: definelist name=(list) For example: definelist fveg=(205:207,210,215,220) To use a defined list, simply replace the list with the name: c(145,147) .in. fveg
Speeding up large programs To speed up your Quantum program by converting expressions of the form c(1,4)=$1234$ into C in a more efficient way, type: inline n where n is the maximum field width to be converted in this manner. This statement must appear at the start of the edit. If you have a large edit, you can speed up the time it takes to run by including the inline statement in your edit. This instructs the Quantum compiler to convert expressions of the form c(1,4)=$1234$ into statements in the C programming language in a different way to the way it
normally does. You need not worry about these different methods of conversion, apart from deciding whether or not to use them. If you want to speed your program up, place a statement of the form: inline n at the beginning of the edit section, where n is the maximum field width to be converted in the special way. For example: inline 6 Here we are saying that fields of six columns or less should be converted in the special way rather than in the normal way.
How Quantum reads data In order for the answered questionnaire to be processed, the information contained on the questionnaire must be read into the computer into a location where Quantum can access it. This is done by reading the data into the data variable array called C which is supplied automatically with every Quantum run. You may then access this data by addressing this array. Different types of records are read into the C Array in different ways. Types of record Quantum deals with three types of record: ordinary, multicard and multicard with trailer cards. Ordinary records These are strings of codes and numbers, one per respondent, up to a maximum of 32,767 characters per respondent. Multicard records When data originates from punched cards and each questionnaire requires more than 80 columns, the data is spread over several cards. So that all cards belonging to a particular respondent may be easily identified, each questionnaire is assigned a serial number which is entered as part of the data for each card. Within this, each card has a unique card type or card number to distinguish it from others in the group. It is important that both the serial number and card type be in the same relative positions on all cards in the file, since this is the only way that Quantum can tell which data belongs to which respondent. If the questionnaire serial number is in columns 1 to 4 of each card and the card type is in column 5, and we are looking at questionnaire 1005, we will see that it has two cards whose first five columns are 10051 and 10052 respectively. Quantum can deal with records that contain up to 327 cards per respondent. occasionally you may have multicard records in which each ‘card’ is greater than 80 columns. The notes that follow refer to multicard records of up to 100 columns per card.
Multicard records with Trailer Cards Sometimes a record contains very repetitive data which is tabulated over and over again in the same way. For instance, a shopping survey may ask the respondent a series of identical questions for each store he visited. In this case, there may be a separate card for each store. Processing this type of data is often easier if we treat all cards containing the same questions as if they were, in fact, one card with one card number. These cards are called Trailer Cards Thus, if the respondent visited five stores, and the questions about these stores are coded on a card 2, the record for that respondent would contain five cards of type 2. If demographic details were stored on a card 1, the whole record would be 6 cards in all. In Quantum, the demographic data would be described as the higher level and the stores as the lower level.
Reading data into the C array Data is read into the C Array automatically, one record at a time. The way data is read depends upon the record structure.
Ordinary records Ordinary records are read into cell 1 onwards of the array. Therefore, for example, the 50th column is referenced as c50 and the 200th cell as c200.
Multicard records Records are read into c101 to c200 for card 1, c201 to c300 for card 2, and so on. For example, 80-column cards are read into c101 to c180 for card 1 and c201 to c280 for card 2. Columns 181200, 281-300, etc remain blank. In this case, the C Array may be pictured as ten rows of 100 cells each. Column 50 of card 1 is then accessed by referring to it as c150, and column 67 of card 8 is referred to as c867.
Ignoring card types It is also possible to read cards into the array sequentially regardless of card type: the first card goes in c(101,200), the second in c(201,300), the third in c(301,400), and so on.
Processing the data Each time an ordinary record or set of cards comprising a multicard record is read in, hat data is processed first by the edit section and then by the tabulation section of your program. The complete record is edited and tabulated in one go. The exception to this is the trailer card record where processing can take place a number of times within each record for each lower level.
To ensure that only the part of the edit section applying to a particular level is used, the edit section is defined separately for each level. Similarly, the table instructions specify the level at which the table should be incremented.
Changing the contents of a variable This section describes how to assign values to variables and the statements emit, delete and
priority, all of which may be used to alter the contents of a variable. Emit, delete and priority are used only with columns whereas assignment statements can deal with character, integer and real variables. When we say that these statements change the contents of a column we mean that they change the contents of that column as it exists during the run: at no time do they change the corresponding column in the data file. Trailer Cards By using the Levels facility, the user need not know how Quantum deals with trailer card data internally. However, there are occasions when it may be necessary to edit or tabulate the data without using levels. To do this, it is necessary to know more about how trailer cards are processed. Quantum deals with trailer cards in a number of ‘reads’. Cards are read into the appropriate rows of the C Array until: a) a card is located with a card type matching that of the previous card (e.g., two consecutive card 2’s), or b) a card is read with a type lower than its predecessor and matching one of the card types already read in during the current ‘read’ (e.g., a card 2, a card 3, and then another card 2).
In order to produce useful tables, you will need to know which cards are currently in the C Array.z` Quantum has four reserved variables – thisread, allread, firstread and lastread – which it uses to keep track of which cards it has read for each respondent.
thisread The array called thisread is used to check which cards have been read in during the current read. thisread1 will be true (or 1) if a card type 1 has just been read in; thisread2 will be true if a card 2 has just been read, and so on.
There are nine such variables (thisread1 to thisread9) available unless extra card types have been specified using the max= option In this case, these variables will be numbered 1 to max; if there are 13 cards, we will have thisread1 to thisread13.
Allread allread notes which cards have been read in so far for this questionnaire. If cards 1, 2 and 3 have been read so far, allread1, allread2 and allread3 will all be true. Additionally, each cell of allread will contain the number of cards of the given type read in – for instance, if two cards of type 3 have been read, allread3 will be true and it will contain the number 2. As with thisread, there are nine allread variables available unless extra card types have been specified with max=.
firstread and lastread The variables firstread and lastread become true when the first and last cards in a record have been read in.
Reserved variables Other reserved variables associated with reading in data: lastrec set to true when the last record in the file has been read or, in the case of trailer cards, the last read of the last record has occurred. rec_count stores the number of records read in so far. card_count counts the number of cards read so far.
Describing the data structure for Multicard records To describe the structure of the data, type: struct; options All programs dealing with multicard records must contain a struct statement unless the data contains trailer cards which will be read and tabulated using the levels facility. In this case you may choose between using a struct statement or using a levels file. If the run has no struct statement and no levels file, Quantum assumes that the data contains ordinary records to be read into c1 onwards of the C array. The struct statement is used to define the type of records, the location of the serial number and card type in the record and the number of the highest card type if greater than 9. Its format is: struct;options
Record type To define the record type, type:
struct; read=n where n is 0 for ordinary records, 2 to read multicard records in sections according to the card type, or 3 to read multicard records in all in one go.
Quantum recognizes two types of record: single card and multicard. The type of record is defined by the keyword read= on the struct statement:
Ordinary Records Ordinary records are defined using read=0. Each record is read into c1 onwards of the array. Since it is the default, you need only use it when other options are required; for example, when the records contain serial numbers and you wish to have the serial number printed out as part of the record, or when you are working with long records of more than 100 columns.
Multicard Records Multicard records are identified by the keyword read=2. Each card in the record is read into the row corresponding to the card type of that card – that is, card 1 in c(101,200), card 2 in c(201,300), and so on. We mentioned briefly that it is possible to read all cards in a multicard record in at once and ignore the card type. The first card goes in c(101,200), the second in c(201,300), and so on. This is achieved with read=3.
Record length To define the record length of records greater than 100 columns, type: struct; reclen=n The keyword reclen=n defines the maximum number of characters to be read into the C rray, the number of cells to be reset to blanks and the number of cells to be written out by the write statement. With ordinary records reclen may take any value, but with multicard records the maximum is reclen=1000. In both cases, the default is reclen=100. When data is being read into the matrix, any record which is longer than reclen characters is truncated to that length and a warning message is printed.
When ordinary records are written out with write or split, cells c1 to c(reclen) are copied, with any trailing blanks being ignored. For instance, if we have: struct;read=0;reclen=200 and the current record is only 157 characters long, the record written out will be 157 characters long. This length can be overridden by an option on a filedef statement. When multicard records are written out, columns c101 to c(100+reclen), c201 to c(200+reclen), and so on will be output. Thus, if we write: struct;read=2;reclen=70 and we have 2 cards per record, Quantum will write out c(101,170) and c(201,270). Finally, with ordinary records cells c1 to c(reclen) are reset to blanks between records, but with multicard records cells c101 to c(100+reclen), c210 to c(200+reclen), and so on are reset.
Serial number location To define the location of the serial number in each record, type: struct; ser=c(m,n) The keyword ser=c(m,n) defines the field of columns containing the respondent serial number. For example, if the serial number is in columns 1 to 5 of an ordinary record we would write: struct;read=0;ser=c(1,5) Similarly, if it is in columns 1 to 5 of a multicard record the statement would be: struct;read=2;ser=c(1,5) Notice that even with multicard records we only give the actual column numbers containing the serial number, rather than card type and column number as is usually the case when identifying columns in such records. This is because the column numbers refer to all cards in the data set rather than to a single card in the file.
Card type location To define the location of the card type in the record, type: struct; crd=cn Defining the card type location is much the same as defining the position of the serial number in the record. The keyword is crd=cn for a single digit card type or crd=c(m,n) for a card type of
more than one digit. Once again, m and n are column numbers only, not card type and column number.
For example: struct;read=2;ser=c(1,4);crd=c5 tells us that we have a multicard record with serial numbers in columns 1 to 4 and the card type in column 5 of each card. Each card will be read into the row corresponding to its card number.
Required card types To define cards which must be present in each record, type: struct; req=card_numbers where card_numbers is either a comma-separated list of card numbers, or a range of sequential card numbers in the form start:end or start/end.
Sometimes some cards will be optional and others mandatory. You may define those cards which must appear in every record by using the keyword req= followed by the numbers of the cards that each respondent must have. For example: req=1,2 tells us that cards 1 and 2 must be present in each record for that record to be accepted. Any other cards are optional. If a record is read without one of these cards, the error message ‘Card Missing in Set’ and a note of the record’s position in the file are printed and the record is ignored. If you have ranges for required card types, you may type the numbers of the lowest and highest cards separated by a slash (/) or a colon (:) rather than listing each card type separately. For example, if cards 1 to 4 are all required, you may type: req=1,2,3,4 or req=1/4 or req=1:4
Repeated card types To define cards which may appear more than once in a record, type: struct; rep=card_numbers where card_numbers is either a comma-separated list of card numbers, or a range of sequential card numbers in the form start:end or start/end. If the data contains trailer cards and the Levels
facility is not used, you must list their card types with the keyword rep=. For instance, if card 2 is a trailer card we would write
rep=2. Where there is more than one trailer card, each card type is listed separated by a comma. If cards 2, 3 and 4 are all trailer cards we could write: rep=2,3,4
If you have ranges for required card types, you may type the numbers of the lowest and highest cards separated by a slash (/) or a colon (:) rather than listing each card type separately.
For example, if cards 2 to 4 are all required, you may type: rep=2,3,4 or rep=2/4 or rep=2:4 If rep= is not used and a record is read with two or more cards of the same type, the last card of that type will be accepted and the message ‘Identical duplicate’ or ‘Non-identical duplicate’ and a note of the record’s position in the file will be printed. For example: Record structure error: serial 026, card 234 in run, card 234 in dfile card type 2 – non-identical duplicate Because rep= refers to trailer cards only, it will be ignored if read=2 and crd= are not both present on the struct statement.
Highest card type number To define the highest card type in the record, if there are more than nine cards per record, type: struct; max=n
The only time you need to inform Quantum of the highest card type is when you have records with more than nine cards. This is so that Quantum can allocate sufficient cells
in the C array to store the extra cards. The highest card type is defined with max=n, where n is the number of the highest card type. Cells 1 to max*reclen are then cleared between respondents. For example, to read a data set with 11 cards per respondent we might write:
struct;read=2;ser=c(1,4);crd=c5;req=1,2,3,4;max=11
If you forget max=, and a record is read with more than nine cards, the message ‘Too many cards per record’ is printed and the record is rejected. On the other hand, if a card is read with a card type higher than that defined with max=, the record is rejected with the message ‘Card number out of range’.
Dealing with alphanumeric card types To define the location in the C array of cards with alphanumeric card types, type: struct; order=card_types where card_types is a list of card type numbers and letters in the order they are to appear in the C array. From time to time you may need to read in records with alphabetic as well as numeric card types. This generally happens in a multicard data set containing more than nine cards per record where only one column has been allocated to the card type. Quantum can deal with this data but first you will have to say where in the C array the alphabetic card types should go. This is done with the keyword: order=n where n is one or more of the codes ’1234567890–&’ or the letters A to Z (in upper or lower case) not separated by spaces. The card type bearing the first number in the list is read into c(101,200), the card bearing the second code in the list is read into c(201,300) etc. For example, suppose each record has ten cards – 1 to 9 and A – our struct statement might say: struct;read=2; ser=c(1,4);crd=c4;max=10;order=123456789A
Data from card A would be read into cells 1001 to 1100 of the C array.
Merging Data using Quantum
Merge sequence for Trailer Cards To define the location of the merge sequence number in trailer cards, type: struct;seq=cn
When trailer card data is merged during a run with the merge facility, you may wish trailer cards to be merged in a specific order, according to a sequence number entered as part of the data. The location of this sequence number can be defined with the keyword
seq=cn for a single column code or seq=c(m,n) for a multicolumn code. For more information on merging data see the next section.
Merging data files When we say that Quantum allows you to merge data files, we do not mean that Quantum takes data from a number of files and merges it to create a new file. Rather, we mean that data can be read from a series of files during a Quantum run. Of course, the merged data can then be written out to a new file for future use. Quantum provides two methods for merging data. The first is designed for studies where you have different card types in different files; for example, cards 1 and 2 in the file data1 and card 3 in the file data2. In this case, merging is by serial number and, optionally, card type and trailer card sequence number. The second method is designed for situations where you want to merge a field of data from an external file into records from the main data file. For example, you may have a file of manufacturers’ codes which refer to a number of products. If each record in the main data file contains the product the respondent preferred, you may wish to merge the
appropriate manufacturer’s code from the external file into the main data in the C array. In this case, merging is based on finding matching keys in the main record and the records in the external file.
Both options are described in detail below.
Merging complete cards Data for a study may be spread across a number of files. This is particularly useful with large surveys because it means that you can put each card type in a different file and simply merge in the cards required for the current batch of tables. For example, if we require tables from cards 4 and 5, we need not even read in cards 1, 2, 3 and 6. Data from up to 16 files may be merged; that is, the main data file and 15 others. It may be merged on serial number and, within that, on card type. With trailer card data, you also have the option of merging trailer cards according to a sequence number entered as part of the data. In order for the merge to be successful, all files must be sorted in ascending order with the serial number, card type and sequence number in the same position. Quantum reads the locations from the keywords ser=, crd= and seq= on the struct statement. To merge data files you must create a file called merges telling Quantum which items to merge on, and which files to merge. The type of merge is represented by a number:
1 merge on serial number. Cards are read in from each data file according to their serial number only – the card type and sequence number, if any, are ignored. You might use this option when you have two files, dat01 containing cards of type 1 and dat02 containing cards of type 2, and you want the files to be merged so that card type 1 is read into the C-Array, followed by card type 2.
3 merge on serial number and card type (default). With this option, cards with the same serial number read from different data files are merged to form a single record by comparing the serial number and card type. Cards within a record are then sorted sequentially from 1 so that each card is read into the appropriate cells of the C-Array. For example, if dat01 contains cards 1 and 3, and dat02 contains cards of type 2, the merge will produce records containing cards 1, 2 and 3 in that order.
5 merge on serial number, card type and sequence number. This is similar to merge type 3, except that trailer cards are merged according to their sequence number. For example, if dat01 contains cards 1 and 2, where card 2 is a trailer card with a sequence number of 2, and dat02 contains cards 2 and 3, where card 2 is a trailer cards with a sequence number of 1, the merged record will contain cards 1, 2/1, 2/2, and 3, in that order.
This is the first item in the merges file, and is followed by the names of the files to be merged with the main data file named in the Quantum command line. Items may be entered on separate lines or all on the same line separated by semicolons. For example, if we want to merge data in files dat02 and dat03 with data in the main file, dat01, by serial number, card type and sequence number, the merges file would look like this: 5; dat02; dat03 Notice that we have not mentioned dat01 in the merges file because it will be named on the Quantum command line instead.
Merging a field of data from an external file To merge extra data from an external data file into the data currently in the C array, type:
int_variable=mergedata($ex_file$, key_field, key_start, copy_to, data_start) where
ex_file is the name of the file containing the extra data. key_field is the location of the key in the main data file, entered using the standard Quantum notation for columns and fields key_start is the start column of the key in the external data file. copy_to is the field in the main data record in which to place the external data. The field is defined using the standard Quantum notation for columns and fields. data_start is the start column of the data to be copied. This statement returns in int_var_name a 1 if a match was found or 0 if not.
The mergedata statement merges a field of data from an external file with the main data at the datapass stage of the Quantum run. Merging is by means of a data key present in both the main records and the records in the external file. If a record in the external file has a key which matches that of a record in the main data file, the external data will be merged into a user-defined field of the main record when it is read into the C array.
In order for data to be merged correctly, both the main data file and the external file must be sorted in ascending order by key value. If the key is the record serial number then the data file will already be sorted in the correct order (assuming, of course, that the data is sorted by serial number). If you are using a key that is not the record serial number you must sort the data file so that it is ordered by key rather than by serial number.
The syntax for mergedata is:
int_variable=mergedata($ex_file$, key_field, key_start, copy_to, data_start) where int_variable is the name of an integer variable in which the function can place its return value. ex_file is the name of the file containing the extra data. It must be enclosed in dollar
signs. key_field is the location of the key in the main data file, entered using the standard Quantum notation for columns and fields.
key_start is the start column of the key in the external data file, for example, 1 if the key starts in column 1. The length of the key is taken from the length of
key_field. copy_to is the field in the main data record in which to place the external data. The field is defined using the standard Quantum notation for columns and fields. data_start is the start column of the data to be copied. Quantum copies as many columns as are defined by copy_to. For example: t1 = mergedata($manuf_codes$,c(178,180),15,c(168,175),1) tells Quantum to compare the key in columns 178 to 180 of the main record with the key which starts in column 15 of the external records in the file manuf_codes. Because the key field in the main record is 3 columns long, Quantum reads columns 15 to 17 of each external record to obtain its key. If the keys match, Quantum copies the data from the external record into columns 168 to 175 of the main record in the C array. The external data to be copied starts in column 1 and, since the destination field is 8 columns long, Quantum copies 8 columns starting at that column. This statement returns a value of 1 if a match was found (i.e., merging took place), or 0 if not. There is no limit on the number of mergedata statements in a specification, but you may only merge data from up to nine different files per record.
Writing out data
There are three ways of writing out your data once it has been read into the C-Array. You
may: a) create a new data file b) copy records to a print file c) write information to a report file
Data and print files are both accessed by the write statement, but the exact format of the statement varies according to the type of file and the information being written. Report files are written to with the report statement.
Print files Print files are printouts of records or parts of records with headings, descriptive texts and page numbers. They cannot be used as data for subsequent Quantum runs.
Printing out individual records To write a record or part of a record to a print file, type: write [file_name] [field] [$text$]
The word write by itself prints out a whole record in the form it is when the write statement is executed, together with a ruler showing which codes fall in which columns, the line number of the record in the data file and the message ‘write’ indicating that the record was generated by a write statement. Any multicodes in the record are shown as asterisks, but you may change this with an option on the filedef statement.
If the record contains more than one card, each card is listed separately beneath the ruler. For example, the statement: write
by itself might give us:
Quantum edit report 1 in file ----+----1----+----2-- ... --9----+----0 column 1 - 100 are |12345 write
2 in file
----+----1----+----2-- ... --9----+----0 column 1 - 100 are |23456 write
Each write statement will produce a line in the default print file, out2, telling you how many records were written out, as follows: 2 (1%) write
The example above was very simple; more often than not your program will contain several write statements and you will want some way of identifying which records were printed by which statement and why. If the write is dependent upon some other statement – for instance, it is part of an if statement – the whole statement is printed underneath each record, thus: Here, as you can see, we are checking that column 14 contains a 1/4. This record has been printed out because it contains a ’5’ instead.
67 in file
----+----1----+----2-- ... --9----+----0 column 1 - 100 are |0015263-16*735 *837361 ... 79& if (c14n’1/4’) write
Here, as you can see, we are checking that column 14 contains a 1/4. This record has been printed out because it contains a ’5’ instead. Sometimes it is more helpful to have an explanatory text printed instead of the statement itself. In this case all that is necessary is to follow the word write with the text to be printed enclosed in dollar signs:
if (c308n’1/5’) write $C308 incorrect$ if (numb(c117,c118,c119).gt.3) write $too many choices$
might give us: Quantum edit report
Record 17
51 in file
----+----1----+----2-- ... --9----+----0 column 101 - 200 are |00170116548986131*46*1 ... column 201 - 300 are |0017026464515 875 ** ... column 301 - 400 are |0017031929-5897231 ... C308 incorrect too many choices
Record 32
94 in file
----+----1----+----2-- ... --9----+----0 column 101 - 200 are |003201837021 **53798 ...
column 201 - 300 are |0032021353452 763736 ... column 301 - 400 are |003203212 & ... too many choices
Our first statement writes out all records in which column 308 does not contain any of the codes 1/5, and the second picks up all records having more than 3 codes in columns 117 to 119. Normally all output from write goes to the default print file, and whenever the current record is written to this file, the variable printed_ becomes true. You may change the output file by following the word write with the name of the file to write to. For example: write pfile $First Print$ writes to the file ‘pfile’, whereas; write errors $Second Print$ writes to a file called ‘errors’. All files named on write statements must be defined on a filedef statement before they are used.
If two or more write statements apply to a single record, the record is printed out once in the state it was when the first applicable write was read, with all relevant write statements or texts listed below it. If a record satisfies two or more write statements which write to different files, Quantum will write the record out once for each statement, in the state it is when each write is executed.
Writing Out Parts of Records Often you will not want to write out the whole record, especially if it contains several cards. Therefore Quantum allows you to include a field specification in a write statement to print only selected portions of an incorrect record. For example:
if (c110’2’.and.c119’2’) write c(110,120) $Married woman$ checks that columns 110 and 119 both contain a 2, and if so prints out columns 110 to 120 in the print file, followed by the text Married woman. If you are writing out less than ten columns, Quantum does not print a ruler above the codes. If you are dealing with multi-card records, you may prefer to use this form of write to have only the card containing the error printed, rather than all cards in the record. If we take our previous example where we were checking the contents of column 308:
if (c308n’1/5’) write $c308 incorrect$
prints all three cards in the record, whereas:
if (c308n’1/5’) write c(301,380) $C308 incorrect$
prints only card 3. To write selected parts of a record to a particular file the notation is: write filename c(m,n) [$text$]
Data files To write records or fields to a data file, type: write file_name [c(start_col, end_col)]
write may also be used to copy records to a data file. This is useful if you want to separate a particular card type from the rest of the data, or if you want to correct errors and save the corrected data in a new file for later tabulation.
To write records to a data file the command is: write filename
to write the whole record to the named file, or write filename c(m,n) to write columns m to n only.
Creating new cards New cards can be created by copying information into spare columns of the C-Array. To save these as part of a new data file you will have to give each new card the same respondent serial number as the rest of the data in the array and a card type which may or may not be unique. In the example below, we are moving some information from card 1 of a 2-card data set into a new card 3. The comments explain what each statement is doing.
/* Copy the data into the new card c(310,341)=c(148,179) /* Delete it from its original place c(148,179)=$ $ /* Give it a serial number and card type c(301,304)=c(101,104); c380’3’ /* Set thisread true for card 3 thisread3=1 /* Define pfil as a data file filedef pfil data /* Copy cards 1, 2 and 3 to pfil write pfil
Some General Instances for forcecoding cleaning etc.
Writing to a report file To write information to a report file, type: report[n] file_name variable_names
where variable_names is a comma-separated list of the variables and texts to print. Use reportn rather than just report to start a new line each time the statement is executed.
A report file is a special type of print file in which you can print out records, fields or variables in the format of your choice. To write information in a report file, use the report statement, as follows: report filename parameters where filename is the name of the file to be written to, and parameters define exactly what is to be written.
Lines in a report may be up to 1024 characters long. Report does not start a new line automatically at the end of each write, but you may tell it to do so by following the keyword report with the letter n: reportn filename parameters In both cases, the named file must be identified as a report file using a filedef statement, as mentioned below. The parameter list defines what is to be printed in the report file. It may contain variables, texts, and special characters representing tabs and spaces.
Assignment statements • to copy codes from one column into another. • to replace certain codes in one column with those from a second column. • to assign the value of an arithmetic expression to a variable. • to copy codes from groups of columns into another column using the logical operators and, or and xor.
In spite of the diversity of these functions the basic format of any assignment statement is:
variable=item where item defines what is to be copied into the variable. Remember that comments can be identified by a capital C in column 1. If the first variable in your statement starts with a C, make sure that you type it in lower case otherwise the whole line will be read as a comment and ignored. For example: col 1 c(15,16)=$12$ is correct, but C(15,16)=$12$ will be read as a comment even though the syntax is correct Alternatively, you may precede assignment statements with the word set, thus: set c(15,16)=$12$
Copying codes To copy codes into a single data variable, overwriting the variable’s original contents, type:
variable=’codes’ To copy a string of codes into a field, type:
var_name(start,end)=$codes$ To copy the contents of one variable or field into another, type:
variable1 = variable2 Assignment statements are most commonly used to copy codes into a column or to copy the contents of one variable into another. For instance: c121=’159’ c121=c134 You can also copy strings of characters into fields of columns. Let’s say we want to copy the code 59642 into columns 76 to 80 of card 3; we would write: c(376,380)=$59642$
Partial column replacement
To replace a code or set of codes in one data variable with a code or set of codes in a second data variable, type:
variable1’codes1’=variable2’codes2’ codes1 and codes2 must contain the same number of codes, and the codes must be in superimposable order
Storing arithmetic values To store the value of an arithmetic expression in a variable, type:
variable = expression To copy a real value into a data variable, type:
var_name(start,end) :dp = expression where dp is the number of decimal places required.
For example, if x5=10.22, the statement: cx(15,19):2=x5 results in:
10.22
Assignment with and, or and xor To copy codes which are present in at least one of a list of columns, type:
data_var_name=or(cnum1[’codes1’], cnum2[’codes2’], ...) To copy codes which are present in all of a list of columns, type:
data_var_name=and(cnum1[’codes1’], cnum2[’codes2’], ...) To copy codes which are present in only one of a list of columns, type:
data_var_name=xor(cnum1[’codes1’], cnum2[’codes2’], ...)
The final type of assignment is copying codes from a set of columns. The codes copied
depend upon the type of operator used: and Copy codes present in all columns or Copy codes present in one or more columns xor Copy codes present in one column only
The format of the statement is:
column = operator(ca,cb,cc, ...) where ca, cb, and cc are the columns whose codes are to be compared. Note that even if you are comparing codes in consecutive columns, each column must be identified separately, For example: the statement c181=and(c137,c138,c139) results in: copying of codes into c181,that present in all columns c137,c138 and c139 the statement c182=or(c137,c138,c139) results in: c182 contains a list of all codes present in AT LEAST ONE of the named columns.
Adding codes into a column To add codes into a column in addition to those that are already there, type: emit cn1’codes1’ [, cn2’codes2’ ...
Emit inserts codes into a column leaving the original contents intact. Its format is: emit cn’p’
More than one column may be entered on each line, provided that each one is separated by a comma. emit c567’7’, c110’2’
emit can only be used with single columns; string variables are not valid: emit c(100,110)$99$ does not work.
Deleting codes from a column To delete selected codes from a column, type: delete cn1’codes1 [, cn2’codes2’ ... ]
The delete statement is the opposite of emit in that it deletes codes from a column leaving the remainder intact. Its format is: delete cn’p’
More than one deletion may be effected with the same delete statement as long as each column is separated by a comma. delete c110’5’, c179’56’
Forcing single-coded answers To force single-coding of a multicoded columns, type: priority cn’code1’, ’code2’ ,’code3’,[cn2’code1a’, ’code2a’ ,’code3a’, ... ] where a code at the start of the list should be accepted in preference to any later in the list.
The statement used for this is: priority cn’code1’, ’code2’ ,’code3’,[cn2’code1a’, ’code2a’ ,’code3a’, ... ] where cn is the column whose codes are to be checked and ’p1’ to ’pn’ are the positions to check, entered in order of priority, the most important first.
priority checks only the listed positions; if any other codes are present they are ignored.
the statement: priority c249’5’, ’4’, ’3’, ’2’, ’1’ causes Quantum to scan column 249 to see first whether it contains a ’5’ and, if so, to delete all subsequent codes in the list. If c249 contains a ’5’ and nothing else, obviously there will be no extra codes to delete; this does not matter. If there is no ’5’ in c249, Quantum then checks whether it contains a ’4’; if so, any other codes in the range ’1/3’ are deleted, otherwise the program skips to the next code in the list and checks for that. If none of the listed codes are found, the column remains unchanged.
Setting a random code in a column To choose a random code from a list of codes, type:
data_var_name=rpunch(’codes’) To choose a random code from the codes present in a column, type:
data_var_name=rpunch(col_number)
For example: c115 = rpunch(’1/5’) will place one of the codes 1 through 5 in column 115.
Alternatively, you may use rpunch with another C-variable, thus: c115 = rpunch(c120) Once this statement has been executed, column 115 will contain one of the codes present in column 120.
Reading numeric codes into an array To set up an array based on numeric codes in the data, type: field array_name=column_spec [,code=cell_number, ...]
column_specs are references to the fields containing the numeric codes. code is a non-numeric code present in those fields and cell_number is the cell of the array which should be incremented whenever that code is encountered. Cells in the array are reset to zero at the start of each new record. To prevent this happening, enter the statement name as fieldadd rather than field. The rest of the statement is as shown.
The format of the field statement is: field output_array = column_specs [,special_specs] output_array is the name of the array in which you wish to store the counts of responses. You can use spare columns in the C array, but you may find your program is easier to read if you define an integer array of your own with a name which reflects the type of information it contains. For example, if you want an integer array called films, you might write: int films 5s ed field films = .....
When you define the integer array, make sure that you request as many cells as there are codes in the data. In this example there are five films so you define the array as having five cells. Quantum automatically creates an extra cell (cell 0) which it uses to count responses for which there is no cell allocated. If there were six films, for example, Quantum would increment cell 0 each time it found code 06 in the films columns. You might like to check the value of this cell as a means of reporting on invalid codes: if (films0 .gt. 0) write c(1,20) $Bad film code$ Negative and zero values also cause cell zero to be incremented. Codes which are shorter than the field width are accepted as long as they are padded with blanks or zeroes. The input_specs part of the statement defines the columns to read. You have a number of
choices here. First, you may list each column or field reference one after the other, separated by commas. The list must be enclosed in parentheses. In our example this would be: field films = (c(12,13), c(14,15), c(16,17))
Second, if you have sequential fields as you do here, you can type the start columns of each field followed by the field length. The list of start columns is separated by commas and enclosed in parentheses, and the field length comes after the closing parenthesis and starts with a colon. If you use this notation for the film example you would write:
field films = (c12, c14, c16) :2
If you wish, you can abbreviate this further by typing just the start columns of the first and last fields, followed by the field length.
field films = c12, c16 :2
Third, if the fields are not sequential, you list the start columns and field width of each group of columns (as shown above) and separate each group with a slash. For example, to read data from columns 12 to 17 and 52 to 57, with each field being two columns wide, you would type:
field films = c12, c16 / c52, c56 :2
This reads c(12,13), c(14,15), c(16,17), c(52,53), c(54,55) and c(56,57). You can also use this notation for single non-sequential fields. For example: field films = c23 / c36 / c71 :2 means c(23,24), c(36,37) and c(71,72).
The special_specs part of the statement is optional. You use it when a field contains non-numeric codes such as $&&$ for None of these films. If you want to count codings of this type, you must remember to allocate cells in the array for each code or group of codes you wish to count. You then include the notation:
code = cell_number to count those codes. For example:
int films 6s ed field films = (c12, c14, ch16) :2, $&&$=6 If you want to count more than one non-numeric code, list each one individually, separated by commas.
Quantum normally resets the cells of the integer array to zero at the start of each record. If you want counts to continue from one record to another, use a fieldadd statement instead of field. For example: fieldadd films = (c12, c14, c16) :2
Clearing variables To remove values from variables, type: clear var_name1, var_name2, var_name3
Changing the contents of a variable – Chapter 8 / 103 Variables of any type may be cleared using a clear statement: clear var1, var2, .... varn where var1 to varn are any valid Quantum variable or range of variables. For example:
clear c(109,180), t(1,200), myarray(29,33), myint, myreal Data variables are reset to blank, integer variables are reset to 0 and real variables are reset to 0.0. Variables can also be cleared using assignment statements (e.g., t1=0), but there are advantages to using clear instead. Firstly, clear is much easier to write. Secondly, with clear the compiler checks that the subscripts are in the correct range (e.g., 1 to 33 if ‘myarray’ has only 33 cells); this is not possible with the loop method because the subscript is a variable. However, if you use variables as subscripts with clear (e.g., clear c(t1,t1+5) subscript checking once again cannot be done.
Flow control Statements in the edit section are usually dealt with in the order in which they occur in the program. Quantum provides statements which may be used to alter this normal order of execution, for example, by missing out a statement or repeating a group of statements a number of times.
Statements of condition 1) Ed -Defines start of edit section of a quantum run. The statement is essential if a Quantum run contain an edit section 2) End -Defines the end of the edit section. This statement is a must if the run contains An edit section.
1) If -To define statements to be executed if a certain condition is true For example: if (numb(c10,c11,c12).gt.3) emit c20’9’
2) Else -To define statements to be executed if a given condition does not exist, For example: if (c115’1’); else; emit c140’2’ 3) go to - Ensures Quantum program will include statements which refer to certain respondents only; For example: The statement: if (c121n’1’) go to 50 causes Quantum to go immediately to the statement labeled 50 if column 121 does not contain a ’1’ Any statements between this if statement and statement 50 are ignored whenever a record is read where c121n’1’ is true.The statement labeled 50 may be any Quantum statement, but many people just write: 50 continue
4) continue- This statement is a dummy statement whose sole purpose is to join various bits of a program together. It is often used with a statement label as a destination for routing with go to, or to identify the end of a loop. 5) Loops- Are used to define repetitive statements. Loops are extremely important structures because they enable the same set of basic statements to be executed over and over again on a changing series of numbers, columns or codes. Their use can reduce the work involved in checking data. The statement which introduces a loop is do which is formatted as follows:
1. The word do. 2. A label number identifying the last statement in the loop. 3. An integer variable (for numbers or columns) or a letter (for codes) whose value is to be used by the statements in the loop. 4. An equals sign. 5. A list of whole numbers, integer variables or codes which are the values the integer variable or letter is to take. These may be entered in two ways Loops should be terminated by any statement other than go to, stop, return, another do or an if containing any of these words. The main purpose of the terminating statement is to identify the end of the loop and send the program back to the start of the loop. Go to and return send the record elsewhere, stop terminates the run and another do indicates the start of another loop. The statement most often used to terminate a loop is the dummy statement continue. Any statement that terminates a loop must be preceded by a label number. Thus, the usual format of a loop is: do label.number int.var = value list
- - statements to be executed - label.number statement For example: do 20 t5 = 125,145,5 if (c(t5,t5+4).gt.3000) c(t5,t5+4)=$ $ 20 continue 6) Reject- To reject a record from the rest of the edit Normally all records are passed straight from the edit to the tabulation section regardless of whether or not they contain errors. Reject tells Quantum to continue editing the record but not to include it in the tables. For instance, we might write:
if (c73’8’) reject if (c80’1’) t5=t5+1 end to reject records in which column 73 contains an ’8’ from the tabulations but not from the rest of the edit. Therefore, even if c73’8’, the record is still checked for a ’1’ in column 80 and if one is found, t5 is incremented. 7) Return - To send the record to the tabulation section, The word return in Quantum bears no relation to the same word in English. It does not mean go back to the start of the edit or anything like that, rather it means ‘terminate the edit immediately and jump to the tabulation section’. Once the record is tabulated Quantum reads in another record as usual. If there is no tabulation section, the next record is read in straight away.
Return is very often used with reject to reject a record without finishing the edit. For example: if (c73’8’) reject; return if (c80’1’) t5=t5+1 end Here any records in which c73’8’ are rejected from the tables, but, because reject is followed by return which sends records to the tabulation section, editing is terminated immediately. Thus, only records in which c73n’8’ will be tested for a ’1’ in column 80.
8) Stop -To stop editing records and start tabulating records read so far Stop tells Quantum to stop the run and print tables once editing has been completed on the current record. For example, we may want test tables for first 100 people,so we set up a counter and terminate the run when it reaches 100: The statement: if (rec_count.eq.100) stop will stop editing records and start tabulating records read so far
9) Process - To send a record temporarily to the tab section Process is an edit statement which is similar to return but must not be confused with it. When return is executed, the record is sent on to the tabulation section; after the tables are completed for that record, the program returns to the start of the edit section and the next record is read in.
When process is executed, the record is also sent immediately to the tabulation section where it is used in table creation. However, after the record has been tabulated, control is passed back to the edit section to the statement immediately following the word process. The record continues through the edit and any statements after process applicable to the record are executed. At the end of the edit the record is passed through the tabulation section again. 10) Split - To write correct records out to a clean data file and incorrect records out to a dirty data file Clean and dirty data files are the terms used to refer to files of correct and incorrect or rejected records created automatically by the edit statement split.
Examining records Holecounts Holecounts are used to obtain an overall picture of the data before you write your edit program. For each column they show: o
a distribution of the codes – e.g., how many respondents have a 2 in column 56
o
the density of coding – i.e., how many respondents have 1, 2 or 3 or more codes ineach column
o
the total number of codes for the whole data file.
Creating a holecount To create a holecount, type: count c(start_col, end_col) [$text$] where text is the holecount title. To create a holecount you will use the count statement: count c(start_col,end_col) [$text$] where text is the heading to be printed at the top of each page. This is optional; if it is omitted the holecount will simply be headed ‘Holecount’. Our example was created by the statement: count c(1,16) $Demonstration Holecount$
Frequency distributions A frequency distribution enables you to inspect the contents of a field of columns containing alphabetic or numeric data. For example, in a shopping survey the price the respondent paid for a bottle of mineral water may be stored in columns 112 to 114. A frequency distribution will tell you how many respondents bought mineral water at particular price. This is very useful for
determining how the values in these fields should be grouped for tabulation, as well as for rough estimates of medians. To create a frequency distribution sorted in alphabetic and rank orders, type: list c(start_col, end_col) [$text$] where text is the heading to be printed. To produce a frequency distribution sorted in alphabetic order only, type lista instead of list. For a distribution sorted in rank order only, type listr instead of list. Here are some examples: listr c(107,108) $Contents of cols 7 and 8$ lista c(100,104) $First Set of Car Brands$ The first example produces a frequency distribution of the contents of c(107,108) sorted in numeric order; the second example generates a list of car brands which will be sorted in alphabetic order.
Data validation In earlier section we discussed ways of examining the data for a set of records (with count) or for an individual record (with write). In general, however, we want to check the validity of the data for individual records by putting in the edit a set of testing sentences which will tell us not only whether a record contains an error but also what that error is. There are two types of checking sentence. The first involves checking whether a column contains the correct type of coding (single-coding/ multi coding) and whether the codes in that column are valid. Take the question on a respondent’s sex which may be Male, coded c106’1’, or Female, coded c106’2’. c106 must be single-coded since no person can have two sexes, and the only codes which may appear in that column are 1 and 2.Any record in which c106 is not single-coded with a 1 or a 2 will be flagged as incorrect. The second type of checking involves making sure that columns whose contents depend on the contents of other columns contain the correct codes. For instance, suppose the questionnaire asks whether the respondent has ever used a particular brand of washing up liquid. The answer is coded into c125 as ’1’ for Yes and a ’2’ for No. If the answer is Yes, the next questions concerning price and quality are asked. If c125’2’ indicating that the respondent has not used that brand of washing up liquid, the following columns must be blank. Conversely, if c125’1’, the following columns must be coded according to the codes on the questionnaire.
require Both tasks listed above can be carried out using if but sometimes they can become very complicated and repetitive. Therefore, Quantum has an additional testing statement, require, specifically designed to increase the efficiency of this checking process.
Require is used in three types of sentence: Column Validation Tests columns against a given set of characteristics and deals with records not meeting the requirements according to a specified action code.
Testing the Validity of a Logical Expression Tests a logical expression and, if it is true, continues with the next statement. If the expression is false, the record is dealt with according to the given action code.
Testing the Equivalence of Logical Expressions Compares the logical value of a group of logical expressions. If all are true or all are false, the run continues with the next statement, otherwise if the expressions yield a mixture of values the specified error action is carried out. The require statement has three forms, depending upon the function it performs, and these are described in the subsequent sections. Each one must start with the word require which may be abbreviated to R.
Column and code validation To validate columns and codes, type: require[/code/] condition col1 [,col2 ...] where code is the error action code, condition is the type of coding required, and col1 and col2 are the columns or fields to be tested. This form of the require statement has four basic parts:
1. The word require or the letter r followed by a space. 2. An optional error action code enclosed in slashes. 3. A code defining the type of coding required. 4. The column or columns to be checked, separated by commas.
Checking type of coding Checking with require can be as simple or complex as you like. In this section, we will start with the simplest checks and deal with each extra feature in turn. We will assume, unless otherwise
stated, that the error action code is the default Print and Reject (code 3) and will omit it from most of the examples accordingly The most basic form of the require statement simply checks whether the column or field of columns contains the correct type of code; it does not check the individual codes themselves. Code types may be: b
Blank
nb
Not blank (i.e., single-coded or multi coded)
sp
Single-coded (literally, single-punched)
spb Single-coded or blank One of these types must follow the word require since it tells Quantum what to check for. All that remains is to say which columns are to be inspected; just list each column or field of columns at the end of the statement. If more than one column or field is defined, each one must be separated by a comma. Here are some examples in which the record to be checked is: ----+----1----+----2----+----3----+----4----+ 002411123481231&- *1927235537*&& 1 1 1 The statement: require nb c10, c(25,35) checks that columns 10, and 25 to 35 inclusive are not blank – they may contain any number of codes. This record satisfies both conditions so it passes on to the next statement in the edit. The statement: r sp c11, c15, c23, c41 looks to see whether columns 11, 15, 23 and 41 are single-coded. In our record they are, but if this were not the case (say c11’123’) the record would be printed out and rejected from any tables that may be produced. Additionally, Quantum would tell us ‘Column 11 is 123’.
Comments with require To define a message to be printed when a record fails a test, type: r [/err_code/ ] condition columns $message$ When incorrect records are printed out, require automatically prints a short text describing the error. Normally, it tells you what codes were found in the column which is wrong, but if this is not what you want, you may define your own error text by entering it enclosed in dollar signs at the
end of the statement. This text will then be printed in place of the default text when errors are found. For example, if c329 is multicoded when it should be single-coded, the statement: r sp c329 will print the whole record and tell us which codes were found in that multicode: Column 329 is 13 Instead of being told which codes the column contains, you may prefer to see a message linking the error to a question on the questionnaire. In this case you will need to add your own error text as follows: r sp c329 $q21a not sp$ These texts may be as long or short as you like.
Checking codes in columns To check for specific codes in a column, type: r [/err_code/] condition col1’codes1’ [, col2’codes2’ ... ] where codes1 are to codes to be tested for in column or field col1, and codes2 are the codes to be tested for in column or field col2. Any codes which are present in col1 but are not listed in codes1 are ignored. The same applies to any other column and code pairs listed. Sometimes it is not sufficient to check just the type of coding, and you will want to know whether the codes found are valid for that column. To do this, we use the information given in the previous section as a base, and add on our first ‘optional extra’. To check whether a column or field of columns contains specific codes, follow the column specification with the codes to be checked, enclosed in single quotes. For example: r /5/ sp c223’1/5’ tells us that column 223 should be single-coded within the range of codes 1 through 5. Any other
codes in this column are ignored. Thus, a record in which c223’14’ is incorrect because it contains two of the listed codes, whereas a record in which c223’27’ is correct because it contains only a 2 from the range ’1/5’. Of course, any record which does not contain a 1, 2, 3, 4 or 5 at all is also incorrect, regardless of whether or not it is single-coded: c223’9’ is just as wrong as c223’789&’.
Exclusive codes To check that a column or field contains no codes other than those listed, type: r [/err_code] condition col1’codes1’o
If col1 contains any codes other than those given in codes1, the test is false. Now that you know how to check codes, the next thing to discuss is how to check that all other code positions are blank. We have said that statements of the form: r sp ca’p’ accept all records containing only one of the codes ’p’ in column a, regardless of what other codes are also present. To check that a column contains only the listed codes and nothing else, follow the code specification with the letter O (for only) in upper or lower case. For example, to indicate that c356 must be single-coded in the range ’1/5’ and that all other positions (’6/&’) must be blank, you should type: r sp c356’1/5’o which is the same as if (c356’6/&’.or.numb(c356).ne.1) write; reject Any of the following would cause the record to be printed and rejected: c356’34’ c356’59’ c356’8’ c356’ ’
Require may define conditions for more than one column. Just follow each column with the code positions to be checked and separate each set with a comma: r sp c164’12-’, c165’1/70’, c166’1/3’, c167’1/9-’, c168’1/5’ Here the columns to be checked are consecutive but have been listed separately because they each have different sets of valid codes. If all columns could be single-coded in the range 1 to 7 we might abbreviate this to: r sp c(164,168)’1/7’ $q10a/e$ since this notation means that each column in the field must be single-coded within the given range rather than that the field as a whole may contain only one of those codes.
Automatic error correction To define a correction code to be used as a replacement for codes which fail the required condition, type: r [/err_code/] condition col1’codes1’ :’new_code’
new_code is the code or codes to be inserted in col1 if it fails the test condition. Any codes already in that column are overwritten.
As you know, records found to have errors are printed, coded and/or rejected according to the error action code. When the run is finished you will look at these records and, if possible, correct the errors by using the on-line edit or correction file facilities. Occasionally you will know in advance what to do with certain types of error; say, for instance, the respondent’s sex has been miscoded. You may decide or be told to recode this person as a ’3’ in the appropriate column indicating that the sex was not known. The way to do all this in one go is to write the normal require statement that checks columns and codes, and to follow the code specification with a colon (:) and the replacement code (in this case ’3’) enclosed in single quotes, thus: r /2/ sp c106’12’ :’3’ Any record in which c106 is not single-coded with either a ’1’ or a ’2’ will have the contents of c106 overwritten with a ’3’. The equivalent using if and an assignment statement would be written: if (numb(c106’12’).ne.1) c106’3’; +write $c106 incorrect$ Once again, the require is shorter and quicker. When working with fields, it is not possible to define replacement strings for the field as a whole. You should, however, note that if a single replacement code is given for a field of columns, any incorrect columns in that field will be overwritten with the replacement code. The correct columns remaining untouched. If we have: +----4----+ 1927 and we write c(237,240)’1/5’ :’&’" we will have: +----4----+ 1&2&
Validating logical expressions This type of require also has four parts, two of which are optional:
1. The word require or the letter r followed by a space. 2. An optional action code enclosed in slashes. 3. A logical expression enclosed in parentheses. 4. An optional error text enclosed in dollar signs.
For example: r /3/ (c133’4’ .and. c140n’5’) $Cols 33/40 incorrect$ says that c133 must contain a ’4’ and c140 must not contain a ’5’. If one or other or both expressions are false, Quantum prints the record out with the message ’Cols 33/40 incorrect’ and rejects it from the tables.
Testing the equivalence of logical expressions To test whether a group of logical expressions all have the same logical value, type: r = (expression1) (expression2) ... There must be a space between r and the = sign.
Require can evaluate groups of expressions and perform given tasks depending on whether all expressions are true or all are false. When all the expressions have the same value (i.e., all true or all false) Quantum continues with the next statement in the program, whereas if some are true and some are false, the record being tested will be dealt with according to the given (or default) error action code. This statement has five parts:
1. The word require or the letter r. 2. An equals sign which must be preceded by a space. 3. An optional action code. 4. The expressions to be evaluated, each one enclosed in parentheses . 5. Optional error text enclosed in dollar signs. This type of statement is generally used to check routing patterns. For example: if a ’2’ in c125 means that the respondent did not try Brand A washing powder, we would expect columns 126 to 145 which record his opinion of it to be blank. On the other hand, if he tried the washing powder, we would expect to find his opinions about it coded in columns 126 to 145. This can be written: r = (c125’2’) (c(126,145)=$ $) which says that to be accepted, a record must either have a ’2’ in column 125 and blanks in columns 126 to 145, or something other than a ’2’ in c125 with at least one code somewhere in c(126,145).
Actions when a require statement fails When Quantum executes a require statement, it sets the variable failed_ to True if the data fails the require statement or to False if the record passed the requirement. You can then test whether failed_ is True and take whatever actions you wish. For example, if you are checking
that the respondent’s sex is coded as a ’1’ or a ’2’ only, you may wish to blank out the column if it contains any other code or codes. You could write this as: r sp c123’12’ if (failed_) set c123’ ’ The test for failure is made on the last require statement executed for the current record. This may not always be the most recent require statement in the program, and it may not be the
require statement you intend Quantum to execute. If you write: r sp c112’1/5’ if (c115’1’) r b c116 if (failed_) set c116’ ’ the test for failure could apply to either of the previous statements. If column 115 does not contain a ’1’, the second require statement will not be executed and failed_ will be True if column 112 is not single-coded in the range ’1/5’. If column 115 contains a ’1’, then failed_ will be True if column 116 is not blank. You can get around this potential problem by setting failed_ to zero (the equivalent of False) just before the require statement you wish to test. For instance: r sp c112’1/5’ failed_ = 0 if (c115’1’) r b c116 if (failed_) set c116’ ’
Data correction There are four ways to correct data: o
Correct the data in the original data file.
o
Correct the data in the C array interactively.
o
Replace the incorrect codes with specific codes using edit forcing statements.
o
Write a file of corrections to be merged with the original data when it is read in by a Quantum program.
Forced editing (forced cleaning) This section does not introduce any new keywords; instead it tells you how to combine the statements that you already know in order to clean your data.
A record which generates too many error messages, or which is clearly incorrect can be removed, as noted. Suppose its serial number is 2004. Then we have: if (c(101,104)=$2004$) reject; return This rejects the record from the rest of the edit and the tabulation section as well. This statement should be at the beginning of the edit to avoid unnecessary editing of a useless record. Columns within a record can be removed by blanking them out or setting them to a common reject code, often a minus or ampersand. For example: if(c125n’12’) c125’&’; c(126,145)=$ $ All records in which c125 contains neither a 1 or a 2 will have the contents of that column replaced with an ampersand, and whatever is in c(126,145) blanked out. As a real-life example, suppose a 1 in c125 means that the respondent visited the market, and a 2 in that column means he did not. Information about purchases made at the market are stored in c(126,145). If column 125 contains neither a 1 or a 2, we cannot clearly establish whether or not the respondent visited the market so we set c125 to a special code and blank out any information about purchases. Inserting correct data is generally more difficult than removing invalid data, because you very often don’t know what the correct data is. However, if you do know, you can correct the data record by record, or make the same correction for any record which is incorrect. For instance: if(c(101,104)=$2222$) c112’2’; c(113,114)=$ $ corrects the record whose serial number is 2222 by setting a 2 into c112 and blanking out c(113,114). If you do not know what the correct data is, you may decide to replace the incorrect code or codes with a valid code chosen at random. For example: if (c(101,104)=$3625$) c145=rpunch(’1/5’) replaces whatever was in column 145 with one of the codes 1 through 5 for the record whose serial number is 3625.
Introduction to the tabulation When a record has passed through the edit without being rejected, it is passed to the tabulation section, if one exists. At this point, data, integer and real variables are available to create tables. The program deals with one complete record at a time. The tabulation section consists of a series of statements which determine the contents of the tables. Each table may be thought of as a matrix of cells. Each cell of this table is defined by two conditions, one from the row and one from the column.
The hierarchy of the tabulation section The tabulation process is hierarchical in characteristics can be defined at one level which will apply to that and all lower levels.
Components of a tabulation program A tabulation run consists of three sets of control statements:
Run control statements Run control statements determine the overall characteristics of the run, and contain the text which is constant for all tables. Filters may be defined, applicable either to all tables in the run or to all tables defined before another general filter statement is read. Titles are entered in various ways depending upon their position in the table.
Defining run conditions To define global and default conditions for the run, type: a;opt1[; opt2; ... ] at the start of the tabulation section. Global run conditions, if any, are defined on the a statement. If used, it must be the first statement in the tabulation section. Its format is: a;options where options are keywords defining the global characteristics of the run. You may list as many keywords as you like, provided that they are separated by semicolons (;), for example: a;dsp;op=12;date;dec=1 Some of the commonly used options and
it’s functions are :
colwid=n Defines the width of columns in the printed tables where no p statements exist in the column csort Sort tables column-wise (i.e., horizontal sorting rather than vertical row-wise sorting). date By default, tables are printed without a date. Use of the keyword date causes the current date to be printed in the top right-hand corner of each table. The date is in the format dd mm yy dec=n This determines the number of decimal places for absolute figures. If
dec= is not used, the default of no decimal places is assumed. decp=n This sets the number of decimal places required for percentages. The default is decp=1 meaning one decimal place. This applies when op=0, 2, 7 or & (see below). Any number of
decimal places are allowed, as long as you make each column wide enough to accommodate them. dsp This leaves one blank line between each row of data in a table. Without this, one line follows directly underneath another. flt=name Invokes the filter conditions and titles named on the flt= statement. If the filter defines conditions, the rules governing data options apply. flush Causes rows containing percentages to be printed with the percentages directly below the absolutes rather than one column to the right. indent=n Where a row text is longer than the space allocated to the row text in the table, Quantum breaks the line in between words and contin ues the text on the next line. To have these continuation lines indented from the left margin, specify the amount of indentation required with indent=. Texts may be indented by between 0 and 15 spaces: the default is indent=0. op=n This keyword governs the type of output in the tables. Output types are & Total percentages. The value in the cell is percentaged against the number in the upper left-hand corner of the table (normally the base) rather than on the totals in the relevant column or row. If the table contains more than one base element, percentages are calculated using the leftmost figure in the most recent base element. - Row rank figures are printed below each cell. Figures are ranked within rows, using 1 for the largest figure. Where two or more numbers have the same rank, they are all assigned the lowest rank possible. Thus, if the previous rank was 2 and the next value to be ranked occurs in the row three times, those numbers will all be ranked 5. 1. 0 Row percentages. 2. 1 Absolute figures (default). 3. 2 Column percentages. 4. 3 Column rank figures are printed below each cell. Figures are ranked within columns, using 1 for the largest figure. Where two or more numbers have the same rank, they are all assigned the lowest rank possible. Thus, if the previous rank was 2 and the next value to be ranked occurs in the column three times, those numbers will all be ranked 5. 5. 5 Prints the text 100% on each cell of the base row.
6. 6 Used with op=2 to produce two percentages for each cell. 7
Cumulative percentages.Indices. The index for a cell is generated by dividing the row percentage in the cell by the row percentage in the base row.
8
Prints absolutes and percentages side by side.
age This option invokes automatic page numbering. Since this is the default – pages are numbered from 1 automatically – this option is generally used in its negative form of nopage which suppresses automatic page numbering. paglen=n This determines the number of lines printed on each page. The default is paglen=60 lines but any value between 10 and 10,000 is valid. pagwid=n Normally tables can be up to 132 characters wide. pagwid= enables you to decrease the page width or to extend it to a maximum of 10,000 characters. pc This prints percent signs after percentage figures. This is the default, so this option is usually used negatively – nopc – to print percentage figures without percent signs. sort: Creates sorted or ranked tables. wm=n This keyword names the weighting matrix to be used.
Table control statements Table control statements name the questions to be cross-tabulated against each other to create tables. In Quantum, these questions are called axes. The most important table control statement is the tab statement which lists the axes to be used to create an individual table. These statements may also specify the text and overall characteristics of each table.
Creating a table To create a table, type: tab [axis1] [axis2] [axis3] [axis4] row_axis column_axis [;options] In order to create a table, Quantum needs to know which is the column axis and which isthe row axis. If the table has more than two dimensions you will need to say which axes should be used for the extra dimensions. Each table must be created separately using a tab statement, as follows: tab row-axis column-axis Tab statements must precede the axes definitions in your program file. multidimensional tables Multidimensional tables are ones created from more than two axes. They occur when a series of tables has the same rows and columns, but each table in the group has additional characteristics which are themselves the conditions of other axes. This sounds complicated, so let’s take an example. Our basic table is of age by sex created by the tab statement: tab age sex
We have been asked to produce a separate table of age by sex for each region of the country. Whereas before each cell had two conditions (age and sex) it now has three (region, age and sex). There are two ways of writing this specification. You may either: a) write as many tab statements as there are regions, and filter each table of age by sex to include only those respondents resident in a given region, or b) write a single tab statement to create a three-dimensional table. Both methods produce the same results – the main advantage of (b) over (a) is that (b) involves you in a lot less work. The tab statement to create the multidimensional table is: tab region age sex
commonly used options in tab section sid place this table to the right of the previous one und place this table underneath the previous one add add this table to the previous one div divide the previous table by this one To place tables side by side, type a tab statement for the first table and follow it with: sid row_axis column_axis [;options] Options are any of anlev=, c=, celllev=, inc=, maxim, means, median, minim and wm=. To place tables one underneath the other, type a tab statement for the first table and follow it with for example: the statement tab region sex;c=250’1’ sid region age;c=254’1’ will place two tables side by side To place tables one underneath the other, type a tab statement for the first table and follow it with: und row_axis column_axis [;options] Options are any of anlev=, c=, inc=, maxim, means, median, minim and wm=. for example: the statement tab region sex;inc=c(25,28)
und region age;inc=c(35,38) will place the second table underneath the first one To add tables, type a tab statement for the first table and follow it with: add[col_offset[,row_offset] ] axis_names where axis_names is the same number of axis names as appears on the tab statement. for example: tab ax01 bk01 add ax02 bk02 Here we are creating the table ax02 by bk02 and adding it to the table ax01 by bk01. To divide one table by another, define the top table on a tab statement followed by: div axis_names [;options] where axis_names is a list of as many axis names as there are on the tab statement, and
options is any of the keywords anlev=, c=, inc=, maxim, means, median, minim or wm=. The statements: tab ax06 brk1 div ax07 brk2
Defines the denominator of a table to be produced by dividing the table specified On the previous tab statement by that on the div line.
Axis control statements Broadly speaking, an axis is Quantum’s way of defining questions from the questionnaire. Each axis consists of a set of statements which establish the conditions and text for the rows and columns of a table. The axis is an integral part of your tabulation program: without it there can be no tables. At its simplest level an axis represents a question on the questionnaire, and contains statements which define the responses to that question and the codes by which Quantum can identify them. Each axis may be used to create one or more of the following: o
the rows of a table
o
the columns of a table
o
a page in a set of tables
o
a set of pages in a group of tables
Types of elements within axes There are four types of element in an axis: o
Text and condition elements
o
Text elements
o
Arithmetic elements
o
Statistical elements
Text and condition elements These elements contain text and conditions which define the characteristics a respondent must have to be included in the element. In a simple axis each element will refer to one response to a question and will produce a row, column or table of figures telling you how many people gave that response. The general format of a condition is: c=logical expression c=cn’p’ is true if column contains the code ‘p’ and false if does not
Most commonly used count-creating elements for tabulation are: Count-creating elements are the basis of any table since they tell you how many respondents gave which responses. There are several statements which will create numeric elements; which you use will depend upon the type of data to be read and the complexity of the condition defining eligibility for inclusion in the element. Statements are: n01 used for simple or complex conditions n15 same as n01 except that the element is not printed n10 creates a base for percentaging n11 same as n10 except that the element is not printed col used for simple conditions val used for numeric data fld used for numeric codes bit a variant of fld
Text elements These elements create nothing but text; no cells containing counts or values are created from these elements.
There are three statements which are used within an axis to create text-only elements. These are: n03 create a text-only element n23 create a subheading n33 continue long element texts If you would like subheadings to be underlined, place one of the options unl1, unl2 or unl3 on the n23. The hdlev= keyword allows you to define various levels of subheading, starting at level 1 for the top subheading down to level 9 for the lowest level. If you would prefer the text to be left justified above the columns to which it refers, add the option hdpos=l to the n23. If you would prefer the text to be right justified, use hdpos=r instead. (hdpos=c is also available for centered text but since this is the default you are unlikely to need it).
Arithmetic elements These are elements which contain arithmetic values rather than counts. For example, one element may tell you the number of times a product was bought rather than the number of people who bought it.
Statistical elements Part of Quantum’s power lies in the fact that it offers you the ability to create various types of statistical output without having to know the formulae necessary to calculate them. These elements contain totals, subtotals or statistical functions such as means and standard deviations. Statements which perform statistical calculations are: n07 average n12 mean n13 sum of factors n17 standard deviation n19 standard error of the mean n20 error variance of the mean n30 medians n04 total n05 subtotal To define incremental values for means, standard deviations, standard errors and error variances, type:
n25[element_text; inc=arith_expr [;c=log_expr] [; row] [; col] The n25 does not normally print anything in the table. Use row and/or col to print these values as the rows and/or columns of the table.
factors fac= defines factors when the numbers in the data are not to be used (e.g., the data may be multicoded) whereas inc=, also mentioned in the Data Options section, reads the data from the column and uses that as the factor for each row. What to use when is best illustrated by examples, although in general you should try to use fac= whenever possible since, in processing terms, it is more efficient than inc=.
The respondent has been asked to say how much he agrees or disagrees with a particular statement. If he agrees very much, he has a code ’1’ in, say, C210. If he agrees somewhat, he has a ’2’; if he neither agrees nor disagrees he is coded as ’3’; disagrees somewhat, a ’4’ and disagrees very much, a ’5’. People who refuse to answer are coded as C210’&’. We wish to obtain a numerical mean value of these opinions using factors of +2 for agrees very much down to –2 for disagrees very much. These are not the same as the codes representing these responses in the data, so we enter them with fac=. People who refused to answer will appear in the table but will not be included in the mean. So the axis will look like l vers1 n01Agrees Very Much;c=c210’1’;fac=2 n01Agrees Somewhat;c=c210’2’;fac=1 n01Neither Agrees Nor Disagrees;c=c210’3’;fac=0 n01Disagrees Somewhat;c=c210’4’;fac=-1 n01Disagrees Very Much;c=c210’5’;fac=-2 n01Refused;c=c210’&’ n12Mean;dec=2
Miscellaneous ‘n’ statements To define a condition that applies to a group of consecutive elements, type: n00;c=logical_expression
An n00 defines a condition applicable to all subsequent rows until another n00 is read or until the end of the axis, whichever is the sooner. Its format is: n00[;c=condition] Where the condition is any valid logical expression. To override the automatic page turnover within an axis, insert the statement: n09[Text] at the point at which a new page is required. ‘Text’ is an optional text which will be printed beneath the table headings at the top of the next page.
More commands to generates counts The col statement To define a list of elements with codes all in the same column, type: col number;[base;] elm_txt1[=’codes1’] [; elm_txt2[=’codes2’] ... ]
If several consecutive statements in an axis have conditions defined by a code or codes in the same column, you can save yourself a lot of time and effort by replacing the individual n01 statements with a single col statement. One of the simplest col statements you can write is: col n;[base];Rtext1[=’p1’];Rtext2[=’p2’] where n is the column containing the codes for this question, base creates a base element, and Rtext1=’p1’, Rtext2=’p2’ and so on define the texts and conditions for the individual elements. To explain more clearly how the col statement works, let’s take the axis mstat that we wrote earlier and rewrite it using a col statement. Originally it consisted of five statements: n10Base n01Single;c=c109’1’ n01Married;c=c109’2’ n01Divorced;c=c109’3’ n01Widowed;c=c109’4’ We can replace these with the line: col 109;Base;Single;Married;Divorced;Widowed
The val statement Val is used when the conditions defining eligibility for inclusion in an element are positive numbers or ranges of positive numbers rather than codes; that is, where the question in the questionnaire requires a numeric response rather than a single or multicoded answer; for example, the number of people in the household, or the number of telephone calls made. To define elements whose condition is that a variable contains a specific value, type: val variable; = ;number1 [element_txt1];number2 [element_txt2] ... If the elements contain text as well as a number, the number may appear anywhere in the text. If the value is not part of the text, type: val variable; = ;element_text = number; ... The base, hd=, tx= and =rej options described for col statements are also valid on val statements of this type.
Val can be used to test whether the value of a variable is equal to a given value. If it is equal, the cell count is incremented by 1. The format is: val variable;[Base];[hd=Text];=;[tx=Text];n1 [Text1]; ... ;nn [Textn] where variable is the data, integer or real variable whose value is to be tested, n1 to nn are the values against which the variable is to be compared, and Text1 to Textn are the row descriptions to be printed in the table. The equals sign indicates that the test is for arithmetic equality rather than ranges. Base, hd= and tx= are optional and create the base, sub-heading and text-only rows of the table as described for col statements. Let’s work through an example to illustrate this. Suppose c(110,111) contains data on the number of people in the household, and we wish to set up a table showing how many respondents live in households containing 1, 2, 3, 4, 5 or 6 people, so we write: val c(110,111);Base;Hd=Number in Household;=;1 Person;2 People; +3 People;4 People;5 People;6 People
The fld statement To define elements whose condition is that a field contains a specific numeric code, type: fld column_specs;element_txt1[=code[,code ...] ]; ... The base, hd=, tx= and =rej options described for col statements are also valid on fld statements.
The column specs on a fld statement define the columns to be read. There are three ways of entering them. First, you may list each column or field reference one after the other, separated by commas. The list must be enclosed in parentheses. In our example this would be: fld (c(12,13), c(14,15), c(16,17)) Second, if you have sequential fields as you do here, you can type the start columns of each field followed by the field length. The list of start columns is separated by commas and enclosed in parentheses, and the field length comes after the closing parenthesis and starts with a colon. If you use this notation for the film example you would write: fld (c12, c14, c16) :2 If you wish, you can abbreviate this further by typing just the start columns of the first and last fields, followed by the field length. This time you do not use parentheses: fld c12, c16 :2 Third, if the fields are not sequential, you may list the start columns and field width of each group of columns (as shown above) and separate each group with a slash. For example, to read data from columns 12 to 17 and 52 to 57, with each field being two columns wide, you would type: fld c12, c16 / c52, c56 :2 This reads c(12,13), c(14,15), c(16,17), c(52,53), c(54,55) and c(56,57). You can also use this notation for single non-sequential fields. For example: fld c23 / c36 / c71 :2 means c(23,24), c(36,37) and c(71,72).
The element specs part of the statement defines the element texts and the codes which represent those responses. If you enter element texts by themselves, Quantum assumes that the first text is code 1, the second text is code 2, and so on. The codes apply to all fields named in the column specs part of the statement. Therefore, to define elements which will count the number of people who saw each film, you would write: fld c12,c16:2;Columbus;Aliens 3;Pretty Woman; +Green Card;Batman 2
Weighting in Quantum Sometimes in surveys we treat the respondents as representatives of the total population of which they are a sample. Normally, tables reflect the attitudes of the people interviewed, but we
may want the tables to reflect the attitudes of the total population instead, so that it seems as if we had interviewed everyone rather than just a sample of the population. This, of course, assumes that the people interviewed are a truly representative sample. If we take a sample of 380 from a population of 10,000 middle-aged housewives, and discover that 57 members of this sample buy cheddar cheese, we may want the number of middle-aged housewives who buy cheddar cheese to read 1,500 in our tables, not 57. Moving from 57 to 1,500 is the fine art of weighting. In this case, each middle-aged housewife has a weight of 10,000/380. Since 57 of them buy cheddar cheese, the number in the cell will be: 10000 / 380 * 57 = 1,500 Weighting is also used to correct biases that build up during a survey. For example, when conducting interviews by telephone you may find that 60% of the respondents were women. You may then want to correct this ratio of men to women to make the two groups more evenly balanced.
Weighting methods Quantum is sufficiently flexible to allow more than one set of weights for a given set of respondents. Which set is applied is determined by options on the a,sectbeg, flt or tab statement or on the statements which create the individual rows or columns of a table. Each set of weights, however, will apply one weight for each respondent. There are two ways of calculating weights: a) The weight for each respondent may be part of the data for that respondent, or it may be calculated in the edit and passed to the tabulation section as a variable. b) The more common method of weighting is to define a set of characteristics and apply specific weights to respondents satisfying those characteristics.
Types of weighting Quantum offers factor, target and rim weighting, preweights, postweights, weighting using proportions and weighting to a given total.
Factor weighting With factor weighting, every record which satisfies a given set of conditions is assigned a specific weight. You would generally use it when the weights are calculated outside of Quantum – for instance, you may be told that all unemployed people in London require a weight of 10.5, whereas unemployed people in the rest of the country need a weight of 7.3.
Target weighting Target weights may be used when you know the exact number of respondents you want to appear in each cell of the weighted table. For example, in a table of age by sex, you may know
the exact number of men under 21, women under 21, and so on, to appear in the table once it has been weighted. The weights that you define in your matrix are therefore the values to appear in the weighted table rather than the weights to be applied to each respondent of a given age and sex.
Rim weighting Rim weighting is used when: a) you want to weight according to various characteristics, but do not know the relationship of the intersection of those characteristics, or b) you do not have enough respondents to fill all the possible cells of the table if you were to weight the data using the multidimensional technique described above. For example, you may want to weight by age, sex and marital status and may know the weights for each category of those characteristics (e.g. people aged 25 to 30; men; single people). However, you may not know the weights for, say, single men aged between 25 and 30, married women aged between 31 and 40, and so on.
Entering weights as proportions (input weighting) When we were talking about target weighting, we said that sometimes you might not know the actual counts of respondents in a group, even though you may know that the group is a certain percentage or proportion of the total population. For instance, you may know that 60% of the population is women, but you may not know how many women that represents. When this happens, you can enter the percentages or proportions as the weights for each group, and use the keyword input to indicate that these figures should be used as targets. For example, in a table of age by sex you would enter the proportion or percentage that each combination of age and sex is of the total population, and Quantum would calculate what weight to assign to each respondent in each category.
Weighting to a given total When you define targets which add up to more than the number of respondents in your sample, Quantum will calculate the weights for each respondent such that the total for the weighted table equals the total of the figures in the weighting matrix. You may define your own total figure (usually the number of respondents in your sample) using the keyword total=n, where n is the required weighted total. Quantum will then calculate the weights according to the values in the weighting matrix and will then adjust them to match the total you have defined.
Preweights Preweights, stored as part of each respondent’s data or created during the edit, are applied to individual records before target or factor weighting is applied. When the characteristic weights are targets, the preweights are used in the calculation of the weight for each respondent.
Postweights The opposite of preweights are postweights, which are applied after all other weights have been applied, and therefore have no effect on the way in which targets are reached. They are generally used to make a final adjustment to a specific item.
Descriptive statistics Quantum provides facilities for calculation of a set of basic statistics from the figures produced in Quantum tabulations. They include the statistics most commonly used for testing hypotheses about the values of proportions (percentages) and the locations (average values) of variables, and about differences in these between two or more subsets of the data. There are also chisquared statistics for testing hypotheses about a single distribution or about differences between two or more distributions. The statistical tests available are:
o One-dimensional, two-dimensional and single classification chi-squared tests o Four tests of differences between proportions (Z-tests) o Two tests of differences between means (T-tests) o Friedman’s test of differences in location between a set of related samples (sometimes known as ‘Friedman’s two-way analysis of variance’)
o Kolmogorov-Smirnov test of differences between two samples o McNemar’s test of the significance of changes o F Test for testing differences between a set of means (one-way analysis of variance (ANOVA))
o Newman Keuls test of differences between means o
For each statistic, Quantum also calculates and prints an associated significance level so that you can readily see the results of the tests you have performed.
Quanvert Quanvert is the Windowed version of quantum database. In other words , it is the GUI for Quantum . Quanvert can process surveys of any type, size or complexity. Whether it's a survey with hundreds of questions, or millions of respondents, or one that's been conducted on a regular basis for years - Quanvert can handle it fast. Quanvert has been specifically designed for the market researcher. You don't have to be a data processing or computer expert, or a statistician -you just have to be interested in your survey results! And you can investigate your data from your desktop, without having to search through
volumes of printed reports. There is no need to predict what analyses you will require before you receive your data Any table can be created based on any variable or question. You can test out any hypothesis, and dig as deep into the data as you wish. For instance, you may want to examine the age group of people who responded positively to an advertisement. You can then take this a stage further and produce a series of tables filtered on those females interviewed. Quanvert is especially powerful for analyzing individual responses to verbatim or "open" questions.
How is the database produced? Quanvert databases are specified and created using quantum- SPSS MR's leading package for editing, weighting and tabulating survey data. Quantum is already renowned in its own right as the most powerful tabulation system available today. You can create the database yourself using Quantum. Preparing Quanvert database
using Quantum
Before you can convert a Quantum spec and data file into a Quanvert database there are several tasks you may need to carry out first. These include checking the Quantum program to ensure that it will create the required information in the appropriate places, and setting up subdirectories if variables are not to be stored in the main project directory. If you have a large database from which you require only a few variables, you may use the raw Quantum data rather than creating a full Quanvert database. To create a Quanvert database, Following command needs to be given at the command prompt : quantum –v [–pd dir_1] [–td dir_2] [prog_file] [data_file] The –v parameter tells Quantum not to produce tables but, when it reaches the output stage, to run the flip program instead. The –pd and –td parameters allow you to read files from and create temporary files in directories other than the directory in which you are running Quantum. All Quanvert projects originate from Quantum. Although Quanvert produces tables identical to those generated by Quantum, it does not normally use the raw data and Quantum program files. Instead, it uses a series of compressed data and axis files, one pair per axis, derived from the Quantum files. These individual databases are referred to as inverted or transposed databases, and the process which creates them is called flipping. In databases with simple axes it is possible to run Quanvert almost immediately on the raw Quantum data.
Files created by flip File creates a number of files. The ones which are important to Quanvert are: The sex axis, for instance, will have two sex.ax containing the element texts and sex. fli containing the inverted data for that axis. Filename Contents *.ax
axes text files
*.fli
inverted data files
*.inc
numeric variables (inc) files
*.mul
values for numeric variables in axes
*.bit
bit files for named filters
*.btx
text for named filters
*.alp
text (alphanumeric) variables files
axes.inf
names of axes present in the database
incs.inf
names of numeric variables present in the database
alpha.inf
names of text variables present in the database
bits.inf
names of named filters present in the database
qvinfo
levels and weighting information
qvlvmn
levels cross-reference files defining the relationship between the
higher
level m and the lower level n
seg1.qv
default run conditions and titles from the a statement
wmvalsn.q
weights for weight matrix n
The sex axis, for instance, will have two sex.ax containing the element texts and sex. Fli containing the inverted data for that axis. To tidy a directory once the database has been created, type: flipclean [–a] under Unix or: flipclea [–a] under DOS. This deletes any temporary files created during the flip process but leaves intact any files which are needed for Quanvert. Example
Structure of Quantum Spec: A typical program might look like this: Struct;read=2;ser=c(5,8);crd=c(9,10);max=32
*include vars
Structure of the Record
External Variables and Arrays are declared in a file called Vars and included before including edit section
ed *include edit end
Edit section will have calculations of counts, column settings to get counts which are not straight-forward.
a;dsp;spechar=–*;decp=1;flush;wm=0;axttr;
Global commands which
+dec=0;rinc;acr100;dp;nsw;nopage;notype;
control the overall
+paglen=64;pagwid=145;
characteristics of a run
wm1 wax1 wax2;rim;input; +20;30;50;
Weighting of the dat in the output ( if required )
+50;50; +33;33;33
*include tabs
Will have details of what to be tabulated with what in order to get a table
*include axes
Contains the definitions of all variables used as Rows
*include breaks
Contains the definitions of all variables used as Columns