OVERVIEW
 


 
 

GUIDE 1: INTRODUCTION
GUIDE 2: CONSTRUCTING A TABLE
GUIDE 3: UNIVARIATE STATISTICS AND DISPLAYS
GUIDE 4: BIVARIATE BASICS
GUIDE 5: BIVARIATE CORRELATIONS
GUIDE 6: MULTIVARIATE CROSSTABULATIONS
GUIDE 7: BASIC REGRESSION
GUIDE 8: REGRESSION SPECIFICS
GUIDE 9: SAMPLING
INTRO STATS READINGS
AND ASSIGNMENTS


 
READ THIS GUIDE FIRST!
KEY TO: Huff, Chapters 2 & 3, pp. 27-52
Agresti and Finlay, Chapter 3, pp. 45-67 THEN Agresti & Finlay, Chapter 3, pp. 35-44

 
INTRODUCTORY STATISTICS AND DATA ANALYSIS
2010
DR SUSAN CAROL LOSH

 
GUIDE 2: CONSTRUCTING A TABLE

 
THE PROBLEM
COMPONENTS OF A TABLE
GETTING STARTED
PERCENTS 101
VARIATIONS ON THE PERCENT


THE PROBLEM

SOME EXAMPLES:

OK, you just collected a lot of data for your thesis or dissertation. OR

You were just designated by your workplace to compile statistics on the number of rehabilitation admissions. OR

You are a public administrator preparing a report for the city comission. OR

Perhaps you availed yourself of one of the increasing numbers of online database archives, for example, to investigate gender and computer access to check out "the digital divide."


 

How are you going to compile all this information in a form that is succinct, easy to read and understand, and yet does justice to your data?

What you are staring at right now is a pile of questionnaires or a page filled up with numbers. Lots of questionnaires or lots of numbers don't convey any kind of intelligible information to anyone. So, you must systematically reduce all this information to a form that you can easily manage and describe.

A TABLE is a common and useful way to present data.

Don't be a snob about tables. OK, they aren't advanced statistics.

But a table is the most useful basic building block in your tool chest of data analytic techniques and the presentation of your results. If you can construct a simple table thoroughly, everyone (including you) will be able to assess your basic results. You could even write an entire dissertation using tables alone.

Further, tables can become increasingly complex. You can present joint distributions of three or more variables in tables, once you understand how to construct a basic table with just one variable.

Percentages are very useful too. We will see in this guide that you can go a long way with the basic percent and its variations.

COMPONENTS OF A TABLE

 
LET'S KEEP OUR TERMINOLOGY STRAIGHT

 
These are rows. This is ROW #1
A row stretches from the left hand side of the table to the right hand side of the table. 
By convention, the top row is number 1.
 
 

 
Below are columns. 
Columns start at the top of the table and plummet straight down to the bottom of the table.
By convention, the FAR LEFT column is designated number 1.
#1


















 


 
 
 
 
 
 
 

 


 
 
 
 
 
 
 

 


 
 
 
 
 

 


 
 
 
 
 
 
 

 


 
 
 
 
 
 
 

 


A univariable table addresses only one variable at a time.

A bivariate table addresses the joint distribution of two variables. For example, the following combination of values example  jointly and simultaneously cross-classifies each individual on two characteristics, their college and their gender:

Male Business Major
Female Business Major
Male Education Major
Female Education Major
Male Humanities Major
Female Humanities Major

and so on.

By convention, we give the row first and the column second to locate a particular CELL in a bivariate table. The "female, business major" cell is row 1, column 2 or just "1,2" for short.

Staying with the example of gender and college major, we could present a bivariate table  as follows:

DISTRIBUTION OF COLLEGE MAJOR BY GENDER AT FLORIDA STATE UNIVERSITY 2004
 

College Major Gender Male Female
Business  
A CELL
 1,2
Education    2,1  2,2
Humanities    3,1  3,2
Other majors,  entered row by row      
Total      

A multivariate table jointly and simultaneously cross-classifies each individual on at least three characteristics. For example, a Hispanic Female Business Major is simultaneously classified on her ethnicity, her gender, and her college.

The table itself is a rectangular array in at least two dimensional space. It typically shows the value or category of the variable and the number of cases that fall into that category. It may also give the percentage of cases that fall into each category (but usually not both frequencies and percents)..

There are a lot of ways to construct univariate and bivariate tables. This Guide follows commonly accepted statistical practices.


 
A UNIVARIATE EXAMPLE

Let's start by examining one variable at a time. Consider the following univariate frequency AND percentage table from the August 2000 Current Population Survey, which is conducted by the United states Government on a regular basis:

TITLE: Percentage of United States Households with a Telephone in the Household
 

Is there a telephone in the household?
Number of Cases
Percent of Total Cases
    Has telephone
118,026
87.4%
    No telephone
6,530
4.8 
    No Answer
10,430
7.7
Total
134,986
100.0%

n = 134,986
Source = Current Population Survey Internet and Computer Use Supplement (Aug 2000)


 THERE ARE (AT LEAST) SIX COMPONENTS OR PIECES OF INFORMATION THAT "THE PERFECT TABLE" MUST TELL YOU.
 
IMPORTANT!!  When YOU construct YOUR table, please include these six pieces of information too.

What are these six pieces of information?

1. THE TITLE

Each table must have a title that briefly but accurately describes the contents of the table. This means that if, for example, you have a bivariate distribution, you should include the names of BOTH VARIABLES in the title.

2. THE VARIABLE(S) OF INTEREST

In my univariate example, there is ONE variable of interest, whether the household contains a telephone (presumably, working.)

When you look at a table, don't YOU want to know what the variables are that are in it? I sure do.

3. THE CATEGORIES OF THE VARIABLE

In my example above, there are three categories: Yes, No, and No Answer (which also included the miniscule categories of refused and don't know).

4A. THE NUMBER OF CASES OR THE FREQUENCY IN EACH CATEGORY OF THE VARIABLE

The total collection of every category name with its associated frequency is the univariate frequency distribution.

4B. THE PERCENTAGE OF CASES IN EACH CATEGORY OF THE VARIABLE

Percentages are very handy, because they are a standardized measure, or on a "per 100" standard. We will examine percentages below. The total collection of every category name with its associated percentage is the univariate percentage distribution.

This section reads 4A or 4B because typically either the frequency distribution OR the percentage distribution is presented, but not both. Presenting too many numbers clutters the table and makes it more difficult to read.

5. THE TOTAL CASE BASE

In my example, that's 134,986.

You would be amazed how many people leave out the total case base, including authors in professional journals. In the back of my mind, I always get just a little suspicious.

What are they hiding?

Were they ashamed of the number of cases? ("Three out of every four dentists recommend Blanca toothpaste" isn't very impressive when there are only four dentists.)

Did they manage to forget how many cases they had collected?

The larger the case base, all other things equal, the more stable the results, so this one is very important.

6. THE SOURCE OF THE DATA

Who collected these data? The United States government? A freshman undergraduate for a psychology project? Your Aunt Millie?

It is important to know the source of the data because clearly some sources have more of a reputation for collecting data in a systematic and reliable data than others.

The United States government, the National Opinion Research Center (NORC), federal agencies of many countries around the world, certain private companies (e.g., the Roper Company), all have excellent reputations for the care that they take in data collection.

You will want to know the collector of the data so that you can interpret the data in context.

It is absolutely amazing how many authors omit this one from a table too. Don't you be one of them.
 

GETTING STARTED

Before you can do anything with your data, remember to sort the observations into your constructed exhaustive and mutually exclusive categories.

This is often called coding.

Sometimes coding is easy: your questionnaire may have already categorized people by using closed questions. For example, each person you studied answered either "very concerned", "somewhat concerned", "only a little concerned" or "not concerned at all"  when asked about having enough money to buy the things they want.

It is typically easy to place the responses to prestructured, closed questionnaire items into categories, although you may have a problem case or two and must create an "other" category to make the category system exhaustive. The manipulation variables and dependent behaviors in an experiment are also often easy to code.

Other times, you have open-ended questions and respondents answer in their own words. You must decide how to collapse all these different answers into a few categories.  Or you have less structured data collected through an ethnography or content analysis. Or, you may have so many categories that you must reduce the number of categories to make the responses intelligible. Coding becomes much more of an art form in such instances.

Remember these additional (if possible) properties. Some of these properties have been especially amended for tabular display. For example, if you were working with the variable "years of age" and you want to use it as an independent variable in an interval-level data statistical technique, you would leave the categories pretty much as is. You wouldn't want to "collapse" categories, that is, group categories together (as I did with No answer, Refused and Don't Know in the example I used above). To do so in the "years of age as a dependent variable" example would be to lose information.


But a table is different. You wouldn't want to present years of age in (for example) 65 different categories. It would take up two pages and no one would be able to simultaneously remember all that information. Click on the link to look at an example of "year of birth" and you will see how cluttered the page is. It is difficult to draw any conclusions about the distribution of year of birth.

SOME  TIPS:

ABOVE ALL, MAKE SENSE!

The CARDINAL RULE in constructing coding categories is to place cases that have something in common together in the same category. Twenty year olds have different interests than 50 year olds so you wouldn't (and shouldn't) want to place them in the same category of age group.

In general, try to keep the number of substantive categories (aside from "other" or "no answer") down to about seven if you can do so without distorting the data. It will be easier for your reader to read your table if there are fewer categories in it.

When you combine a large number of categories by collapsing or grouping some of them together to make a smaller number of total categories , make sure that your combinations make senseIn my example, it would be silly to  combine "has telephone" and "no telephone" into one gigantic category.


For another bad example of placing people together who have very little in common, review my collapse of formal level of education into just two categories, eighth grade or less, and ninth grade or more. Just click on the box below:

MORE TIPS:

Keep the meaning of the categories simple. You or your reader should be able to know at a glance what the category means.

Give each category a simple, descriptive, but easy to understand category label or value. "Yes" or "No" works if your question had these as precoded responses, or if the action is so simple (locked car door, for example) that yes or no would suffice.

Make sure each category represents only one dimension. For example, if you were creating a category for college major field, Theater majors typically have a very different set of interests from Business majors, so you wouldn't want to include them in a category together. But you might put Marketing majors and Management majors in the same category.

If you are working with numeric data, try for equal intervals IF THE DATA ALLOW AND IF IT MAKES SENSE! Don't try to torture your data into nonsensical equal interval categories.
 


 PERCENTS AND PERCENTAGE TABLES

The common percent is an extremely useful measure.

You can use a percent with any kind of data: nominal, ordinal, interval and ratio.

A percent is a standardized measure. Per cent means "per 100 cases."

Because the percent is standardized you can use it to compare results from different population bases that have different sizes or total casebases. For example, you could compare the percentage of home computer ownership among people with elementary school, high school, and college educational levels.

EXAMPLE: suppose you wanted to compare women and men on an item of basic science knowledge: does the father's genes determine the sex of a couple's baby (the correct answer is "yes")? Here is the following bivariate frequency distribution:
 
 

 How Gender Influences Answers to the Question: Does the father's gene determine the sex of the couple's baby?
NOTE: By convention, categories of the independent 
variable (or "cause") form the COLUMNS of the table.
Male Female Total
Answer to Question:      
No, father's gene does not determine baby's sex (WRONG) 318
(r1, c1)
232
550
Yes, father's gene does determine baby's sex  (RIGHT) 436 588
1024
Total (at the bottom of each column are SEPARATE totals for women and men, then a total for everyone combined) 754 820
1574

Source: NSF Surveys of Public Understanding of Science and Technology, 2001, Director, ORC/Macro New York. n = 1574

As you can see, more women (588) than men (436) gave a correct answer to this question. BUT this tells us relatively little, because there are ALSO more women (820) than men (754) in the total group of people studied.

If we put the answers from each sex on a per 100, or a percentage basis, then we can compare women and men directly even though there are different total cases for women and men.
 
 
HOW TO CALCULATE PERCENTS 

The first step in calculating  a percent is to isolate your case base of interest. This is particularly important if you really have two or more separate case bases, as I do for women and men in my bivariate table above. I will first look at the 754 people in the MALE CASEBASE.

The second step is to identify your category of interest. In my example of the "father gene" question, there are only two categories, the "WRONG" answer and the "RIGHT" answer. I will begin with the "wrong answer" category.
 
 
IMPORTANT NOTE: 
NOTICE THAT I AM LOOKING AT MY CATEGORY OF INTEREST IN THE FATHER GENE QUESTION BUT IN THIS CASE I WILL USE THE MALE CASEBASE ONLY

The third step is to locate the number of cases in my category of interest ONLY for the group I am looking at. In my first example here, I am only looking at men, and men who gave the wrong answer (the category of interest)  to the father gene question. That frequency is found in row 1, column 1 of the bivariate table and it is 318 men.

The fourth step is to take the frequency in my category of interest, in this instance ONLY for men, and divide that frequency by the total number of cases in my casebase of interest (all the males = 754). Or:

              318/754 = .422

The number .422 is the proportion of men who gave the wrong answer to the father gene question.

As you can see, a proportion is a fraction.

Proportions vary from 0 to 1.00. All the proportions for a particular group studied (in this case, men) will add up to 1.00 within rounding errors.

In my example, this means if .422 of men gave the wrong answer, by substraction, .578 of men must have given the correct answer because there are only two categories, and I already know the proportion in the "wrong answer"-male category.

 1.000 - .422 = .578

The fifth step is to turn the proportion into a percentage. Multiply the proportion by 100 and the result is a percent.

.422 X 100 = 42.2%

Thus, among men, 42.2 percent said that it is false that the father's gene determines the sex of the baby, and 57.8 percent correctly stated "true," that the father's gene determines the sex of the baby.

You must complete the last step to multiply by 100 to turn your proportion into a percentage. 



Here, I'll repeat the process for women. There are a total of 820 women. 232 women said it is false that the father's gene determines the sex of a baby. So:

(232/820) X 100 = 28.3 % of women gave an incorrect answer to the father gene question

(588/820) x 100 = 71.7 % of women correctly answered this question

Here is my original bivariate table, now reworked as a percentage table. We can compare men and women directly on the father gene question using percentages. We could not compare women and men directly when the data were in frequencies form because we had a different number of men (754) than women (820).
 

 How Gender Influences Answers to the Question: Does the father's gene determine the sex of the couple's baby?
  Male Female
Answer to Question:
   
No, father's gene does not determine baby's sex (WRONG)
42.2%
28.3%
Yes, father's gene does determine baby's sex  (RIGHT)
57.8 
71.7 
 Total
100.0%
100.0%
Casebase
754
820

Source: NSF Surveys of Public Understanding of Science and Technology, 2001, Director, ORC/Macro New York. n = 1574
 
 
CONVENTIONS IN PERCENTAGE TABLES

The bivariate percentage table that I presented immediately above follows several presentation conventions in use with such tables that make them easier to read.

Categories of the independent variable or "cause" usually form the columns of the table and the category names on the independent variable go at the top of each column. Categories of the dependent variable or "effect" form the rows of the table. Apparently it is easier to read up and down the columns than across the rows.

Here "cause" and "effect" are easy to determine. Gender is fixed at birth. Your basic science knowledge will not create a change in your biological sex.

Only percentages are in the cells of the table. DO NOT include both the frequencies AND the percents in the cells. That clutters up the table and makes the table more difficult to read. It is also redundant because given the percentages and the appropriate case bases, the reader can reconstruct the actual frequencies.

As the American Psychological Association Manual on Style puts it: do not include numbers (frequencies) that can be simply calculated from other numbers (percentages). Percentages, because they are standardized, are typically more informative than frequencies (except for the total number of cases).

Give a 100 percent total at the bottom of each column (or at the end of each row if you percentized across). Do NOT put the 100% in any kind of parentheses or brackets.

That way, your reader knows whether s/he should read down the columns or across the rows.

Notice there are only TWO percent marks in each column if the values of the independent variable form the columns:

(1) the percent mark at the top of each column and
(2) the percent mark with the hundred percent at the bottom of each column.

Again, this helps your reader to know whether to read across or down the table rows or columns.

DO NOT put a percent mark in every cell of the table. Again, that clutters up the table and makes the table difficult to read. In addition, the reader can easily be confused about whether to read across or down.

I only went out to ONE decimal place in my percents. In general, DO NOT go beyond two decimal places in percentages. The added decimal places do not typically give increased precision and they will again clutter the table. (NOTE: If you have such a large case base that you believe more detail in your precents is necessary, do a rate instead; see rates below.)

My grand total of 1574 cases was placed UNDERNEATH the table itself, with an n = terminology. That way, the total casebase was less likely to get confused with the total for men or the total for women. In this particular example, there were no missing data, so they are not mentioned. However, large amounts of missing data should be mentioned; if the reason for the missing data is known, it should be briefly described.

In conventional statistical terminology:

N   is used for a population total while
n   is used for a total coming from some sample from the population.

As you can see, the purpose of these conventions are to make your table simple and thus easier to read.
 


 
 
Below is a table that contains the same data, percentized across instead of down (notice how categories of the independent variable gender now form the rows, instead of the columns.)

 
 How Gender Influences Answers to the Question: Does the father's gene determine the sex of the couple's baby?
Answer to Question   Incorrect   Correct  Total  Casebase
Male
 42.2%
 57.8
 100.0%
754
Female
28.3%
71.7
100.0%
820

Source: NSF Surveys of Public Understanding of Science and Technology, 2001, Director, ORC/Macro New York. n = 1574


IS THERE A TELEPHONE IN THE HOUSEHOLD? THE UNIVARIATE FREQUENCY DISTRIBUTION TRANSFORMED.

Here is my original example univariate frequency distribution table from the Current Population Survey, only now it is a univariate PERCENTAGE distribution. Notice that the table is simpler and easier to read with just the percentages present. ONLY use the percentages. The only frequency will be your total at the bottom of the table.

TITLE: Percentage of United States Households with a Telephone in the Household
 

Is there a telephone in the household?
Percent of Total Cases
    Has telephone
87.4%
    No telephone
4.8 
    No Answer
7.7 
Total 
100.0%
(134,986)

Source: Current Population Survey, August, 2000

Another conventional option for handling case bases: Instead of placing my n s in a separate row under the 100 %, I just put the casebase in parentheses (  ) underneath the 100%. This, too, helps to simplify the table.
 
 
DECISION POINT

What IS the total casebase that you use? Is it all cases (missing values or not) or just cases that have a valid value on the variables that you use?

That is a judgment call. One way to handle it is to put the number of missing cases underneath the table off to the left side, and the number of valid cases as the n  (if a sample) casebase. Here's the telephone in the household question, where the interior of the table only contains the 124,556 respondents who gave a "yes" or "no" answer:
 

Is there a telephone in the household?
Percent of Total Cases
    Has telephone
94.8%
    No telephone
5.2 
Total
100.0%
(124,556)

Number of missing cases = 10,430
Source = Current Population Survey Internet and Computer Use Supplement (Aug 2000)

 CUMULATIVE PERCENTS, RATES, RATIOS, AND CHANGE OVER TIME

 
RATES

Some events such as births or divorces happen at relatively rare intervals among the population at large. The overwhelming number of women do NOT have a baby in any given year. In any given year, most people stay married.

In the case of comparatively rare events, we standardize using a cousin of percents called the rate. Governments often report rates because they have enormous case bases and often collect data on relatively rare events.

We know that the base for a percent is per 100. What is the base for a rate?

The base for a rate varies. The usual bases are:

per 1000
per 10,000 and
per 100,000

The more unusual the event (such as winning the state Lottery), the larger the standardizing base.

Here's how to obtain a rate:

1. Obtain the proportion. In the case of a truly rare event, you will have a very, very small fraction.

2. Next multiply the proportion by the standardizing base figure. If your rate is per 1000, you multiply the proportion by 1000 (instead of by 100 as you did for the percent).  If your rate is per 100,000,  you multiply the proportion by 100,000.

Rates are easier to interpret than either percentages with lots of decimal places or tiny, fractional proportions.

Be very careful to use the correct population base to calculate any of these figures.

EXAMPLE: Crude divorce rates are the number of divorces per 1000 people, and this includes 5 year olds and, worse yet, unmarried  people who are at no risk for divorce!

Crude rates have limited utility. It is the choice of a meaningless base (for example, not correcting for age distributions in crime rates) that can allow the ignorant (charitable) or the unscrupulous to "lie" with statistics.



 
CUMULATIVE PERCENTS

NOTE: This section can give novices a lot of trouble. Please read carefully.

Cumulative percents aggregate the percentages going up or down the values of the variable.  Cumulative percents allow us to make statements such as "at least" or "at most".

IMPORTANT: The categories of your variable must be ordinal, interval or ratio in order to take cumulative percents. You implicitly make "more than" or "less than" statements when you calculate a cumulative percent (e.g., 50 percent of the population has more formal education than a high school degree).

In a cumulative percent, you add the percents sequentially.

You begin with the very lowest value category (or the very highest category), then add the percent from the next lowest category to form a subtotal percent.

Proceeding sequentially, you add the percent from the next category to the first subtotal that you just created to make a new, larger second subtotal.

The process ends when you reach 100 percent at the very highest category (or at the very lowest one, if you started at the top). The process sounds much more complicated than it really is.

EXAMPLE: Here is a new table to serve an an example from the Current Population Survey (August 2000) on Computer and Internet Use in United States households.

TITLE: Number of computers in the household
 

How many computers or laptops are there in this household? Percent of Total Cases Cumulative - down
"Less than" statements
"At most" statements
Cumulative - up
"At least" statements
"Or more" statements
    No computer
47.3%
47.3
52.7 + 47.3 = 100.0
    1
37.6 
47.3 + 37.6 = 84.9
15.1 + 37.6 =   52.7
    2
10.4 
84.9 + 10.4 = 95.3
4.7 + 10.4 =   15.1
    3 or more*
4.7 
95.3 + 4.7 = 100.0
4.7
Total
100.0%
(134,986)
   

Source = Current Population Survey Internet and Computer Use Supplement (Aug 2000)

*oops, here is the dreaded "open-ended" category, 3 or more. Since we are using percents, and not calculating an arithmetic average or other numerical entity, we will just leave the category as it is in this example.

Now, let's use the data in the table above and the cumulative percents to make the following kinds of statements:
(and I PROMISE statements of this type will be on Exam 1)

What percent of United States households contain at least one computer?

ANSWER: 52.7 percent of United States households contain at least one computer.

That's the percent with 3 + 2 + 1 computers
37.6% of households have one computer.
10.4% of households have two computers.
4.7% of households have three computers.
Cummulate UP from the bottom (3 + 2 + 1).
4.7% + 10.4% + 37.6% = 52.7%.

What percent of United States household have at least two computers (that means two or more)?

ANSWER: 15.1 percent of United States household contain two or more computers.

That's the percent with 3 + 2 computers.
10.4% of households have two computers.
4.7% of households have three computers.
.
Start at three computers and commulate UP to and including two computers.
4.7% + 10.4% = 15.1% of households have two or more computers.

What percent of United States households own less than two computers?

ANSWER: 84.9 percent of United States households own less than two computers.
That's the percent with 0 + 1 computer

Start at zero computers and cummulate DOWN to and including one computer.
47.3% + 37.6% = 84.9%.

What percent of United States households own at more one computer?

ANSWER: 84.9 percent of United States households own at most one computer.
That's the same thing as less than two.
At most one means 0 + 1 computer.
See the prior example.
 



 
THE PERCENT CHANGE OVER TIME

Many researchers like to assess how much change occurs in a phenomenon that they study. One easy to calculate measure is the percentage change over time.

To calculate the percent change over time, start with the frequency at the later or more recent time, call this more recent time "time 2".

We call this f t2

You will also need the frequency at the original or earlier time, or "time 1".

We call this earlier frequency f t1

To calculate the percent change over time, here's the formula:

    [ ( f t2  - f t1 )/f t1] X 100

1) Later frequency minus earlier frequency

2) Divide step one by the EARLIER frequency

3) Multiply the step two result by 100

REMEMBER!! Divide by the EARLIER frequency!

Let's apply this to the population growth in the Miami-Fort Lauderdale-Miami Beach, Florida metropolitan area from 1990 to 2002. My figures are in millions of people.

1990 population in millions = 4.056
2002 population in millions = 5.232

Population change in Miami-Fort Lauderdale-Miami Beach area 1990-2002 =

[(5.232-4.056)/4.056] X 100 = 29.0 percent

That's a lot of growth for only 12 years.

Notice that the smaller the base at time 1, the more growth there appears to be. This is why small states like Nevada seem to have such high growth rates compared with larger states like California.

A calculator typically does this one in two minutes. All you have to make sure is that you feed in the correct figures in the correct order. Remember to divide by the earlier time frequency.

RATIOS

With ratios, we compare the frequency in one category of a variable with the frequency in a second category of the same variable.

For example, we can look at the ratio of males to females, but a ratio of males to first year graduate students "mixes apples and oranges" and makes no sense.

(You might be thinking of the percent of first year graduate students who are men--but that is a totally different kind of measure and it isn't a ratio.)

Here's how to calculate ratios: divide the frequency in first category by the frequency in the second category. We then multiply by the appropriate standardizing base.  Suppose we wanted the number of males per 100 females. In our sample, we have 30 women and 20 men:

Step One: divide the number of males by the number of females (because we are looking at per 100 females)

20/30 = .667

Step Two: Multiply by the appropriate standardizing base. Since we are looking at the number of males per 100 females, we will multiply by 100

.667 X 100 = 66.7 males per 100 females
 
 

 
DON'T confuse taking ratios with the level of measurement "ratio data." In ratios, you are looking at how the frequency in one category of a single variable compares to the frequency in a second category from that same variable.

My example looks at the ratio of males per 100 females by age groups in the United States in 2002.
The numbers are in MILLIONS.

                                                                    AGE GROUPS 2002

15-19 years 20-24 years 25-29 years 30-34 years 35-39 years 40-44 years 45-49 years
Number of              
Men
10.471
10.350
9.640
10.563
10.954
11.413
10.492
Women
9.905
9.863
9.332
10.394
10.961
11.589
10.810
Ratio
Men:Women
105.7
104.9
103.3
101.6
99.9
98.5
97.1

source: U.S. Bureau of the Census: Resident Population by Age and Sex, 2003


All the measures in this Guide can be used on any kind of data, with the exception of the cumulative percent, where the data must be at least at the ordinal level of measurement. All the measures in this section also allow you to compare two or more groups that have different case bases. All of these measures are used A LOT in conference papers, journals, textbooks, and mass media reports.
 

INTRO STATS READINGS AND ASSIGNMENTS

OVERVIEW

Susan Carol Losh March 22 2010
This page was built with Netscape Composer.