EduNinja
IB Maths AI SL/Question Bank/4 Statistics and probability

IB Maths AI SL4 Statistics and probabilityQuestion Bank

SL110 questions10 previewsSyllabus linked
[Maximum number: 16]

A group of 1280 students were asked which electronic device they preferred. The results per age group are given in the following table.

Table
(a)

A student from the group is chosen at random. Calculate the probability that the student

[ 9 ]
(i)

prefers a tablet.

(ii)

is 11-13 years old and prefers a mobile phone.

(iii)

prefers a laptop given that they are 17-18 years old.

(iv)

prefers a tablet or is 14-16 years old.

A χ2\chi^{2} test for independence was performed on the collected data at the 1 % significance level. The critical value for the test is 13.277 .

[ 9 ]
(b)

State the null and alternative hypotheses.

[ 1 ]
(c)

Write down the number of degrees of freedom.

[ 1 ]
(d)
(i)

Write down the χ2\chi^{2} test statistic.

(ii)

Write down the p-value.

(iii)

State the conclusion for the test in context. Give a reason for your answer.

[ 5 ]
[Maximum number: 21]

As part of his mathematics exploration about classic books, Jason investigated the time taken by students in his school to read the book The Old Man and the Sea. He collected his data by stopping and asking students in the school corridor, until he reached his target of 10 students from each of the literature classes in his school.

(a)

State which of the two sampling methods, systematic or quota, Jason has used.

[ 1 ]
(b)

Jason constructed the following box and whisker diagram to show the number of hours students in the sample took to read this book.

Question image

Write down the median time to read the book.

[ 1 ]
(c)

Calculate the interquartile range.

Mackenzie, a member of the sample, took 25 hours to read the novel. Jason believes Mackenzie's time is not an outlier.

[ 2 ]
(d)

Determine whether Jason is correct. Support your reasoning.

[ 4 ]
(e)

For each student interviewed, Jason recorded the time taken to read The Old Man and the Sea ( x ), measured in hours, and paired this with their percentage score on the final exam ( y ). These data are represented on the scatter diagram.

Question image

Describe the correlation.

Jason correctly calculates the equation of the regression line y on x for these students to be

y=-1.54 x+98.8

He uses the equation to estimate the percentage score on the final exam for a student who read the book in 1.5 hours.

[ 1 ]
(f)

Find the percentage score calculated by Jason.

[ 2 ]
(g)

State whether it is valid to use the regression line y on x for Jason's estimate. Give a reason for your answer.

Jason found a website that rated the 'top 50' classic books. He randomly chose eight of these classic books and recorded the number of pages. For example, Book H is rated 44th and has 281 pages. These data are shown in the table.

Table

Jason intends to analyse the data using Spearman's rank correlation coefficient, rsr_{\mathrm{s}}.

[ 2 ]
(h)

Copy and complete the information in the following table.

Table
[ 2 ]
(i)
(i)

Calculate the value of rsr_{\mathrm{s}}.

[ 3 ]
(ii)

Interpret your result.

[ 3 ]
[Maximum number: 19]

A medical centre is testing patients for a certain disease. This disease occurs in 5\% of the population.
They test every patient who comes to the centre on a particular day.

(a)

State the sampling method being used.

[ 1 ]
(b)

It is intended that if a patient has the disease, they test "positive", and if a patient does not have the disease, they test "negative".

However, the tests are not perfect, and only 99 % of people who have the disease test positive. Also, 2 % of people who do not have the disease test positive.

The tree diagram shows some of this information.

Question image

Write down the value of

[ 4 ]
(i)

a\quad a.

(ii)

b.

(iii)

c.

(iv)

d.

[ 4 ]
(c)

Use the tree diagram to find the probability that a patient selected at random

[ 8 ]
(i)

will not have the disease and will test positive.

(ii)

will test negative.

(iii)

has the disease given that they tested negative.

[ 8 ]
(d)

The medical centre finds the actual number of positive results in their sample is different than predicted by the tree diagram. Explain why this might be the case.

The staff at the medical centre looked at the care received by all visiting patients on a randomly chosen day. All the patients received at least one of these services: they had medical tests (M), were seen by a nurse (N), or were seen by a doctor (D). It was found that:
- 78 had medical tests,
- 45 were seen by a nurse;
- 30 were seen by a doctor;
- 9 had medical tests and were seen by a doctor and a nurse;
- 18 had medical tests and were seen by a doctor but were not seen by a nurse;
- 11 patients were seen by a nurse and had medical tests but were not seen by a doctor;
- 2 patients were seen by a doctor without being seen by nurse and without having medical tests.

[ 1 ]
(e)

Draw a Venn diagram to illustrate this information, placing all relevant information on the diagram.

[ 3 ]
(f)

Find the total number of patients who visited the centre during this day.

[ 2 ]
[Maximum number: 6]

Eduardo believes that there is a linear relationship between the age of a male runner and the time it takes them to run 5000 metres.
To test this, he recorded the age, x years, and the time, t minutes, for eight males in a single 5000 m race. His results are presented in the following table and scatter diagram.

Table
Question image
(a)

For this data, find the value of the Pearson's product-moment correlation coefficient, r.

[ 2 ]
(b)

Eduardo looked in a sports science text book. He found that the following information about r was appropriate for athletic performance.

Table

Comment on your answer to part (a), using the information that Eduardo found.

[ 1 ]
(c)

Write down the equation of the regression line of t on x, in the form t=a x+b.

A 57-year-old male also ran in the 5000 m race.

[ 1 ]
(d)

Use the equation of the regression line to estimate the time he took to complete the 5000 m race.

[ 2 ]
[Maximum number: 18]

Elsie, a librarian, wants to investigate the length of time, T minutes, that people spent in her library on a particular day.

(a)

State whether the variable T is discrete or continuous.

[ 1 ]
(b)

Elsie's data for 160 people who visited the library on that particular day is shown in the following table.

Table

Find the value of k.

[ 2 ]
(c)
(i)

Write down the modal class.

[ 1 ]
(ii)

Write down the mid-interval value for this class.

[ 2 ]
(d)

Use Elsie's data to calculate an estimate of the mean time that people spent in the library.

[ 2 ]
(e)

Using the table, write down the maximum possible number of people who spent 35 minutes or less in the library on that day.

Elsie assumes her data to be representative of future visitors to the library.

[ 1 ]
(f)

Find the probability a visitor spends at least 60 minutes in the library.

[ 2 ]
(g)

The following box and whisker diagram shows the times, in minutes, that the 160 visitors spent in the library.

Question image

Write down the median time spent in the library.

[ 1 ]
(h)

Find the interquartile range.

[ 2 ]
(i)

Hence show that the longest time that a person spent in the library is not an outlier.

Elsie believes the box and whisker diagram indicates that the times spent in the library are not normally distributed.

[ 3 ]
(j)

Identify one feature of the box and whisker diagram which might support Elsie's belief.

[ 1 ]
[Maximum number: 17]

Mackenzie conducted an experiment on the reaction times of teenagers. The results of the experiment are displayed in the following cumulative frequency graph.

Question image
(a)

Use the graph to estimate the

[ 4 ]
(i)

median reaction time;

(ii)

interquartile range of the reaction times.

[ 4 ]
(b)

Find the estimated number of teenagers who have a reaction time greater than 0.4 seconds.

[ 2 ]
(c)

Determine the 90th percentile of the reaction times from the cumulative frequency graph.

[ 2 ]
(d)

Mackenzie created the cumulative frequency graph using the following grouped frequency table.

Table

Write down the value of

[ 2 ]
(i)

a;

(ii)

b.

[ 2 ]
(e)

Write down the modal class from the table.

[ 1 ]
(f)

Use your graphic display calculator to find an estimate of the mean reaction time.

Upon completion of the experiment, Mackenzie realized that some values were grouped incorrectly in the frequency table. Some reaction times recorded in the interval 0<t0.20<t \leq 0.2 should have been recorded in the interval 0.2<t0.40.2<t \leq 0.4.

[ 2 ]
(g)

Suggest how, if at all, the estimated mean and estimated median reaction times will change if the errors are corrected. Justify your response.

[ 4 ]
[Maximum number: 6]

The decathlon is a competition where athletes compete in ten events. Two of those events are long jump and high jump. In both events, a greater distance means a better ranking.
The table shows results for these two events at the World Championships.

Table

The Spearman's rank correlation coefficient is used to determine if there is a linear correlation between an athlete's ranking in long jump and their ranking in high jump.

(a)

Complete the table to show the athletes' rankings in high jump.

[ 2 ]
(b)

Find the value of the Spearman's rank correlation coefficient rsr_{s}.

[ 2 ]
(c)

The following guide is used by the coach to determine the strength of the correlation between the ranks for long jump and high jump.

Table

State the strength of the correlation between the rankings as indicated by the table and interpret this in the context of the question.

[ 2 ]
[Maximum number: 8]

The mean annual temperatures for Earth, recorded at fifty-year intervals, are shown in the table.

Table

Tami creates a linear model for this data by finding the equation of the straight line passing through the points with coordinates (1708,8.73) and (1958,9.45).

(a)
(i)

Find the equation of the regression line y on x.

[ 3 ]
(ii)

Find the value of r, the Pearson's product-moment correlation coefficient.

[ 3 ]
(b)

Use Thandizo's model to estimate the mean annual temperature in the year 2000.

Thandizo uses his regression line to predict the year when the mean annual temperature will first exceed 15C15^{\circ} \mathrm{C}.

[ 2 ]
[Maximum number: 11]

Billy is a keen walker who keeps a record of his performance. The following table shows the time, in minutes, it takes him to walk one kilometre up hills with different gradients. The gradient of each hill is constant.

Table
(a)
(i)

Find the equation of the regression line of T on G.

[ 4 ]
(ii)

Describe the correlation between T and G with reference to the value of r, the Pearson's product-moment correlation coefficient.

On Sunday, Billy intends to walk up a hill with a gradient of 13 %.

[ 4 ]
(b)

Estimate the time it will take Billy to walk one kilometre up the hill.

This morning, Billy walked one kilometre up a hill, and it took 22 minutes.

[ 2 ]
(c)

Explain why it would be inappropriate to use the equation found in part (a) to estimate the gradient of this hill.

[ 1 ]
[Maximum number: 11]

Joel is a keen cyclist who keeps a record of his performance. The following table shows the time, in minutes, it takes him to ride one kilometre on hills with different gradients. The gradient of each hill is constant.

Table
(a)
(i)

Find the equation of the regression line of T on G.

[ 4 ]
(ii)

Describe the correlation between T and G with reference to the value of r, the Pearson's product-moment correlation coefficient.

On Saturday, Joel intends to ride a hill with a gradient of 17 %.

[ 4 ]
(b)

Estimate the time it will take Joel to ride one kilometre up the hill.

This morning, Joel rode one kilometre up a hill, and it took 22 minutes.

[ 2 ]
(c)

Explain why it would be inappropriate to use the equation found in part (a) to estimate the gradient of this hill.

[ 1 ]
0