EduNinja
IB Maths AI HL/Question Bank/4.1 Statistics and probability - SL content

IB Maths AI HL4.1 Statistics and probability - SL contentQuestion Bank

HL91 questions10 previewsSyllabus linked
[Maximum number: 10]

The company Fred Express delivers packages. From past experience, the time taken, T, to deliver a package follows a normal distribution with mean 64 hours and standard deviation 12 hours.

(a)

State P(T<64).

[ 1 ]
(b)

Find P(44<T<64).
30 % of packages are delivered in less than k hours.

[ 2 ]
(c)
(i)

Sketch a diagram of this normal distribution, shading the region that represents P(T<k).

(ii)

Find the value of k.

For quality control, the manager randomly selects five outgoing packages. These selections are independent.

[ 4 ]
(d)

Find the probability that exactly two of these packages are delivered in less than k hours.

Fred Express charges a fixed amount of $ 4.50 for any package weighing 1 kg or less. Heavier packages are charged an additional fee of $ 2.00 per kg. This fee is applied for any weight in excess of 1 kg . For example, a 1.5 kg package is charged an additional $ 1.00.

[ 3 ]
[Maximum number: 22]

At Mirabooka Primary School, a survey found that 68 % of students have a dog and 36 % of students have a cat. 14 % of students have both a dog and a cat.
This information can be represented in the following Venn diagram, where m, n, p and q represent the percentage of students within each region.

Question image
(a)

Find the value of

[ 4 ]
(i)

m.

(ii)

n.

(iii)

p.

(iv)

q.

[ 4 ]
(b)

Find the probability that a randomly chosen student

[ 3 ]
(i)

has a dog but does not have a cat.

(ii)

has a dog given that they do not have a cat.

Each year, one student is chosen randomly to be the school captain of Mirabooka Primary School.

Tim is using a binomial distribution to make predictions about how many of the next 10 school captains will own a dog. He assumes that the percentages found in the survey will remain constant for future years and that the events "being a school captain" and "having a dog" are independent.

Use Tim's model to find the probability that in the next 10 years

[ 3 ]
(c)
(i)

5 school captains have a dog.

[ 7 ]
(ii)

more than 3 school captains have a dog.

(iii)

exactly 9 school captains in succession have a dog.

John randomly chooses 10 students from the survey.

[ 7 ]
(d)

State why John should not use the binomial distribution to find the probability that 5 of these students have a dog.

[ 1 ]
[Maximum number: 16]

In this question, you will explore possible approaches to using historical sports results for making predictions about future sports matches.
Two friends, Peter and Helen, are discussing ways of predicting the outcomes of international football matches involving Argentina.
Peter suggests analysing historical data to help make predictions. He lists the results of the most recent 240 matches in which Argentina played, in chronological order, then considers blocks of four matches at a time. He counts how many times Argentina has won in each block. The following table shows his results for the 60 blocks of four matches.

Table
(a)

Determine the mean number of wins per block of four matches for Argentina.

Peter thinks that this data can be modelled by a binomial distribution with n=4 and decides to carry out a χ2\chi^{2} goodness of fit test.

[ 2 ]
(b)

Use Peter's data to write down an estimate for the probability p for this binomial model.

[ 1 ]
(c)
(i)

Use the binomial model to find the probability that Argentina win zero matches in a block of four matches.

[ 1 ]
(ii)

Find the expected frequency for zero wins.

As some expected frequencies are less than 5, Peter combines rows in his table to produce the following observed frequencies. He then uses his binomial model to find appropriate expected frequencies, correct to one decimal place.

Table
[ 2 ]
(d)

Peter uses this table to carry out a χ2\chi^{2} goodness of fit test, to test the hypothesis that the data follows a binomial distribution with n=4, at the 5 % significance level.

For this test, state

[ 6 ]
(i)

the null hypothesis;

[ 1 ]
(ii)

the number of degrees of freedom;

[ 1 ]
(iii)

the p-value;

[ 2 ]
(iv)

the conclusion, justifying your answer.

[ 2 ]
(e)

Using Peter's binomial model, find the probability that Argentina will win at least one of their next four international football matches.

Helen thinks that a better prediction might be made by considering the transition between matches. To keep the model simple, she decides to use only two states: Argentina won (A) or Argentina did not win (B). Helen looks at Peter's list of results and counts the number of times that:
- Argentina won, twice in succession (AA),
- Argentina won, then did not win (AB),
- Argentina did not win, then won (BA),
- Argentina did not win, twice in succession (BB).

She recorded the following results.

Table

Helen uses the relative frequencies to estimate the probabilities in a transition matrix.

[ 2 ]
(f)
(i)

Given that Argentina won the previous match, show that Helen's estimate for the probability of Argentina winning the next match is 1729\frac{17}{29}.

[ 2 ]
[Maximum number: 6]

This question is about modelling the spread of a computer virus to predict the number of computers in a city which will be infected by the virus.
A systems analyst defines the following variables in a model:
- t is the number of days since the first computer was infected by the virus.
- Q(t) is the total number of computers that have been infected up to and including day t.
The following data were collected:

Table
(a)
(i)

Find the equation of the regression line of Q(t) on t.

[ 2 ]
(ii)

Write down the value of r, Pearson's product-moment correlation coefficient.

[ 1 ]
(iii)

Explain why it would not be appropriate to conduct a hypothesis test on the value of r found in (a)(ii).

A model for the early stage of the spread of the computer virus suggests that

Q′(t)=βNQ(t)Q^{\prime}(t)=\beta N Q(t)

where N is the total number of computers in a city and β\beta is a measure of how easily the virus is spreading between computers. Both N and β\beta are assumed to be constant.

[ 1 ]
(b)

An estimate for Q′(t),t≥5Q^{\prime}(t), t \geq 5, can be found by using the formula:

Q′(t)≈Q(t+5)−Q(t−5)10Q^{\prime}(t) \approx \frac{Q(t+5)-Q(t-5)}{10}

The following table shows estimates of Q′(t)Q^{\prime}(t) for city X at different values of t.

Table

Determine the value of a and of b. Give your answers correct to one decimal place.

An improved model for Q(t), which is valid for large values of t, is the logistic differential equation

Q′(t)=kQ(t)(1−Q(t)L)Q^{\prime}(t)=k Q(t)\left(1-\frac{Q(t)}{L}\right)

where k and L are constants.
Based on this differential equation, the graph of Q′(t)Q(t)\frac{Q^{\prime}(t)}{Q(t)} against Q(t) is predicted to be a straight line.

[ 2 ]
[Maximum number: 18]

Paul has a bar graph for the total number of goals scored in each game of a soccer tournament in 2024. The bar graph is shown below, however the frequency of 4 goals in a game is unreadable.
Paul uses this bar graph to create a frequency table.

Question image
Frequency table

Frequency table

(a)

Write down the value of k.

Paul knows that the mean number of goals per game scored during the tournament was 2.2 .

[ 1 ]
(b)
(i)

Write down an equation for the mean in terms of p.

[ 3 ]
(ii)

Determine the value of p.

Data for the number of goals per game in the 2025 soccer tournament are shown in the following box and whisker diagram.

Question image

After comparing the box and whisker diagram from the 2025 tournament with the frequency table from the 2024 tournament, Paul concludes that the distribution of goals is consistent between the two tournaments.

[ 3 ]
(c)

State two observations that support Paul's conclusion using values from the data to compare any two of:
range, symmetry, median, and interquartile range.

Paul plans to watch all the games from the 2024 tournament in a random order.
He will watch each game once.
For the first game he watches, he defines event F as:
"scoring either 0 goals or exactly 1 goal".

[ 3 ]
(d)

Write down the event(s) from the table that are equivalent to F′F^{\prime}. There may be more than one correct event.

Table
[ 2 ]
(e)

If exactly 1 goal was scored in the first game Paul watches, write down the probability that exactly 1 goal was scored in the second game he watches. Give your answer as a fraction.

[ 2 ]
(f)

Calculate the probability that 5 goals were scored in the first game that Paul watches and 0 goals were scored in the second game he watches.

[ 4 ]
[Maximum number: 25]

This question is about comparing the academic performance of two schools.
At age 18, all students in school A and school B take the same final exam. Augustin is studying the results in these schools.
Augustin chooses to take a representative sample of size six from each school. For each student in the sample, he will conduct an interview.

(a)
(i)

State one advantage of increasing the sample size.

[ 1 ]
(ii)

State one disadvantage of increasing the sample size.

The data in Table 1 shows the results of the final exam as a percentage for each of the six students, sampled from school A.

Table 1

Table 1

The mean result for the sample from school A is 51.7 to three significant figures.

[ 1 ]
(b)

Find the value of sn−1s_{n-1} for the sample from school A.

The value of sn−1s_{n-1} for the sample from school B is 7.66 to three significant figures.
Augustin makes the following claim:
"The spread of results in school A must be less than the spread of results in school B."

[ 2 ]
(c)

Make one criticism of Augustin's claim.

The examination board claims that the final exam results in each school are approximately normally distributed. You may assume that this claim is correct.

Augustin decides to use a pooled t-test to compare the mean results of school A and of school B.

[ 1 ]
(d)
(i)

State the condition regarding population variances required to use a pooled t-test.

[ 1 ]
(ii)

State whether Augustin should use a pooled t-test in this case. Justify your answer.

Prior to collecting data, Augustin believed that the mean result of school B was higher than that of school A. From his data he finds that the mean final exam result for the sample from school B is exactly 60.

[ 2 ]
(e)
(i)

State appropriate null and alternative hypotheses for the pooled t-test.

[ 2 ]
(ii)

Find the p-value.

[ 2 ]
(iii)

Given that the test is carried out at the 5 % significance level, state the appropriate conclusion in context. Justify your answer.

All students in the two schools took the same entry exam at age 11.
Augustin wants to determine if there is evidence of any correlation between the entry exam result and final exam result. For the 12 students in the sample, Augustin collects their entry exam results.

The Pearson's product moment correlation coefficient between the results for the entry exam and the final exam is r=0.876 to three significant figures. The critical value of r at the 5 % significance level is 0.576 .

[ 2 ]
(f)
(i)

Assuming all requirements are met, perform a test at the 5 % significance level. State the hypotheses and justify your conclusion.

[ 4 ]
(ii)

If the requirements are not met, state an alternative test.

The examination board uses a model to make a prediction of a student's final exam result ( y^\hat{y} ) based on their entry exam result ( x ).

The model used is:

y^=0.37x+37.6\hat{y}=0.37 x+37.6
[ 1 ]
(g)

For each student, Augustin uses the value y−y^y-\hat{y} rounded to one decimal place to measure the extent to which a school helped to improve a student's results. He calls this "school value added". This is shown in Table 2.

Table 2

Table 2

[ 4 ]
(i)

Assuming that the appropriate requirements are met, use a pooled t-test at a 5 % significance level to determine if the mean "school value added" is higher in school A than in school B. Write down your null and alternative hypotheses and justify your conclusion.

[ 4 ]
(h)

Using Augustin's results, explain how each school could claim they are performing better than the other school.

[ 2 ]
[Maximum number: 10]

This question considers how the assessment of the Air Quality Index (AQI) for a school depends on the method chosen by the person doing the assessing.
Air quality for a district is measured at three monitoring stations. The positions of these stations on a coordinate system with units in kilometres are A(0,5), B(8,9) and C(8,1).
A Voronoi diagram is constructed with the three stations as sites.

Question image
(a)

Explain why the principal might not accept that the air quality around the school can be classed as "good".

The principal decides to obtain an expected value for the AQI at the school that uses all the available information. To do this, she uses an alternative method: the natural neighbour algorithm. This algorithm has two stages.

The first stage is to create a new Voronoi diagram with the school as an extra site. This is shown in the following diagram with the edges of the previous diagram shown by dashed lines.

Question image

The second stage is to estimate the AQI value at the school, W, by using the formula

W=wAaA+wBaB+wCaCTW=\frac{w_{A} a_{A}+w_{B} a_{B}+w_{C} a_{C}}{T}

In the formula, aAa_{A} is the area within the new cell that has been taken from the cell surrounding site A , shown as region P on the diagram, and wAw_{A} is the mean AQI value from site A . This is given as 132 in the table of mean AQI values above. Similarly for sites B and C . T is the total area of the new cell around S .

[ 1 ]
(b)

Test at the 10 % significance level the hypothesis that the mean AQI value at the school is greater than 94.4. State clearly the null and alternative hypotheses and the conclusion of the test in context.

The principal now assumes that the distribution of AQI values at the school follows a normal distribution with mean 97.8 and standard deviation 17.2.

[ 6 ]
(c)

Use this model to find the expected number of days per year (correct to the nearest day) on which the AQI value can be classed as "good".

[ 3 ]
[Maximum number: 20]

This question uses statistical tests to investigate whether advertising leads to increased profits for a grocery store.

Aimmika is the manager of a grocery store in Nong Khai. She is carrying out a statistical analysis on the number of bags of rice that are sold in the store each day. She collects the following sample data by recording how many bags of rice the store sells each day over a period of 90 days.

Table

She believes that her data follows a Poisson distribution.

(a)
(i)

Find the mean and variance for the sample data given in the table.

[ 2 ]
(ii)

Hence state why Aimmika believes her data follows a Poisson distribution.

[ 1 ]
(b)

Aimmika decides to carry out a χ2\chi^{2} goodness of fit test at the 5 % significance level to see whether the data follows a Poisson distribution with mean 4.2.

[ 8 ]
(i)

Write down the number of degrees of freedom for her test.

[ 1 ]
(ii)

Perform the χ2\chi^{2} goodness of fit test and state, with reason, a conclusion.

[ 7 ]
(c)

Aimmika claims that advertising in a local newspaper for 300 Thai Baht (THB) per day will increase the number of bags of rice sold. However, Nichakarn, the owner of the store, claims that the advertising will not increase the store's overall profit.

Nichakarn agrees to advertise in the newspaper for the next 60 days. During that time, Aimmika records that the store sells 282 bags of rice with a profit of 495 THB on each bag sold.

Aimmika wants to carry out an appropriate hypothesis test to determine whether the number of bags of rice sold during the 60 days increased when compared with the historic sales records.

[ 6 ]
(i)

By finding a critical value, perform this test at a 5 % significance level.

[ 6 ]
(d)

By considering the claims of both Aimmika and Nichakarn, explain whether the advertising was beneficial to the store.

[ 3 ]
[Maximum number: 7]

Under controlled driving conditions, Jacob investigated the fuel efficiency of his car when using premium fuel compared to standard fuel.
Jacob recorded the distance travelled per litre ( kmL−1\mathrm{kmL}^{-1} ) using standard fuel for six days and then using premium fuel for seven days. This information is shown in the following tables.

Table
Table

At the 5 % significance level, Jacob performs a t-test to determine whether there is sufficient evidence that his car travels further using premium fuel compared to standard fuel.

(a)

State one mathematical assumption made for this test to be valid.

[ 1 ]
(b)

Write down the

[ 2 ]
(i)

null hypothesis.

(ii)

alternative hypothesis.

[ 2 ]
(c)
(i)

Find the p-value.

(ii)

State your conclusion to the test in context. Justify your answer.

[ 4 ]
[Maximum number: 5]

Sergio is interested in whether an adult's favourite breakfast berry depends on their income level. He obtains the following data for 341 adults and decides to carry out a χ2\chi^{2} test for independence, at the 10 % significance level.

Table
(a)

Write down the null hypothesis.

[ 1 ]
(b)

Find the value of the χ2\chi^{2} statistic.

The critical value of this χ2\chi^{2} test is 7.78 .

[ 2 ]
(c)

Write down Sergio's conclusion to the test in context. Justify your answer.

[ 2 ]
0