Calc's Z-test tool

I was looking at a bug report again: https://bugs.documentfoundation.org/show_bug.cgi?id=132983

The reporter is scratching their head over the "division by zero" result for z.

From what I can see, the "Known variance" values can't both be zero, if you want to avoid the #DIV/0! results. You have to give at least one of them a positive non-zero value.

Now, looking at Help:
https://help.libreoffice.org/7.1/en-US/text/scalc/01/statistics_test_z.html

and the Calc guide chapter CG7009-DataAnalysis-SF-12Aug2020, we see this same "division by zero" result and the reader is left holding it.

How to present this better in the docs? I have no experience on the topic, but I hope we can find some statistician to shed light on this!

Ilmari

Variance can't be zero for analysis of variance tests (t test, F test,
etc). Statistical testing assumes that you have variance otherwise you
really wouldn't need a test.

Thanks! That was my gut feeling.

So the immediate action would be to change the screenshots in Help and Calc guide so the variance cells have non-zero values, also adding a note to the instructions.

A further question is, should we consider adding fields for inputting variance values into the Z-test tool itself? Looking at Microsoft Office docs, Excel has these. This would become an enhancement request in Bugzilla.

Ilmari

Mark Morin kirjoitti 2.9.2020 klo 2.57:

Thanks Ilmari and Mark,

I'm happy to update the section in the 7.0 Calc Guide (which is currently in preparation) but I'm still confused. Why does the z-test tool think that the variance of the given data sets is zero? As you say Ilmari, that is not a number that the user currently enters.

Regards,

Steve

Hello everyone,

About the matter, I'm currently reviewing the
CG7009-DataAnalysis-SF-12Aug2020, and got to the Z-test tool part.

After creating the results table, is possible to insert the variance in
their respective cells, but I was looking to the Z-test wiki page referred
in the chapter and it says this:
"*If the population variance is unknown (and therefore has to be estimated
from the sample itself) and the sample size is not large (n < 30), the
Student's t-test may be more appropriate.*"

I just did a little research on google and found this website doing some
explanation about Z-Test:
https://www.solver.com/z-test-two-sample-means
When running the example in Calc and manually inserting the variances in
the cells of the Z-Test results, the output was identical to the example on
the website.

So I'm guessing that LO Calc should be able to calculate the variance from
the given data, and this should be fixed in the software.
Or, maybe we can put a note to the user to insert the variances after
creating the table.

Best regards,
Felipe Viggiano

Thanks Felipe.

I think we must describe what the software currently does in the Calc Guide - enhancements can be requested via Bugzilla but they may or may not happen depending on volunteer effort.

In summary, it appears that we need to update the description in Chapter 9 (and maybe the Help) as follows:
Include a statement that normally, if the sample size is not large (n < 30), the Paired t-test tool may be more appropriate.
After selecting the tool, the user should enter the variances of the two datasets into the fields provided. In the example given the user could enter the formula =VAR(A1:A13) into cell E5 and the formula =VAR(B1:B13) into cell F5. The subsequent z and P values will be updated automatically.
The user can also update the required Alpha (cell E2) and Hypothesised Mean Difference (cell E3) values. Again, the subsequent z and P values will be updated automatically.
State that the user should compare the selected Alpha level to the appropriate calculated P value (depending whether a one-tailed or a two-tailed test is required). If the calculated P value is smaller than the Alpha level, then the user should reject the hypothesis (which, in the example given, is that the means of the two data sets are the same).
Update the figure accordingly.
Does that make sense?

Regards,

Steve

Hello everyone,

Steve,

Just uploaded a first review on the CG7009 - Data analysis in the feedback
folder. I agree with your suggestions and inserted them into the guide.

A few notes:
- It's possible to set different values for Alpha in other tools like
F-test, Paired t-test, and Chi-test (the dialog do not give this option to
the user). This is also true for the Hypothesized Mean Difference in the
Paired t-test. In this case, I suggested in the chapter to expand your
explanation to all of those tools.
- I have updated the Figure, but this was my first time doing this, could
you give a double check on this? Fell free to delete it and replace if the
result was not OK.

About the chapter, the instructions were very clear, and I managed to
reproduce every step, just a few minor suggestions.

The Calc Guide Status has been updated.

Best regards,
Felipe Viggiano

Hi Felipe,

Many thanks for the comments. I have updated the chapter accordingly and placed it back in the Draft folder, ready for 2nd review.

I didn't update Figure 12 showing the Create Scenario dialog because I don't think it really matters that the sample data shown in the dialog includes a 2019 date.

Regards,

Steve

Hello Steve,

Once again, thanks for the feedback.

About the Figure 12, that's OK! That suggestion was just some minor detail.

Just checked out the CG7010 - LinkingData for a first review. Expect to
return it by Friday (11Sep2020).

The Calc Status Guide has been updated.

Best regards,
Felipe Viggiano