Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% Coverage for Calculation/DateTime #1870

Merged
merged 3 commits into from Feb 27, 2021
Merged

100% Coverage for Calculation/DateTime #1870

merged 3 commits into from Feb 27, 2021

Conversation

oleibman
Copy link
Collaborator

The code in DateTime is now completely covered.
Along the way, some errors were discovered and corrected.

  • The tests which have had to be changed at the start of every year are
    replaced by more robust equivalents which do not require annual changes.
  • Several places in the code where Gnumeric and OpenOffice were thought to differ
    from Excel do not appear to have had any justification.
    I have left a comment where such code has been removed.
  • Use DateTime when possible rather than date, time, or strftime functions to avoid
    potential Y2038 problems.
  • Some impossible code has been removed, replaced by an explanatory comment.
  • NETWORKDAYS had a bug when the start date was Sunday. There had been no tests
    of this condition.
  • Some functions allow boolean and null arguments where a number is expected.
    This is more complicated than the equivalent situations in MathTrig because
    the initial date for these calculations can be Day 1 rather than Day 0.
  • More testing for dates from 1900-01-01 through the fictitious
    everywhere-but-Excel 1900-01-29.
    • This showed that there is an additional Excel bug - Excel evaluates
      WEEKNUM(emptycell) as 0, which is not a valid result for
      WEEKNUM without a second argument.
      PhpSpreadsheet now duplicates this bug.
    • There is a similar and even worse bug for 1904-01-01 in 1904 calculations.
      Weeknum returns 0 for this,
      but returns the correct value for arguments of 0 or null.
    • DATEVALUE should accept 1900-02-29 (sigh) and relatives.
      PhpSpreadsheet now duplicates this bug.
  • Testing bootstrap sets default timezone. This appears to be a relic from
    the releases of PHP where the unwise decision, subsequenly reversed,
    was made to issue messages for
    "no default timezone is set" rather than just use a sensible default.
    This was a disruptive setting for some of the tests I added.
    There is only one test in the entire suite which is default-timezone-dependent.
    Setting and resetting of default timezone is moved to that test
    (Reader/ODS/ODSTest), and out of bootstrap.
  • There had been no testing of NOW() function.
  • DATEVALUE test had no tests for 1904 calendar and needs some.
  • DATE test changed 1900/1904 calendar in use without restoring it.
  • WEEKDAY test had no tests for 1904 calendar and needs some.
    • Which revealed a bug in Shared/Date (excelToDateTimeObject was not
      recognizing 1904-01-01 as valid when 1904 calendar is in use).
    • And an additional bug in that legal 1904-calendar values in the 0.0-1.0
      range yielded the same "wrong" answers as 1900-calendar (see "One note" below).
      Also the comment for one of the calendar-1904 tests was wrong in attempting
      to identify what time of day the fraction represented.

I had wanted to break this up into a set of smaller modules, a process already
started for Engineering and MathTrig.
However the number of source code changes was sufficient that I wanted
a clean delta for this request.
If it is merged, I will work on breaking it up afterwards.

One note - Shared/Date/excelToDateTimeObject, when calendar-1900 is in use,
returns an unexpected result if its argument is between 0 and 1,
which is nominally invalid for that calendar.
It uses a base-1970 calendar in that instance. That check is not justifiable
for calendar-1904, where values in that range are legal,
so I made the check specific to calendar-1900,
and adjusted 3 1904 unit test results accordingly. However, I have to admit that
I don't understand why that check should be made even for calendar-1900.
It certainly doesn't match anything that Excel does.
I would recommend scrapping that code altogether.
If agreed, I would do this as part of the break-up into smaller modules.

Another note -
more controversially, it is clear that PhpSpreadsheet needs to support
the Excel and PHP date formats. Although it requires further study,
I am not convinced that it needs to support Unix timestamp format.
Since that is a potential source of Y2038 problems on 32-bit systems,
I would like to open a PR to deprecate the use of that format.
Please let me know if you are aware of a valid reason to continue to support it.

This is:

- [x] a bugfix
- [ ] a new feature

Checklist:

Why this change is needed?

Test coverage, minor bug fixes.

The code in DateTime is now completely covered.
Along the way, some errors were discovered and corrected.
- The tests which have had to be changed at the start of every year are
replaced by more robust equivalents which do not require annual changes.
- Several places in the code where Gnumeric and OpenOffice were thought to differ
from Excel do not appear to have had any justification.
I have left a comment where such code has been removed.
- Use DateTime when possible rather than date, time, or strftime functions to avoid
potential Y2038 problems.
- Some impossible code has been removed, replaced by an explanatory comment.
- NETWORKDAYS had a bug when the start date was Sunday. There had been no tests
of this condition.
- Some functions allow boolean and null arguments where a number is expected.
This is more complicated than the equivalent situations in MathTrig because
the initial date for these calculations can be Day 1 rather than Day 0.
- More testing for dates from 1900-01-01 through the fictitious
everywhere-but-Excel 1900-01-29.
    - This showed that there is an additional Excel bug - Excel evaluates
WEEKNUM(emptycell) as 0, which is not a valid result for
WEEKNUM without a second argument.
PhpSpreadsheet now duplicates this bug.
    - There is a similar and even worse bug for 1904-01-01 in 1904 calculations.
Weeknum returns 0 for this,
but returns the correct value for arguments of 0 or null.
    - DATEVALUE should accept 1900-02-29 (sigh) and relatives.
PhpSpreadsheet now duplicates this bug.
- Testing bootstrap sets default timezone. This appears to be a relic from
the releases of PHP where the unwise decision, subsequenly reversed,
was made to issue messages for
"no default timezone is set" rather than just use a sensible default.
This was a disruptive setting for some of the tests I added.
There is only one test in the entire suite which is default-timezone-dependent.
Setting and resetting of default timezone is moved to that test
(Reader/ODS/ODSTest), and out of bootstrap.
- There had been no testing of NOW() function.
- DATEVALUE test had no tests for 1904 calendar and needs some.
- DATE test changed 1900/1904 calendar in use without restoring it.
- WEEKDAY test had no tests for 1904 calendar and needs some.
    - Which revealed a bug in Shared/Date (excelToDateTimeObject was not
recognizing 1904-01-01 as valid when 1904 calendar is in use).
    - And an additional bug in that legal 1904-calendar values in the 0.0-1.0
range yielded the same "wrong" answers as 1900-calendar (see "One note" below).
Also the comment for one of the calendar-1904 tests was wrong in attempting
to identify what time of day the fraction represented.

I had wanted to break this up into a set of smaller modules, a process already
started for Engineering and MathTrig.
However the number of source code changes was sufficient that I wanted
a clean delta for this request.
If it is merged, I will work on breaking it up afterwards.

One note - Shared/Date/excelToDateTimeObject, when calendar-1900 is in use,
returns an unexpected result if its argument is between 0 and 1,
which is nominally invalid for that calendar.
It uses a base-1970 calendar in that instance. That check is not justifiable
for calendar-1904, where values in that range are legal,
so I made the check specific to calendar-1900,
and adjusted 3 1904 unit test results accordingly. However, I have to admit that
I don't understand why that check should be made even for calendar-1900.
It certainly doesn't match anything that Excel does.
I would recommend scrapping that code altogether.
If agreed, I would do this as part of the break-up into smaller modules.

Another note -
more controversially, it is clear that PhpSpreadsheet needs to support
the Excel and PHP date formats. Although it requires further study,
I am not convinced that it needs to support Unix timestamp format.
Since that is a potential source of Y2038 problems on 32-bit systems,
I would like to open a PR to deprecate the use of that format.
Please let me know if you are aware of a valid reason to continue to support it.
The usual set of stuff.
If at first you don't succeed ...
@MarkBaker
Copy link
Member

I'll take a more detailed look later.

Just as a piece of MS Excel trivia, the fictitious 1900 leap day was in Lotus 1-2-3; and back in 19 MS were trying to steal the 1-2-3 customer base for their brand new Excel version, so they implemented all the functions that existed in 1-2-3 so that users could run their existing 1-2-3 spreadsheets and they would return identical results... including retaining the erroneous 1900 leap day

@oleibman
Copy link
Collaborator Author

Thanks - it is always helpful to know the history.

Copy link
Member

@MarkBaker MarkBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note - Shared/Date/excelToDateTimeObject, when calendar-1900 is in use, returns an unexpected result if its argument is between 0 and 1, which is nominally invalid for that calendar.
It uses a base-1970 calendar in that instance. That check is not justifiable for calendar-1904, where values in that range are legal, so I made the check specific to calendar-1900, and adjusted 3 1904 unit test results accordingly. However, I have to admit that
I don't understand why that check should be made even for calendar-1900.
It certainly doesn't match anything that Excel does.
I would recommend scrapping that code altogether.
If agreed, I would do this as part of the break-up into smaller modules.

Dredging from memory, I believe that this is how PHSpreadsheet recognises time-only values internally, where MS Excel uses a value in the range 0-1 for handling any calculations (e.g. durations) rather than actual dates/times... e.g. adding a 4:30 hour duration a task that's started at 22:30 on 31st December 2021.

Something (using the advanced value binder to convert those string values to dates/times) like:

$worksheet->

$worksheet->setCellValue('A1', 'Start Task'); 
$worksheet->setCellValue('B1', 'Duration'); 
$worksheet->setCellValue('A1', 'End Task'); 

$worksheet->getCell('A2')->setValue('31-Dec-2021 22:30')
    ->getStyle->getNumberFormat()->setFormatCode('dd-mmm-yyyy hh:mm:ss'); 
$worksheet->getCell('B2')->setValue('4:30')
    ->getStyle->getNumberFormat()->setFormatCode('[h]:mm'); 
$worksheet->getCell('C2')->setValue('=A2+B2')
    ->getStyle->getNumberFormat()->setFormatCode('dd-mmm-yyyy hh:mm:ss'); 

Another note -
more controversially, it is clear that PhpSpreadsheet needs to support the Excel and PHP date formats. Although it requires further study,
I am not convinced that it needs to support Unix timestamp format.
Since that is a potential source of Y2038 problems on 32-bit systems,
I would like to open a PR to deprecate the use of that format.
Please let me know if you are aware of a valid reason to continue to support it.

This largely goes back to legacy PHPExcel. PHPExcel was first released a month or two after PHP 5.2.0, which was when DateTime objects were introduced for the very first time, so almost all the userbase was still using unix timestamps for anything date-related. So unix timestamps were still "first class citizens" while DateTime objects took a while to establish themselves.

I believe that DateTime objects should be our main consideration, and used consistently internally, but that we still need to allow users to enter/retrieve unix timestamps for legacy (especially if they're on 64-bit systems where Y2038 isn't a major problem).

Just as a reference, I started a new job 2-weeks ago where the current codebase is PHP 5.5, and still uses unix timestamps internally.... and the business is more insistent that I provide support for the Arabic language before the end of March than it is for allowing a language upgrade to PHP 7.... proof that unix timestamps aren't yet dead. :-(

Several places in the code where Gnumeric and OpenOffice were thought to differ
from Excel do not appear to have had any justification.
I have left a comment where such code has been removed.

Again, I believe that this is historic, and these spreadsheet programs have adjusted their codebase over time to bring it more compatible with MS Excel.

@MarkBaker MarkBaker merged commit 80a20fc into PHPOffice:master Feb 27, 2021
MarkBaker pushed a commit that referenced this pull request Mar 3, 2021
…1891)

I ran the test suite using 32-bit PHP. There were 2 places where changes
were needed due to 32-bit timestamps.

Reader\\Xml.php was using strtotime as an intermediate step in converting
a string timestamp to an Excel timestamp. The XML file type stores pure timestamps
(i.e. no date portion) as, e.g., 1899-12-31T02:30:00.000, and that value
causes an error using strtotime on a 32-bit system. However, it is sufficient
to use that value in a DateTime constructor, and that will work for 32- and 64-bit.

There was no test for that particular cell, so I added one to the XML read test.
And that's when I discovered the getFormattedValue bug. The cell's format
is `hh":"mm":"ss`. The quotes around the colons are disrupting the formatting.
PhpSpreadsheet formats the cell by converting the Excel format
to a Php Date format, in this case `H\:m\:s`.
That's a problem,
since Excel thinks 'm' means *minutes*, but PHP thinks it means *months*.
This is not a problem when the colon is not quoted; there are ample tests for that.
I added my best guess as to how to recognize this situation,
changing `\:m` to `:i`. The XML read test
now succeeds, and no other tests were broken by this change.

Test Shared\\DateTest had one test where the expected result of converting to a
Unix timestamp exceeds 2**32. Since a Unix timestamp is strictly an int,
that test fails on a 32-bit system. In the discussion regarding recently merged
PR #1870, it was felt that the user base might still be using the functions
that convert to and from a timestamp. So, we should not drop this test, but,
since it cannot succeed on a 32-bit system, I changed it to be skipped
whenever the expected result exceeded PHP_INT_MAX. There are 3 "toTimestamp"
functions within that test. Only one of these had been affected, but I thought
it was a good idea to add additional tests to the others to demonstrate this
condition.

In the course of testing, I also discovered some 32-bit problems with
bitwise and base-conversion functions. I am preparing separate PRs to
deal with those.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants