Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some packed links differ from IANA source files #835

Open
gilmoreorless opened this issue May 1, 2020 · 0 comments
Open

Some packed links differ from IANA source files #835

gilmoreorless opened this issue May 1, 2020 · 0 comments

Comments

@gilmoreorless
Copy link
Member

gilmoreorless commented May 1, 2020

While investigating a bug in a separate plugin, I found some oddities in the default packed data files provided by moment-timezone.

Some zones in moment-timezone are listed as links in the IANA source files, and vice versa. This is due to the way the compilation process creates the packed files.

  1. The IANA source files are downloaded and compiled using zic into binary TZif files — one per zone or link. At this point, the information about which identifiers are Zones and which are Links is lost.
  2. The compiled binary files are exported to text using zdump, then compiled into a raw JSON file in the data/unpacked directory.
  3. The unpacked data is compressed into the packed file, combining zones with identical data into a single source zone and multiple links. Because there's no information brought over from the IANA source files, the choice of which zone is the source and which are links can differ from IANA.

For most use cases, this difference doesn't really matter, because moment-timezone transparently handles the links just the same as source zones. Where it gets odd is with the relatively new countries data. Now there are some countries that list some links as their primary zones, while others just point straight at the source zone.

I ran a quick script to identify these outliers:

// Countries containing links as primary zones:
[
  { name: 'MM', zones: [ 'Asia/Yangon' ] },
  { name: 'SG', zones: [ 'Asia/Singapore' ] },
  { name: 'TV', zones: [ 'Pacific/Funafuti' ] },
  { name: 'UM', zones: [ 'Pacific/Wake' ] },
  { name: 'US', zones: [ 'America/Indiana/Indianapolis' ] },
  { name: 'WF', zones: [ 'Pacific/Wallis' ] }
]

Edit: There are actually more than that listed in latest.json vs 2019c.json — see #836

For reference, this is the script...
const tzdata = require('./data/packed/2019c.json');

const countries = tzdata.countries.map(country => {
  const [name, zonesStr] = country.split('|');
  const zones = zonesStr.split(' ');
  return { name, zones };
});

const linkMap = new Map();
tzdata.links.forEach(link => {
  const [target, name] = link.split('|');
  linkMap.set(name, target);
});

const withLinks = countries
  .map(country => ({
    name: country.name,
    zones: country.zones.filter(zone => linkMap.has(zone))
  }))
  .filter(country => country.zones.length);

console.log('// Countries containing links as primary zones:');
console.log(withLinks);

Investigating further, I ran a script to identify all the links and zones in moment-timezone data that differ from the IANA source files:

The script I ran...
const tzdata = require('./data/packed/2019c.json');
const fs = require('fs');
const files = 'africa antarctica asia australasia etcetera europe northamerica southamerica pacificnew backward'.split(' ');
const sourceLinkZoneLine = /^(Link|Zone)\s/;

const ianaLinkMap = new Map();
const ianaZoneSet = new Set();

files.forEach(file => {
  const contents = fs.readFileSync(`./temp/download/2019c/${file}`, 'utf-8');
  contents
    .split('\n')
    .filter(line => sourceLinkZoneLine.test(line))
    .forEach(line => {
      if (line.startsWith('Link')) {
        const [, source, name] = line.split(/\s+/);
        ianaLinkMap.set(name, source);
      } else {
        const [, name] = line.split(/\s+/);
        ianaZoneSet.add(name);
      }
    });
});

const ianaLinksAsMomentZones = [];
const ianaZonesAsMomentLinks = [];
const wrongLinkTargets = [];
const zoneSet = new Set();

tzdata.zones.forEach(packedZone => {
  const [name] = packedZone.split('|');
  zoneSet.add(name);
  if (ianaLinkMap.has(name)) {
    ianaLinksAsMomentZones.push(name);
  }
});

tzdata.links.forEach(packedLink => {
  const [target, name] = packedLink.split('|');
  if (!ianaLinkMap.has(name) || ianaZoneSet.has(name)) {
    ianaZonesAsMomentLinks.push(name);
  } else if (ianaLinkMap.get(name) !== target) {
    wrongLinkTargets.push({
      linkName: name,
      momentTarget: target,
      ianaTarget: ianaLinkMap.get(name),
    });
  }
});

console.log({
  ianaLinksAsMomentZones,
  ianaZonesAsMomentLinks,
  wrongLinkTargets,
});
The results...
{
  ianaLinksAsMomentZones: [ 'America/Fort_Wayne', 'Asia/Rangoon', 'Etc/GMT-0' ],
  ianaZonesAsMomentLinks: [
    'America/Indiana/Indianapolis',
    'Asia/Singapore',
    'Asia/Yangon',
    'Etc/GMT+2',
    'Etc/GMT',
    'Etc/GMT-7',
    'Etc/GMT-9',
    'Etc/GMT-10',
    'Etc/GMT-12',
    'Pacific/Funafuti',
    'Pacific/Wake',
    'Pacific/Wallis'
  ],
  wrongLinkTargets: [
    {
      linkName: 'America/Indianapolis',
      momentTarget: 'America/Fort_Wayne',
      ianaTarget: 'America/Indiana/Indianapolis'
    },
    {
      linkName: 'US/East-Indiana',
      momentTarget: 'America/Fort_Wayne',
      ianaTarget: 'America/Indiana/Indianapolis'
    },
    {
      linkName: 'Singapore',
      momentTarget: 'Asia/Kuala_Lumpur',
      ianaTarget: 'Asia/Singapore'
    },
    {
      linkName: 'Etc/GMT+0',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    },
    {
      linkName: 'Etc/GMT0',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    },
    {
      linkName: 'Etc/Greenwich',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    },
    {
      linkName: 'GMT',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    },
    {
      linkName: 'GMT+0',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    },
    {
      linkName: 'GMT-0',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    },
    {
      linkName: 'GMT0',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    },
    {
      linkName: 'Greenwich',
      momentTarget: 'Etc/GMT-0',
      ianaTarget: 'Etc/GMT'
    }
  ]
}

It looks like this happens because the compression of multiple zones into links within filterLinkPack works on a first-in basis. The first zone processed in a list of identical zones is made the source, with the rest being links to it. The zdump data files are processed in alphabetical order, which explains why Asia/Rangoon is a link to Asia/Yangon in the IANA files, but it's the other way around in moment-timezone (Asia/Rangoon gets processed first in alphabetical order).

The tests I ran were on moment-timezone 0.5.28 and IANA data files for 2019c, but as far as I can tell this dates back to the first set of data files in moment-timezone (2014a). I see the group-leaders.json file was added to help keep links consistent, but I think it just encoded the link directions from the already-incorrect data. (See America/Fort_Wayne, for example.)

Realistically this is a fairly minor problem due to the transparent handling of links. But sometimes the data files are processed by other scripts to be passed in to filterLinkPack for custom builds, and I think having consistency with the source data is important.


Edit September 2022: After some changes in recent IANA releases, there are now countries defined in the IANA sources with a primary zone that's a link to a different zone. The assumption of at least one Zone definition per country no longer holds true, so I don't think having links in moment-timezone's country data is a problem any more. The mismatch between Zone and Link status still exists, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants