Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rDNS is broken in latest Windows release (only tested with binaries) #66

Closed
nazihhawi opened this issue Sep 30, 2023 · 4 comments
Closed

Comments

@nazihhawi
Copy link

nazihhawi commented Sep 30, 2023

rDNS tagging seems to be broken in latest Windows release. Starting with a blank input file or wiping any existing metadata in it before writing rDNS does not make a difference, and neither does writing in place or to temp output.

For example:

AtomicParsley.exe Test.m4a --preserveTime --metaEnema

AtomicParsley.exe Test.m4a --preserveTime --rDNSatom " 00001BAB 00001CB3 000093C6 00009D77 0001145F 0001E1B9 00007C14 00007A19 000188D6 0001BB5B" name=iTunNORM domain=com.apple.iTunes

writes metadata to iTun "atom" on domain com.apple.iTunes

Testing with AtomicParsley.exe Test.m4a -t yields:

Atom "----" [com.apple.iTunes;iTun] contains: 00001BAB 00001CB3 000093C6 00009D77 0001145F 0001E1B9 00007C14 00007A19 000188D6 0001BB5B

Some quick tests show that anything beyond the first 4 chars of the atom name are being truncated. Even worse, repeating the process ends up with multiple iTun "atoms".

EDIT 1: Running Windows 64-bit, tested with both AP 32 and 64. Same issue.

EDIT 2: I can confirm that this regression seems to have been introduced with the 2022 release. The 20210715.151551.e7ad03a release works as expected.

@github-actions
Copy link

Thanks for filing an issue! Please note that this project is only passively maintained, so your best bet for getting an issue resolved is through a pull request that is easy to verify! Please read this for more information.

@wilhuff
Copy link
Contributor

wilhuff commented Feb 6, 2024

This issue is not isolated to Windows; it shows up in macOS builds as well.

The root cause seems to be the changes in d86a31a. Reverting just this change fixes the problem for me in local builds.

The issue seems to be that when writing, name_len at parsley.cpp line 3506 is now 8 for values of rDNS_atom_name like "iTunMOVI" where it used to be a fixed 12:

atomicparsley/src/parsley.cpp

Lines 3502 to 3510 in 171e8ae

uint32_t name_len = strlen(rDNS_atom_name);
short rDNS_name_atom = APar_InterjectNewAtom("name",
CHILD_ATOM,
VERSIONED_ATOM,
name_len,
AtomFlags_Data_Binary,
0,
ilst_atom->AtomicLevel + 2,
rDNS_mean_atom);

... but when reading (in the call to APar_readX):

atomicparsley/src/parsley.cpp

Lines 1818 to 1829 in 171e8ae

if (memcmp(atom, "name", 4) == 0 &&
memcmp(parsedAtoms[atom_number - 2].AtomicName, "mean", 4) == 0 &&
memcmp(parsedAtoms[atom_number - 3].AtomicName, "----", 4) == 0) {
parsedAtoms[atom_number - 1].ReverseDNSname =
(char *)calloc(1, sizeof(char) * dataSize);
// jump + 12 because 'name' atom is the 2nd child
APar_readX(parsedAtoms[atom_number - 1].ReverseDNSname,
file,
jump + 12,
dataSize - 12);

... the code expects a fixed 12-byte amount.

I know essentially nothing about the low-level format of atoms within an mp4 file, but the actual contents of the variable-length rdns name are written as part of the data of the name atom--not as a part of the header. The call to APar_atom_Binary_Put at

atomicparsley/src/parsley.cpp

Lines 3515 to 3516 in 171e8ae

APar_atom_Binary_Put(
&parsedAtoms[rDNS_name_atom], rDNS_atom_name, name_len, 0);

... is the thing that actually encodes the data portion of the atom. It already adjusts AtomicLength to account for the variable-length binary data that's being encoded in the atom.

Put simply, I believe the 12 in the code prior to d86a31a reflected a fixed size offset relating to the size of the header of the atom prior to the variable-length component of the name which is actually in the data part of the atom. Using name_len for the atom_length parameter of APar_InterjectNewAtom effectively double-counts the length of strings like iTunMOVI, effectively truncating names shorter than twelve characters and wasting space for anything longer.

wilhuff added a commit to wilhuff/atomicparsley that referenced this issue Feb 6, 2024
This reverts commit d86a31a.

This change does not seem to be correct. See discussion:

wez#66 (comment)

The `12` was related to a fixed-size header and not related to the
length of the variable-length rdns name. That length is encoded in the
data part of the atom, in the call to `APar_atom_Binary_Put`.
@wilhuff
Copy link
Contributor

wilhuff commented Feb 6, 2024

Hm. I'm reading through #62 (and #44) and I don't think the proposed fix in #71 is quite right. Based on the way the reading code works, it seems to expect a 12-byte header in the atom and the actual rdns name should come after. I still can't account for why the original code triggered an ASAN error.

@wilhuff
Copy link
Contributor

wilhuff commented Feb 7, 2024

OK, the source of the 12 here is that the name atom is a "full atom". These include the usual 4-byte size, the 4-byte type code ("name"), and 4-bytes of version and flags fields. Therefore UTF-8 encoding of the name must start 12 bytes from the beginning. The reading code that expects to skip the first 12 bytes of the atom to get to the name is definitely correct.

The writing side of this is a mess. Reverting the changes in d86a31a isn't enough. The trouble is that APar_InterjectNewAtom is confusing the meaning of the atom_length argument. On the one hand, it seems to mean that it's the initial length of the atom, before putting any data into it:

new_atom->AtomicLength = atom_length;

When binary data is appended to the atom, APar_atom_Binary_Put bumps AtomicLength.

This is why the code prior to d86a31a worked for "iTunMOVI": AtomicLength would initially be 12, reflecting the length of the full box header, and then AtomicLength would be 20 after appending the binary data. d86a31a breaks this by making AtomicLength initially 8 and ends up at 16. Even though "iTunMOVI" is written into the buffer, only AtomicLength-worth is written out, resulting in the truncation we're seeing at HEAD.

On the other hand, atom_length is also used to compute the size of the AtomicData buffer:

atomicparsley/src/parsley.cpp

Lines 2389 to 2394 in 171e8ae

new_atom->AtomicData = (char *)calloc(
1,
sizeof(char) * (atom_length > 16
? atom_length
: 16)); // puts a hard limit on the length of
// strings (the spec doesn't)

This is why d86a31a makes longer names succeed. Writes via APar_atom_Binary_Put don't care about value of AtomicLength--they're always written relative to AtomicData plus atomic_data_offset, which is always 0 for names.

However, this isn't correct either: if you write a 20-byte name, the resulting AtomicLength will be 40, when it really should be 12 + 20 = 32 (this is the double-counting I thought I was seeing earlier). This wastes space in the written atom. It ends up looking like it works though: when writing, the AtomicData buffer was allocated by calloc, which zeroes its contents. When reading, any excess bytes past the end of user data look like a null terminator. These are disregarded because it is treated as a string for display purposes.

Interestingly, APar_atom_Binary_Put does a length check:

atomicparsley/src/parsley.cpp

Lines 2678 to 2687 in 171e8ae

if (atomic_data_offset + bytecount + target_atom->AtomicLength <=
MAXDATA_PAYLOAD) {
memcpy(
target_atom->AtomicData + atomic_data_offset, binary_data, bytecount);
target_atom->AtomicLength += bytecount;
} else {
fprintf(stdout,
"AtomicParsley warning: some data was longer than the "
"allotted space and was skipped\n");
}

... but this isn't related to the size of the buffer at all: MAXDATA_PAYLOAD isn't used for the allocation of the AtomicData buffer here. Even worse, for data that might exceed that 1256-byte limit (e.g. very, very long names) this will result in needless truncation, even though the buffer could have been long enough to support a longer value.

The cheapest fix is therefore to just allocate the AtomicData buffer as containing MAXDATA_PAYLOAD bytes, similar to the way APar_MetaData_atom_Init does it:

atomicparsley/src/parsley.cpp

Lines 3274 to 3279 in 171e8ae

parsedAtoms[desiredAtom->AtomicNumber].AtomicData = (char *)malloc(
sizeof(char) * MAXDATA_PAYLOAD +
1); // puts a hard limit on the length of strings (the spec doesn't)
memset(parsedAtoms[desiredAtom->AtomicNumber].AtomicData,
0,
sizeof(char) * MAXDATA_PAYLOAD + 1);

Then the existing length check will just do its job. We'll waste ~3 KB of RAM for each reverse DNS name (in the "----", "mean", and "name" atoms) but we won't have to update all 11 sites that allocate or reassign the AtomicData pointer to account for anything more complicated.

wilhuff added a commit to wilhuff/atomicparsley that referenced this issue Feb 7, 2024
wez pushed a commit that referenced this issue Jun 8, 2024
This reverts commit d86a31a.

This change does not seem to be correct. See discussion:

#66 (comment)

The `12` was related to a fixed-size header and not related to the
length of the variable-length rdns name. That length is encoded in the
data part of the atom, in the call to `APar_atom_Binary_Put`.
@wez wez closed this as completed in 9224de3 Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants