UTF-8 BOM is not removed when running on Node.js and input is a file

Although there's a test that covers removing the UTF-8 BOM from the AsciiDoc source, this only works on Node.js when the input is a string created in JavaScript. When the AsciiDoc source is read from a file, or the string is created from Buffer.from, the UTF-8 BOM is not removed.

This snippet explains why we're hitting this problem:

```js
const fs = require('fs')

fs.writeFileSync('test.adoc', Buffer.concat([Buffer.from([0xEF, 0xBB, 0xBF]), Buffer.from('= Document Title')]))

const contentsFromFile = fs.readFileSync('test.adoc')
console.log(contentsFromFile.toString().charCodeAt())
console.log(contentsFromFile.toString().charCodeAt(1))
// => 65279
// => '='

const contentsFromString = '\xef\xbb\xbf= Document Title'
console.log(contentsFromString.charCodeAt())
console.log(contentsFromString.charCodeAt(3))
// => 239
// => '='
```

It appears that when Node.js creates a string by way of a Buffer, such as when reading the contents of a file, it changes the UTF-8 BOM into a different BOM character (code: 65279, char ref: 0xFEFF). I have not found any way to disable this behavior. It's basically a quirk of Node.js.

I think Asciidoctor.js should detect this alternate BOM and remove the character. (I'm open to changing Asciidoctor Ruby, if we determine it's necessary).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

UTF-8 BOM is not removed when running on Node.js and input is a file #1344

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

UTF-8 BOM is not removed when running on Node.js and input is a file #1344

Description

Activity

mojavelinux commented on Jul 2, 2021

mojavelinux commented on Jul 2, 2021

mojavelinux commented on Jul 2, 2021

mojavelinux commented on Jul 2, 2021

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions