Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal character '\' generation in CylconeDX-XML. #918

Closed
PatrickYanZ opened this issue Mar 24, 2022 · 12 comments · Fixed by #971
Closed

Illegal character '\' generation in CylconeDX-XML. #918

PatrickYanZ opened this issue Mar 24, 2022 · 12 comments · Fixed by #971
Assignees
Labels
bug Something isn't working

Comments

@PatrickYanZ
Copy link

PatrickYanZ commented Mar 24, 2022

What happened:

  1. using syft to generate bom.xml in cyclonedx
<properties>
        <property name="syft:package:foundBy">python-package-cataloger</property>
        <property name="syft:package:language">python</property>
        <property name="syft:package:metadataType">PythonPackageMetadata</property>
        <property name="syft:package:type">python</property>
        <property name="syft:cpe23">cpe:2.3:a:bibek_kafle_\<bkafle662\@gmail_com\>\,_roland_shoemaker_\<rolandshoemaker\@gmail_com\>:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:bibek_kafle_\<bkafle662\@gmail_com\>\,_roland_shoemaker_\<rolandshoemaker\@gmail_com\>:commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python-commonmark:python-commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python-commonmark:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python_commonmark:python-commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python_commonmark:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:rolandshoemaker:python-commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:rolandshoemaker:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:commonmark:python-commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:commonmark:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python-commonmark:commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python_commonmark:commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:rolandshoemaker:commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python:python-commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:commonmark:commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:python:commonmark:0.9.1:*:*:*:*:*:*:*</property>
        <property name="syft:location:0:layerID">sha256:e0bab8caf99538f2edbd0112c15a3108574a3793df0df3e7c54d11d1b939ac0f</property>
        <property name="syft:location:0:path">/usr/lib/python3.8/site-packages/commonmark-0.9.1-py3.8.egg-info/PKG-INFO</property>
        <property name="syft:location:1:layerID">sha256:e0bab8caf99538f2edbd0112c15a3108574a3793df0df3e7c54d11d1b939ac0f</property>
        <property name="syft:location:1:path">/usr/lib/python3.8/site-packages/commonmark-0.9.1-py3.8.egg-info/top_level.txt</property>
      </properties>

2.Some property name is

<property name="syft:cpe23">cpe:2.3:a:bibek_kafle_\<bkafle662\@gmail_com\>\,_roland_shoemaker_\<rolandshoemaker\@gmail_com\>:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
<property name="syft:cpe23">cpe:2.3:a:bibek_kafle_\<bkafle662\@gmail_com\>\,_roland_shoemaker_\<rolandshoemaker\@gmail_com\>:commonmark:0.9.1:*:*:*:*:*:*:*</property>
  1. It will raise xml issue afterward.
    Processing input file bom-image.xml
    Unhandled exception: System.InvalidOperationException: There is an error in XML document (1500, 70).
    ---> System.Xml.XmlException: The '' character, hexadecimal value 0x5C, cannot be included in a name. Line 1500, position 70.
    at System.Xml.XmlTextReaderImpl.Throw(Exception )
    at System.Xml.XmlTextReaderImpl.Throw(String , String[] )
    at System.Xml.XmlTextReaderImpl.ParseElement()
    at System.Xml.XmlTextReaderImpl.ParseElementContent()
    at System.Xml.XmlReader.ReadString()
    at System.Xml.XmlTextReaderImpl.ReadString()
    at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderBom.Read25_Property(Boolean isNullable, Boolean checkType)
    at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderBom.Read30_Component(Boolean isNullable, Boolean checkType)
    at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderBom.Read50_Bom(Boolean isNullable, Boolean checkType)
    at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderBom.Read51_bom()
    --- End of inner exception stack trace ---
    at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
    at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle)
    at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream)
    at CycloneDX.Xml.Serializer.Deserialize(MemoryStream xmlStream)
    at CycloneDX.Xml.Serializer.Deserialize(Stream xmlStream)
    at CycloneDX.Cli.CliUtils.InputBomHelper(String filename, CycloneDXBomFormat format)
    at CycloneDX.Cli.Commands.MergeCommand.InputBoms(IEnumerable`1 inputFilenames, CycloneDXBomFormat inputFormat, Boolean outputToConsole)
    at CycloneDX.Cli.Commands.MergeCommand.Merge(MergeCommandOptions options)
    at System.CommandLine.Invocation.CommandHandler.GetExitCodeAsync(Object value, InvocationContext context)
    at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
    at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<b__0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass23_0.<b__0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<b__0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass27_0.<b__1>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass25_0.<b__0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<b__24_0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<b__0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass11_0.<b__0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<b__10_0>d.MoveNext()
    --- End of stack trace from previous location ---
    at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass14_0.<b__0>d.MoveNext()

What you expected to happen:
Convert email to a recognize name. Removing illegal characters in name.

How to reproduce it (as minimally and precisely as possible):
syft packages ${CI_REGISTRY_IMAGE}:${CI_DEFAULT_BRANCH} -o -o cyclonedx=bom-image.xml

Anything else we need to know?:

Environment:

  • Output of syft version: 0.42.3
  • OS (e.g: cat /etc/os-release or similar): Linux
@PatrickYanZ PatrickYanZ added the bug Something isn't working label Mar 24, 2022
@spiffcs
Copy link
Contributor

spiffcs commented Mar 30, 2022

Thanks for the bug @PatrickYanZ! Do you happen to have the input you used to generate these malformed properties or a public image you can reproduce this with?

I'll take a look to see if I can find one, but it's always easier when we're using roughly the same input that was used to produce the initial bug.

@malice00
Copy link

malice00 commented Apr 15, 2022

Sorry to jump in on this, but I am having the same issue.

Try the image node:15, which will generate this:

    <component type="library">
      <author>Rob Dennis, Eli Courtwright (Michael Foord &amp; Nicola Larosa original maintainers) &lt;rdennis+configobj@gmail.com, eli@courtwright.org, fuzzyman@voidspace.co.uk, nico@tekNico.net&gt;</author>
      <name>configobj</name>
      <version>5.0.6</version>
      <cpe>cpe:2.3:a:rob_dennis\,_eli_courtwright_\(michael_foord_\&amp;_nicola_larosa_original_maintainers\):python-configobj:5.0.6:*:*:*:*:*:*:*</cpe>
      <purl>pkg:pypi/configobj@5.0.6</purl>
      <properties>
        <property name="syft:package:foundBy">python-package-cataloger</property>
        <property name="syft:package:language">python</property>
        <property name="syft:package:metadataType">PythonPackageMetadata</property>
        <property name="syft:package:type">python</property>
        <property name="syft:cpe23">cpe:2.3:a:rob_dennis\,_eli_courtwright_\(michael_foord_\&_nicola_larosa_original_maintainers\):python_configobj:5.0.6:*:*:*:*:*:*:*</property>
        <property name="syft:cpe23">cpe:2.3:a:rob_dennis\,_eli_courtwright_\(michael_foord_\&_nicola_larosa_original_maintainers\):configobj:5.0.6:*:*:*:*:*:*:*</property>

These last 2 lines are not correct (& should be &amp;). I assume the same happens when there is a '<' there, as is the problem for @PatrickYanZ.

@derkoe
Copy link
Contributor

derkoe commented Apr 26, 2022

@derkoe
Copy link
Contributor

derkoe commented Apr 26, 2022

Just checked: cyclonedx-go encodes correctly.

Filed an issue there CycloneDX/cyclonedx-go#31

@derkoe
Copy link
Contributor

derkoe commented Apr 27, 2022

... and created a PR fixing this CycloneDX/cyclonedx-go#32

@spiffcs
Copy link
Contributor

spiffcs commented Apr 27, 2022

Thanks @derkoe! I'll track that PR and pull in the new version when merged.

@spiffcs spiffcs self-assigned this Apr 27, 2022
derkoe added a commit to derkoe/syft that referenced this issue Apr 28, 2022
Fixes anchore#918 (XML encoding problem)

Signed-off-by: Christian Köberl <christian.koeberl@porscheinformatik.com>
@spiffcs
Copy link
Contributor

spiffcs commented Apr 28, 2022

This was closed since the PR was merged, but just wanted to reopen and check the output so we can validate the fix

@spiffcs spiffcs reopened this Apr 28, 2022
@spiffcs
Copy link
Contributor

spiffcs commented Apr 28, 2022

It looks like even on main I'm still seeing the illegal character \
Screen Shot 2022-04-28 at 10 47 56 AM
Screen Shot 2022-04-28 at 10 48 44 AM

@derkoe it looks like property.name and cpe still include this char. Should I file another PR implementing the tag fix you did for these fields as well?

I'll also check and see if there are other places that need this update.

@nscuro
Copy link

nscuro commented May 1, 2022

@spiffcs Backslashes do not have to be escaped in XML AFAIK.

Here's a minimal example to reproduce this using the CycloneDX CLI (which is what @PatrickYanZ used as well):

<!-- bom.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.4" version="1">
  <metadata>
    <properties>
      <property name="syft:cpe23">cpe:2.3:a:bibek_kafle_\<bkafle662\@gmail_com\>\,_roland_shoemaker_\<rolandshoemaker\@gmail_com\>:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
    </properties>
  </metadata>
</bom>
$ cyclonedx convert --input-file bom.xml --output-format json
Unhandled exception: System.InvalidOperationException: There is an error in XML document (5, 68).
 ---> System.Xml.XmlException: The '\' character, hexadecimal value 0x5C, cannot be included in a name. Line 5, position 68.
...

The error message is a bit misleading, as \ is not really the culprit.
In fact, encoding < as &lt; alone does the trick already:

<!-- bom.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.4" version="1">
  <metadata>
    <properties>
      <property name="syft:cpe23">cpe:2.3:a:bibek_kafle_\&lt;bkafle662\@gmail_com\>\,_roland_shoemaker_\&lt;rolandshoemaker\@gmail_com\>:python_commonmark:0.9.1:*:*:*:*:*:*:*</property>
    </properties>
  </metadata>
</bom>
$ cyclonedx convert --input-file bom.xml --output-format json
{
  "bomFormat": "CycloneDX",
  "specVersion": "1.4",
  "version": 1,
  "metadata": {
    "licenses": [],
    "properties": [
      {
        "name": "syft:cpe23",
        "value": "cpe:2.3:a:bibek_kafle_\\\u003Cbkafle662\\@gmail_com\\\u003E\\,_roland_shoemaker_\\\u003Crolandshoemaker\\@gmail_com\\\u003E:python_commonmark:0.9.1:*:*:*:*:*:*:*"
      }
    ]
  },
  "vulnerabilities": []
}

So to me it looks like this issue has indeed been fixed.

@derkoe
Copy link
Contributor

derkoe commented May 2, 2022

Yes \ is no special char in XML files - I'm not sure which part of syft creates the backslashes in the names.

Anyway - the issue of invalid CycloneDx XML files is solved.

@nscuro
Copy link

nscuro commented May 2, 2022

I'm not sure which part of syft creates the backslashes in the names.

The backslashes are required because characters like ,@<> have to be escaped in CPEs. You can verify this by using the CPE search of the NVD (https://nvd.nist.gov/products/cpe/search). Meaning Syft is behaving as expected.

@spiffcs
Copy link
Contributor

spiffcs commented May 2, 2022

Thanks so much for the context @nscuro and for helping get in the PR from @derkoe.

Really appreciate all the details on this issue

@spiffcs spiffcs closed this as completed May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants