After publishToConfluence: Java-Code Formatting lost #281

duschata · 2019-03-07T10:44:45Z

Hello All,
have a look at this URL(and it's subpages):

https://asciidoc.atlassian.net/wiki/spaces/DOCSASCODE/pages/589861/Showcase

I've pushed some asiidocs which contain java code. The code-formatting for java has completely vanished (I've tried this with/without the java keyword in [code]).
Maybe this has to do sth. with

asciidoc2confluence::rewriteCodeblocks{...}

any ideas?

Greetings Tom

The text was updated successfully, but these errors were encountered:

rdmueller · 2019-03-07T12:10:48Z

hm. this is not how it should be. docToolchain takes the generated HTML and publishes it to confluence.
Have you checked how the HTML looks like?

duschata · 2019-03-07T12:29:53Z

I think its a problem of LF/CLRF. I'm a linux user, so my default editor setting is LF. I'll check this immediately and will inform you...

rdmueller · 2019-03-07T12:31:01Z

This could be the problem. My last published code block looks fine, but I used windows and it could be that git converted my line endings:

https://arc42-template.atlassian.net/wiki/spaces/dttest/pages/579665949/Included+Section

duschata · 2019-03-07T15:59:59Z

This is the code before publishing:

and this after:

I have no clue why. The XML file works fine. Where is the difference. I've tried 1000 things, nothing works for me. Please, can you try to publish the file under windows?

importCode.html.zip

rdmueller · 2019-03-07T20:13:21Z

yes, I will give it a try tomorrow.
As I can see, you use coderay as highlighter. Do you have any special settings?

duschata · 2019-03-08T07:18:44Z

I've tried all format options (coderay, highlightjs, prettify, and pygments). The stripping of all the spans from the codeblock seems to work correctly. I suppose sth. unexpected happens here:
https://github.com/docToolchain/docToolchain/blob/master/scripts/asciidoc2confluence.groovy#L232-L246

I'm not a groovy expert, but we can try to investigate what goes in and what comes out there.
Has sth. changed with

org.jsoup.nodes.Document?
Maybe confluence rest/endpoint strips the \r\n? I'll check this with postman later.

I have no further settings, my adoc file is shown here:
https://asciidoc.atlassian.net/wiki/spaces/DOCSASCODE/pages/524424/1.+This+Asciidoc

rdmueller · 2019-03-08T11:07:23Z

you example (the one with the inline source) renders fine for me:
https://arc42-template.atlassian.net/wiki/spaces/dttest/pages/580911154/Render+Source

No matter if I use WSL (linux) or windows.

Can you please check which line endings you use? My editor reports me that I use \n

rdmueller · 2019-03-08T20:58:57Z

my html source looks the same as yours. The confluence storage format looks like this:

<div class="content">
<ac:structured-macro ac:name="code" ac:schema-version="1" ac:macro-id="5f75b752-3bf1-441c-8e1e-dba4007a9a13"><ac:parameter ac:name="language">java</ac:parameter><ac:plain-text-body><![CDATA[package checkCodeFormatting;

public class Hello {

    public static void pain (String... args) {

        System.out.println("This is not nice formatted");

    }

}]]></ac:plain-text-body></ac:structured-macro>
</div>

so, something is different with your setup...

rdmueller · 2019-03-08T21:02:20Z

as I can see, the confluence version we both use is also the same...

rdmueller · 2019-03-08T21:14:01Z

could you please overwrite your rewriteCodeBlocks closure with the following code and send me the output?

def rewriteCodeblocks = { body ->
    body.select('pre > code').each { code ->
        println "before: "
        println code
        if (code.attr('data-lang')) {
            code.select('span[class]').each { span ->
                span.unwrap()
            }
            code.before("<ac:parameter ac:name=\"language\">${code.attr('data-lang')}</ac:parameter>")
        }
        println "after: "
        println code.parent()
        code.parent() // pre now
            .wrap('<ac:structured-macro ac:name="code"></ac:structured-macro>')
            .unwrap()
        code.wrap("<ac:plain-text-body>${CDATA_PLACEHOLDER_START}${CDATA_PLACEHOLDER_END}</ac:plain-text-body>")
            .unwrap()
    }
}

This should shed some light at least on where the problem lies.

duschata · 2019-03-11T13:41:17Z

Thank your for your efforts. I found a method to reproduce the error today by random:

git clone https://github.com/docToolchain/docToolchain.git
cd docToolchain/
./gradlew
cd ..
mkdir reproduceCodeFormattingFailure
cd reproduceCodeFormattingFailure/
../docToolchain/gradlew init --type java-application
mkdir src/docs/images -p
cp ../docToolchain/Config.groovy .

src/docs/test.adoc

= Test

Preamble

== Header (sect1)

[source, java]
----
include::./../../src/main/java/App.java[]
----

=== Header (sect2)

[source, java]
----
include::./../../src/main/java/App.java[]
----

Config.groovy (to modify):

inputFiles = [
        [file: 'test.adoc',       formats: ['html','pdf']],
]

confluence.with {
    input = [
            [ file: "build/docs/html5/test.html" ],
    ]

api="yourServer" 
createSubpages = false //this is the crucial point
spaceKey = [your key]

credentials = "username:password".bytes.encodeBase64().toString()
  
}

Publish it and everything works. Set

createSubpages=true

and you see the unformatted code in the sect2 sections

set it to false and everything is ok again...

Hope this helps to find the error. This is under Linux, I'll check it under windows asap. If the behavior is different, I'll debug the rewriteCodeblocks as you have proposed above...

Greetings Tom

duschata · 2019-03-11T13:41:43Z

wrong button clicked...

duschata · 2019-03-11T14:10:15Z

absolutly the same behavior as described above with windows 10...

rdmueller · 2019-03-11T15:58:25Z

hm. I have currently no clue what could be different. which shell do you use?

duschata · 2019-03-11T19:30:20Z

I'm usting bash(Linux) or gitbash (windows) but ...anyway. Chekcout the code at

https://github.com/duschata/reproduceCodeFormattingFailure.git

and modify your server and space settings. After that publish and play with the createSubpages = false/true option. If set to false, sect2 code formatting will fail...

Greetings

Tom

rdmueller · 2019-03-11T21:23:13Z

Thanx! Now I can reproduce it:
https://arc42-template.atlassian.net/wiki/spaces/dttest/pages/581271562/Header+sect1

so, it is because of the subpages...

rdmueller · 2019-03-11T21:52:34Z

this is the line:

       code.parent() // pre now
                .wrap('<ac:structured-macro ac:name="code"></ac:structured-macro>')
                .unwrap()

https://github.com/docToolchain/docToolchain/blob/master/scripts/asciidoc2confluence.groovy#L248

now let's see why it behaves different when executed the second time....

rdmueller · 2019-03-11T21:57:49Z

hm. at least, this line removes the rest of the line breaks, but some are already removed before...

duschata · 2019-03-14T20:17:06Z

 dom.select('div.sect1').each { sect1 ->
        Elements pageBody = sect1.select('div.sectionbody')
        def currentPage = [
            title: sect1.select('h2').text(),
            body: pageBody,
            children: [],
            parent: parentId
        ]
        pageAnchors.putAll(recordPageAnchor(sect1.select('h2')))

        if (confluenceCreateSubpages) {
            pageBody.select('div.sect2').each { sect2 ->
                def title = sect2.select('h3').text()
                pageAnchors.putAll(recordPageAnchor(sect2.select('h3')))
                sect2.select('h3').remove()
                def body = sect2
                def subPage = [
                    title: title,
                    body: body
                ]
                currentPage.children << subPage
                promoteHeaders sect2, 4, 3
                anchors.putAll(parseAnchors(subPage))
            }
            pageBody.select('div.sect2').remove()
        } else {
            pageBody.select('div.sect2').unwrap()
            promoteHeaders sect1, 3, 2
        }
        sections << currentPage
        anchors.putAll(parseAnchors(currentPage))

pageBody has type Elements, body (in the confluenceCreateSubpages section) is a (single) Element type. So the rewriteCodeblocks get different inputs while processing the page.body. If you have an idea to quickfix it let me know. I suggest to change the sect2 selector or modify the rewriteCodeblocks section (don't know what is better, what do you think?)

rdmueller · 2019-03-15T14:55:40Z

good question. JSoup was always some kind of magic to me. I am always happy when it just works.

So, how would you modify it?

duschata · 2019-03-19T10:48:37Z

I' spent hours and hours with this bug..
My assumption above is not the reason, I've debugged this, no change when pushing Elements (instead of Element) to the body: ...
The crucial point is the

pageBody.select('div.sect2').remove()

line. You will get a null pointer exception in a plain java jsoup test, if you try to work on referenced Elements, which you have removed from the dom. The behavior of Groovy seems to be different there, but not 100% predictable.

You write a view lines to debug this

 if (confluenceCreateSubpages) {
            Elements sect2Elements = pageBody.select('div.sect2')
            sect2Elements.each { sect2 ->
                def title = sect2.select('h3').text()
                pageAnchors.putAll(recordPageAnchor(sect2.select('h3')))
                sect2.select('h3').remove()
                def body = sect2Elements
                def subPage = [
                        title: title,
                        body : body
                ]
                currentPage.children << subPage
                promoteHeaders sect2, 4, 3
                anchors.putAll(parseAnchors(subPage))

                println ("##############before remove####################")
                println (body)
                println ("##############before remove####################")
            }
            pageBody.select('div.sect2').remove()

            println ("##############after remove####################")
            println currentPage.children.body
            println ("##############after remove####################")

        } else {
            pageBody.select('div.sect2').unwrap()
            promoteHeaders sect1, 3, 2
        }
        sections << currentPage
        anchors.putAll(parseAnchors(currentPage))

after the remove() the LF disappeared. Comment the remove() line and both outputs are identical.
Do you have an idea? I've many other tasks this week, I'll try to continue with debugging soon...

duschata · 2019-03-19T15:48:11Z

ok, you can checkout this project
https://github.com/duschata/debugJsoup.git
run the test and see the console output
next, change the jsoup version to 1.11.3 and compare with the first output
I will drink some beer for now...

duschata · 2019-03-20T12:12:33Z

I've forgotten to set in the java-example above:

 doc.outputSettings(new Document.OutputSettings().prettyPrint(false));

after this, everything works as expected (in java). But why fails this in the doctools groovy script?

rdmueller · 2019-03-23T16:52:07Z

thanx for spending some much time trying to fix this issue!
It's hard to believe that there is a difference between Groovy and Java.
I will try to use your code to reproduce the problem in the Groovy Console and hopefully fix it.

duschata · 2019-03-25T07:29:34Z

ok, I had the same idea on Friday and started a fresh groovy project. I would like to reduce the "bug" to a minimum of lines. What we have so far:

The code has different outputs with different jsoup versions.
playing with the pretty print flag, has not the output I expected
after commenting the "remove line" (see above), the origin code runs

Have a look to my java code, it seems to work. Where are the differences? Are there side effects in the gradleScript which I've overlooked?

duschata · 2019-03-25T16:02:32Z

https://github.com/duschata/minimalPublishError.git
is not yet minimal, but Simple.groovy reproduces the failure

duschata · 2019-03-28T13:33:36Z

finally it's a bug (or a strange behavior) in jsoup!
Clone() and remove() loose the prettyPrint(false). This is an issue since 2016
jhy/jsoup#763
but nobody cares about. Exactly in the moment when the <pre></pre> are removed, the framework cleans also the preformatted content in between. Creating a new prettyPrinted(false) jsoup.Document for the sect2 sections solves the problem, I'll post a PR soon.

rdmueller · 2019-03-28T15:37:29Z

very cool that you managed to find the root cause! Kudos to you!

rdmueller added the in analysis label Mar 8, 2019

duschata closed this as completed Mar 11, 2019

duschata reopened this Mar 11, 2019

rdmueller self-assigned this Mar 23, 2019

rdmueller added 🐞 bug and removed in analysis labels Mar 23, 2019

duschata mentioned this issue Mar 27, 2019

provide asciidoc2Confluence as plugin #290

Closed

duschata closed this as completed Mar 28, 2019

duschata mentioned this issue Mar 28, 2019

creating an not prettyPrinted Document for sect2 pages fixes the bug #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After publishToConfluence: Java-Code Formatting lost #281

After publishToConfluence: Java-Code Formatting lost #281

duschata commented Mar 7, 2019 •

edited

rdmueller commented Mar 7, 2019

duschata commented Mar 7, 2019

rdmueller commented Mar 7, 2019

duschata commented Mar 7, 2019

rdmueller commented Mar 7, 2019

duschata commented Mar 8, 2019

rdmueller commented Mar 8, 2019

rdmueller commented Mar 8, 2019

rdmueller commented Mar 8, 2019

rdmueller commented Mar 8, 2019

duschata commented Mar 11, 2019 •

edited

duschata commented Mar 11, 2019

duschata commented Mar 11, 2019

rdmueller commented Mar 11, 2019

duschata commented Mar 11, 2019

rdmueller commented Mar 11, 2019

rdmueller commented Mar 11, 2019

rdmueller commented Mar 11, 2019

duschata commented Mar 14, 2019 •

edited

rdmueller commented Mar 15, 2019

duschata commented Mar 19, 2019

duschata commented Mar 19, 2019

duschata commented Mar 20, 2019

rdmueller commented Mar 23, 2019

duschata commented Mar 25, 2019

duschata commented Mar 25, 2019

duschata commented Mar 28, 2019 •

edited

rdmueller commented Mar 28, 2019

After publishToConfluence: Java-Code Formatting lost #281

After publishToConfluence: Java-Code Formatting lost #281

Comments

duschata commented Mar 7, 2019 • edited

rdmueller commented Mar 7, 2019

duschata commented Mar 7, 2019

rdmueller commented Mar 7, 2019

duschata commented Mar 7, 2019

rdmueller commented Mar 7, 2019

duschata commented Mar 8, 2019

rdmueller commented Mar 8, 2019

rdmueller commented Mar 8, 2019

rdmueller commented Mar 8, 2019

rdmueller commented Mar 8, 2019

duschata commented Mar 11, 2019 • edited

duschata commented Mar 11, 2019

duschata commented Mar 11, 2019

rdmueller commented Mar 11, 2019

duschata commented Mar 11, 2019

rdmueller commented Mar 11, 2019

rdmueller commented Mar 11, 2019

rdmueller commented Mar 11, 2019

duschata commented Mar 14, 2019 • edited

rdmueller commented Mar 15, 2019

duschata commented Mar 19, 2019

duschata commented Mar 19, 2019

duschata commented Mar 20, 2019

rdmueller commented Mar 23, 2019

duschata commented Mar 25, 2019

duschata commented Mar 25, 2019

duschata commented Mar 28, 2019 • edited

rdmueller commented Mar 28, 2019

duschata commented Mar 7, 2019 •

edited

duschata commented Mar 11, 2019 •

edited

duschata commented Mar 14, 2019 •

edited

duschata commented Mar 28, 2019 •

edited