Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After publishToConfluence: Java-Code Formatting lost #281

Closed
duschata opened this issue Mar 7, 2019 · 28 comments
Closed

After publishToConfluence: Java-Code Formatting lost #281

duschata opened this issue Mar 7, 2019 · 28 comments
Assignees
Labels

Comments

@duschata
Copy link
Contributor

duschata commented Mar 7, 2019

Hello All,
have a look at this URL(and it's subpages):

https://asciidoc.atlassian.net/wiki/spaces/DOCSASCODE/pages/589861/Showcase

I've pushed some asiidocs which contain java code. The code-formatting for java has completely vanished (I've tried this with/without the java keyword in [code]).
Maybe this has to do sth. with

asciidoc2confluence::rewriteCodeblocks{...}

any ideas?

Greetings Tom

@rdmueller
Copy link
Member

hm. this is not how it should be. docToolchain takes the generated HTML and publishes it to confluence.
Have you checked how the HTML looks like?

@duschata
Copy link
Contributor Author

duschata commented Mar 7, 2019

I think its a problem of LF/CLRF. I'm a linux user, so my default editor setting is LF. I'll check this immediately and will inform you...

@rdmueller
Copy link
Member

This could be the problem. My last published code block looks fine, but I used windows and it could be that git converted my line endings:

https://arc42-template.atlassian.net/wiki/spaces/dttest/pages/579665949/Included+Section

@duschata
Copy link
Contributor Author

duschata commented Mar 7, 2019

This is the code before publishing:

selection_013

and this after:

selection_012

I have no clue why. The XML file works fine. Where is the difference. I've tried 1000 things, nothing works for me. Please, can you try to publish the file under windows?

importCode.html.zip

@rdmueller
Copy link
Member

yes, I will give it a try tomorrow.
As I can see, you use coderay as highlighter. Do you have any special settings?

@duschata
Copy link
Contributor Author

duschata commented Mar 8, 2019

I've tried all format options (coderay, highlightjs, prettify, and pygments). The stripping of all the spans from the codeblock seems to work correctly. I suppose sth. unexpected happens here:
https://github.com/docToolchain/docToolchain/blob/master/scripts/asciidoc2confluence.groovy#L232-L246

I'm not a groovy expert, but we can try to investigate what goes in and what comes out there.
Has sth. changed with

org.jsoup.nodes.Document?
Maybe confluence rest/endpoint strips the \r\n? I'll check this with postman later.

I have no further settings, my adoc file is shown here:
https://asciidoc.atlassian.net/wiki/spaces/DOCSASCODE/pages/524424/1.+This+Asciidoc

@rdmueller
Copy link
Member

you example (the one with the inline source) renders fine for me:
https://arc42-template.atlassian.net/wiki/spaces/dttest/pages/580911154/Render+Source

No matter if I use WSL (linux) or windows.

Can you please check which line endings you use? My editor reports me that I use \n

@rdmueller
Copy link
Member

my html source looks the same as yours. The confluence storage format looks like this:

<div class="content">
<ac:structured-macro ac:name="code" ac:schema-version="1" ac:macro-id="5f75b752-3bf1-441c-8e1e-dba4007a9a13"><ac:parameter ac:name="language">java</ac:parameter><ac:plain-text-body><![CDATA[package checkCodeFormatting;

public class Hello {

    public static void pain (String... args) {

        System.out.println("This is not nice formatted");

    }

}]]></ac:plain-text-body></ac:structured-macro>
</div>

so, something is different with your setup...

@rdmueller
Copy link
Member

as I can see, the confluence version we both use is also the same...

@rdmueller
Copy link
Member

could you please overwrite your rewriteCodeBlocks closure with the following code and send me the output?

def rewriteCodeblocks = { body ->
    body.select('pre > code').each { code ->
        println "before: "
        println code
        if (code.attr('data-lang')) {
            code.select('span[class]').each { span ->
                span.unwrap()
            }
            code.before("<ac:parameter ac:name=\"language\">${code.attr('data-lang')}</ac:parameter>")
        }
        println "after: "
        println code.parent()
        code.parent() // pre now
            .wrap('<ac:structured-macro ac:name="code"></ac:structured-macro>')
            .unwrap()
        code.wrap("<ac:plain-text-body>${CDATA_PLACEHOLDER_START}${CDATA_PLACEHOLDER_END}</ac:plain-text-body>")
            .unwrap()
    }
}

This should shed some light at least on where the problem lies.

@duschata
Copy link
Contributor Author

duschata commented Mar 11, 2019

Thank your for your efforts. I found a method to reproduce the error today by random:

git clone https://github.com/docToolchain/docToolchain.git
cd docToolchain/
./gradlew
cd ..
mkdir reproduceCodeFormattingFailure
cd reproduceCodeFormattingFailure/
../docToolchain/gradlew init --type java-application
mkdir src/docs/images -p
cp ../docToolchain/Config.groovy .

src/docs/test.adoc

= Test

Preamble

== Header (sect1)

[source, java]
----
include::./../../src/main/java/App.java[]
----

=== Header (sect2)

[source, java]
----
include::./../../src/main/java/App.java[]
----

Config.groovy (to modify):

inputFiles = [
        [file: 'test.adoc',       formats: ['html','pdf']],
]

confluence.with {
    input = [
            [ file: "build/docs/html5/test.html" ],
    ]

api="yourServer" 
createSubpages = false //this is the crucial point
spaceKey = [your key]

credentials = "username:password".bytes.encodeBase64().toString()
  
}

Publish it and everything works. Set

createSubpages=true

and you see the unformatted code in the sect2 sections

set it to false and everything is ok again...

Hope this helps to find the error. This is under Linux, I'll check it under windows asap. If the behavior is different, I'll debug the rewriteCodeblocks as you have proposed above...

Greetings Tom

@duschata duschata reopened this Mar 11, 2019
@duschata
Copy link
Contributor Author

wrong button clicked...

@duschata
Copy link
Contributor Author

absolutly the same behavior as described above with windows 10...

@rdmueller
Copy link
Member

hm. I have currently no clue what could be different. which shell do you use?

@duschata
Copy link
Contributor Author

I'm usting bash(Linux) or gitbash (windows) but ...anyway. Chekcout the code at

https://github.com/duschata/reproduceCodeFormattingFailure.git

and modify your server and space settings. After that publish and play with the createSubpages = false/true option. If set to false, sect2 code formatting will fail...

Greetings

Tom

@rdmueller
Copy link
Member

Thanx! Now I can reproduce it:
https://arc42-template.atlassian.net/wiki/spaces/dttest/pages/581271562/Header+sect1

so, it is because of the subpages...

@rdmueller
Copy link
Member

this is the line:

       code.parent() // pre now
                .wrap('<ac:structured-macro ac:name="code"></ac:structured-macro>')
                .unwrap()

https://github.com/docToolchain/docToolchain/blob/master/scripts/asciidoc2confluence.groovy#L248

now let's see why it behaves different when executed the second time....

@rdmueller
Copy link
Member

hm. at least, this line removes the rest of the line breaks, but some are already removed before...

@duschata
Copy link
Contributor Author

duschata commented Mar 14, 2019

 dom.select('div.sect1').each { sect1 ->
        Elements pageBody = sect1.select('div.sectionbody')
        def currentPage = [
            title: sect1.select('h2').text(),
            body: pageBody,
            children: [],
            parent: parentId
        ]
        pageAnchors.putAll(recordPageAnchor(sect1.select('h2')))

        if (confluenceCreateSubpages) {
            pageBody.select('div.sect2').each { sect2 ->
                def title = sect2.select('h3').text()
                pageAnchors.putAll(recordPageAnchor(sect2.select('h3')))
                sect2.select('h3').remove()
                def body = sect2
                def subPage = [
                    title: title,
                    body: body
                ]
                currentPage.children << subPage
                promoteHeaders sect2, 4, 3
                anchors.putAll(parseAnchors(subPage))
            }
            pageBody.select('div.sect2').remove()
        } else {
            pageBody.select('div.sect2').unwrap()
            promoteHeaders sect1, 3, 2
        }
        sections << currentPage
        anchors.putAll(parseAnchors(currentPage))

pageBody has type Elements, body (in the confluenceCreateSubpages section) is a (single) Element type. So the rewriteCodeblocks get different inputs while processing the page.body. If you have an idea to quickfix it let me know. I suggest to change the sect2 selector or modify the rewriteCodeblocks section (don't know what is better, what do you think?)

@rdmueller
Copy link
Member

good question. JSoup was always some kind of magic to me. I am always happy when it just works.

So, how would you modify it?

@duschata
Copy link
Contributor Author

I' spent hours and hours with this bug..
My assumption above is not the reason, I've debugged this, no change when pushing Elements (instead of Element) to the body: ...
The crucial point is the

pageBody.select('div.sect2').remove()

line. You will get a null pointer exception in a plain java jsoup test, if you try to work on referenced Elements, which you have removed from the dom. The behavior of Groovy seems to be different there, but not 100% predictable.

You write a view lines to debug this

 if (confluenceCreateSubpages) {
            Elements sect2Elements = pageBody.select('div.sect2')
            sect2Elements.each { sect2 ->
                def title = sect2.select('h3').text()
                pageAnchors.putAll(recordPageAnchor(sect2.select('h3')))
                sect2.select('h3').remove()
                def body = sect2Elements
                def subPage = [
                        title: title,
                        body : body
                ]
                currentPage.children << subPage
                promoteHeaders sect2, 4, 3
                anchors.putAll(parseAnchors(subPage))

                println ("##############before remove####################")
                println (body)
                println ("##############before remove####################")
            }
            pageBody.select('div.sect2').remove()

            println ("##############after remove####################")
            println currentPage.children.body
            println ("##############after remove####################")

        } else {
            pageBody.select('div.sect2').unwrap()
            promoteHeaders sect1, 3, 2
        }
        sections << currentPage
        anchors.putAll(parseAnchors(currentPage))

after the remove() the LF disappeared. Comment the remove() line and both outputs are identical.
Do you have an idea? I've many other tasks this week, I'll try to continue with debugging soon...

@duschata
Copy link
Contributor Author

ok, you can checkout this project
https://github.com/duschata/debugJsoup.git
run the test and see the console output
next, change the jsoup version to 1.11.3 and compare with the first output
I will drink some beer for now...

@duschata
Copy link
Contributor Author

I've forgotten to set in the java-example above:

 doc.outputSettings(new Document.OutputSettings().prettyPrint(false));

after this, everything works as expected (in java). But why fails this in the doctools groovy script?

@rdmueller
Copy link
Member

thanx for spending some much time trying to fix this issue!
It's hard to believe that there is a difference between Groovy and Java.
I will try to use your code to reproduce the problem in the Groovy Console and hopefully fix it.

@rdmueller rdmueller self-assigned this Mar 23, 2019
@duschata
Copy link
Contributor Author

ok, I had the same idea on Friday and started a fresh groovy project. I would like to reduce the "bug" to a minimum of lines. What we have so far:

  • The code has different outputs with different jsoup versions.
  • playing with the pretty print flag, has not the output I expected
  • after commenting the "remove line" (see above), the origin code runs

Have a look to my java code, it seems to work. Where are the differences? Are there side effects in the gradleScript which I've overlooked?

@duschata
Copy link
Contributor Author

https://github.com/duschata/minimalPublishError.git
is not yet minimal, but Simple.groovy reproduces the failure

@duschata
Copy link
Contributor Author

duschata commented Mar 28, 2019

finally it's a bug (or a strange behavior) in jsoup!
Clone() and remove() loose the prettyPrint(false). This is an issue since 2016
jhy/jsoup#763
but nobody cares about. Exactly in the moment when the <pre></pre> are removed, the framework cleans also the preformatted content in between. Creating a new prettyPrinted(false) jsoup.Document for the sect2 sections solves the problem, I'll post a PR soon.

@rdmueller
Copy link
Member

very cool that you managed to find the root cause! Kudos to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants