Open
Description
Hi
This is my first use of PHPWord. I created a docx file with LibreOffice (do not ask why I'm not using MS Office :) )
The document contains a block (a single paragraph) to be cloned. The resulting docx file appears to be empty (still using LibreOffice).
I noticed the paragraphs are different from the sample_23 provided by PHPWord and those in my document created by LibreOffice.
The following code has been code formated (with Eclipse) to check the XML structure.
Sample_23 provided by PHPWord repository (block tag only):
<w:p w:rsidR="00C0566D" w:rsidRPr="003B08B6" w:rsidRDefault="00C0566D"
w:rsidP="00C0566D">
<w:r>
<w:t>${CLONEME}</w:t>
</w:r>
</w:p>
Paragraph generated by LibreOffice (block tag only)
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:pageBreakBefore w:val="false" />
<w:rPr></w:rPr>
</w:pPr>
<w:r>
<w:rPr></w:rPr>
<w:t>${itemtypeBlock}</w:t>
</w:r>
</w:p>
output file (cloned blocks appears to be nested)
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:rPr></w:rPr>
</w:pPr>
<w:r>
<w:rPr></w:rPr>
<w:t>cloned paragraph</w:t>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:p>
<w:pPr>
<w:pStyle w:val="Corpsdetexte" />
<w:rPr></w:rPr>
</w:pPr>
<w:r>
<w:rPr></w:rPr>
<w:t>cloned paragraph</w:t>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:p>
<w:pPr>
<w:pStyle w:val="Normal" />
<w:rPr></w:rPr>
</w:pPr>
I began to debug and found the following :
- the XML tree is broken in the output file (cloned blocks nested, and some XML tags are not properly closed (not visible on the snippets, because it would be too big; please be confident to the indentation)
- the paragraphs generated by LibreOffice contains extra tags in paragraphs, which breaks the regex used in cloneBlock()
- I'm far from being a docx specification expert, but the regex appears (to me) to be easily breakable because of matching on different opening and closing tags (no backreference used). Also I notice the use of <w:p.> and </w.?p> which is nonsense to me (but I admit I may b e wrong)
- I believe PHPWord should be also compatible with documents generated by other softwares than MS Office, as far they are properly built.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Metadata
Metadata
Assignees
Labels
No labels