update txml to version 4.0.1 for parsing office documents
Today I found a new issue for txml the small fast pure javascript xml parser. The thing is, that I saw no value in keeping whitespace. However, I had to learn different. Today I was parsing asked to keep spaces within a document.
In order to reproduce the problem, I had opened up wikipedia and saw, that .odf
as well as .docx
files are basically zip files, containing some defined contents. Such as an xml defining the content, and all the media files embedded to the document. Here is the code, that I came up with, to open and manipulate an odf file from libre office:
1 | var fs = require("fs"); |
Then I found that I do not have microsoft office installed. But I had wordpad. It also can open and save odf as well as docx files. So I created a docx file with it and did the same:
1 | var fs = require("fs"); |
When I stringify the new content, I found the new content to work well, however it was a bit bigger. That is because it stringifies always with proper closing tags. </...>
With this test, I was also testing the problem described in at the issue. And I found that spaces
typed into the document did not end up in the document written by my node program. So that was fixed, by making a new version for txml and a new argument keepWhitespace
.
For this I really want to thank moongazers. In the issue he described the problem clearly and even pointed me already to the correct position in the code.