It is done, the fastest pure Javascript XML parser is ready.

For a new UI framework that I develop I needed an XML-DOM parser. The new framework tReeact, was highly inspired by react.js but want to to render the UI using a templating language of your choice. The Framework will then update the DOM elements in the Browser, in the fastest possible way and with a minimal amount of direct DOM operations.

But the first step is the XML parser. Usually people make a different of two kinds of XML parser. StreamReader and DOM-Parser. Stream Reader are great to read very large files that are even bigger then a few GB. While parsing they trigger Events describing what was found on the xml-stream. A DOM-Parser on the other hand takes an xmlString and returns an object that represents the structure and data of that XML. Because that object and the entire XML-String have to be in memory while parsing, the size of the parsed xml is limited. There for your program that is using the DOM-Object can be written in a procedural way, not event-driven, what makes it much easier to reason, debug and develop.

The new framework tReeact is meant to handle a web apps UI. And the HTML of an app seldom bigger then 5MB. Thats why tXML became a domParser. The development took serval steps. First I made a basic version that can give nodes, attributes and text content. Then I used this tool to parse different sources. OpenStreetMap, serval websites, there RSS-feeds. I also compared the speed with other projects, like XML2JSON and sax and browsers native XML parser. In the end I was about 5-10% slower in Chrome then native. But the object that I had in the end was a “plane old javascript object”. So accessing the data is much faster then using the DOM API in a browser. That will specially make the tReeact faster, what has to traverse the entire Object to compare.

In other situations the difference will be much more significant. In a direct comparison with sax vs tXML parsing the Github website and a big chunk of OSM-Data the advantage was 5 to 10 times in terms of speed. When reading that, please keep in mind that this is a comparison of a streamReader vs DOM-parser.

Motivated by seeing this advantage, I analysed the tXML parser again. And thought how I can improve the speed and usability for most common cases. A great win for usability was to “simplify” the dom-Object. for that I oriented on PHP’s simpleXML. With a simplify method I return the same object as if it was parsed by simpleXML. This let you access the data very comfortable.

Providing the functions “getElementsByClassName” and “getElementById” the usability as well as the speed is increased. and the speed advantage can be enormous. because you will use this functions direct on your XML-string. In that way tXML parses only the necessary Elements, not the entire XML. These methods make tXML the perfect tool, for parsing Data from any website, that not officially provides an API. So, have fun Hacking the Web.

If you are now interested to use the fastest XML parser, for the best user experience in your application, get started and install the tXML parser with “npm install txml”. or download the standalone version for the browser on github. At NPM as well as on Github, you find the documentation.

A short opinion to the end: if you can chose, use JSON in stat of XML to persist and transfer data, this is much easier to access in all programming languages and also very fast in JS.

Contents