This adds about 40 cycles for parsing <?xml version='1.0'?> declaration and
about 70 cycles for parsing <?xml version='1.0' encoding='utf-8'?>, as
measured on a Core i7, which should be negligible for all documents.
Fixes#16.
It is probably redundant given that we have -Wold-style-cast, but it's better
to warn about casts like this in case we ever need to remove the latter flag.
Previously the page size was defining the data size, and due to additional
headers (+ recently removed allocation padding) the actual allocation was a bit
bigger.
The problem is that some allocators round 2^N+k allocations to 2^N+M, which can
result in noticeable waste of space. Specifically, on 64-bit OSX allocating the
previous page size (32k+40) resulted in 32k+512 allocation, thereby wasting 472
bytes, or 1.4%.
Now we have the allocation size specified exactly and just recompute the available
data size, which can in small space savings depending on the allocator.
When using format_raw the space in the empty tag (<node />) is the only
character that does not have to be there; so format_raw almost results in
a minimal XML but not quite.
It's pretty unlikely that this is crucial for any users - the formatting
change should be benign, and it's better to improve format_raw than to add
yet another flag.
Fixes#87.
Unify the implementations by automatically deducing the unsigned type from its
signed counterpart. That allows us to use a templated function instead of
duplicating code.
This determines the used C++ standard.
If you do not want to use a specific C++ standard, use cxxstd=any.
The default is set to c++11.
The "define" PUGIXML_NO_CXX11 is removed from the Makefile
since it is not used in the code anyways.
This utilizes the fact that pages are of limited size so we can store offset
from the object to the page in a few bits - we currently use 24 although that's
excessive given that pages are limited to ~512k.
This has several benefits:
- Pages do not have to be 64b aligned any more - this simplifies allocation flow
and frees up 40-50 bytes from xml_document::_memory.
- Header now has 8 bits available for metadata for both compact and default mode
which makes it possible to store type as-is (allowing easy type extension and
removing one add/sub operation from type checks).
- One extra bit is easily available for future metadata extension (in addition
to the bit for type encoding that could be reclaimed if necessary).
- Allocators that return 4b-aligned memory on 64-bit platforms work fine if
misaligned reads are supported.
The downside is that there is one or two extra instructions on the allocation
path. This does not seem to hurt parsing performance.
Also remove the description of behavior for trailing non-numeric characters.
It's likely this will become a parse error in the future so better leave it
as unspecified for now.
Fixes#80.
Add parse_embed_pcdata flag
This flag determines if plain character data is be stored in the parent element's value. This significantly changes the structure of the document; this flag is only recommended for parsing documents with a lot of PCDATA nodes in a very memory-constrained environment.
Most high-level APIs continue to work; code that inspects DOM using first_child()/value() will have to be adapted.