0
0
mirror of https://github.com/zeux/pugixml.git synced 2025-01-15 02:17:56 +08:00

1192 Commits

Author SHA1 Message Date
Arseny Kapoulkine
873c8e5011 Merge pull request #42 from zeux/compact
Implement compact mode.

This introduces a new storage mode that dramatically reduces node size at some performance cost.
The mode is enabled by defining PUGIXML_COMPACT. This does not change API/ABI - all existing functionality still works.

The pointers are stored using delta encoding and bytes, with some additional tricks to make encoding more optimal for e.g. parent pointer and string pointers. Since the node is fixed size, we have to fall back to a hash table if the pointer does not fit. Thus all DOM operations still have amortized complexity - constant number of operations if you don't need the hash table and amortized constant if you do.

Aside from some performance loss (which is inevitable since decoding takes time), the only other caveat is that we can't remove entries from the hash table - so in some edge cases with a lot of node removals the peak memory consumption can grow indefinitely. In theory we can implement this later; it's unclear that this is useful at this point.

The resulting node/attribute sizes are as follows:
non-compact node: 28b 32-bit, 56b 64-bit
compact node: 12b 32/64-bit
non-compact attribute: 20b 32-bit, 40b 64-bit
compact attribute: 8b 32/64-bit
2015-05-03 11:42:19 -07:00
Arseny Kapoulkine
9597265a12 Cleanup before merge 2015-05-03 11:02:42 -07:00
Arseny Kapoulkine
b1965061af Fix MSVC warning 2015-05-03 09:21:23 -07:00
Arseny Kapoulkine
f67e761970 Fix MSVC build 2015-05-02 16:41:21 -07:00
Arseny Kapoulkine
20e2041f14 Reorder conditions in compact_string implementation
Now compact_string matches compact_pointer_parent.

Turns out PUGI__UNLIKELY is good at reordering conditions but usually does not
really affect performance. Since MSVC should treat "if" branches as taken and
does not support branch probabilities, don't use them if we don't need to.
2015-05-02 15:57:46 -07:00
Arseny Kapoulkine
f8915c8eab Minor refactoring 2015-05-02 15:47:59 -07:00
Arseny Kapoulkine
fa8663c066 Revise marker deletion strategy
Instead of checking if the object being removed allocated a marker, mark the
marker block as deleted immediately upon allocation. This simplifies the logic
and prevents extra markers from being inserted if we allocate/deallocate the
same node indefinitely.

Also change marker pointer type to uint32_t*.
2015-05-02 15:40:30 -07:00
Arseny Kapoulkine
613301ce51 Optimize compact_string
First assignment uses a fast path; second assignment uses a specialized
path as well.
2015-05-02 14:52:27 -07:00
Arseny Kapoulkine
19d43d39fc tests: Add one more page reclamation test 2015-05-02 09:45:26 -07:00
Arseny Kapoulkine
b1578e32a5 Fix node deallocation
When we deallocate nodes/attributes that allocated the marker we have to
adjust the size accordingly, and dismiss the marker in case it gets
overwritten with something else...
2015-05-02 09:38:14 -07:00
Arseny Kapoulkine
dec4267fb1 Implement efficient compact_header storage
Header is now just 2 bytes, with optional additonal 4 bytes that are only
allocated for every 85 nodes / 128 attributes.
2015-05-02 08:59:47 -07:00
Arseny Kapoulkine
e4c539a869 Implement compact_string with shared storage 2015-05-01 22:47:53 -07:00
Arseny Kapoulkine
3915f7b144 Rename compact_string to compact_string_fat 2015-05-01 21:09:26 -07:00
Arseny Kapoulkine
bc5eb22b71 Revert to name/value storage inside node
This temporarily increases the node size to 16 bytes - we'll bring it back.

It allows us to remove the horrible node_pi hack and to reduce the amount of
changes against master. This comes at the price of not decreasing basline
xml_node_struct size.

The compact xml_node_struct is also increased by this change but a followup
change will reduce *both* xml_attribute_struct and xml_node_struct (to 8/12
bytes).
2015-05-01 20:03:17 -07:00
Arseny Kapoulkine
dede617d9f tests: Fix spurious failures in compact mode
The memory_large_allocations test sometimes classified hash allocations
as page allocations since hash table could reach 512 entries.
2015-04-29 09:21:04 -07:00
Arseny Kapoulkine
b2399f5ab5 Refactor offset_debug
Split a long line into multiple statements.
2015-04-29 09:20:08 -07:00
Arseny Kapoulkine
44e4f17348 Change xml_node_struct field order to match compact
Also remove useless comments.
2015-04-22 09:53:04 -07:00
Arseny Kapoulkine
3643b505a6 Fix node_pi memory leak 2015-04-22 08:38:52 -07:00
Arseny Kapoulkine
4223b4a3f0 Make xml_node::value() structure consistent with set_* 2015-04-22 08:30:53 -07:00
Arseny Kapoulkine
e4e2259646 Remove compact_header::operator uintptr_t
We used this in two cases - to get the page pointer and to test flags.

We now use PUGI__GETPAGE for getting the page pointer and operator& to test
flags - this makes getting node type significantly faster since it does not
require page pointer reconstruction.
2015-04-22 08:26:47 -07:00
Arseny Kapoulkine
12744fd1fa Remove redundant has_value check 2015-04-22 07:52:20 -07:00
Arseny Kapoulkine
b87160013b Use has_name/has_value in set_name/set_value 2015-04-22 07:51:02 -07:00
Arseny Kapoulkine
4649914447 Optimize and refactor compact_pointer implementations
Clarify the offset applied when encoding the pointer difference.
Make decoding diff slightly more clear - no effect on performance.

Adjust branch weighting in compact_string encoding - 0.5% faster.
Use uint16_t in compact_pointer_parent - 2% faster.
2015-04-22 07:36:32 -07:00
Arseny Kapoulkine
33b2efe318 Optimize xml_allocator::reserve()
Make sure compact_hash_table::rehash() is not inlined - that way reserve() is
inlined so the fast path has no extra function calls.

Also use subtraction instead of multiplication when checking capacity.
2015-04-21 23:02:44 -07:00
Arseny Kapoulkine
52bcb4ecd6 tests: Adjust allocation thresholds to fix tests 2015-04-21 21:35:54 -07:00
Arseny Kapoulkine
f9983ea2ed Merge branch 'master' into compact 2015-04-21 21:27:44 -07:00
Arseny Kapoulkine
a6cc636a6b tests: Fix MSVC warnings 2015-04-21 21:07:58 -07:00
Arseny Kapoulkine
250d020e9b Use -std=c++0x instead of -std=c++11 2015-04-21 20:46:33 -07:00
Arseny Kapoulkine
8d4544f2e1 Enable C++11 in Makefile 2015-04-21 20:32:40 -07:00
Arseny Kapoulkine
4eadece45f tests: Add move semantics tests
Also test ranged for and copying big xpath_variable_set objects (to make
sure we actually handle hash collisions properly)
2015-04-21 19:44:19 -07:00
Arseny Kapoulkine
83b894b8f1 XPath: Implement move semantics support
xpath_query, xpath_node_set and xpath_variable_set are now moveable.

This is a nice performance optimization for variable/node sets, and enables
storing xpath_query in containers without using pointers (it's only possible
now since the query is not copyable).
2015-04-21 19:42:31 -07:00
Arseny Kapoulkine
a414c5c52d Fix compilation warning in some configurations 2015-04-21 10:02:26 -07:00
Arseny Kapoulkine
cbf3807ad4 Implement copy ctor/assignment for xpath_variable_set
xpath_variable_set is essentially an associative container; it's about time it
became copyable.

Implementation is slightly tricky due to out of memory handling. Both copy ctor
and assignment operator have strong exception guarantee (even if exceptions are
disabled! which translates to "roll back on allocation errors").
2015-04-15 23:22:31 -07:00
Arseny Kapoulkine
70a78b2fa5 tests: Fix Linux build 2015-04-15 22:11:13 -07:00
Arseny Kapoulkine
bb3aee447b tests: Use malloc for OSX/Linux page heap
Switch to malloc and manually aligning the pointer to the page boundary.

mmap is much slower than malloc; this change makes tests ~4x faster.
2015-04-15 21:44:52 -07:00
Arseny Kapoulkine
8c8940430a Minor xpath_variable refactoring
The type of the variable is now initialized correctly in the ctor, so that there
is no interim invalid state.
2015-04-15 08:34:14 -07:00
Arseny Kapoulkine
5158ee903b Fix xpath_node_set assignment to provide strong exception guarantee
Since the type of the set was updated before assignment, assigning in
out-of-memory condition could change the type to not match the content.
2015-04-14 19:23:36 -07:00
Arseny Kapoulkine
2badcbb674 Explicitly call xml_buffered_writer::flush()
If xml_writer::write throws an exception while being called from flush(), the
exception is thrown from destructor. Clang in C++11 mode calls std::terminate
in this case.
2015-04-14 19:11:26 -07:00
Arseny Kapoulkine
e977f04fe2 docs: Add format_indent_attributes documentation
Slightly reword format_indent description.
2015-04-13 21:50:24 -07:00
Arseny Kapoulkine
2a3435274f Refactor format_indent_attributes implementation
Fix code style and revert redundant parameters/whitespace changes.

Also remove format_each_attribute_on_new_line - we're only introducing one
extra formatting flag. The flag implies format_indent but does not include its
bitmask.

Also add a few more tests.

Fixes #14.
2015-04-13 21:49:08 -07:00
Arseny Kapoulkine
950693be7f Merge branch 'AlignAttributesEachOnSeparateLine' of git://github.com/halex2005/pugixml into indent_attributes 2015-04-13 20:56:18 -07:00
Arseny Kapoulkine
f241318f9c Add branch name to AppVeyor version 2015-04-13 20:38:52 -07:00
Arseny Kapoulkine
cb786665d4 tests: Add PUGIXML_COMPACT to AppVeyor 2015-04-13 20:36:04 -07:00
Arseny Kapoulkine
ed2c822643 Merge branch 'master' into compact 2015-04-13 20:35:26 -07:00
Arseny Kapoulkine
1c4098a7d9 Remove all files for the Jamplus-based build system
End of an era.

Make can be used for regular development (Linux/OSX), documentation building
and release packaging.
CMake can be used for regular development (Windows); it's also used by some
Linux distributions.

Continuous integration is now performed by Travis CI and AppVeyor.
2015-04-13 20:30:14 -07:00
Arseny Kapoulkine
baacd81907 Fix AppVeyor script path 2015-04-13 20:10:45 -07:00
Arseny Kapoulkine
218ddd0376 Add AppVeyor build scripts 2015-04-13 20:03:49 -07:00
Arseny Kapoulkine
05032b4c06 scripts: Add an option for building tests with CMake 2015-04-13 20:02:09 -07:00
halex2005
5d66ae9fb9 add tests for aligning each attribute on next line 2015-04-14 00:56:42 +05:00
halex2005
6766f35338 add align each attribute on new line support with format_indent_attribute 2015-04-14 00:56:23 +05:00
Arseny Kapoulkine
054b0b447e Merge branch 'master' into compact 2015-04-12 22:09:45 -07:00
Arseny Kapoulkine
9539c488c2 Fix unused variable warning
Also fix test in wchar_t mode.
2015-04-12 22:06:17 -07:00
Arseny Kapoulkine
f04b56e178 Permit custom allocation function to throw
Ensure that all the necessary cleanup is performed in case the allocation fails
with an exception - files are closed, buffers are reclaimed, etc.

Any test that triggers a simulated out-of-memory condition is ran once again
with a throwing allocation function. Unobserved std::bad_alloc count as test
failures and require CHECK_ALLOC_FAIL macro.

Fixes #17.
2015-04-12 21:46:48 -07:00
Arseny Kapoulkine
5edeaf6765 tests: Add more out of memory tests
Also add tests that verify save_file for absence of FILE leaks.
2015-04-12 21:27:12 -07:00
Arseny Kapoulkine
6c11a0c693 Fix compilation and tests after merge. 2015-04-12 03:14:08 -07:00
Arseny Kapoulkine
a19da1c246 Merge branch 'master' into compact 2015-04-12 03:05:58 -07:00
Arseny Kapoulkine
a0d065cd22 Implment copyless copy for attributes
Previously attributes that were copied with their node used string sharing,
but standalone attributes that were copied using xml_node::*_copy(xml_attribute)
were not.
2015-04-12 03:03:56 -07:00
Arseny Kapoulkine
c5d07e2c28 tests: Add a test that verifies absence of file leaks
If an out of memory error happens in load_file there's a danger of leaking
the FILE object. Since there is a limited supply of the objects we can easily
test that the leak does not happen.
2015-04-12 02:34:48 -07:00
Arseny Kapoulkine
2537cccad3 tests: Fix some Coverity issues 2015-04-12 02:17:20 -07:00
Arseny Kapoulkine
d6f7766172 Optimize xml_node::path() to use 1 allocation
Instead of reallocating the string for every tree level just do two passes
over the ancestor chain.
2015-04-12 02:12:15 -07:00
Arseny Kapoulkine
99afee1832 Move zero-termination out of as_utf8_end
as_utf8_end was used with std::string, where writing an extra zero-terminating
character should *probably* always work (at least if size is positive) but is
not ideal.

The only place that needed to zero-terminate was convert_path_heap.
2015-04-12 01:32:25 -07:00
Arseny Kapoulkine
3da7d68617 Fix Travis CI build. 2015-04-11 22:52:41 -07:00
Arseny Kapoulkine
4e004176ba tests: Improve out-of-memory tests
Previously there was no guarantee that the tests that check for out of memory
handling behavior are actually correct - e.g. that they correctly simulate out
of memory conditions.

Now every simulated out of memory condition has to be "guarded" using
CHECK_ALLOC_FAIL. It makes sure that every piece of code that is supposed to
cause out-of-memory does so, and that no other code runs out of memory
unnoticed.
2015-04-11 22:46:08 -07:00
Arseny Kapoulkine
37467c13bf tests: Add a test for throwing from xml_writer::write
We currently don't allocate/modify any state so there are no issues with this.
2015-04-11 22:44:42 -07:00
Arseny Kapoulkine
e2e5bc906a Use -fno-exceptions flag for PUGIXML_NO_EXCEPTIONS build
This makes sure that no exception handling mechanisms are used if
PUGXML_NO_EXCEPTIONS is defined.
2015-04-11 22:42:27 -07:00
Arseny Kapoulkine
814443b147 Fix exception type for out-of-memory for XPath variables
When parsing XPath variables, we need to perform a heap allocation; if it
fails, an xpath_exception instead of bad_alloc used to be thrown.

Now we throw the exception of a correct type so that xpath_exception means
'parsing error'.
2015-04-11 22:40:30 -07:00
Arseny Kapoulkine
03ea04c32a tests: Use char_t instead of wchar_t 2015-04-11 00:33:35 -07:00
Arseny Kapoulkine
29fef9aca2 tests: Add more out of memory tests
This provides more coverage for #17.
2015-04-11 00:16:39 -07:00
Arseny Kapoulkine
e90d2ac8ba Merge branch 'master' into compact 2015-04-10 22:26:57 -07:00
Arseny Kapoulkine
405fefc877 Update README.md 2015-04-10 20:59:07 -07:00
Arseny Kapoulkine
9b8553bf4b docs: Update release date v1.6 2015-04-10 20:49:47 -07:00
Arseny Kapoulkine
f1d1534210 Fix archive packaging
Base directory is now using target basename.
2015-04-10 20:45:07 -07:00
Arseny Kapoulkine
10ff488eb9 docs: Use automatically retrieved version for docs
This eliminates one more hardcoded version from the repo, yay!
2015-03-24 20:59:04 -07:00
Arseny Kapoulkine
fc20b0afbb Update Makefile to exclude docs/manual folder from release 2015-03-24 20:08:06 -07:00
Arseny Kapoulkine
e35058cfda docs: Add generated documentation 2015-03-24 20:07:19 -07:00
Arseny Kapoulkine
80a8a77af4 docs: Finishing touches
It's almost done; the only remaining issue is that some section titles are too long.
2015-03-24 10:03:08 -07:00
Arseny Kapoulkine
704d27622b Add include dependencies to HTML targets 2015-03-22 11:34:06 -07:00
Arseny Kapoulkine
9a55571725 docs: Reword documentation note 2015-03-22 10:44:46 -07:00
Arseny Kapoulkine
c0374b8a48 docs: Minor API reference improvements 2015-03-22 10:40:18 -07:00
Arseny Kapoulkine
3f3e4525e1 docs: Fix several internal links 2015-03-22 10:08:35 -07:00
Arseny Kapoulkine
5644027990 docs: HTML validity fixes
Also minor wording fixes.
2015-03-22 09:50:55 -07:00
Arseny Kapoulkine
40fa405751 docs: Converted some samples to Unix newline 2015-03-22 01:01:46 -07:00
Arseny Kapoulkine
56bdc6c5ea docs: Extract configuration to config.adoc 2015-03-22 00:35:06 -07:00
Arseny Kapoulkine
c94e8a7c0e docs: Remove old Quickbook sources 2015-03-22 00:16:14 -07:00
Arseny Kapoulkine
d4f9047b2f docs: Fix PUGIXML_HEADER_ONLY description
Users no longer need to #include "pugixml.cpp"
2015-03-22 00:14:48 -07:00
Arseny Kapoulkine
11054219de docs: A lot of small fixes
Mostly added correct quotation to changelog.
2015-03-22 00:11:19 -07:00
Arseny Kapoulkine
55081aca8b docs: Set up cross-referencing and anchors
This is mostly done using regex replaces of original Quickbook markup, plus a
bit of manual fixup for multiple references to the single point from different
lines that AsciiDoc does not seem to handle.
2015-03-21 23:37:33 -07:00
Arseny Kapoulkine
054bffb195 docs: API reference is closer to being done
Still need to replace [link ] with actual links.
Also a bunch of small fixes here and there.
2015-03-21 23:09:29 -07:00
Arseny Kapoulkine
363b7a3b22 docs: Fix nested lists and changelog 2015-03-21 22:23:03 -07:00
Arseny Kapoulkine
5f8cd17ff6 docs: Fix tables and images in the manual
Also remove redundant [lbr]
2015-03-21 21:56:54 -07:00
Arseny Kapoulkine
b9177ab7b5 docs: Remove image thumbnails 2015-03-21 21:53:50 -07:00
Arseny Kapoulkine
d8f900f148 Add docs target to Makefile 2015-03-21 21:06:48 -07:00
Arseny Kapoulkine
eed184a175 docs: Remove auxiliary files for old documentation 2015-03-21 21:05:52 -07:00
Arseny Kapoulkine
2843f91d00 docs: Remove old HTML documentation 2015-03-21 21:04:28 -07:00
Arseny Kapoulkine
1a450b302a docs: Use AsciiDoc-compatible comments in samples 2015-03-21 21:03:01 -07:00
Arseny Kapoulkine
23e9beb003 docs: Add AsciiDoc versions of quickstart and manual
Quickstart should be reasonably complete; manual is still in progress
2015-03-21 21:02:27 -07:00
Arseny Kapoulkine
5959a17967 tests: Final test fix for CW 2015-03-21 17:09:42 -07:00
Arseny Kapoulkine
250b690a54 tests: Work around fp issues in various runtime libraries
Disable/change some tests for some compilers; use binary float comparison
for early MSVC versions.
2015-03-21 01:05:31 -07:00
Arseny Kapoulkine
ce974094ac tests: Fix test compilation
Rename PAGE_SIZE to page_size to avoid define conflict with Android SDK.
Minor fixes in several tests.
2015-03-21 00:14:53 -07:00
Arseny Kapoulkine
28e63f66e1 Update year to 2015 2015-03-20 20:47:14 -07:00