pugixml features an extensive interface for getting various types of data from
the document and for traversing the document. This section provides documentation
for all such functions that do not modify the tree except for XPath-related
functions; see <aclass="xref"href="xpath.html"title="XPath"> XPath</a> for XPath reference. As discussed in <aclass="xref"href="dom.html#manual.dom.cpp"title="C++ interface"> C++ interface</a>,
there are two types of handles to tree data - <aclass="link"href="dom.html#xml_node">xml_node</a>
and <aclass="link"href="dom.html#xml_attribute">xml_attribute</a>. The handles have special
null (empty) values which propagate through various functions and thus are
useful for writing more concise code; see <aclass="link"href="dom.html#node_null">this description</a>
for details. The documentation in this section will explicitly state the results
node's parent; all non-null nodes except the document have non-null parent.
<codeclass="computeroutput"><spanclass="identifier">first_child</span></code> and <codeclass="computeroutput"><spanclass="identifier">last_child</span></code> return the first and last child
of the node, respectively; note that only document nodes and element nodes
can have non-empty child node list. If node has no children, both functions
and <codeclass="computeroutput"><spanclass="identifier">previous_sibling</span></code> return
the node that's immediately to the right/left of this node in the children
list, respectively - for example, in <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">a</span><spanclass="special">/><</span><spanclass="identifier">b</span><spanclass="special">/><</span><spanclass="identifier">c</span><spanclass="special">/></span></code>,
calling <codeclass="computeroutput"><spanclass="identifier">next_sibling</span></code> for
a handle that points to <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">b</span><spanclass="special">/></span></code>
results in a handle pointing to <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">c</span><spanclass="special">/></span></code>,
and calling <codeclass="computeroutput"><spanclass="identifier">previous_sibling</span></code>
results in handle pointing to <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">a</span><spanclass="special">/></span></code>.
If node does not have next/previous sibling (this happens if it is the last/first
node in the list, respectively), the functions return null nodes. <codeclass="computeroutput"><spanclass="identifier">first_attribute</span></code>, <codeclass="computeroutput"><spanclass="identifier">last_attribute</span></code>,
Because of memory consumption reasons, attributes do not have a link to
their parent nodes. Thus there is no <codeclass="computeroutput"><spanclass="identifier">xml_attribute</span><spanclass="special">::</span><spanclass="identifier">parent</span><spanclass="special">()</span></code> function.
</p></td></tr>
</table></div>
<p>
Calling any of the functions above on the null handle results in a null handle
- i.e. <codeclass="computeroutput"><spanclass="identifier">node</span><spanclass="special">.</span><spanclass="identifier">first_child</span><spanclass="special">().</span><spanclass="identifier">next_sibling</span><spanclass="special">()</span></code>
returns the second child of <codeclass="computeroutput"><spanclass="identifier">node</span></code>,
In case node does not have a name or value or if the node handle is null,
both functions return empty strings - they never return null pointers.
</p>
<aname="xml_node::child_value"></a><p>
It is common to store data as text contents of some node - i.e. <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">node</span><spanclass="special">><</span><spanclass="identifier">description</span><spanclass="special">></span><spanclass="identifier">This</span><spanclass="identifier">is</span><spanclass="identifier">a</span><spanclass="identifier">node</span><spanclass="special"></</span><spanclass="identifier">description</span><spanclass="special">></</span><spanclass="identifier">node</span><spanclass="special">></span></code>.
In this case, <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">description</span><spanclass="special">></span></code> node does not have a value, but instead
returns the value of the first child with type <aclass="link"href="dom.html#node_pcdata">node_pcdata</a>
or <aclass="link"href="dom.html#node_cdata">node_cdata</a>; <codeclass="computeroutput"><spanclass="identifier">child_value</span><spanclass="special">(</span><spanclass="identifier">name</span><spanclass="special">)</span></code>
is a simple wrapper for <codeclass="computeroutput"><spanclass="identifier">child</span><spanclass="special">(</span><spanclass="identifier">name</span><spanclass="special">).</span><spanclass="identifier">child_value</span><spanclass="special">()</span></code>.
For the above example, calling <codeclass="computeroutput"><spanclass="identifier">node</span><spanclass="special">.</span><spanclass="identifier">child_value</span><spanclass="special">(</span><spanclass="string">"description"</span><spanclass="special">)</span></code> and <codeclass="computeroutput"><spanclass="identifier">description</span><spanclass="special">.</span><spanclass="identifier">child_value</span><spanclass="special">()</span></code> will both produce string <codeclass="computeroutput"><spanclass="string">"This is a node"</span></code>. If there is no
child with relevant type, or if the handle is null, <codeclass="computeroutput"><spanclass="identifier">child_value</span></code>
<codeclass="computeroutput"><spanclass="identifier">as_double</span></code> and <codeclass="computeroutput"><spanclass="identifier">as_float</span></code> convert attribute values to numbers.
If attribute handle is null or attribute value is empty, <codeclass="computeroutput"><spanclass="identifier">def</span></code>
argument is returned (which is 0 by default). Otherwise, all leading whitespace
characters are truncated, and the remaining string is parsed as a decimal
number (<codeclass="computeroutput"><spanclass="identifier">as_int</span></code> or <codeclass="computeroutput"><spanclass="identifier">as_uint</span></code>) or as a floating point number
in either decimal or scientific form (<codeclass="computeroutput"><spanclass="identifier">as_double</span></code>
or <codeclass="computeroutput"><spanclass="identifier">as_float</span></code>). Any extra characters
are silently discarded, i.e. <codeclass="computeroutput"><spanclass="identifier">as_int</span></code>
will return <codeclass="computeroutput"><spanclass="number">1</span></code> for string <codeclass="computeroutput"><spanclass="string">"1abc"</span></code>.
</p>
<p>
In case the input string contains a number that is out of the target numeric
Number conversion functions depend on current C locale as set with <codeclass="computeroutput"><spanclass="identifier">setlocale</span></code>, so may return unexpected results
if the locale is different from <codeclass="computeroutput"><spanclass="string">"C"</span></code>.
value to boolean as follows: if attribute handle is null, <codeclass="computeroutput"><spanclass="identifier">def</span></code>
argument is returned (which is <codeclass="computeroutput"><spanclass="keyword">false</span></code>
by default). If attribute value is empty, <codeclass="computeroutput"><spanclass="keyword">false</span></code>
is returned. Otherwise, <codeclass="computeroutput"><spanclass="keyword">true</span></code>
is returned if the first character is one of <codeclass="computeroutput"><spanclass="char">'1'</span><spanclass="special">,</span><spanclass="char">'t'</span><spanclass="special">,</span>
This means that strings like <codeclass="computeroutput"><spanclass="string">"true"</span></code>
and <codeclass="computeroutput"><spanclass="string">"yes"</span></code> are recognized
as <codeclass="computeroutput"><spanclass="keyword">true</span></code>, while strings like
<codeclass="computeroutput"><spanclass="string">"false"</span></code> and <codeclass="computeroutput"><spanclass="string">"no"</span></code> are recognized as <codeclass="computeroutput"><spanclass="keyword">false</span></code>. For more complex matching you'll have
<codeclass="computeroutput"><spanclass="identifier">child</span></code> and <codeclass="computeroutput"><spanclass="identifier">attribute</span></code>
return the first child/attribute with the specified name; <codeclass="computeroutput"><spanclass="identifier">next_sibling</span></code>
and <codeclass="computeroutput"><spanclass="identifier">previous_sibling</span></code> return
the first sibling in the corresponding direction with the specified name.
All string comparisons are case-sensitive. In case the node handle is null
or there is no node/attribute with the specified name, null handle is returned.
</p>
<p>
<codeclass="computeroutput"><spanclass="identifier">child</span></code> and <codeclass="computeroutput"><spanclass="identifier">next_sibling</span></code>
functions can be used together to loop through all child nodes with the desired
Occasionally the needed node is specified not by the unique name but instead
by the value of some attribute; for example, it is common to have node collections
with each node having a unique id: <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">group</span><spanclass="special">><</span><spanclass="identifier">item</span><spanclass="identifier">id</span><spanclass="special">=</span><spanclass="string">"1"</span><spanclass="special">/></span><spanclass="special"><</span><spanclass="identifier">item</span><spanclass="identifier">id</span><spanclass="special">=</span><spanclass="string">"2"</span><spanclass="special">/></</span><spanclass="identifier">group</span><spanclass="special">></span></code>. There are two functions for finding
The three-argument function returns the first child node with the specified
name which has an attribute with the specified name/value; the two-argument
function skips the name test for the node, which can be useful for searching
in heterogeneous collections. If the node handle is null or if no node is
found, null handle is returned. All string comparisons are case-sensitive.
</p>
<p>
In all of the above functions, all arguments have to be valid strings; passing
null pointers results in undefined behavior.
</p>
<p>
This is an example of using these functions (<ahref="../samples/traverse_base.cpp"target="_top">samples/traverse_base.cpp</a>):
</p>
<p>
</p>
<preclass="programlisting"><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">cout</span><spanclass="special"><<</span><spanclass="string">"Tool for *.dae generation: "</span><spanclass="special"><<</span><spanclass="identifier">tools</span><spanclass="special">.</span><spanclass="identifier">find_child_by_attribute</span><spanclass="special">(</span><spanclass="string">"Tool"</span><spanclass="special">,</span><spanclass="string">"OutputFileMasks"</span><spanclass="special">,</span><spanclass="string">"*.dae"</span><spanclass="special">).</span><spanclass="identifier">attribute</span><spanclass="special">(</span><spanclass="string">"Filename"</span><spanclass="special">).</span><spanclass="identifier">value</span><spanclass="special">()</span><spanclass="special"><<</span><spanclass="string">"\n"</span><spanclass="special">;</span>
<codeclass="computeroutput"><spanclass="identifier">children</span></code> function allows
you to enumerate all child nodes; <codeclass="computeroutput"><spanclass="identifier">children</span></code>
function with <codeclass="computeroutput"><spanclass="identifier">name</span></code> argument
allows you to enumerate all child nodes with a specific name; <codeclass="computeroutput"><spanclass="identifier">attributes</span></code> function allows you to enumerate
all attributes of the node. Note that you can also use node object itself
in a range-based for construct, which is equivalent to using <codeclass="computeroutput"><spanclass="identifier">children</span><spanclass="special">()</span></code>.
</p>
<p>
This is an example of using these functions (<ahref="../samples/traverse_rangefor.cpp"target="_top">samples/traverse_rangefor.cpp</a>):
Child node lists and attribute lists are simply double-linked lists; while
you can use <codeclass="computeroutput"><spanclass="identifier">previous_sibling</span></code>/<codeclass="computeroutput"><spanclass="identifier">next_sibling</span></code> and other such functions for
iteration, pugixml additionally provides node and attribute iterators, so
that you can treat nodes as containers of other nodes or attributes:
<codeclass="computeroutput"><spanclass="identifier">begin</span></code> and <codeclass="computeroutput"><spanclass="identifier">attributes_begin</span></code>
return iterators that point to the first node/attribute, respectively; <codeclass="computeroutput"><spanclass="identifier">end</span></code> and <codeclass="computeroutput"><spanclass="identifier">attributes_end</span></code>
return past-the-end iterator for node/attribute list, respectively - this
iterator can't be dereferenced, but decrementing it results in an iterator
pointing to the last element in the list (except for empty lists, where decrementing
past-the-end iterator results in undefined behavior). Past-the-end iterator
is commonly used as a termination value for iteration loops (see sample below).
If you want to get an iterator that points to an existing handle, you can
construct the iterator with the handle as a single constructor argument,
like so: <codeclass="computeroutput"><spanclass="identifier">xml_node_iterator</span><spanclass="special">(</span><spanclass="identifier">node</span><spanclass="special">)</span></code>.
For <codeclass="computeroutput"><spanclass="identifier">xml_attribute_iterator</span></code>,
you'll have to provide both an attribute and its parent node.
</p>
<p>
<codeclass="computeroutput"><spanclass="identifier">begin</span></code> and <codeclass="computeroutput"><spanclass="identifier">end</span></code>
return equal iterators if called on null node; such iterators can't be dereferenced.
<codeclass="computeroutput"><spanclass="identifier">attributes_begin</span></code> and <codeclass="computeroutput"><spanclass="identifier">attributes_end</span></code> behave the same way. For
correct iterator usage this means that child node/attribute collections of
null nodes appear to be empty.
</p>
<p>
Both types of iterators have bidirectional iterator semantics (i.e. they
can be incremented and decremented, but efficient random access is not supported)
and support all usual iterator operations - comparison, dereference, etc.
The iterators are invalidated if the node/attribute objects they're pointing
to are removed from the tree; adding nodes/attributes does not invalidate
any iterators.
</p>
<p>
Here is an example of using iterators for document traversal (<ahref="../samples/traverse_iter.cpp"target="_top">samples/traverse_iter.cpp</a>):
Node and attribute iterators are somewhere in the middle between const
and non-const iterators. While dereference operation yields a non-constant
reference to the object, so that you can use it for tree modification operations,
modifying this reference by assignment - i.e. passing iterators to a function
like <codeclass="computeroutput"><spanclass="identifier">std</span><spanclass="special">::</span><spanclass="identifier">sort</span></code> - will not give expected results,
as assignment modifies local handle that's stored in the iterator.
</p></td></tr>
</table></div>
</div>
<divclass="section">
<divclass="titlepage"><div><div><h3class="title">
<aname="manual.access.walker"></a><aclass="link"href="access.html#manual.access.walker"title="Recursive traversal with xml_tree_walker"> Recursive traversal with xml_tree_walker</a>
</h3></div></div></div>
<aname="xml_tree_walker"></a><p>
The methods described above allow traversal of immediate children of some
node; if you want to do a deep tree traversal, you'll have to do it via a
recursive function or some equivalent method. However, pugixml provides a
helper for depth-first traversal of a subtree. In order to use it, you have
to implement <codeclass="computeroutput"><spanclass="identifier">xml_tree_walker</span></code>
interface and to call <codeclass="computeroutput"><spanclass="identifier">traverse</span></code>
First, <codeclass="computeroutput"><spanclass="identifier">begin</span></code> function
is called with traversal root as its argument.
</li>
<liclass="listitem">
Then, <codeclass="computeroutput"><spanclass="identifier">for_each</span></code> function
is called for all nodes in the traversal subtree in depth first order,
excluding the traversal root. Node is passed as an argument.
</li>
<liclass="listitem">
Finally, <codeclass="computeroutput"><spanclass="identifier">end</span></code> function
is called with traversal root as its argument.
</li>
</ul></div>
<p>
If <codeclass="computeroutput"><spanclass="identifier">begin</span></code>, <codeclass="computeroutput"><spanclass="identifier">end</span></code>
or any of the <codeclass="computeroutput"><spanclass="identifier">for_each</span></code> calls
return <codeclass="computeroutput"><spanclass="keyword">false</span></code>, the traversal
is terminated and <codeclass="computeroutput"><spanclass="keyword">false</span></code> is returned
as the traversal result; otherwise, the traversal results in <codeclass="computeroutput"><spanclass="keyword">true</span></code>. Note that you don't have to override
<codeclass="computeroutput"><spanclass="identifier">begin</span></code> or <codeclass="computeroutput"><spanclass="identifier">end</span></code>
functions; their default implementations return <codeclass="computeroutput"><spanclass="keyword">true</span></code>.
</p>
<aname="xml_tree_walker::depth"></a><p>
You can get the node's depth relative to the traversal root at any point
by calling <codeclass="computeroutput"><spanclass="identifier">depth</span></code> function.
It returns <codeclass="computeroutput"><spanclass="special">-</span><spanclass="number">1</span></code>
if called from <codeclass="computeroutput"><spanclass="identifier">begin</span></code>/<codeclass="computeroutput"><spanclass="identifier">end</span></code>, and returns 0-based depth if called
from <codeclass="computeroutput"><spanclass="identifier">for_each</span></code> - depth is
0 for all children of the traversal root, 1 for all grandchildren and so
on.
</p>
<p>
This is an example of traversing tree hierarchy with xml_tree_walker (<ahref="../samples/traverse_walker.cpp"target="_top">samples/traverse_walker.cpp</a>):
<aname="manual.access.predicate"></a><aclass="link"href="access.html#manual.access.predicate"title="Searching for nodes/attributes with predicates"> Searching for nodes/attributes
The predicate should be either a plain function or a function object which
accepts one argument of type <codeclass="computeroutput"><spanclass="identifier">xml_attribute</span></code>
(for <codeclass="computeroutput"><spanclass="identifier">find_attribute</span></code>) or
<codeclass="computeroutput"><spanclass="identifier">xml_node</span></code> (for <codeclass="computeroutput"><spanclass="identifier">find_child</span></code> and <codeclass="computeroutput"><spanclass="identifier">find_node</span></code>),
and returns <codeclass="computeroutput"><spanclass="keyword">bool</span></code>. The predicate
is never called with null handle as an argument.
</p>
<p>
<codeclass="computeroutput"><spanclass="identifier">find_attribute</span></code> function iterates
through all attributes of the specified node, and returns the first attribute
<aname="manual.access.text"></a><aclass="link"href="access.html#manual.access.text"title="Working with text contents"> Working with text contents</a>
</h3></div></div></div>
<aname="xml_text"></a><p>
It is common to store data as text contents of some node - i.e. <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">node</span><spanclass="special">><</span><spanclass="identifier">description</span><spanclass="special">></span><spanclass="identifier">This</span><spanclass="identifier">is</span><spanclass="identifier">a</span><spanclass="identifier">node</span><spanclass="special"></</span><spanclass="identifier">description</span><spanclass="special">></</span><spanclass="identifier">node</span><spanclass="special">></span></code>.
In this case, <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">description</span><spanclass="special">></span></code> node does not have a value, but instead
has a child of type <aclass="link"href="dom.html#node_pcdata">node_pcdata</a> with value
<codeclass="computeroutput"><spanclass="string">"This is a node"</span></code>. pugixml
provides a special class, <codeclass="computeroutput"><spanclass="identifier">xml_text</span></code>,
to work with such data. Working with text objects to modify data is described
in <aclass="link"href="modify.html#manual.modify.text"title="Working with text contents">the documentation for modifying document
data</a>; this section describes the access interface of <codeclass="computeroutput"><spanclass="identifier">xml_text</span></code>.
</p>
<aname="xml_node::text"></a><p>
You can get the text object from a node by using <codeclass="computeroutput"><spanclass="identifier">text</span><spanclass="special">()</span></code> method:
If the node has a type <codeclass="computeroutput"><spanclass="identifier">node_pcdata</span></code>
or <codeclass="computeroutput"><spanclass="identifier">node_cdata</span></code>, then the node
itself is used to return data; otherwise, a first child node of type <codeclass="computeroutput"><spanclass="identifier">node_pcdata</span></code> or <codeclass="computeroutput"><spanclass="identifier">node_cdata</span></code>
All of the above functions have the same semantics as similar <codeclass="computeroutput"><spanclass="identifier">xml_attribute</span></code> members: they return the
default argument if the text object is empty, they convert the text contents
to a target type using the same rules and restrictions. You can <aclass="link"href="access.html#xml_attribute::as_int">refer
to documentation for the attribute functions</a> for details.
</p>
<aname="xml_text::data"></a><p>
<codeclass="computeroutput"><spanclass="identifier">xml_text</span></code> is essentially a
helper class that operates on <codeclass="computeroutput"><spanclass="identifier">xml_node</span></code>
values. It is bound to a node of type <aclass="link"href="dom.html#node_pcdata">node_pcdata</a>
<codeclass="computeroutput"><spanclass="identifier">text</span><spanclass="special">.</span><spanclass="identifier">get</span><spanclass="special">()</span></code> is
equivalent to calling <codeclass="computeroutput"><spanclass="identifier">text</span><spanclass="special">.</span><spanclass="identifier">data</span><spanclass="special">().</span><spanclass="identifier">value</span><spanclass="special">()</span></code>.
</p>
<p>
This is an example of using <codeclass="computeroutput"><spanclass="identifier">xml_text</span></code>
Node paths consist of node names, separated with a delimiter (which is <codeclass="computeroutput"><spanclass="special">/</span></code> by default); also paths can contain self
(<codeclass="computeroutput"><spanclass="special">.</span></code>) and parent (<codeclass="computeroutput"><spanclass="special">..</span></code>) pseudo-names, so that this is a valid
(absolute paths start with the delimiter), in which case the rest of the
path is treated as document root relative, and relative to the given node.
For example, in the following document: <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">a</span><spanclass="special">><</span><spanclass="identifier">b</span><spanclass="special">><</span><spanclass="identifier">c</span><spanclass="special">/></</span><spanclass="identifier">b</span><spanclass="special">></</span><spanclass="identifier">a</span><spanclass="special">></span></code>,
node <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">c</span><spanclass="special">/></span></code> has path <codeclass="computeroutput"><spanclass="string">"a/b/c"</span></code>;
for document with path <codeclass="computeroutput"><spanclass="string">"a/b"</span></code>
results in node <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">b</span><spanclass="special">/></span></code>; calling <codeclass="computeroutput"><spanclass="identifier">first_element_by_path</span></code>
for node <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">a</span><spanclass="special">/></span></code> with path <codeclass="computeroutput"><spanclass="string">"../a/./b/../."</span></code>
results in node <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">a</span><spanclass="special">/></span></code>; calling <codeclass="computeroutput"><spanclass="identifier">first_element_by_path</span></code>
with path <codeclass="computeroutput"><spanclass="string">"/a"</span></code> results
in node <codeclass="computeroutput"><spanclass="special"><</span><spanclass="identifier">a</span><spanclass="special">/></span></code> for any node.
</p>
<p>
In case path component is ambiguous (if there are two nodes with given name),
the first one is selected; paths are not guaranteed to uniquely identify
nodes in a document. If any component of a path is not found, the result
of <codeclass="computeroutput"><spanclass="identifier">first_element_by_path</span></code>
is null node; also <codeclass="computeroutput"><spanclass="identifier">first_element_by_path</span></code>
returns null node for null nodes, in which case the path does not matter.
<codeclass="computeroutput"><spanclass="identifier">path</span></code> returns an empty string