Discussion:
Newbie has trouble with parsing
(too old to reply)
malhenry
2006-04-27 15:01:03 UTC
Permalink
I am using MSXML 4.0 on WinXP Pro.

I am having two problems.
1. How can I extract the value (of a SINGLE node) such that I do NOT get the
values of all child nodes concatenated together (e.g. all children of the
incident node)?
2. How can I successfully navigate the document? The problem is that when
there are multiple nodes with the same name, I always get the first instance,
not the instance that I want.

Here is an (short) xml file:
<?xml version="1.0" encoding="ISO-8859-1" ?>
- <Traffic>
<date>06-01-25</date>
<time>14:23:24</time>
<city>TO</city>
<sector />
- <incident>
<active>0</active>
<route>403</route>
<direction>WB</direction>
<location>at</location>
<route>6 South</route>
<incident>Collision</incident>
<lane>Right lane</lane>
</incident>
<city>TO</city>
<sector />
- <incident>
<active>1</active>
<route>409</route>
<direction>WB</direction>
<location>at</location>
<route>427</route>
<incident>Car guardrail</incident>
<lane />
</Traffic>


Here are some code snippets:
This method always goes to the first city node:
bool APCCheckXML::ProcNodeCity(MSXML::IXMLDOMDocument *m_pDoc1,
MSXML::IXMLDOMNode* pCityNode, const char *pCity,
int iCity)
{

HRESULT hr = S_OK; // HRESULT from MSXML calls
BSTR bstrItemText;

BSTR bstrNodeName = ::SysAllocString( L"//city" );
m_pDoc1->selectSingleNode(bstrNodeName, &pCityNode);
hr = pCityNode->get_text(&bstrItemText);
if (hr == S_OK)
{
if(bstrItemText)
{
TRACE(TEXT("city =%S\n"), bstrItemText);

// Convert the BSTR (nodeName) into an ANSI string in order to make
comparison

USES_CONVERSION; // When using an ATL string conversion macro, specify the
USES_CONVERSION
// macro at the beginning of your function in order to avoid compiler
errors

pCity = OLE2CT( bstrItemText );

::SysFreeString(bstrItemText);
bstrItemText = NULL;
pCityNode->Release();
pCityNode = NULL;
}
}

return hr == S_OK ? true : false;
}

The next method prints all child node values of the OUTER incident node
(i.e. incident =0 403 WB at 6 South Collision Right lane) instead of just
the value of the inner incident value (Collision).
bool APCCheckXML::ProcNodeIncident(MSXML::IXMLDOMDocument *m_pDoc1,
MSXML::IXMLDOMNode* pIncidentNode, const char *pIncident)
{

HRESULT hr = S_OK; // HRESULT from MSXML calls
BSTR bstrItemText;

BSTR bstrNodeName = ::SysAllocString( L"//incident" );
m_pDoc1->selectSingleNode(bstrNodeName, &pIncidentNode);
hr = pIncidentNode->get_text(&bstrItemText);
if (hr == S_OK)
{
if(bstrItemText)
{
TRACE(TEXT("incident =%S\n"), bstrItemText);

// Convert the BSTR (nodeName) into an ANSI string in order to make
comparison

USES_CONVERSION; // When using an ATL string conversion macro, specify the
USES_CONVERSION
// macro at the beginning of your function in order to avoid compiler
errors

pIncident = OLE2CT( bstrItemText );

::SysFreeString(bstrItemText);
bstrItemText = NULL;
pIncidentNode->Release();
pIncidentNode = NULL;
}
}

return hr == S_OK ? true : false;
}
Thanks!!
Martin Honnen
2006-04-27 18:14:17 UTC
Permalink
Post by malhenry
I am using MSXML 4.0 on WinXP Pro.
I am having two problems.
1. How can I extract the value (of a SINGLE node) such that I do NOT get the
values of all child nodes concatenated together (e.g. all children of the
incident node)?
It depends on what you understand the value of a node to be. In terms of
the DOM the nodeValue of an element node is defined to be null for
instance. What excactly is your definition of the value? If you are
looking for the first child text node in an element node then you could
do e.g.
selectSingleNode('/root/element/child/text()')
and read out the nodeValue property of that text node.
Post by malhenry
2. How can I successfully navigate the document? The problem is that when
there are multiple nodes with the same name, I always get the first instance,
not the instance that I want.
Use selectNodes, not selectSingleNode. selectNodes gives you a node list
through which you can iterate.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
malhenry
2006-04-27 19:09:02 UTC
Permalink
What I mean by value of a node is the NODE_TEXT or the string between the
tag delimiters, for example:
<MyTag>MyValue</MyTag>
I consider the value of the MyTag element to be "MyValue".

My ProcNodeCity method that I posted will obtain TO from <city>TO<city>
which is what I want, the problem is how to get to the next city.

I have also successfully used SelectNodes to process other xml files that
stored all their data as attributes. I do not think I can use SelectNodes to
process the xml file I posted because the city node is outside the incident
nodes. Essentially the way I want to process this data is to get all the
incident nodes for a Particular city and then process all the children for
the first N incident nodes for a particular city. I am not sure how to do
this (especially since the city node is not a child of the incident node).
Any ideas would be welcome.

Thanks.
Post by Martin Honnen
Post by malhenry
I am using MSXML 4.0 on WinXP Pro.
I am having two problems.
1. How can I extract the value (of a SINGLE node) such that I do NOT get the
values of all child nodes concatenated together (e.g. all children of the
incident node)?
It depends on what you understand the value of a node to be. In terms of
the DOM the nodeValue of an element node is defined to be null for
instance. What excactly is your definition of the value? If you are
looking for the first child text node in an element node then you could
do e.g.
selectSingleNode('/root/element/child/text()')
and read out the nodeValue property of that text node.
Post by malhenry
2. How can I successfully navigate the document? The problem is that when
there are multiple nodes with the same name, I always get the first instance,
not the instance that I want.
Use selectNodes, not selectSingleNode. selectNodes gives you a node list
through which you can iterate.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Martin Honnen
2006-04-28 11:42:21 UTC
Permalink
Post by malhenry
What I mean by value of a node is the NODE_TEXT or the string between the
<MyTag>MyValue</MyTag>
I consider the value of the MyTag element to be "MyValue".
That is one example, with MSXML you can simply use
elementNode.text
to get that kind of value.
But XML allows much more complex structures e.g.
<element>Some text
<child>some child text</child>
more text
<child>more child text</child>
</element>
and you have not explained what the value of the element named 'element'
is in this case.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
malhenry
2006-04-28 14:03:02 UTC
Permalink
If you look at my first post, you will see the xml data format that I need to
handle. The case in your latest post is not something I have to worry about.
I want to treat each element individually and get the associated text. The
only exception to this is the (Outer) incident nodes which contain other
nodes. In that case, I can build a set of the child nodes and then I want to
treat those nodes individually so that I can grab the text from each node and
manipulate it for output to a text file.

Thanks.
Post by Martin Honnen
Post by malhenry
What I mean by value of a node is the NODE_TEXT or the string between the
<MyTag>MyValue</MyTag>
I consider the value of the MyTag element to be "MyValue".
That is one example, with MSXML you can simply use
elementNode.text
to get that kind of value.
But XML allows much more complex structures e.g.
<element>Some text
<child>some child text</child>
more text
<child>more child text</child>
</element>
and you have not explained what the value of the element named 'element'
is in this case.
--
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Loading...