Extract values from child XML tag that has the same parent tag using python

I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').

A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

    </RollNumber>

</RollNumbers>

I've attached my script below:

import arcpy,sys,os,xml.dom.minidom



arcpy.env.overwriteOutput = True



fname = arcpy.GetParameterAsText(0)

fxml = open(fname, 'r')



if fxml != None:

    XMLData = fxml.read()

    fxml.close()



dom = xml.dom.minidom.parseString(XMLData)

node = dom.documentElement



rollTag = dom.getElementsByTagName('RollNumber')



RollNums = 

for RollNumber in rollTag:

    nodes = RollNumber.childNodes

    for node in nodes:

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)



rolllen = len(RollNums)

arcpy.AddMessage(rolllen)

asked Nov 21 '18 at 20:16

DJB

add a comment |

I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').

A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

    </RollNumber>

</RollNumbers>

I've attached my script below:

import arcpy,sys,os,xml.dom.minidom



arcpy.env.overwriteOutput = True



fname = arcpy.GetParameterAsText(0)

fxml = open(fname, 'r')



if fxml != None:

    XMLData = fxml.read()

    fxml.close()



dom = xml.dom.minidom.parseString(XMLData)

node = dom.documentElement



rollTag = dom.getElementsByTagName('RollNumber')



RollNums = 

for RollNumber in rollTag:

    nodes = RollNumber.childNodes

    for node in nodes:

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)



rolllen = len(RollNums)

arcpy.AddMessage(rolllen)

asked Nov 21 '18 at 20:16

DJB

add a comment |

I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').

A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

    </RollNumber>

</RollNumbers>

I've attached my script below:

import arcpy,sys,os,xml.dom.minidom



arcpy.env.overwriteOutput = True



fname = arcpy.GetParameterAsText(0)

fxml = open(fname, 'r')



if fxml != None:

    XMLData = fxml.read()

    fxml.close()



dom = xml.dom.minidom.parseString(XMLData)

node = dom.documentElement



rollTag = dom.getElementsByTagName('RollNumber')



RollNums = 

for RollNumber in rollTag:

    nodes = RollNumber.childNodes

    for node in nodes:

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)



rolllen = len(RollNums)

arcpy.AddMessage(rolllen)

asked Nov 21 '18 at 20:16

DJB

I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').

A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

    </RollNumber>

</RollNumbers>

I've attached my script below:

import arcpy,sys,os,xml.dom.minidom



arcpy.env.overwriteOutput = True



fname = arcpy.GetParameterAsText(0)

fxml = open(fname, 'r')



if fxml != None:

    XMLData = fxml.read()

    fxml.close()



dom = xml.dom.minidom.parseString(XMLData)

node = dom.documentElement



rollTag = dom.getElementsByTagName('RollNumber')



RollNums = 

for RollNumber in rollTag:

    nodes = RollNumber.childNodes

    for node in nodes:

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)



rolllen = len(RollNums)

arcpy.AddMessage(rolllen)

python xml

asked Nov 21 '18 at 20:16

DJB

asked Nov 21 '18 at 20:16

DJB

asked Nov 21 '18 at 20:16

DJB

asked Nov 21 '18 at 20:16

DJB

asked Nov 21 '18 at 20:16

DJB

add a comment |

1 Answer
1

active

oldest

votes

The problem here is that you are assuming all child nodes of a RollNumber element are Text nodes. However, the parent RollNumber element in your XML document has another element as one of its children and you cannot return the data for an element.

One way to handle the problem is to replace the line

rollTag = dom.getElementsByTagName('RollNumber')

with

rollTag = [ element for element in dom.getElementsByTagName('RollNumber')

                     if not element.getElementsByTagName('RollNumber') ]

dom.getElementsByTagName('RollNumber') returns all elements with the tag name RollNumber. For each such element we find the child elements that also have the name RollNumber. If any are found then element is a parent node and is excluded from the list returned to rollTag. rollTag thus ends up containing only the child RollNumber nodes.

Alternatively, you could replace the lines

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)

with

        if isinstance(node, xml.dom.minidom.Text) and node.data.strip():

            arn = node.data[:15]

            arcpy.AddMessage(arn)

            RollNums.append(arn)

This checks that the child node of the RollNumber element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.

Both approaches should handle any number of nested RollNumber elements, provided the data you want to read is only ever in the innermost RollNumber element. They will behave differently if parent nodes also contain text, for example:

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

        ABCDEFG

    </RollNumber>

</RollNumbers>

The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

– DJB
Nov 22 '18 at 16:41

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419856%2fextract-values-from-child-xml-tag-that-has-the-same-parent-tag-using-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

One way to handle the problem is to replace the line

rollTag = dom.getElementsByTagName('RollNumber')

with

rollTag = [ element for element in dom.getElementsByTagName('RollNumber')

                     if not element.getElementsByTagName('RollNumber') ]

Alternatively, you could replace the lines

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)

with

        if isinstance(node, xml.dom.minidom.Text) and node.data.strip():

            arn = node.data[:15]

            arcpy.AddMessage(arn)

            RollNums.append(arn)

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

        ABCDEFG

    </RollNumber>

</RollNumbers>

The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

– DJB
Nov 22 '18 at 16:41

add a comment |

One way to handle the problem is to replace the line

rollTag = dom.getElementsByTagName('RollNumber')

with

rollTag = [ element for element in dom.getElementsByTagName('RollNumber')

                     if not element.getElementsByTagName('RollNumber') ]

Alternatively, you could replace the lines

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)

with

        if isinstance(node, xml.dom.minidom.Text) and node.data.strip():

            arn = node.data[:15]

            arcpy.AddMessage(arn)

            RollNums.append(arn)

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

        ABCDEFG

    </RollNumber>

</RollNumbers>

The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

– DJB
Nov 22 '18 at 16:41

add a comment |

One way to handle the problem is to replace the line

rollTag = dom.getElementsByTagName('RollNumber')

with

rollTag = [ element for element in dom.getElementsByTagName('RollNumber')

                     if not element.getElementsByTagName('RollNumber') ]

Alternatively, you could replace the lines

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)

with

        if isinstance(node, xml.dom.minidom.Text) and node.data.strip():

            arn = node.data[:15]

            arcpy.AddMessage(arn)

            RollNums.append(arn)

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

        ABCDEFG

    </RollNumber>

</RollNumbers>

The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

One way to handle the problem is to replace the line

rollTag = dom.getElementsByTagName('RollNumber')

with

rollTag = [ element for element in dom.getElementsByTagName('RollNumber')

                     if not element.getElementsByTagName('RollNumber') ]

Alternatively, you could replace the lines

        arn = node.data[:15]

        arcpy.AddMessage(arn)

        RollNums.append(arn)

with

        if isinstance(node, xml.dom.minidom.Text) and node.data.strip():

            arn = node.data[:15]

            arcpy.AddMessage(arn)

            RollNums.append(arn)

<RollNumbers>

    <RollNumber>

        <RollNumber>1234567891011120000</RollNumber>

        ABCDEFG

    </RollNumber>

</RollNumbers>

The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

answered Nov 21 '18 at 21:32

Luke Woodward

45.8k126688

Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

– DJB
Nov 22 '18 at 16:41

add a comment |

Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

– DJB
Nov 22 '18 at 16:41

Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

– DJB
Nov 22 '18 at 16:41

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk