Extract values from child XML tag that has the same parent tag using python
I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').
A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
</RollNumber>
</RollNumbers>
I've attached my script below:
import arcpy,sys,os,xml.dom.minidom
arcpy.env.overwriteOutput = True
fname = arcpy.GetParameterAsText(0)
fxml = open(fname, 'r')
if fxml != None:
XMLData = fxml.read()
fxml.close()
dom = xml.dom.minidom.parseString(XMLData)
node = dom.documentElement
rollTag = dom.getElementsByTagName('RollNumber')
RollNums =
for RollNumber in rollTag:
nodes = RollNumber.childNodes
for node in nodes:
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
rolllen = len(RollNums)
arcpy.AddMessage(rolllen)
python xml
add a comment |
I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').
A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
</RollNumber>
</RollNumbers>
I've attached my script below:
import arcpy,sys,os,xml.dom.minidom
arcpy.env.overwriteOutput = True
fname = arcpy.GetParameterAsText(0)
fxml = open(fname, 'r')
if fxml != None:
XMLData = fxml.read()
fxml.close()
dom = xml.dom.minidom.parseString(XMLData)
node = dom.documentElement
rollTag = dom.getElementsByTagName('RollNumber')
RollNums =
for RollNumber in rollTag:
nodes = RollNumber.childNodes
for node in nodes:
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
rolllen = len(RollNums)
arcpy.AddMessage(rolllen)
python xml
add a comment |
I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').
A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
</RollNumber>
</RollNumbers>
I've attached my script below:
import arcpy,sys,os,xml.dom.minidom
arcpy.env.overwriteOutput = True
fname = arcpy.GetParameterAsText(0)
fxml = open(fname, 'r')
if fxml != None:
XMLData = fxml.read()
fxml.close()
dom = xml.dom.minidom.parseString(XMLData)
node = dom.documentElement
rollTag = dom.getElementsByTagName('RollNumber')
RollNums =
for RollNumber in rollTag:
nodes = RollNumber.childNodes
for node in nodes:
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
rolllen = len(RollNums)
arcpy.AddMessage(rolllen)
python xml
I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').
A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
</RollNumber>
</RollNumbers>
I've attached my script below:
import arcpy,sys,os,xml.dom.minidom
arcpy.env.overwriteOutput = True
fname = arcpy.GetParameterAsText(0)
fxml = open(fname, 'r')
if fxml != None:
XMLData = fxml.read()
fxml.close()
dom = xml.dom.minidom.parseString(XMLData)
node = dom.documentElement
rollTag = dom.getElementsByTagName('RollNumber')
RollNums =
for RollNumber in rollTag:
nodes = RollNumber.childNodes
for node in nodes:
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
rolllen = len(RollNums)
arcpy.AddMessage(rolllen)
python xml
python xml
asked Nov 21 '18 at 20:16
DJBDJB
31
31
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The problem here is that you are assuming all child nodes of a RollNumber
element are Text nodes. However, the parent RollNumber
element in your XML document has another element as one of its children and you cannot return the data
for an element.
One way to handle the problem is to replace the line
rollTag = dom.getElementsByTagName('RollNumber')
with
rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
if not element.getElementsByTagName('RollNumber') ]
dom.getElementsByTagName('RollNumber')
returns all elements with the tag name RollNumber
. For each such element we find the child elements that also have the name RollNumber
. If any are found then element
is a parent node and is excluded from the list returned to rollTag
. rollTag
thus ends up containing only the child RollNumber
nodes.
Alternatively, you could replace the lines
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
with
if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
This checks that the child node of the RollNumber
element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber
element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.
Both approaches should handle any number of nested RollNumber
elements, provided the data you want to read is only ever in the innermost RollNumber
element. They will behave differently if parent nodes also contain text, for example:
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
ABCDEFG
</RollNumber>
</RollNumbers>
The first approach would return only 123456789101112
but the second approach would also pick up the text ABCDEFG
.
Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!
– DJB
Nov 22 '18 at 16:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419856%2fextract-values-from-child-xml-tag-that-has-the-same-parent-tag-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The problem here is that you are assuming all child nodes of a RollNumber
element are Text nodes. However, the parent RollNumber
element in your XML document has another element as one of its children and you cannot return the data
for an element.
One way to handle the problem is to replace the line
rollTag = dom.getElementsByTagName('RollNumber')
with
rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
if not element.getElementsByTagName('RollNumber') ]
dom.getElementsByTagName('RollNumber')
returns all elements with the tag name RollNumber
. For each such element we find the child elements that also have the name RollNumber
. If any are found then element
is a parent node and is excluded from the list returned to rollTag
. rollTag
thus ends up containing only the child RollNumber
nodes.
Alternatively, you could replace the lines
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
with
if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
This checks that the child node of the RollNumber
element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber
element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.
Both approaches should handle any number of nested RollNumber
elements, provided the data you want to read is only ever in the innermost RollNumber
element. They will behave differently if parent nodes also contain text, for example:
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
ABCDEFG
</RollNumber>
</RollNumbers>
The first approach would return only 123456789101112
but the second approach would also pick up the text ABCDEFG
.
Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!
– DJB
Nov 22 '18 at 16:41
add a comment |
The problem here is that you are assuming all child nodes of a RollNumber
element are Text nodes. However, the parent RollNumber
element in your XML document has another element as one of its children and you cannot return the data
for an element.
One way to handle the problem is to replace the line
rollTag = dom.getElementsByTagName('RollNumber')
with
rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
if not element.getElementsByTagName('RollNumber') ]
dom.getElementsByTagName('RollNumber')
returns all elements with the tag name RollNumber
. For each such element we find the child elements that also have the name RollNumber
. If any are found then element
is a parent node and is excluded from the list returned to rollTag
. rollTag
thus ends up containing only the child RollNumber
nodes.
Alternatively, you could replace the lines
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
with
if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
This checks that the child node of the RollNumber
element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber
element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.
Both approaches should handle any number of nested RollNumber
elements, provided the data you want to read is only ever in the innermost RollNumber
element. They will behave differently if parent nodes also contain text, for example:
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
ABCDEFG
</RollNumber>
</RollNumbers>
The first approach would return only 123456789101112
but the second approach would also pick up the text ABCDEFG
.
Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!
– DJB
Nov 22 '18 at 16:41
add a comment |
The problem here is that you are assuming all child nodes of a RollNumber
element are Text nodes. However, the parent RollNumber
element in your XML document has another element as one of its children and you cannot return the data
for an element.
One way to handle the problem is to replace the line
rollTag = dom.getElementsByTagName('RollNumber')
with
rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
if not element.getElementsByTagName('RollNumber') ]
dom.getElementsByTagName('RollNumber')
returns all elements with the tag name RollNumber
. For each such element we find the child elements that also have the name RollNumber
. If any are found then element
is a parent node and is excluded from the list returned to rollTag
. rollTag
thus ends up containing only the child RollNumber
nodes.
Alternatively, you could replace the lines
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
with
if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
This checks that the child node of the RollNumber
element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber
element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.
Both approaches should handle any number of nested RollNumber
elements, provided the data you want to read is only ever in the innermost RollNumber
element. They will behave differently if parent nodes also contain text, for example:
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
ABCDEFG
</RollNumber>
</RollNumbers>
The first approach would return only 123456789101112
but the second approach would also pick up the text ABCDEFG
.
The problem here is that you are assuming all child nodes of a RollNumber
element are Text nodes. However, the parent RollNumber
element in your XML document has another element as one of its children and you cannot return the data
for an element.
One way to handle the problem is to replace the line
rollTag = dom.getElementsByTagName('RollNumber')
with
rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
if not element.getElementsByTagName('RollNumber') ]
dom.getElementsByTagName('RollNumber')
returns all elements with the tag name RollNumber
. For each such element we find the child elements that also have the name RollNumber
. If any are found then element
is a parent node and is excluded from the list returned to rollTag
. rollTag
thus ends up containing only the child RollNumber
nodes.
Alternatively, you could replace the lines
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
with
if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)
This checks that the child node of the RollNumber
element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber
element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.
Both approaches should handle any number of nested RollNumber
elements, provided the data you want to read is only ever in the innermost RollNumber
element. They will behave differently if parent nodes also contain text, for example:
<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
ABCDEFG
</RollNumber>
</RollNumbers>
The first approach would return only 123456789101112
but the second approach would also pick up the text ABCDEFG
.
answered Nov 21 '18 at 21:32
Luke WoodwardLuke Woodward
45.8k126688
45.8k126688
Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!
– DJB
Nov 22 '18 at 16:41
add a comment |
Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!
– DJB
Nov 22 '18 at 16:41
Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!
– DJB
Nov 22 '18 at 16:41
Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!
– DJB
Nov 22 '18 at 16:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419856%2fextract-values-from-child-xml-tag-that-has-the-same-parent-tag-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown