Extract values from child XML tag that has the same parent tag using python












0















I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').



A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.



<RollNumbers>
<RollNumber>
<RollNumber>1234567891011120000</RollNumber>
</RollNumber>
</RollNumbers>


I've attached my script below:



import arcpy,sys,os,xml.dom.minidom

arcpy.env.overwriteOutput = True

fname = arcpy.GetParameterAsText(0)
fxml = open(fname, 'r')

if fxml != None:
XMLData = fxml.read()
fxml.close()

dom = xml.dom.minidom.parseString(XMLData)
node = dom.documentElement

rollTag = dom.getElementsByTagName('RollNumber')

RollNums =
for RollNumber in rollTag:
nodes = RollNumber.childNodes
for node in nodes:
arn = node.data[:15]
arcpy.AddMessage(arn)
RollNums.append(arn)

rolllen = len(RollNums)
arcpy.AddMessage(rolllen)









share|improve this question



























    0















    I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').



    A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.



    <RollNumbers>
    <RollNumber>
    <RollNumber>1234567891011120000</RollNumber>
    </RollNumber>
    </RollNumbers>


    I've attached my script below:



    import arcpy,sys,os,xml.dom.minidom

    arcpy.env.overwriteOutput = True

    fname = arcpy.GetParameterAsText(0)
    fxml = open(fname, 'r')

    if fxml != None:
    XMLData = fxml.read()
    fxml.close()

    dom = xml.dom.minidom.parseString(XMLData)
    node = dom.documentElement

    rollTag = dom.getElementsByTagName('RollNumber')

    RollNums =
    for RollNumber in rollTag:
    nodes = RollNumber.childNodes
    for node in nodes:
    arn = node.data[:15]
    arcpy.AddMessage(arn)
    RollNums.append(arn)

    rolllen = len(RollNums)
    arcpy.AddMessage(rolllen)









    share|improve this question

























      0












      0








      0








      I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').



      A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.



      <RollNumbers>
      <RollNumber>
      <RollNumber>1234567891011120000</RollNumber>
      </RollNumber>
      </RollNumbers>


      I've attached my script below:



      import arcpy,sys,os,xml.dom.minidom

      arcpy.env.overwriteOutput = True

      fname = arcpy.GetParameterAsText(0)
      fxml = open(fname, 'r')

      if fxml != None:
      XMLData = fxml.read()
      fxml.close()

      dom = xml.dom.minidom.parseString(XMLData)
      node = dom.documentElement

      rollTag = dom.getElementsByTagName('RollNumber')

      RollNums =
      for RollNumber in rollTag:
      nodes = RollNumber.childNodes
      for node in nodes:
      arn = node.data[:15]
      arcpy.AddMessage(arn)
      RollNums.append(arn)

      rolllen = len(RollNums)
      arcpy.AddMessage(rolllen)









      share|improve this question














      I'm trying to extract roll numbers from an XML file using python. I used to be able to retrieve the appropriate element using getElementsByTagName('RollNumber').



      A parent tag with the very same child tag name has now been added to the XML generation. When I run the script I an error stating Element instance has no attribute 'data'.



      <RollNumbers>
      <RollNumber>
      <RollNumber>1234567891011120000</RollNumber>
      </RollNumber>
      </RollNumbers>


      I've attached my script below:



      import arcpy,sys,os,xml.dom.minidom

      arcpy.env.overwriteOutput = True

      fname = arcpy.GetParameterAsText(0)
      fxml = open(fname, 'r')

      if fxml != None:
      XMLData = fxml.read()
      fxml.close()

      dom = xml.dom.minidom.parseString(XMLData)
      node = dom.documentElement

      rollTag = dom.getElementsByTagName('RollNumber')

      RollNums =
      for RollNumber in rollTag:
      nodes = RollNumber.childNodes
      for node in nodes:
      arn = node.data[:15]
      arcpy.AddMessage(arn)
      RollNums.append(arn)

      rolllen = len(RollNums)
      arcpy.AddMessage(rolllen)






      python xml






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 21 '18 at 20:16









      DJBDJB

      31




      31
























          1 Answer
          1






          active

          oldest

          votes


















          0














          The problem here is that you are assuming all child nodes of a RollNumber element are Text nodes. However, the parent RollNumber element in your XML document has another element as one of its children and you cannot return the data for an element.



          One way to handle the problem is to replace the line



          rollTag = dom.getElementsByTagName('RollNumber')


          with



          rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
          if not element.getElementsByTagName('RollNumber') ]


          dom.getElementsByTagName('RollNumber') returns all elements with the tag name RollNumber. For each such element we find the child elements that also have the name RollNumber. If any are found then element is a parent node and is excluded from the list returned to rollTag. rollTag thus ends up containing only the child RollNumber nodes.



          Alternatively, you could replace the lines



                  arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          with



                  if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
          arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          This checks that the child node of the RollNumber element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.



          Both approaches should handle any number of nested RollNumber elements, provided the data you want to read is only ever in the innermost RollNumber element. They will behave differently if parent nodes also contain text, for example:



          <RollNumbers>
          <RollNumber>
          <RollNumber>1234567891011120000</RollNumber>
          ABCDEFG
          </RollNumber>
          </RollNumbers>


          The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.






          share|improve this answer
























          • Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

            – DJB
            Nov 22 '18 at 16:41












          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419856%2fextract-values-from-child-xml-tag-that-has-the-same-parent-tag-using-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          The problem here is that you are assuming all child nodes of a RollNumber element are Text nodes. However, the parent RollNumber element in your XML document has another element as one of its children and you cannot return the data for an element.



          One way to handle the problem is to replace the line



          rollTag = dom.getElementsByTagName('RollNumber')


          with



          rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
          if not element.getElementsByTagName('RollNumber') ]


          dom.getElementsByTagName('RollNumber') returns all elements with the tag name RollNumber. For each such element we find the child elements that also have the name RollNumber. If any are found then element is a parent node and is excluded from the list returned to rollTag. rollTag thus ends up containing only the child RollNumber nodes.



          Alternatively, you could replace the lines



                  arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          with



                  if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
          arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          This checks that the child node of the RollNumber element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.



          Both approaches should handle any number of nested RollNumber elements, provided the data you want to read is only ever in the innermost RollNumber element. They will behave differently if parent nodes also contain text, for example:



          <RollNumbers>
          <RollNumber>
          <RollNumber>1234567891011120000</RollNumber>
          ABCDEFG
          </RollNumber>
          </RollNumbers>


          The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.






          share|improve this answer
























          • Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

            – DJB
            Nov 22 '18 at 16:41
















          0














          The problem here is that you are assuming all child nodes of a RollNumber element are Text nodes. However, the parent RollNumber element in your XML document has another element as one of its children and you cannot return the data for an element.



          One way to handle the problem is to replace the line



          rollTag = dom.getElementsByTagName('RollNumber')


          with



          rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
          if not element.getElementsByTagName('RollNumber') ]


          dom.getElementsByTagName('RollNumber') returns all elements with the tag name RollNumber. For each such element we find the child elements that also have the name RollNumber. If any are found then element is a parent node and is excluded from the list returned to rollTag. rollTag thus ends up containing only the child RollNumber nodes.



          Alternatively, you could replace the lines



                  arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          with



                  if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
          arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          This checks that the child node of the RollNumber element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.



          Both approaches should handle any number of nested RollNumber elements, provided the data you want to read is only ever in the innermost RollNumber element. They will behave differently if parent nodes also contain text, for example:



          <RollNumbers>
          <RollNumber>
          <RollNumber>1234567891011120000</RollNumber>
          ABCDEFG
          </RollNumber>
          </RollNumbers>


          The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.






          share|improve this answer
























          • Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

            – DJB
            Nov 22 '18 at 16:41














          0












          0








          0







          The problem here is that you are assuming all child nodes of a RollNumber element are Text nodes. However, the parent RollNumber element in your XML document has another element as one of its children and you cannot return the data for an element.



          One way to handle the problem is to replace the line



          rollTag = dom.getElementsByTagName('RollNumber')


          with



          rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
          if not element.getElementsByTagName('RollNumber') ]


          dom.getElementsByTagName('RollNumber') returns all elements with the tag name RollNumber. For each such element we find the child elements that also have the name RollNumber. If any are found then element is a parent node and is excluded from the list returned to rollTag. rollTag thus ends up containing only the child RollNumber nodes.



          Alternatively, you could replace the lines



                  arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          with



                  if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
          arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          This checks that the child node of the RollNumber element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.



          Both approaches should handle any number of nested RollNumber elements, provided the data you want to read is only ever in the innermost RollNumber element. They will behave differently if parent nodes also contain text, for example:



          <RollNumbers>
          <RollNumber>
          <RollNumber>1234567891011120000</RollNumber>
          ABCDEFG
          </RollNumber>
          </RollNumbers>


          The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.






          share|improve this answer













          The problem here is that you are assuming all child nodes of a RollNumber element are Text nodes. However, the parent RollNumber element in your XML document has another element as one of its children and you cannot return the data for an element.



          One way to handle the problem is to replace the line



          rollTag = dom.getElementsByTagName('RollNumber')


          with



          rollTag = [ element for element in dom.getElementsByTagName('RollNumber')
          if not element.getElementsByTagName('RollNumber') ]


          dom.getElementsByTagName('RollNumber') returns all elements with the tag name RollNumber. For each such element we find the child elements that also have the name RollNumber. If any are found then element is a parent node and is excluded from the list returned to rollTag. rollTag thus ends up containing only the child RollNumber nodes.



          Alternatively, you could replace the lines



                  arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          with



                  if isinstance(node, xml.dom.minidom.Text) and node.data.strip():
          arn = node.data[:15]
          arcpy.AddMessage(arn)
          RollNums.append(arn)


          This checks that the child node of the RollNumber element is a Text node and also that it contains something other than whitespace. In your sample XML document, your parent RollNumber element has two child nodes that are both Text nodes containing only whitespace, but you want to ignore them.



          Both approaches should handle any number of nested RollNumber elements, provided the data you want to read is only ever in the innermost RollNumber element. They will behave differently if parent nodes also contain text, for example:



          <RollNumbers>
          <RollNumber>
          <RollNumber>1234567891011120000</RollNumber>
          ABCDEFG
          </RollNumber>
          </RollNumbers>


          The first approach would return only 123456789101112 but the second approach would also pick up the text ABCDEFG.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 21:32









          Luke WoodwardLuke Woodward

          45.8k126688




          45.8k126688













          • Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

            – DJB
            Nov 22 '18 at 16:41



















          • Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

            – DJB
            Nov 22 '18 at 16:41

















          Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

          – DJB
          Nov 22 '18 at 16:41





          Hey Luke, I used your first approach and it worked perfectly! Thank you very much for your help. Cheers!

          – DJB
          Nov 22 '18 at 16:41




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419856%2fextract-values-from-child-xml-tag-that-has-the-same-parent-tag-using-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Guess what letter conforming each word

          Port of Spain

          Run scheduled task as local user group (not BUILTIN)