PHP Read and Parse XML like File












0















We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.




Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62



Warning: simplexml_load_string(): s;S PERSPECTIVE


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62



Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62



Warning: simplexml_load_string(): R TO THE EDITOR


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62




When i looked at the file it seems like an XML file. How can i parse it with PHP?.



The code i am using is:



$file = file_get_contents('xyz.0');

$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character

//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);


Sample XML Code from file:



<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements&mdash;Where Art Thou&quest;</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris&commat;wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79&ndash;80</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT&apos;S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy&quest;</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3&ndash;12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3&ndash;12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https:&sol;&sol;www.uspreventiveservicestaskforce.org&sol;Page&sol;Document&sol;RecommendationStatementFinal&sol;high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https:&sol;&sol;www.cdc.gov&sol;bloodpressure&sol;facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169&ndash;1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169&ndash;1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM&apos;s Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697&ndash;716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697&ndash;716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77&ndash;80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77&ndash;80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192&ndash;1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192&ndash;1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263&ndash;265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263&ndash;265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 &ldquo;Spring Is the Time of Plans and Projects&rdquo;</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer&commat;umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81&ndash;82</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>


You can download the xml file from here.



Thank you



EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.



EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.










share|improve this question




















  • 1





    Thats NOT an XML file. I think thats a SAP specific tag

    – RiggsFolly
    Nov 16 '18 at 13:29













  • I'm not sure how you're getting errors about DOMDocument::loadXML when you say you aren't calling that method

    – iainn
    Nov 16 '18 at 13:32






  • 1





    Possible duplicate of XML parser error: entity not defined

    – Mohammad
    Nov 16 '18 at 13:32











  • @RiggsFolly do you have any idea how to parse this file?

    – Ben Perry
    Nov 16 '18 at 13:52











  • SHort of looking for a library to help, no

    – RiggsFolly
    Nov 16 '18 at 13:55
















0















We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.




Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62



Warning: simplexml_load_string(): s;S PERSPECTIVE


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62



Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62



Warning: simplexml_load_string(): R TO THE EDITOR


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62




When i looked at the file it seems like an XML file. How can i parse it with PHP?.



The code i am using is:



$file = file_get_contents('xyz.0');

$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character

//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);


Sample XML Code from file:



<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements&mdash;Where Art Thou&quest;</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris&commat;wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79&ndash;80</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT&apos;S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy&quest;</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3&ndash;12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3&ndash;12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https:&sol;&sol;www.uspreventiveservicestaskforce.org&sol;Page&sol;Document&sol;RecommendationStatementFinal&sol;high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https:&sol;&sol;www.cdc.gov&sol;bloodpressure&sol;facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169&ndash;1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169&ndash;1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM&apos;s Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697&ndash;716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697&ndash;716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77&ndash;80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77&ndash;80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192&ndash;1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192&ndash;1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263&ndash;265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263&ndash;265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 &ldquo;Spring Is the Time of Plans and Projects&rdquo;</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer&commat;umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81&ndash;82</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>


You can download the xml file from here.



Thank you



EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.



EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.










share|improve this question




















  • 1





    Thats NOT an XML file. I think thats a SAP specific tag

    – RiggsFolly
    Nov 16 '18 at 13:29













  • I'm not sure how you're getting errors about DOMDocument::loadXML when you say you aren't calling that method

    – iainn
    Nov 16 '18 at 13:32






  • 1





    Possible duplicate of XML parser error: entity not defined

    – Mohammad
    Nov 16 '18 at 13:32











  • @RiggsFolly do you have any idea how to parse this file?

    – Ben Perry
    Nov 16 '18 at 13:52











  • SHort of looking for a library to help, no

    – RiggsFolly
    Nov 16 '18 at 13:55














0












0








0








We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.




Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62



Warning: simplexml_load_string(): s;S PERSPECTIVE


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62



Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62



Warning: simplexml_load_string(): R TO THE EDITOR


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62




When i looked at the file it seems like an XML file. How can i parse it with PHP?.



The code i am using is:



$file = file_get_contents('xyz.0');

$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character

//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);


Sample XML Code from file:



<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements&mdash;Where Art Thou&quest;</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris&commat;wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79&ndash;80</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT&apos;S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy&quest;</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3&ndash;12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3&ndash;12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https:&sol;&sol;www.uspreventiveservicestaskforce.org&sol;Page&sol;Document&sol;RecommendationStatementFinal&sol;high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https:&sol;&sol;www.cdc.gov&sol;bloodpressure&sol;facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169&ndash;1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169&ndash;1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM&apos;s Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697&ndash;716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697&ndash;716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77&ndash;80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77&ndash;80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192&ndash;1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192&ndash;1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263&ndash;265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263&ndash;265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 &ldquo;Spring Is the Time of Plans and Projects&rdquo;</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer&commat;umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81&ndash;82</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>


You can download the xml file from here.



Thank you



EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.



EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.










share|improve this question
















We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.




Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62



Warning: simplexml_load_string(): s;S PERSPECTIVE


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62



Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62



Warning: simplexml_load_string(): R TO THE EDITOR


Warning: simplexml_load_string(): ^ in *** on line 62



Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62




When i looked at the file it seems like an XML file. How can i parse it with PHP?.



The code i am using is:



$file = file_get_contents('xyz.0');

$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character

//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);


Sample XML Code from file:



<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements&mdash;Where Art Thou&quest;</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris&commat;wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79&ndash;80</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT&apos;S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy&quest;</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3&ndash;12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3&ndash;12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https:&sol;&sol;www.uspreventiveservicestaskforce.org&sol;Page&sol;Document&sol;RecommendationStatementFinal&sol;high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https:&sol;&sol;www.cdc.gov&sol;bloodpressure&sol;facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169&ndash;1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169&ndash;1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM&apos;s Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697&ndash;716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697&ndash;716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77&ndash;80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77&ndash;80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192&ndash;1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192&ndash;1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263&ndash;265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263&ndash;265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 &ldquo;Spring Is the Time of Plans and Projects&rdquo;</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer&commat;umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81&ndash;82</PG></SO> <CP>&copy; 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>


You can download the xml file from here.



Thank you



EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.



EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.







php xml xml-parsing simplexml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 13:40







Ben Perry

















asked Nov 16 '18 at 13:25









Ben PerryBen Perry

73




73








  • 1





    Thats NOT an XML file. I think thats a SAP specific tag

    – RiggsFolly
    Nov 16 '18 at 13:29













  • I'm not sure how you're getting errors about DOMDocument::loadXML when you say you aren't calling that method

    – iainn
    Nov 16 '18 at 13:32






  • 1





    Possible duplicate of XML parser error: entity not defined

    – Mohammad
    Nov 16 '18 at 13:32











  • @RiggsFolly do you have any idea how to parse this file?

    – Ben Perry
    Nov 16 '18 at 13:52











  • SHort of looking for a library to help, no

    – RiggsFolly
    Nov 16 '18 at 13:55














  • 1





    Thats NOT an XML file. I think thats a SAP specific tag

    – RiggsFolly
    Nov 16 '18 at 13:29













  • I'm not sure how you're getting errors about DOMDocument::loadXML when you say you aren't calling that method

    – iainn
    Nov 16 '18 at 13:32






  • 1





    Possible duplicate of XML parser error: entity not defined

    – Mohammad
    Nov 16 '18 at 13:32











  • @RiggsFolly do you have any idea how to parse this file?

    – Ben Perry
    Nov 16 '18 at 13:52











  • SHort of looking for a library to help, no

    – RiggsFolly
    Nov 16 '18 at 13:55








1




1





Thats NOT an XML file. I think thats a SAP specific tag

– RiggsFolly
Nov 16 '18 at 13:29







Thats NOT an XML file. I think thats a SAP specific tag

– RiggsFolly
Nov 16 '18 at 13:29















I'm not sure how you're getting errors about DOMDocument::loadXML when you say you aren't calling that method

– iainn
Nov 16 '18 at 13:32





I'm not sure how you're getting errors about DOMDocument::loadXML when you say you aren't calling that method

– iainn
Nov 16 '18 at 13:32




1




1





Possible duplicate of XML parser error: entity not defined

– Mohammad
Nov 16 '18 at 13:32





Possible duplicate of XML parser error: entity not defined

– Mohammad
Nov 16 '18 at 13:32













@RiggsFolly do you have any idea how to parse this file?

– Ben Perry
Nov 16 '18 at 13:52





@RiggsFolly do you have any idea how to parse this file?

– Ben Perry
Nov 16 '18 at 13:52













SHort of looking for a library to help, no

– RiggsFolly
Nov 16 '18 at 13:55





SHort of looking for a library to help, no

– RiggsFolly
Nov 16 '18 at 13:55












1 Answer
1






active

oldest

votes


















0














The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.



Replacing the & with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...> and converts it to <COVER ... />)...



$file = file_get_contents('20180400.xml');

$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);

// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");





share|improve this answer
























  • I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.

    – IMSoP
    Nov 19 '18 at 14:10











  • @IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.

    – Nigel Ren
    Nov 19 '18 at 14:17











  • Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.

    – IMSoP
    Nov 19 '18 at 14:31











  • @IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.

    – Nigel Ren
    Nov 19 '18 at 14:38











  • I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.

    – IMSoP
    Nov 19 '18 at 14:50











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338804%2fphp-read-and-parse-xml-like-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.



Replacing the & with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...> and converts it to <COVER ... />)...



$file = file_get_contents('20180400.xml');

$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);

// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");





share|improve this answer
























  • I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.

    – IMSoP
    Nov 19 '18 at 14:10











  • @IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.

    – Nigel Ren
    Nov 19 '18 at 14:17











  • Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.

    – IMSoP
    Nov 19 '18 at 14:31











  • @IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.

    – Nigel Ren
    Nov 19 '18 at 14:38











  • I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.

    – IMSoP
    Nov 19 '18 at 14:50
















0














The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.



Replacing the & with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...> and converts it to <COVER ... />)...



$file = file_get_contents('20180400.xml');

$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);

// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");





share|improve this answer
























  • I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.

    – IMSoP
    Nov 19 '18 at 14:10











  • @IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.

    – Nigel Ren
    Nov 19 '18 at 14:17











  • Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.

    – IMSoP
    Nov 19 '18 at 14:31











  • @IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.

    – Nigel Ren
    Nov 19 '18 at 14:38











  • I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.

    – IMSoP
    Nov 19 '18 at 14:50














0












0








0







The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.



Replacing the & with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...> and converts it to <COVER ... />)...



$file = file_get_contents('20180400.xml');

$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);

// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");





share|improve this answer













The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.



Replacing the & with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...> and converts it to <COVER ... />)...



$file = file_get_contents('20180400.xml');

$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);

// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 17 '18 at 18:51









Nigel RenNigel Ren

25.8k61832




25.8k61832













  • I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.

    – IMSoP
    Nov 19 '18 at 14:10











  • @IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.

    – Nigel Ren
    Nov 19 '18 at 14:17











  • Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.

    – IMSoP
    Nov 19 '18 at 14:31











  • @IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.

    – Nigel Ren
    Nov 19 '18 at 14:38











  • I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.

    – IMSoP
    Nov 19 '18 at 14:50



















  • I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.

    – IMSoP
    Nov 19 '18 at 14:10











  • @IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.

    – Nigel Ren
    Nov 19 '18 at 14:17











  • Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.

    – IMSoP
    Nov 19 '18 at 14:31











  • @IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.

    – Nigel Ren
    Nov 19 '18 at 14:38











  • I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.

    – IMSoP
    Nov 19 '18 at 14:50

















I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.

– IMSoP
Nov 19 '18 at 14:10





I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.

– IMSoP
Nov 19 '18 at 14:10













@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.

– Nigel Ren
Nov 19 '18 at 14:17





@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.

– Nigel Ren
Nov 19 '18 at 14:17













Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.

– IMSoP
Nov 19 '18 at 14:31





Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.

– IMSoP
Nov 19 '18 at 14:31













@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.

– Nigel Ren
Nov 19 '18 at 14:38





@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.

– Nigel Ren
Nov 19 '18 at 14:38













I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.

– IMSoP
Nov 19 '18 at 14:50





I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.

– IMSoP
Nov 19 '18 at 14:50


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338804%2fphp-read-and-parse-xml-like-file%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

鏡平學校

ꓛꓣだゔៀៅຸ໢ທຮ໕໒ ,ໂ'໥໓າ໼ឨឲ៵៭ៈゎゔit''䖳𥁄卿' ☨₤₨こゎもょの;ꜹꟚꞖꞵꟅꞛေၦေɯ,ɨɡ𛃵𛁹ޝ޳ޠ޾,ޤޒޯ޾𫝒𫠁သ𛅤チョ'サノބޘދ𛁐ᶿᶇᶀᶋᶠ㨑㽹⻮ꧬ꧹؍۩وَؠ㇕㇃㇪ ㇦㇋㇋ṜẰᵡᴠ 軌ᵕ搜۳ٰޗޮ޷ސޯ𫖾𫅀ल, ꙭ꙰ꚅꙁꚊꞻꝔ꟠Ꝭㄤﺟޱސꧨꧼ꧴ꧯꧽ꧲ꧯ'⽹⽭⾁⿞⼳⽋២៩ញណើꩯꩤ꩸ꩮᶻᶺᶧᶂ𫳲𫪭𬸄𫵰𬖩𬫣𬊉ၲ𛅬㕦䬺𫝌𫝼,,𫟖𫞽ហៅ஫㆔ాఆఅꙒꚞꙍ,Ꙟ꙱エ ,ポテ,フࢰࢯ𫟠𫞶 𫝤𫟠ﺕﹱﻜﻣ𪵕𪭸𪻆𪾩𫔷ġ,ŧآꞪ꟥,ꞔꝻ♚☹⛵𛀌ꬷꭞȄƁƪƬșƦǙǗdžƝǯǧⱦⱰꓕꓢႋ神 ဴ၀க௭எ௫ឫោ ' េㇷㇴㇼ神ㇸㇲㇽㇴㇼㇻㇸ'ㇸㇿㇸㇹㇰㆣꓚꓤ₡₧ ㄨㄟ㄂ㄖㄎ໗ツڒذ₶।ऩछएोञयूटक़कयँृी,冬'𛅢𛅥ㇱㇵㇶ𥄥𦒽𠣧𠊓𧢖𥞘𩔋цѰㄠſtʯʭɿʆʗʍʩɷɛ,əʏダヵㄐㄘR{gỚṖḺờṠṫảḙḭᴮᵏᴘᵀᵷᵕᴜᴏᵾq﮲ﲿﴽﭙ軌ﰬﶚﶧ﫲Ҝжюїкӈㇴffצּ﬘﭅﬈軌'ffistfflſtffतभफɳɰʊɲʎ𛁱𛁖𛁮𛀉 𛂯𛀞నఋŀŲ 𫟲𫠖𫞺ຆຆ ໹້໕໗ๆทԊꧢꧠ꧰ꓱ⿝⼑ŎḬẃẖỐẅ ,ờỰỈỗﮊDžȩꭏꭎꬻ꭮ꬿꭖꭥꭅ㇭神 ⾈ꓵꓑ⺄㄄ㄪㄙㄅㄇstA۵䞽ॶ𫞑𫝄㇉㇇゜軌𩜛𩳠Jﻺ‚Üမ႕ႌႊၐၸဓၞၞၡ៸wyvtᶎᶪᶹစဎ꣡꣰꣢꣤ٗ؋لㇳㇾㇻㇱ㆐㆔,,㆟Ⱶヤマފ޼ޝަݿݞݠݷݐ',ݘ,ݪݙݵ𬝉𬜁𫝨𫞘くせぉて¼óû×ó£…𛅑הㄙくԗԀ5606神45,神796'𪤻𫞧ꓐ㄁ㄘɥɺꓵꓲ3''7034׉ⱦⱠˆ“𫝋ȍ,ꩲ軌꩷ꩶꩧꩫఞ۔فڱێظペサ神ナᴦᵑ47 9238їﻂ䐊䔉㠸﬎ffiﬣ,לּᴷᴦᵛᵽ,ᴨᵤ ᵸᵥᴗᵈꚏꚉꚟ⻆rtǟƴ𬎎

Why https connections are so slow when debugging (stepping over) in Java?