PHP Read and Parse XML like File
We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.
Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62
Warning: simplexml_load_string(): s;S PERSPECTIVE
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62
Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62
Warning: simplexml_load_string(): R TO THE EDITOR
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62
When i looked at the file it seems like an XML file. How can i parse it with PHP?.
The code i am using is:
$file = file_get_contents('xyz.0');
$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character
//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
Sample XML Code from file:
<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements—Where Art Thou?</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris@wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79–80</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT'S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy?</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3–12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3–12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https://www.cdc.gov/bloodpressure/facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169–1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169–1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM's Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697–716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697–716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77–80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77–80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192–1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192–1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263–265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263–265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 “Spring Is the Time of Plans and Projects”</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer@umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81–82</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>
You can download the xml file from here.
Thank you
EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.
EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.
php xml xml-parsing simplexml
|
show 4 more comments
We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.
Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62
Warning: simplexml_load_string(): s;S PERSPECTIVE
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62
Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62
Warning: simplexml_load_string(): R TO THE EDITOR
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62
When i looked at the file it seems like an XML file. How can i parse it with PHP?.
The code i am using is:
$file = file_get_contents('xyz.0');
$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character
//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
Sample XML Code from file:
<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements—Where Art Thou?</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris@wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79–80</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT'S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy?</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3–12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3–12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https://www.cdc.gov/bloodpressure/facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169–1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169–1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM's Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697–716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697–716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77–80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77–80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192–1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192–1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263–265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263–265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 “Spring Is the Time of Plans and Projects”</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer@umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81–82</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>
You can download the xml file from here.
Thank you
EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.
EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.
php xml xml-parsing simplexml
1
Thats NOT an XML file. I think thats a SAP specific tag
– RiggsFolly
Nov 16 '18 at 13:29
I'm not sure how you're getting errors aboutDOMDocument::loadXML
when you say you aren't calling that method
– iainn
Nov 16 '18 at 13:32
1
Possible duplicate of XML parser error: entity not defined
– Mohammad
Nov 16 '18 at 13:32
@RiggsFolly do you have any idea how to parse this file?
– Ben Perry
Nov 16 '18 at 13:52
SHort of looking for a library to help, no
– RiggsFolly
Nov 16 '18 at 13:55
|
show 4 more comments
We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.
Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62
Warning: simplexml_load_string(): s;S PERSPECTIVE
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62
Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62
Warning: simplexml_load_string(): R TO THE EDITOR
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62
When i looked at the file it seems like an XML file. How can i parse it with PHP?.
The code i am using is:
$file = file_get_contents('xyz.0');
$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character
//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
Sample XML Code from file:
<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements—Where Art Thou?</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris@wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79–80</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT'S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy?</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3–12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3–12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https://www.cdc.gov/bloodpressure/facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169–1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169–1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM's Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697–716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697–716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77–80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77–80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192–1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192–1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263–265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263–265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 “Spring Is the Time of Plans and Projects”</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer@umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81–82</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>
You can download the xml file from here.
Thank you
EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.
EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.
php xml xml-parsing simplexml
We are indexing our journals with PHP. We have journal meta data files. I am trying to parse it with PHP SimpleXML but i am getting lots of errors.
Warning: simplexml_load_string(): Entity: line 19: parser error :
Opening and ending tag mismatch: XUI line 19 and BB in *** on line 62
Warning: simplexml_load_string(): s;S PERSPECTIVE
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 44: parser error :
Opening and ending tag mismatch: BB line 4 and D in *** on line 62
Warning: simplexml_load_string(): 33rd ed. St. Louis, MO: Elsevier
Health Sciences; 2016.
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 61: parser error :
Opening and ending tag mismatch: XUI line 61 and BB in *** on line 62
Warning: simplexml_load_string(): R TO THE EDITOR
Warning: simplexml_load_string(): ^ in *** on line 62
Warning: simplexml_load_string(): Entity: line 74: parser error :
Opening and ending tag mismatch: BB line 46 and D in *** on line 62
When i looked at the file it seems like an XML file. How can i parse it with PHP?.
The code i am using is:
$file = file_get_contents('xyz.0');
$file = utf8_decode($file);
$file = str_replace("&", "", $file); //For problems with & character
//libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
Sample XML Code from file:
<!DOCTYPE dg SYSTEM "ovidbase.dtd"> <DG><COVER NAME="G1893697-201804000-00000"> <D AN="01893697-201804000-00001" V="2009.2F" FILE="G1893697-201804000-00001"> <BB> <TG> <TI>Oh Blood Pressure Measurements—Where Art Thou?</TI></TG> <BY> <PN><FN>G.</FN><MN>Stephen</MN><SN>Morris</SN><DEG>PT, PhD, FACSM</DEG></PN> <AF><P>President, Oncology Section of the APTA; and Professor, Department of Physical Therapy, Wingate University, Wingate, NC</P></AF> <BT><P><E T="B">Correspondence:</E> G. Stephen Morris, PT, PhD, FACSM, Department of Physical Therapy, Wingate University, 215 N. Camden Rd, Wingate, NC 28174 (<URL>s.morris@wingate.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>79–80</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>PRESIDENT'S PERSPECTIVE</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000118"></BB> <BD> <LV1><HD>&NA;</HD> <P>physical therapy?</P></LV1> <LV1><SG><SGN>G. Stephen Morris, PT, PhD, FACSM</SGN></SG></LV1></BD> <ED> <EDS><HD>REFERENCES</HD> <RF ID="R1-1">1. <JRF><DRF>Arena SK, Reyes A, Rolf M. Behaviors, and knowledge of outpatient physical therapists. Cardiopulm Phys Ther J. 2018;9:3–12.</DRF><PN><FN>SK</FN><SN>Arena</SN></PN><PN><FN>A</FN><SN>Reyes</SN></PN><PN><FN>M</FN><SN>Rolf</SN></PN><TI>Behaviors, and knowledge of outpatient physical therapists</TI><PB>Cardiopulm Phys Ther J</PB><DA><YR>2018</YR></DA><V>9</V><PG>3–12</PG></JRF></RF> <RF ID="R2-1">2. <URF>US Preventative Services Task Force. High blood pressure in adults: screening. https://www.uspreventiveservicestaskforce.org/Page/Document/RecommendationStatementFinal/high-blood-pressure-in-adults-screening. Accessed January 12, 2018.</URF></RF> <RF ID="R3-1">3. <URF>Centers for Disease Control and Prevention. High blood pressure fact sheet. https://www.cdc.gov/bloodpressure/facts.htm. Accessed January 12, 2018.</URF></RF> <RF ID="R4-1">4. <JRF><DRF>Lein DH Jr, Clark D, Graham C, Perez P, Morris D. A model to integrate health promotion and wellness in physical therapist practice: development and validation. Phys Ther. 2017;97(12):1169–1181.</DRF><PN><FN>DH</FN><SN>Lein</SN></PN><PN><FN>D</FN><SN>Clark</SN></PN><PN><FN>C</FN><SN>Graham</SN></PN><PN><FN>P</FN><SN>Perez</SN></PN><PN><FN>D</FN><SN>Morris</SN></PN><TI>A model to integrate health promotion and wellness in physical therapist practice: development and validation</TI><PB>Phys Ther</PB><DA><YR>2017</YR></DA><V>97</V><PG>1169–1181</PG></JRF></RF> <RF ID="R5-1">5. <URF>Riebe D, ed. ACSM's Guidelines for Exercise Testing and Prescription. 10th ed. Baltimore, Maryland: Wolters Kluwer; 2018.</URF></RF> <RF ID="R6-1">6. <JRF><DRF>Pickering TG, Hall JE, Appel LJ, et al Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research. Circulation. 2005;111(5):697–716.</DRF><PN><FN>TG</FN><SN>Pickering</SN></PN><PN><FN>JE</FN><SN>Hall</SN></PN><PN><FN>LJ</FN><SN>Appel</SN></PN><TI>Recommendations for blood pressure measurement in humans and experimental animals: part 1: blood pressure measurement in humans: a statement for professionals from the Subcommittee of Professional and Public Education of the American Heart Association Council on High Blood Pressure Research</TI><PB>Circulation</PB><DA><YR>2005</YR></DA><V>111</V><PG>697–716</PG></JRF></RF> <RF ID="R7-1">7. <JRF><DRF>Rabbia F, Testa E, Rabbia S, et al Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses. High Blood Press Cardiovasc Prev. 2013;20(2):77–80.</DRF><PN><FN>F</FN><SN>Rabbia</SN></PN><PN><FN>E</FN><SN>Testa</SN></PN><PN><FN>S</FN><SN>Rabbia</SN></PN><TI>Effectiveness of blood pressure educational and evaluation program for the improvement of measurement accuracy among nurses</TI><PB>High Blood Press Cardiovasc Prev</PB><DA><YR>2013</YR></DA><V>20</V><PG>77–80</PG></JRF></RF> <RF ID="R8-1">8. <JRF><DRF>Frese EM, Richter RR, Burlis TV. Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors. Phys Ther. 2002;82(12):1192–1200.</DRF><PN><FN>EM</FN><SN>Frese</SN></PN><PN><FN>RR</FN><SN>Richter</SN></PN><PN><FN>TV</FN><SN>Burlis</SN></PN><TI>Self-reported measurement of heart rate and blood pressure in patients by physical therapy clinical instructors</TI><PB>Phys Ther</PB><DA><YR>2002</YR></DA><V>82</V><PG>1192–1200</PG></JRF></RF> <RF ID="R9-1">9. <JRF><DRF>Mouhavar E, Salahudeen A, Yeh ETH. Hypertension in cancer patients. Tex Heart Inst J. 2011;38(3):263–265.</DRF><PN><FN>E</FN><SN>Mouhavar</SN></PN><PN><FN>A</FN><SN>Salahudeen</SN></PN><PN><FN>ETH</FN><SN>Yeh</SN></PN><TI>Hypertension in cancer patients</TI><PB>Tex Heart Inst J</PB><DA><YR>2011</YR></DA><V>38</V><PG>263–265</PG></JRF></RF> <RF ID="R10-1">10. <URF>Gahart BL, Nazareno AR, eds. Intravenous Medications: A Handbook for Nurses and Health Professionals. 33rd ed. St. Louis, MO: Elsevier Health Sciences;
2016.</URF></RF></EDS></ED></D> <D AN="01893697-201804000-00002" V="2009.2F" FILE="G1893697-201804000-00002"> <BB> <TG> <TI>In 2018 “Spring Is the Time of Plans and Projects”</TI></TG> <BY> <PN><FN>Lucinda</FN><MN>(Cindy)</MN><SN>Pfalzer</SN><DEG>PT, PhD, FACSM, FAPTA</DEG></PN> <AF><P>Editor of <E T="I">Oncology Rehabilitation</E> and Emeriti Professor, Physical Therapy Department, University of Michigan-Flint, Flint, MI</P></AF> <BT><P><E T="B">Correspondence:</E> Lucinda (Cindy) Pfalzer, PT, PhD, FACSM, FAPTA, Physical Therapy Department, University of Michigan-Flint, 2157 WSW Bldg, Flint, MI 48502 (<URL>cpfalzer@umich.edu</URL>).</P><P>The author declares no conflicts of interest.</P></BT></BY> <SO> <PB>Rehabilitation Oncology</PB> <ISN>2168-3808</ISN> <DA><MO>April</MO><YR>2018</YR></DA> <V>36</V> <IS><IP>2</IP></IS> <PG>81–82</PG></SO> <CP>© 2018 Oncology Section, APTA.</CP> <DT>LETTER TO THE EDITOR</DT><XUI XDB="pub-doi" UI="10.1097/01.REO.0000000000000119"></BB> <BD>
You can download the xml file from here.
Thank you
EDIT: This is different from the question XML parser error: entity not defined This files are generated years ago (2000s etc.). I am not generating this files, i only try to parse them and get the meta data.
EDIT 2: Sorry i am also trying to parse with Dom Parser and added the errors from it when i created the post. Now i added the SimpleXML errors.
php xml xml-parsing simplexml
php xml xml-parsing simplexml
edited Nov 20 '18 at 13:40
Ben Perry
asked Nov 16 '18 at 13:25
Ben PerryBen Perry
73
73
1
Thats NOT an XML file. I think thats a SAP specific tag
– RiggsFolly
Nov 16 '18 at 13:29
I'm not sure how you're getting errors aboutDOMDocument::loadXML
when you say you aren't calling that method
– iainn
Nov 16 '18 at 13:32
1
Possible duplicate of XML parser error: entity not defined
– Mohammad
Nov 16 '18 at 13:32
@RiggsFolly do you have any idea how to parse this file?
– Ben Perry
Nov 16 '18 at 13:52
SHort of looking for a library to help, no
– RiggsFolly
Nov 16 '18 at 13:55
|
show 4 more comments
1
Thats NOT an XML file. I think thats a SAP specific tag
– RiggsFolly
Nov 16 '18 at 13:29
I'm not sure how you're getting errors aboutDOMDocument::loadXML
when you say you aren't calling that method
– iainn
Nov 16 '18 at 13:32
1
Possible duplicate of XML parser error: entity not defined
– Mohammad
Nov 16 '18 at 13:32
@RiggsFolly do you have any idea how to parse this file?
– Ben Perry
Nov 16 '18 at 13:52
SHort of looking for a library to help, no
– RiggsFolly
Nov 16 '18 at 13:55
1
1
Thats NOT an XML file. I think thats a SAP specific tag
– RiggsFolly
Nov 16 '18 at 13:29
Thats NOT an XML file. I think thats a SAP specific tag
– RiggsFolly
Nov 16 '18 at 13:29
I'm not sure how you're getting errors about
DOMDocument::loadXML
when you say you aren't calling that method– iainn
Nov 16 '18 at 13:32
I'm not sure how you're getting errors about
DOMDocument::loadXML
when you say you aren't calling that method– iainn
Nov 16 '18 at 13:32
1
1
Possible duplicate of XML parser error: entity not defined
– Mohammad
Nov 16 '18 at 13:32
Possible duplicate of XML parser error: entity not defined
– Mohammad
Nov 16 '18 at 13:32
@RiggsFolly do you have any idea how to parse this file?
– Ben Perry
Nov 16 '18 at 13:52
@RiggsFolly do you have any idea how to parse this file?
– Ben Perry
Nov 16 '18 at 13:52
SHort of looking for a library to help, no
– RiggsFolly
Nov 16 '18 at 13:55
SHort of looking for a library to help, no
– RiggsFolly
Nov 16 '18 at 13:55
|
show 4 more comments
1 Answer
1
active
oldest
votes
The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.
Replacing the &
with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...>
and converts it to <COVER ... />
)...
$file = file_get_contents('20180400.xml');
$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);
// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");
I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.
– IMSoP
Nov 19 '18 at 14:10
@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.
– Nigel Ren
Nov 19 '18 at 14:17
Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.
– IMSoP
Nov 19 '18 at 14:31
@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.
– Nigel Ren
Nov 19 '18 at 14:38
I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.
– IMSoP
Nov 19 '18 at 14:50
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338804%2fphp-read-and-parse-xml-like-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.
Replacing the &
with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...>
and converts it to <COVER ... />
)...
$file = file_get_contents('20180400.xml');
$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);
// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");
I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.
– IMSoP
Nov 19 '18 at 14:10
@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.
– Nigel Ren
Nov 19 '18 at 14:17
Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.
– IMSoP
Nov 19 '18 at 14:31
@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.
– Nigel Ren
Nov 19 '18 at 14:38
I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.
– IMSoP
Nov 19 '18 at 14:50
|
show 2 more comments
The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.
Replacing the &
with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...>
and converts it to <COVER ... />
)...
$file = file_get_contents('20180400.xml');
$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);
// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");
I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.
– IMSoP
Nov 19 '18 at 14:10
@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.
– Nigel Ren
Nov 19 '18 at 14:17
Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.
– IMSoP
Nov 19 '18 at 14:31
@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.
– Nigel Ren
Nov 19 '18 at 14:38
I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.
– IMSoP
Nov 19 '18 at 14:50
|
show 2 more comments
The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.
Replacing the &
with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...>
and converts it to <COVER ... />
)...
$file = file_get_contents('20180400.xml');
$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);
// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");
The file doesn't stick to the XML spec, there are a few things like unknown entities and also non-closed tags.
Replacing the &
with space will manage to ignore the entities, to solve some of the other problems it has been a case of using regular expressions to tidy the tags up (I'm not a regex expert, but the replacement takes <COVER ...>
and converts it to <COVER ... />
)...
$file = file_get_contents('20180400.xml');
$file = str_replace("&", "", $file); //For problems with & character
$file = preg_replace('/<COVER (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<XUI (.*?)>/', '<COVER $1 />', $file);
$file = preg_replace('/<TGP (.*?)>/', '<COVER $1 />', $file);
// libxml_use_internal_errors(true);
$xml = simplexml_load_string($file, 'SimpleXMLElement', LIBXML_NOCDATA);
echo $xml->asXML("out.xml");
answered Nov 17 '18 at 18:51
Nigel RenNigel Ren
25.8k61832
25.8k61832
I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.
– IMSoP
Nov 19 '18 at 14:10
@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.
– Nigel Ren
Nov 19 '18 at 14:17
Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.
– IMSoP
Nov 19 '18 at 14:31
@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.
– Nigel Ren
Nov 19 '18 at 14:38
I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.
– IMSoP
Nov 19 '18 at 14:50
|
show 2 more comments
I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.
– IMSoP
Nov 19 '18 at 14:10
@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.
– Nigel Ren
Nov 19 '18 at 14:17
Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.
– IMSoP
Nov 19 '18 at 14:31
@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.
– Nigel Ren
Nov 19 '18 at 14:38
I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.
– IMSoP
Nov 19 '18 at 14:50
I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.
– IMSoP
Nov 19 '18 at 14:10
I would be very hesitant about ad hoc fixes like this; from other comments, it sounds like the file may be in some non-XML format, where the actual meaning of these tags and entities might be relevant. Just stripping them out might lead to fragile code and incorrect results on other files.
– IMSoP
Nov 19 '18 at 14:10
@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.
– Nigel Ren
Nov 19 '18 at 14:17
@IMSoP, as with any answers on SO, it is up to OP to check that the code and any processing is up to what they need. If this is for some business purpose then I would assume there is some form of testing and validation in the project which again is something they must assume responsibility for.
– Nigel Ren
Nov 19 '18 at 14:17
Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.
– IMSoP
Nov 19 '18 at 14:31
Indeed, but some fixes are riskier than others, and I thought it worth calling out that this is on the "hack that will probably work but might cause problems later" end of the spectrum rather than the "well-recognised technique that you'll find in plenty of professional codebases" end of the spectrum.
– IMSoP
Nov 19 '18 at 14:31
@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.
– Nigel Ren
Nov 19 '18 at 14:38
@IMSoP, but if it was a choice of a hack or ditch all of the data and start again. With appropriate validation and oversight I would rather go with a hack - which is much less error prone than starting again.
– Nigel Ren
Nov 19 '18 at 14:38
I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.
– IMSoP
Nov 19 '18 at 14:50
I'm not disagreeing with posting this answer; I'm just saying that a warning that this is a hack might be sensible, in case readers get the impression that this is a good solution any time they have errors. Also, I don't think the alternative is "ditch the data and start again"; I think the alternative is "research what format the data is in and how its creator intended it to be parsed". Unless the data has been corrupted (in which case it's dangerous anyway), it's presumably something other than XML, and may be documented somewhere.
– IMSoP
Nov 19 '18 at 14:50
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338804%2fphp-read-and-parse-xml-like-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Thats NOT an XML file. I think thats a SAP specific tag
– RiggsFolly
Nov 16 '18 at 13:29
I'm not sure how you're getting errors about
DOMDocument::loadXML
when you say you aren't calling that method– iainn
Nov 16 '18 at 13:32
1
Possible duplicate of XML parser error: entity not defined
– Mohammad
Nov 16 '18 at 13:32
@RiggsFolly do you have any idea how to parse this file?
– Ben Perry
Nov 16 '18 at 13:52
SHort of looking for a library to help, no
– RiggsFolly
Nov 16 '18 at 13:55