xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

I'm trying to parse a directory with a collection of xml files from RSS feeds.
I have a similar code for another directory working fine, so I can't figure out the problem. I want to return the items so I can write them to a CSV file. The error I'm getting is:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

Here is the site I've collected RSS feeds from: https://www.ba.no/service/rss

It worked fine for: https://www.nrk.no/toppsaker.rss and https://www.vg.no/rss/feed/?limit=10&format=rss&categories=&keywords=

Here is the function for this RSS:

import os

import xml.etree.ElementTree as ET

import csv



def baitem():

basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    if fname != "last_feed.xml":

        files = ET.parse(os.path.join(basepath, fname))

        root = files.getroot()

        items = root.find("channel").findall("item")

        #print(items)

    for item in items:

        date = item.find("pubDate").text

        title = item.find("title").text

        description = item.find("description").text

        link = item.find("link").text

        table.append((date, title, description, link))

return table

I tested with print(items) and it returns all the objects.
Can it be how the XML files are written?

edited Nov 19 '18 at 13:43

asked Nov 19 '18 at 11:58

Felisep

add a comment |

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

Here is the site I've collected RSS feeds from: https://www.ba.no/service/rss

It worked fine for: https://www.nrk.no/toppsaker.rss and https://www.vg.no/rss/feed/?limit=10&format=rss&categories=&keywords=

Here is the function for this RSS:

import os

import xml.etree.ElementTree as ET

import csv



def baitem():

basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    if fname != "last_feed.xml":

        files = ET.parse(os.path.join(basepath, fname))

        root = files.getroot()

        items = root.find("channel").findall("item")

        #print(items)

    for item in items:

        date = item.find("pubDate").text

        title = item.find("title").text

        description = item.find("description").text

        link = item.find("link").text

        table.append((date, title, description, link))

return table

I tested with print(items) and it returns all the objects.
Can it be how the XML files are written?

edited Nov 19 '18 at 13:43

asked Nov 19 '18 at 11:58

Felisep

add a comment |

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

Here is the site I've collected RSS feeds from: https://www.ba.no/service/rss

It worked fine for: https://www.nrk.no/toppsaker.rss and https://www.vg.no/rss/feed/?limit=10&format=rss&categories=&keywords=

Here is the function for this RSS:

import os

import xml.etree.ElementTree as ET

import csv



def baitem():

basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    if fname != "last_feed.xml":

        files = ET.parse(os.path.join(basepath, fname))

        root = files.getroot()

        items = root.find("channel").findall("item")

        #print(items)

    for item in items:

        date = item.find("pubDate").text

        title = item.find("title").text

        description = item.find("description").text

        link = item.find("link").text

        table.append((date, title, description, link))

return table

I tested with print(items) and it returns all the objects.
Can it be how the XML files are written?

edited Nov 19 '18 at 13:43

asked Nov 19 '18 at 11:58

Felisep

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

Here is the site I've collected RSS feeds from: https://www.ba.no/service/rss

It worked fine for: https://www.nrk.no/toppsaker.rss and https://www.vg.no/rss/feed/?limit=10&format=rss&categories=&keywords=

Here is the function for this RSS:

import os

import xml.etree.ElementTree as ET

import csv



def baitem():

basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    if fname != "last_feed.xml":

        files = ET.parse(os.path.join(basepath, fname))

        root = files.getroot()

        items = root.find("channel").findall("item")

        #print(items)

    for item in items:

        date = item.find("pubDate").text

        title = item.find("title").text

        description = item.find("description").text

        link = item.find("link").text

        table.append((date, title, description, link))

return table

I tested with print(items) and it returns all the objects.
Can it be how the XML files are written?

python-3.6 elementtree parse-error xml.etree python-os

edited Nov 19 '18 at 13:43

asked Nov 19 '18 at 11:58

Felisep

edited Nov 19 '18 at 13:43

asked Nov 19 '18 at 11:58

Felisep

edited Nov 19 '18 at 13:43

asked Nov 19 '18 at 11:58

Felisep

asked Nov 19 '18 at 11:58

Felisep

asked Nov 19 '18 at 11:58

Felisep

add a comment |

1 Answer
1

active

oldest

votes

Asked a friend and said to test with a try except statement. Found a .DS_Store file, which only applies to Mac computers. I'm providing the solution for those who might experience the same problem in the future.

def baitem():



basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    try:

        if fname != "last_feed.xml" and fname != ".DS_Store":

            files = ET.parse(os.path.join(basepath, fname))

            root = files.getroot()

            items = root.find("channel").findall("item")

            for item in items:

                date = item.find("pubDate").text

                title = item.find("title").text

                description = item.find("description").text

                link = item.find("link").text

                table.append((date, title, description, link))

    except Exception as e:

        print(fname, e)

return table

edited Jan 19 at 12:57

answered Nov 19 '18 at 14:43

Felisep

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53374160%2fxml-etree-elementtree-parseerror-not-well-formed-invalid-token-line-1-colum%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

def baitem():



basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    try:

        if fname != "last_feed.xml" and fname != ".DS_Store":

            files = ET.parse(os.path.join(basepath, fname))

            root = files.getroot()

            items = root.find("channel").findall("item")

            for item in items:

                date = item.find("pubDate").text

                title = item.find("title").text

                description = item.find("description").text

                link = item.find("link").text

                table.append((date, title, description, link))

    except Exception as e:

        print(fname, e)

return table

edited Jan 19 at 12:57

answered Nov 19 '18 at 14:43

Felisep

add a comment |

def baitem():



basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    try:

        if fname != "last_feed.xml" and fname != ".DS_Store":

            files = ET.parse(os.path.join(basepath, fname))

            root = files.getroot()

            items = root.find("channel").findall("item")

            for item in items:

                date = item.find("pubDate").text

                title = item.find("title").text

                description = item.find("description").text

                link = item.find("link").text

                table.append((date, title, description, link))

    except Exception as e:

        print(fname, e)

return table

edited Jan 19 at 12:57

answered Nov 19 '18 at 14:43

Felisep

add a comment |

def baitem():



basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    try:

        if fname != "last_feed.xml" and fname != ".DS_Store":

            files = ET.parse(os.path.join(basepath, fname))

            root = files.getroot()

            items = root.find("channel").findall("item")

            for item in items:

                date = item.find("pubDate").text

                title = item.find("title").text

                description = item.find("description").text

                link = item.find("link").text

                table.append((date, title, description, link))

    except Exception as e:

        print(fname, e)

return table

edited Jan 19 at 12:57

answered Nov 19 '18 at 14:43

Felisep

def baitem():



basepath = "../data_copy/bergens_avisen"



table = 



for fname in os.listdir(basepath):

    try:

        if fname != "last_feed.xml" and fname != ".DS_Store":

            files = ET.parse(os.path.join(basepath, fname))

            root = files.getroot()

            items = root.find("channel").findall("item")

            for item in items:

                date = item.find("pubDate").text

                title = item.find("title").text

                description = item.find("description").text

                link = item.find("link").text

                table.append((date, title, description, link))

    except Exception as e:

        print(fname, e)

return table

edited Jan 19 at 12:57

answered Nov 19 '18 at 14:43

Felisep

edited Jan 19 at 12:57

answered Nov 19 '18 at 14:43

Felisep

answered Nov 19 '18 at 14:43

Felisep

answered Nov 19 '18 at 14:43

Felisep

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk