Python Scrapy - Can't log into a site

I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:

test.py

import scrapy

from scrapy.http import FormRequest

from scrapy.utils.response import open_in_browser



class Test_spider(scrapy.Spider):

    """

    Log into the provided site with Scrapy

    """



    name = 'test'

    start_urls = ['https://www.privatelenderdatafeed.com/login/']





    def parse(self, response):

        """

        Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site

        """



        return FormRequest.from_response(

            response,

            formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

                      'email':'email',          # Email

                      'password':'password'     # Password

                      },

            callback = self.after_login)





    def after_login(self, response):

        """

        Open browser to check status 

        """



        open_in_browser(response)

I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?

edited Nov 20 '18 at 0:34

asked Nov 19 '18 at 21:58

curiousgeorge

417

add a comment |

test.py

import scrapy

from scrapy.http import FormRequest

from scrapy.utils.response import open_in_browser



class Test_spider(scrapy.Spider):

    """

    Log into the provided site with Scrapy

    """



    name = 'test'

    start_urls = ['https://www.privatelenderdatafeed.com/login/']





    def parse(self, response):

        """

        Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site

        """



        return FormRequest.from_response(

            response,

            formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

                      'email':'email',          # Email

                      'password':'password'     # Password

                      },

            callback = self.after_login)





    def after_login(self, response):

        """

        Open browser to check status 

        """



        open_in_browser(response)

edited Nov 20 '18 at 0:34

asked Nov 19 '18 at 21:58

curiousgeorge

417

add a comment |

test.py

import scrapy

from scrapy.http import FormRequest

from scrapy.utils.response import open_in_browser



class Test_spider(scrapy.Spider):

    """

    Log into the provided site with Scrapy

    """



    name = 'test'

    start_urls = ['https://www.privatelenderdatafeed.com/login/']





    def parse(self, response):

        """

        Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site

        """



        return FormRequest.from_response(

            response,

            formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

                      'email':'email',          # Email

                      'password':'password'     # Password

                      },

            callback = self.after_login)





    def after_login(self, response):

        """

        Open browser to check status 

        """



        open_in_browser(response)

edited Nov 20 '18 at 0:34

asked Nov 19 '18 at 21:58

curiousgeorge

417

test.py

import scrapy

from scrapy.http import FormRequest

from scrapy.utils.response import open_in_browser



class Test_spider(scrapy.Spider):

    """

    Log into the provided site with Scrapy

    """



    name = 'test'

    start_urls = ['https://www.privatelenderdatafeed.com/login/']





    def parse(self, response):

        """

        Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site

        """



        return FormRequest.from_response(

            response,

            formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

                      'email':'email',          # Email

                      'password':'password'     # Password

                      },

            callback = self.after_login)





    def after_login(self, response):

        """

        Open browser to check status 

        """



        open_in_browser(response)

python scrapy

edited Nov 20 '18 at 0:34

asked Nov 19 '18 at 21:58

curiousgeorge

417

edited Nov 20 '18 at 0:34

asked Nov 19 '18 at 21:58

curiousgeorge

417

edited Nov 20 '18 at 0:34

asked Nov 19 '18 at 21:58

curiousgeorge

417

asked Nov 19 '18 at 21:58

curiousgeorge

417

asked Nov 19 '18 at 21:58

curiousgeorge

417

add a comment |

1 Answer
1

active

oldest

votes

If you look at the POST request that is posted to the website, you can see that the cause is xhr.

web developer

That means it's not a "normal" HTML form submission, there is some javascript involved.

To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:

return FormRequest.from_response(

    response,

    formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

              'email':'email',          # Email

              'password':'password'     # Password

              },

    yield Request('https://after/login/url', callback=self.after_login)

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

– curiousgeorge
Nov 20 '18 at 18:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383230%2fpython-scrapy-cant-log-into-a-site%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If you look at the POST request that is posted to the website, you can see that the cause is xhr.

web developer

That means it's not a "normal" HTML form submission, there is some javascript involved.

To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:

return FormRequest.from_response(

    response,

    formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

              'email':'email',          # Email

              'password':'password'     # Password

              },

    yield Request('https://after/login/url', callback=self.after_login)

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

– curiousgeorge
Nov 20 '18 at 18:31

add a comment |

If you look at the POST request that is posted to the website, you can see that the cause is xhr.

web developer

That means it's not a "normal" HTML form submission, there is some javascript involved.

To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:

return FormRequest.from_response(

    response,

    formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

              'email':'email',          # Email

              'password':'password'     # Password

              },

    yield Request('https://after/login/url', callback=self.after_login)

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

– curiousgeorge
Nov 20 '18 at 18:31

add a comment |

If you look at the POST request that is posted to the website, you can see that the cause is xhr.

web developer

That means it's not a "normal" HTML form submission, there is some javascript involved.

To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:

return FormRequest.from_response(

    response,

    formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

              'email':'email',          # Email

              'password':'password'     # Password

              },

    yield Request('https://after/login/url', callback=self.after_login)

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

If you look at the POST request that is posted to the website, you can see that the cause is xhr.

web developer

That means it's not a "normal" HTML form submission, there is some javascript involved.

To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:

return FormRequest.from_response(

    response,

    formdata={'ajaxreferred':'1',       # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it

              'email':'email',          # Email

              'password':'password'     # Password

              },

    yield Request('https://after/login/url', callback=self.after_login)

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

answered Nov 20 '18 at 17:10

Guillaume

1,1531724

I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

– curiousgeorge
Nov 20 '18 at 18:31

add a comment |

I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

– curiousgeorge
Nov 20 '18 at 18:31

I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

– curiousgeorge
Nov 20 '18 at 18:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk