Python Scrapy - Can't log into a site
I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:
test.py
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
class Test_spider(scrapy.Spider):
"""
Log into the provided site with Scrapy
"""
name = 'test'
start_urls = ['https://www.privatelenderdatafeed.com/login/']
def parse(self, response):
"""
Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
"""
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
callback = self.after_login)
def after_login(self, response):
"""
Open browser to check status
"""
open_in_browser(response)
I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?
python scrapy
add a comment |
I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:
test.py
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
class Test_spider(scrapy.Spider):
"""
Log into the provided site with Scrapy
"""
name = 'test'
start_urls = ['https://www.privatelenderdatafeed.com/login/']
def parse(self, response):
"""
Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
"""
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
callback = self.after_login)
def after_login(self, response):
"""
Open browser to check status
"""
open_in_browser(response)
I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?
python scrapy
add a comment |
I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:
test.py
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
class Test_spider(scrapy.Spider):
"""
Log into the provided site with Scrapy
"""
name = 'test'
start_urls = ['https://www.privatelenderdatafeed.com/login/']
def parse(self, response):
"""
Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
"""
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
callback = self.after_login)
def after_login(self, response):
"""
Open browser to check status
"""
open_in_browser(response)
I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?
python scrapy
I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:
test.py
import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser
class Test_spider(scrapy.Spider):
"""
Log into the provided site with Scrapy
"""
name = 'test'
start_urls = ['https://www.privatelenderdatafeed.com/login/']
def parse(self, response):
"""
Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
"""
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
callback = self.after_login)
def after_login(self, response):
"""
Open browser to check status
"""
open_in_browser(response)
I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?
python scrapy
python scrapy
edited Nov 20 '18 at 0:34
curiousgeorge
asked Nov 19 '18 at 21:58
curiousgeorgecuriousgeorge
417
417
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
If you look at the POST request that is posted to the website, you can see that the cause is xhr.

That means it's not a "normal" HTML form submission, there is some javascript involved.
To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
yield Request('https://after/login/url', callback=self.after_login)
I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!
– curiousgeorge
Nov 20 '18 at 18:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383230%2fpython-scrapy-cant-log-into-a-site%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you look at the POST request that is posted to the website, you can see that the cause is xhr.

That means it's not a "normal" HTML form submission, there is some javascript involved.
To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
yield Request('https://after/login/url', callback=self.after_login)
I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!
– curiousgeorge
Nov 20 '18 at 18:31
add a comment |
If you look at the POST request that is posted to the website, you can see that the cause is xhr.

That means it's not a "normal" HTML form submission, there is some javascript involved.
To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
yield Request('https://after/login/url', callback=self.after_login)
I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!
– curiousgeorge
Nov 20 '18 at 18:31
add a comment |
If you look at the POST request that is posted to the website, you can see that the cause is xhr.

That means it's not a "normal" HTML form submission, there is some javascript involved.
To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
yield Request('https://after/login/url', callback=self.after_login)
If you look at the POST request that is posted to the website, you can see that the cause is xhr.

That means it's not a "normal" HTML form submission, there is some javascript involved.
To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:
return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
yield Request('https://after/login/url', callback=self.after_login)
answered Nov 20 '18 at 17:10
GuillaumeGuillaume
1,1531724
1,1531724
I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!
– curiousgeorge
Nov 20 '18 at 18:31
add a comment |
I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!
– curiousgeorge
Nov 20 '18 at 18:31
I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!
– curiousgeorge
Nov 20 '18 at 18:31
I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!
– curiousgeorge
Nov 20 '18 at 18:31
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383230%2fpython-scrapy-cant-log-into-a-site%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown