Python Scrapy - Can't log into a site












1















I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:



test.py



import scrapy
from scrapy.http import FormRequest
from scrapy.utils.response import open_in_browser

class Test_spider(scrapy.Spider):
"""
Log into the provided site with Scrapy
"""

name = 'test'
start_urls = ['https://www.privatelenderdatafeed.com/login/']


def parse(self, response):
"""
Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
"""

return FormRequest.from_response(
response,
formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
'email':'email', # Email
'password':'password' # Password
},
callback = self.after_login)


def after_login(self, response):
"""
Open browser to check status
"""

open_in_browser(response)


I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?










share|improve this question





























    1















    I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:



    test.py



    import scrapy
    from scrapy.http import FormRequest
    from scrapy.utils.response import open_in_browser

    class Test_spider(scrapy.Spider):
    """
    Log into the provided site with Scrapy
    """

    name = 'test'
    start_urls = ['https://www.privatelenderdatafeed.com/login/']


    def parse(self, response):
    """
    Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
    """

    return FormRequest.from_response(
    response,
    formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
    'email':'email', # Email
    'password':'password' # Password
    },
    callback = self.after_login)


    def after_login(self, response):
    """
    Open browser to check status
    """

    open_in_browser(response)


    I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?










    share|improve this question



























      1












      1








      1








      I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:



      test.py



      import scrapy
      from scrapy.http import FormRequest
      from scrapy.utils.response import open_in_browser

      class Test_spider(scrapy.Spider):
      """
      Log into the provided site with Scrapy
      """

      name = 'test'
      start_urls = ['https://www.privatelenderdatafeed.com/login/']


      def parse(self, response):
      """
      Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
      """

      return FormRequest.from_response(
      response,
      formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
      'email':'email', # Email
      'password':'password' # Password
      },
      callback = self.after_login)


      def after_login(self, response):
      """
      Open browser to check status
      """

      open_in_browser(response)


      I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?










      share|improve this question
















      I'm a noob when it comes to Scrapy and understand the underyling basic scraping and crawling operations thanks to the docs. However, I am having difficulities with logging into a site. Here's my code:



      test.py



      import scrapy
      from scrapy.http import FormRequest
      from scrapy.utils.response import open_in_browser

      class Test_spider(scrapy.Spider):
      """
      Log into the provided site with Scrapy
      """

      name = 'test'
      start_urls = ['https://www.privatelenderdatafeed.com/login/']


      def parse(self, response):
      """
      Send login data and use "from_response" to pre-populate session related data as per the docs and what I need for this site
      """

      return FormRequest.from_response(
      response,
      formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
      'email':'email', # Email
      'password':'password' # Password
      },
      callback = self.after_login)


      def after_login(self, response):
      """
      Open browser to check status
      """

      open_in_browser(response)


      I explictly make Scrapy open the browser regardless of whether it logs into the site or not so I can visually check the status. In other words, if its still at the login page it failed someway/somehow. Otherwise, if I'm logged in then I should see a different page. Obviously, it doesn't log in and I just continue to see the login page. What is going on here?







      python scrapy






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 20 '18 at 0:34







      curiousgeorge

















      asked Nov 19 '18 at 21:58









      curiousgeorgecuriousgeorge

      417




      417
























          1 Answer
          1






          active

          oldest

          votes


















          1














          If you look at the POST request that is posted to the website, you can see that the cause is xhr.



          web developer



          That means it's not a "normal" HTML form submission, there is some javascript involved.



          To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:



          return FormRequest.from_response(
          response,
          formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
          'email':'email', # Email
          'password':'password' # Password
          },
          yield Request('https://after/login/url', callback=self.after_login)





          share|improve this answer
























          • I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

            – curiousgeorge
            Nov 20 '18 at 18:31











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383230%2fpython-scrapy-cant-log-into-a-site%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          If you look at the POST request that is posted to the website, you can see that the cause is xhr.



          web developer



          That means it's not a "normal" HTML form submission, there is some javascript involved.



          To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:



          return FormRequest.from_response(
          response,
          formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
          'email':'email', # Email
          'password':'password' # Password
          },
          yield Request('https://after/login/url', callback=self.after_login)





          share|improve this answer
























          • I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

            – curiousgeorge
            Nov 20 '18 at 18:31
















          1














          If you look at the POST request that is posted to the website, you can see that the cause is xhr.



          web developer



          That means it's not a "normal" HTML form submission, there is some javascript involved.



          To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:



          return FormRequest.from_response(
          response,
          formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
          'email':'email', # Email
          'password':'password' # Password
          },
          yield Request('https://after/login/url', callback=self.after_login)





          share|improve this answer
























          • I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

            – curiousgeorge
            Nov 20 '18 at 18:31














          1












          1








          1







          If you look at the POST request that is posted to the website, you can see that the cause is xhr.



          web developer



          That means it's not a "normal" HTML form submission, there is some javascript involved.



          To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:



          return FormRequest.from_response(
          response,
          formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
          'email':'email', # Email
          'password':'password' # Password
          },
          yield Request('https://after/login/url', callback=self.after_login)





          share|improve this answer













          If you look at the POST request that is posted to the website, you can see that the cause is xhr.



          web developer



          That means it's not a "normal" HTML form submission, there is some javascript involved.



          To get around this, once you have submitted the post request, you will have to send a request on the next page, by that I mean you have to know the next URL to go:



          return FormRequest.from_response(
          response,
          formdata={'ajaxreferred':'1', # Not sure if I need this? It's included in the form data when I checked the site with dev tools so I'm including it
          'email':'email', # Email
          'password':'password' # Password
          },
          yield Request('https://after/login/url', callback=self.after_login)






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 17:10









          GuillaumeGuillaume

          1,1531724




          1,1531724













          • I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

            – curiousgeorge
            Nov 20 '18 at 18:31



















          • I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

            – curiousgeorge
            Nov 20 '18 at 18:31

















          I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

          – curiousgeorge
          Nov 20 '18 at 18:31





          I changed the code to include the yield Request and appropriate url but now the spider doesn't run at all because its saying that the callback is invalid (it doesn't recognize "self" apparently). What?! Oh well, in any case, you've helped immensely. I've been looking more into Scrapy and the site and its just as you've said - the login utilizes js! No wonder! Complicates things since I just learned that Scrapy doesn't do anything at all with js but now I have a solid starting point to work off from now. Thanks @Guillaume!!!

          – curiousgeorge
          Nov 20 '18 at 18:31




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383230%2fpython-scrapy-cant-log-into-a-site%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to pass form data using jquery Ajax to insert data in database?

          National Museum of Racing and Hall of Fame

          Guess what letter conforming each word