Mechanicalsoup parsing HTML
up vote
0
down vote
favorite
I want to use mechanicalsoup to run a search on google then take the first page results and return them in a list that has the Title and Link.
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open('http://google.com')
browser.select_form('form[action="/search"]')
browser['q'] = "cookies"
browser.submit_selected()
links = browser.get_current_page().find_all('h3')
usable =
for link in links:
u_link = link.find_all('a')
usable.append(u_link)
print(usable[0])
This code returns the data I want, but it looks like:
[<a href="/url?q=https://iambaker.net/cookie-recipes/&sa=U&ved=0ahUKEwjwwKe96s3eAhWUyIMKHYTgCOkQFggUMAA&usg=AOvVaw0x6Run0LppqZLnS9Sul9qH">The 50 Best <b>Cookie</b> Recipes in the World | i am baker</a>]
I've looked at the google search results page and found that div class='r' has the results and in that div is an h3 class="LC201b" which has the title info and under that is a cite class="iUh30" which has the url. Problem is, if i try to target this div,I get a blank result.
python python-3.x mechanicalsoup
add a comment |
up vote
0
down vote
favorite
I want to use mechanicalsoup to run a search on google then take the first page results and return them in a list that has the Title and Link.
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open('http://google.com')
browser.select_form('form[action="/search"]')
browser['q'] = "cookies"
browser.submit_selected()
links = browser.get_current_page().find_all('h3')
usable =
for link in links:
u_link = link.find_all('a')
usable.append(u_link)
print(usable[0])
This code returns the data I want, but it looks like:
[<a href="/url?q=https://iambaker.net/cookie-recipes/&sa=U&ved=0ahUKEwjwwKe96s3eAhWUyIMKHYTgCOkQFggUMAA&usg=AOvVaw0x6Run0LppqZLnS9Sul9qH">The 50 Best <b>Cookie</b> Recipes in the World | i am baker</a>]
I've looked at the google search results page and found that div class='r' has the results and in that div is an h3 class="LC201b" which has the title info and under that is a cite class="iUh30" which has the url. Problem is, if i try to target this div,I get a blank result.
python python-3.x mechanicalsoup
Most likely google is serving different content for javascript-enabled browsers and for MechanicalSoup.
– Matthieu Moy
Nov 12 at 7:02
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I want to use mechanicalsoup to run a search on google then take the first page results and return them in a list that has the Title and Link.
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open('http://google.com')
browser.select_form('form[action="/search"]')
browser['q'] = "cookies"
browser.submit_selected()
links = browser.get_current_page().find_all('h3')
usable =
for link in links:
u_link = link.find_all('a')
usable.append(u_link)
print(usable[0])
This code returns the data I want, but it looks like:
[<a href="/url?q=https://iambaker.net/cookie-recipes/&sa=U&ved=0ahUKEwjwwKe96s3eAhWUyIMKHYTgCOkQFggUMAA&usg=AOvVaw0x6Run0LppqZLnS9Sul9qH">The 50 Best <b>Cookie</b> Recipes in the World | i am baker</a>]
I've looked at the google search results page and found that div class='r' has the results and in that div is an h3 class="LC201b" which has the title info and under that is a cite class="iUh30" which has the url. Problem is, if i try to target this div,I get a blank result.
python python-3.x mechanicalsoup
I want to use mechanicalsoup to run a search on google then take the first page results and return them in a list that has the Title and Link.
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open('http://google.com')
browser.select_form('form[action="/search"]')
browser['q'] = "cookies"
browser.submit_selected()
links = browser.get_current_page().find_all('h3')
usable =
for link in links:
u_link = link.find_all('a')
usable.append(u_link)
print(usable[0])
This code returns the data I want, but it looks like:
[<a href="/url?q=https://iambaker.net/cookie-recipes/&sa=U&ved=0ahUKEwjwwKe96s3eAhWUyIMKHYTgCOkQFggUMAA&usg=AOvVaw0x6Run0LppqZLnS9Sul9qH">The 50 Best <b>Cookie</b> Recipes in the World | i am baker</a>]
I've looked at the google search results page and found that div class='r' has the results and in that div is an h3 class="LC201b" which has the title info and under that is a cite class="iUh30" which has the url. Problem is, if i try to target this div,I get a blank result.
python python-3.x mechanicalsoup
python python-3.x mechanicalsoup
asked Nov 12 at 3:26
cmrussell
1827
1827
Most likely google is serving different content for javascript-enabled browsers and for MechanicalSoup.
– Matthieu Moy
Nov 12 at 7:02
add a comment |
Most likely google is serving different content for javascript-enabled browsers and for MechanicalSoup.
– Matthieu Moy
Nov 12 at 7:02
Most likely google is serving different content for javascript-enabled browsers and for MechanicalSoup.
– Matthieu Moy
Nov 12 at 7:02
Most likely google is serving different content for javascript-enabled browsers and for MechanicalSoup.
– Matthieu Moy
Nov 12 at 7:02
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255579%2fmechanicalsoup-parsing-html%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Most likely google is serving different content for javascript-enabled browsers and for MechanicalSoup.
– Matthieu Moy
Nov 12 at 7:02