Text from webpage using BeautifulSoup

-2

I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python

Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.

any ideas?

asked Nov 20 '18 at 17:24

L.1995

2

please share what you have done so far and point out where it failed.

– Alexis
Nov 20 '18 at 17:25

I can't find anything on that page that says "Market Non-displayed volume total"

– NotAnAmbiTurner
Nov 20 '18 at 17:32

add a comment |

-2

I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python

Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.

any ideas?

asked Nov 20 '18 at 17:24

L.1995

2

please share what you have done so far and point out where it failed.

– Alexis
Nov 20 '18 at 17:25

I can't find anything on that page that says "Market Non-displayed volume total"

– NotAnAmbiTurner
Nov 20 '18 at 17:32

add a comment |

-2

I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python

Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.

any ideas?

asked Nov 20 '18 at 17:24

L.1995

I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python

Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.

any ideas?

python http beautifulsoup

asked Nov 20 '18 at 17:24

L.1995

asked Nov 20 '18 at 17:24

L.1995

asked Nov 20 '18 at 17:24

L.1995

asked Nov 20 '18 at 17:24

L.1995

asked Nov 20 '18 at 17:24

L.1995

2

please share what you have done so far and point out where it failed.

– Alexis
Nov 20 '18 at 17:25

I can't find anything on that page that says "Market Non-displayed volume total"

– NotAnAmbiTurner
Nov 20 '18 at 17:32

add a comment |

2

please share what you have done so far and point out where it failed.

– Alexis
Nov 20 '18 at 17:25

I can't find anything on that page that says "Market Non-displayed volume total"

– NotAnAmbiTurner
Nov 20 '18 at 17:32

please share what you have done so far and point out where it failed.

– Alexis
Nov 20 '18 at 17:25

I can't find anything on that page that says "Market Non-displayed volume total"

– NotAnAmbiTurner
Nov 20 '18 at 17:32

add a comment |

2 Answers
2

active

oldest

votes

I would suggest giving the pandas html reader a shot:

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Grab the second table founds

df = results[1]

# Set the first column as the index

df = df.set_index(0)

# Switch columns and indexes

df = df.T

# Drop any columns that have no data in them

df = df.dropna(how='all', axis=1)

# Set the column under "Displayed Price Venues" as the index

df = df.set_index('Displayed Price Venues')

# Switch columns and indexes again

df = df.T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

You can also do this in a more aggressively compact way (same code but without breaking the steps down):

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Do all the stuff above in one go

df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

answered Nov 20 '18 at 17:54

jfbeltran

9562817

1

Thanks so much ! worked like a charm

– L.1995
Nov 21 '18 at 16:06

add a comment |

The problem is the id keep changing dynamically. Otherwise, I would have just used that but can't. Assuming that the Output value is what you're looking for, this should work, also as long as the content doesn't change or get shifted around.

from bs4 import BeautifulSoup as bs

import requests



url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'

page = requests.get(url)

html = bs(page.text, 'lxml')

total_volume = html.findAll('td', class_='idx_val')

print(total_volume[645].text)



Output:

€4,378,517,621

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398327%2ftext-from-webpage-using-beautifulsoup%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I would suggest giving the pandas html reader a shot:

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Grab the second table founds

df = results[1]

# Set the first column as the index

df = df.set_index(0)

# Switch columns and indexes

df = df.T

# Drop any columns that have no data in them

df = df.dropna(how='all', axis=1)

# Set the column under "Displayed Price Venues" as the index

df = df.set_index('Displayed Price Venues')

# Switch columns and indexes again

df = df.T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

You can also do this in a more aggressively compact way (same code but without breaking the steps down):

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Do all the stuff above in one go

df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

answered Nov 20 '18 at 17:54

jfbeltran

9562817

1

Thanks so much ! worked like a charm

– L.1995
Nov 21 '18 at 16:06

add a comment |

I would suggest giving the pandas html reader a shot:

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Grab the second table founds

df = results[1]

# Set the first column as the index

df = df.set_index(0)

# Switch columns and indexes

df = df.T

# Drop any columns that have no data in them

df = df.dropna(how='all', axis=1)

# Set the column under "Displayed Price Venues" as the index

df = df.set_index('Displayed Price Venues')

# Switch columns and indexes again

df = df.T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

You can also do this in a more aggressively compact way (same code but without breaking the steps down):

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Do all the stuff above in one go

df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

answered Nov 20 '18 at 17:54

jfbeltran

9562817

1

Thanks so much ! worked like a charm

– L.1995
Nov 21 '18 at 16:06

add a comment |

I would suggest giving the pandas html reader a shot:

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Grab the second table founds

df = results[1]

# Set the first column as the index

df = df.set_index(0)

# Switch columns and indexes

df = df.T

# Drop any columns that have no data in them

df = df.dropna(how='all', axis=1)

# Set the column under "Displayed Price Venues" as the index

df = df.set_index('Displayed Price Venues')

# Switch columns and indexes again

df = df.T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

You can also do this in a more aggressively compact way (same code but without breaking the steps down):

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Do all the stuff above in one go

df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

answered Nov 20 '18 at 17:54

jfbeltran

9562817

I would suggest giving the pandas html reader a shot:

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Grab the second table founds

df = results[1]

# Set the first column as the index

df = df.set_index(0)

# Switch columns and indexes

df = df.T

# Drop any columns that have no data in them

df = df.dropna(how='all', axis=1)

# Set the column under "Displayed Price Venues" as the index

df = df.set_index('Displayed Price Venues')

# Switch columns and indexes again

df = df.T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

You can also do this in a more aggressively compact way (same code but without breaking the steps down):

import pandas as pd



# Read in all tables at this address as pandas dataframes

results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')



# Do all the stuff above in one go

df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T



# Aesthetic. Don't like having an index name myself! 

del df.index.name



# Separate the three subtables from each other!  

displayed = df.iloc[0:18]

non_displayed = df.iloc[18:-1]

total = df.iloc[-1]

answered Nov 20 '18 at 17:54

jfbeltran

9562817

answered Nov 20 '18 at 17:54

jfbeltran

9562817

answered Nov 20 '18 at 17:54

jfbeltran

9562817

answered Nov 20 '18 at 17:54

jfbeltran

9562817

1

Thanks so much ! worked like a charm

– L.1995
Nov 21 '18 at 16:06

add a comment |

1

Thanks so much ! worked like a charm

– L.1995
Nov 21 '18 at 16:06

Thanks so much ! worked like a charm

– L.1995
Nov 21 '18 at 16:06

add a comment |

from bs4 import BeautifulSoup as bs

import requests



url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'

page = requests.get(url)

html = bs(page.text, 'lxml')

total_volume = html.findAll('td', class_='idx_val')

print(total_volume[645].text)



Output:

€4,378,517,621

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

add a comment |

from bs4 import BeautifulSoup as bs

import requests



url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'

page = requests.get(url)

html = bs(page.text, 'lxml')

total_volume = html.findAll('td', class_='idx_val')

print(total_volume[645].text)



Output:

€4,378,517,621

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

add a comment |

from bs4 import BeautifulSoup as bs

import requests



url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'

page = requests.get(url)

html = bs(page.text, 'lxml')

total_volume = html.findAll('td', class_='idx_val')

print(total_volume[645].text)



Output:

€4,378,517,621

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

from bs4 import BeautifulSoup as bs

import requests



url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'

page = requests.get(url)

html = bs(page.text, 'lxml')

total_volume = html.findAll('td', class_='idx_val')

print(total_volume[645].text)



Output:

€4,378,517,621

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

answered Nov 20 '18 at 17:57

Kamikaze_goldfish

493311

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk