Text from webpage using BeautifulSoup












-2















I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python



Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.



any ideas?










share|improve this question


















  • 2





    please share what you have done so far and point out where it failed.

    – Alexis
    Nov 20 '18 at 17:25











  • I can't find anything on that page that says "Market Non-displayed volume total"

    – NotAnAmbiTurner
    Nov 20 '18 at 17:32
















-2















I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python



Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.



any ideas?










share|improve this question


















  • 2





    please share what you have done so far and point out where it failed.

    – Alexis
    Nov 20 '18 at 17:25











  • I can't find anything on that page that says "Market Non-displayed volume total"

    – NotAnAmbiTurner
    Nov 20 '18 at 17:32














-2












-2








-2








I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python



Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.



any ideas?










share|improve this question














I'm trying to extract some data from https://markets.cboe.com/europe/equities/market_share/index/all/ using Python



Specifically the figure for "Market Non-displayed volume total", I've tried several ways using BeautifulSoup but none seem to get me there.



any ideas?







python http beautifulsoup






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 17:24









L.1995L.1995

43




43








  • 2





    please share what you have done so far and point out where it failed.

    – Alexis
    Nov 20 '18 at 17:25











  • I can't find anything on that page that says "Market Non-displayed volume total"

    – NotAnAmbiTurner
    Nov 20 '18 at 17:32














  • 2





    please share what you have done so far and point out where it failed.

    – Alexis
    Nov 20 '18 at 17:25











  • I can't find anything on that page that says "Market Non-displayed volume total"

    – NotAnAmbiTurner
    Nov 20 '18 at 17:32








2




2





please share what you have done so far and point out where it failed.

– Alexis
Nov 20 '18 at 17:25





please share what you have done so far and point out where it failed.

– Alexis
Nov 20 '18 at 17:25













I can't find anything on that page that says "Market Non-displayed volume total"

– NotAnAmbiTurner
Nov 20 '18 at 17:32





I can't find anything on that page that says "Market Non-displayed volume total"

– NotAnAmbiTurner
Nov 20 '18 at 17:32












2 Answers
2






active

oldest

votes


















0














I would suggest giving the pandas html reader a shot:



import pandas as pd

# Read in all tables at this address as pandas dataframes
results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

# Grab the second table founds
df = results[1]
# Set the first column as the index
df = df.set_index(0)
# Switch columns and indexes
df = df.T
# Drop any columns that have no data in them
df = df.dropna(how='all', axis=1)
# Set the column under "Displayed Price Venues" as the index
df = df.set_index('Displayed Price Venues')
# Switch columns and indexes again
df = df.T

# Aesthetic. Don't like having an index name myself!
del df.index.name

# Separate the three subtables from each other!
displayed = df.iloc[0:18]
non_displayed = df.iloc[18:-1]
total = df.iloc[-1]


You can also do this in a more aggressively compact way (same code but without breaking the steps down):



import pandas as pd

# Read in all tables at this address as pandas dataframes
results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

# Do all the stuff above in one go
df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T

# Aesthetic. Don't like having an index name myself!
del df.index.name

# Separate the three subtables from each other!
displayed = df.iloc[0:18]
non_displayed = df.iloc[18:-1]
total = df.iloc[-1]





share|improve this answer



















  • 1





    Thanks so much ! worked like a charm

    – L.1995
    Nov 21 '18 at 16:06



















0














The problem is the id keep changing dynamically. Otherwise, I would have just used that but can't. Assuming that the Output value is what you're looking for, this should work, also as long as the content doesn't change or get shifted around.



from bs4 import BeautifulSoup as bs
import requests

url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'
page = requests.get(url)
html = bs(page.text, 'lxml')
total_volume = html.findAll('td', class_='idx_val')
print(total_volume[645].text)

Output:
€4,378,517,621





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398327%2ftext-from-webpage-using-beautifulsoup%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I would suggest giving the pandas html reader a shot:



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Grab the second table founds
    df = results[1]
    # Set the first column as the index
    df = df.set_index(0)
    # Switch columns and indexes
    df = df.T
    # Drop any columns that have no data in them
    df = df.dropna(how='all', axis=1)
    # Set the column under "Displayed Price Venues" as the index
    df = df.set_index('Displayed Price Venues')
    # Switch columns and indexes again
    df = df.T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]


    You can also do this in a more aggressively compact way (same code but without breaking the steps down):



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Do all the stuff above in one go
    df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]





    share|improve this answer



















    • 1





      Thanks so much ! worked like a charm

      – L.1995
      Nov 21 '18 at 16:06
















    0














    I would suggest giving the pandas html reader a shot:



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Grab the second table founds
    df = results[1]
    # Set the first column as the index
    df = df.set_index(0)
    # Switch columns and indexes
    df = df.T
    # Drop any columns that have no data in them
    df = df.dropna(how='all', axis=1)
    # Set the column under "Displayed Price Venues" as the index
    df = df.set_index('Displayed Price Venues')
    # Switch columns and indexes again
    df = df.T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]


    You can also do this in a more aggressively compact way (same code but without breaking the steps down):



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Do all the stuff above in one go
    df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]





    share|improve this answer



















    • 1





      Thanks so much ! worked like a charm

      – L.1995
      Nov 21 '18 at 16:06














    0












    0








    0







    I would suggest giving the pandas html reader a shot:



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Grab the second table founds
    df = results[1]
    # Set the first column as the index
    df = df.set_index(0)
    # Switch columns and indexes
    df = df.T
    # Drop any columns that have no data in them
    df = df.dropna(how='all', axis=1)
    # Set the column under "Displayed Price Venues" as the index
    df = df.set_index('Displayed Price Venues')
    # Switch columns and indexes again
    df = df.T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]


    You can also do this in a more aggressively compact way (same code but without breaking the steps down):



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Do all the stuff above in one go
    df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]





    share|improve this answer













    I would suggest giving the pandas html reader a shot:



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Grab the second table founds
    df = results[1]
    # Set the first column as the index
    df = df.set_index(0)
    # Switch columns and indexes
    df = df.T
    # Drop any columns that have no data in them
    df = df.dropna(how='all', axis=1)
    # Set the column under "Displayed Price Venues" as the index
    df = df.set_index('Displayed Price Venues')
    # Switch columns and indexes again
    df = df.T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]


    You can also do this in a more aggressively compact way (same code but without breaking the steps down):



    import pandas as pd

    # Read in all tables at this address as pandas dataframes
    results = pd.read_html('https://markets.cboe.com/europe/equities/market_share/index/all')

    # Do all the stuff above in one go
    df = results[1].set_index(0).T.dropna(how='all',axis=1).set_index('Displayed Price Venues').T

    # Aesthetic. Don't like having an index name myself!
    del df.index.name

    # Separate the three subtables from each other!
    displayed = df.iloc[0:18]
    non_displayed = df.iloc[18:-1]
    total = df.iloc[-1]






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 20 '18 at 17:54









    jfbeltranjfbeltran

    9562817




    9562817








    • 1





      Thanks so much ! worked like a charm

      – L.1995
      Nov 21 '18 at 16:06














    • 1





      Thanks so much ! worked like a charm

      – L.1995
      Nov 21 '18 at 16:06








    1




    1





    Thanks so much ! worked like a charm

    – L.1995
    Nov 21 '18 at 16:06





    Thanks so much ! worked like a charm

    – L.1995
    Nov 21 '18 at 16:06













    0














    The problem is the id keep changing dynamically. Otherwise, I would have just used that but can't. Assuming that the Output value is what you're looking for, this should work, also as long as the content doesn't change or get shifted around.



    from bs4 import BeautifulSoup as bs
    import requests

    url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'
    page = requests.get(url)
    html = bs(page.text, 'lxml')
    total_volume = html.findAll('td', class_='idx_val')
    print(total_volume[645].text)

    Output:
    €4,378,517,621





    share|improve this answer




























      0














      The problem is the id keep changing dynamically. Otherwise, I would have just used that but can't. Assuming that the Output value is what you're looking for, this should work, also as long as the content doesn't change or get shifted around.



      from bs4 import BeautifulSoup as bs
      import requests

      url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'
      page = requests.get(url)
      html = bs(page.text, 'lxml')
      total_volume = html.findAll('td', class_='idx_val')
      print(total_volume[645].text)

      Output:
      €4,378,517,621





      share|improve this answer


























        0












        0








        0







        The problem is the id keep changing dynamically. Otherwise, I would have just used that but can't. Assuming that the Output value is what you're looking for, this should work, also as long as the content doesn't change or get shifted around.



        from bs4 import BeautifulSoup as bs
        import requests

        url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'
        page = requests.get(url)
        html = bs(page.text, 'lxml')
        total_volume = html.findAll('td', class_='idx_val')
        print(total_volume[645].text)

        Output:
        €4,378,517,621





        share|improve this answer













        The problem is the id keep changing dynamically. Otherwise, I would have just used that but can't. Assuming that the Output value is what you're looking for, this should work, also as long as the content doesn't change or get shifted around.



        from bs4 import BeautifulSoup as bs
        import requests

        url = 'https://markets.cboe.com/europe/equities/market_share/index/all/'
        page = requests.get(url)
        html = bs(page.text, 'lxml')
        total_volume = html.findAll('td', class_='idx_val')
        print(total_volume[645].text)

        Output:
        €4,378,517,621






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 '18 at 17:57









        Kamikaze_goldfishKamikaze_goldfish

        493311




        493311






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398327%2ftext-from-webpage-using-beautifulsoup%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Port of Spain

            Run scheduled task as local user group (not BUILTIN)