How to get specific lines with Jsoup












2















This is the source code that I am trying to scrape with Jsoup. I am interested in taking data from following: "Code Number", "Date Available", "Type", "Breed", "Sex", "Age", "Weight" and "Adoption Fee". That is, I am looking for my output to be:



Code Number: 107796



Date Availabe: 11/20/2018



Type: Dog



Breed: German Shepherd Dog



Sex: Male



Age: 2 years, 0 months



Weight: 64.6 lbs



Adoption Fee: $250



Source code from:
view-source:https://southwesthumane.org/adopt/dogs/dog-details/?id=84807
lines 186-215






<div id="ContentPlaceHolder_Item3_AnimalDetails_2_divDetails">
<h3>Alan</h3>
<div class="float-to-right animal-slideshow">
<div class="cycle-slideshow" data-cycle-fx="Fade" data-cycle-timeout="0" data-cycle-auto-height="container" data-cycle-pager="#adv-custom-pager" data-cycle-pager-template="<a href='#'><img src='{{src}}' width=50 height=50></a>">
<img src="http://southwesthumanepets.shelterbuddy.com/photos/lostfound/84807.jpg" />
</div>
<div id="adv-custom-pager"></div>
</div>
<div class="AnimalDetails">
<p>Alan is looking for a new best friend! Could it be you? Alan is new to the shelter and we are still getting to know his unique personality. If Alan looks like your dream dog, let the staff know you are interested in meeting him. Going to a new home can be exciting and strange for pets, so it's best for them to meet any children and other dogs in their future home. Alan can't wait to meet his forever family!</p>
<br />
<strong>Code Number: </strong>107796
<br />
<strong>Date Available: </strong>11/20/2018
<br />
<strong>Type: </strong>Dog
<br />
<strong>Breed: </strong>German Shepherd Dog
<br />
<strong>Sex: </strong>Male
<br />
<strong>Age: </strong>2 years, 0 months
<br />
<strong>Weight: </strong>64.6 lbs
<br />
<strong>Adoption Fee: </strong>$250
<br />
<br />
</div>
</div>





Here is my code so far:






	    try{
Document dogs = Jsoup.connect("https://southwesthumane.org/adopt/dogs/").get();
Elements links_dogs = dogs.select(":containsOwn(Details »)");
for (Element link : links_dogs) {
String test = "https://southwesthumane.org" + link.attr("href");
System.out.println("url: " + test);
try{
Document dog = Jsoup.connect(test).get();
Elements name = dog.select("h3");
Elements description = dog.select("div.Animaldetails");
for (Element code : name) {
System.out.println("Name: " + code.text());
}
for (Element code : description) {
System.out.println("Description: " + code.select("p").text());
System.out.println(code.select("strong").first().text());
System.out.println(code.select("div.Animaldetails").text());
}
} catch (IOException e) {
e.printStackTrace();
}
}





This line:




System.out.println(code.select("div.Animaldetails").text());




is taking all the information I need but I do not know how to parse each individual line because ultimately I will save each individual information into a list. Any help would be greatly appreciated. Thank you for your time!










share|improve this question



























    2















    This is the source code that I am trying to scrape with Jsoup. I am interested in taking data from following: "Code Number", "Date Available", "Type", "Breed", "Sex", "Age", "Weight" and "Adoption Fee". That is, I am looking for my output to be:



    Code Number: 107796



    Date Availabe: 11/20/2018



    Type: Dog



    Breed: German Shepherd Dog



    Sex: Male



    Age: 2 years, 0 months



    Weight: 64.6 lbs



    Adoption Fee: $250



    Source code from:
    view-source:https://southwesthumane.org/adopt/dogs/dog-details/?id=84807
    lines 186-215






    <div id="ContentPlaceHolder_Item3_AnimalDetails_2_divDetails">
    <h3>Alan</h3>
    <div class="float-to-right animal-slideshow">
    <div class="cycle-slideshow" data-cycle-fx="Fade" data-cycle-timeout="0" data-cycle-auto-height="container" data-cycle-pager="#adv-custom-pager" data-cycle-pager-template="<a href='#'><img src='{{src}}' width=50 height=50></a>">
    <img src="http://southwesthumanepets.shelterbuddy.com/photos/lostfound/84807.jpg" />
    </div>
    <div id="adv-custom-pager"></div>
    </div>
    <div class="AnimalDetails">
    <p>Alan is looking for a new best friend! Could it be you? Alan is new to the shelter and we are still getting to know his unique personality. If Alan looks like your dream dog, let the staff know you are interested in meeting him. Going to a new home can be exciting and strange for pets, so it's best for them to meet any children and other dogs in their future home. Alan can't wait to meet his forever family!</p>
    <br />
    <strong>Code Number: </strong>107796
    <br />
    <strong>Date Available: </strong>11/20/2018
    <br />
    <strong>Type: </strong>Dog
    <br />
    <strong>Breed: </strong>German Shepherd Dog
    <br />
    <strong>Sex: </strong>Male
    <br />
    <strong>Age: </strong>2 years, 0 months
    <br />
    <strong>Weight: </strong>64.6 lbs
    <br />
    <strong>Adoption Fee: </strong>$250
    <br />
    <br />
    </div>
    </div>





    Here is my code so far:






    	    try{
    Document dogs = Jsoup.connect("https://southwesthumane.org/adopt/dogs/").get();
    Elements links_dogs = dogs.select(":containsOwn(Details »)");
    for (Element link : links_dogs) {
    String test = "https://southwesthumane.org" + link.attr("href");
    System.out.println("url: " + test);
    try{
    Document dog = Jsoup.connect(test).get();
    Elements name = dog.select("h3");
    Elements description = dog.select("div.Animaldetails");
    for (Element code : name) {
    System.out.println("Name: " + code.text());
    }
    for (Element code : description) {
    System.out.println("Description: " + code.select("p").text());
    System.out.println(code.select("strong").first().text());
    System.out.println(code.select("div.Animaldetails").text());
    }
    } catch (IOException e) {
    e.printStackTrace();
    }
    }





    This line:




    System.out.println(code.select("div.Animaldetails").text());




    is taking all the information I need but I do not know how to parse each individual line because ultimately I will save each individual information into a list. Any help would be greatly appreciated. Thank you for your time!










    share|improve this question

























      2












      2








      2








      This is the source code that I am trying to scrape with Jsoup. I am interested in taking data from following: "Code Number", "Date Available", "Type", "Breed", "Sex", "Age", "Weight" and "Adoption Fee". That is, I am looking for my output to be:



      Code Number: 107796



      Date Availabe: 11/20/2018



      Type: Dog



      Breed: German Shepherd Dog



      Sex: Male



      Age: 2 years, 0 months



      Weight: 64.6 lbs



      Adoption Fee: $250



      Source code from:
      view-source:https://southwesthumane.org/adopt/dogs/dog-details/?id=84807
      lines 186-215






      <div id="ContentPlaceHolder_Item3_AnimalDetails_2_divDetails">
      <h3>Alan</h3>
      <div class="float-to-right animal-slideshow">
      <div class="cycle-slideshow" data-cycle-fx="Fade" data-cycle-timeout="0" data-cycle-auto-height="container" data-cycle-pager="#adv-custom-pager" data-cycle-pager-template="<a href='#'><img src='{{src}}' width=50 height=50></a>">
      <img src="http://southwesthumanepets.shelterbuddy.com/photos/lostfound/84807.jpg" />
      </div>
      <div id="adv-custom-pager"></div>
      </div>
      <div class="AnimalDetails">
      <p>Alan is looking for a new best friend! Could it be you? Alan is new to the shelter and we are still getting to know his unique personality. If Alan looks like your dream dog, let the staff know you are interested in meeting him. Going to a new home can be exciting and strange for pets, so it's best for them to meet any children and other dogs in their future home. Alan can't wait to meet his forever family!</p>
      <br />
      <strong>Code Number: </strong>107796
      <br />
      <strong>Date Available: </strong>11/20/2018
      <br />
      <strong>Type: </strong>Dog
      <br />
      <strong>Breed: </strong>German Shepherd Dog
      <br />
      <strong>Sex: </strong>Male
      <br />
      <strong>Age: </strong>2 years, 0 months
      <br />
      <strong>Weight: </strong>64.6 lbs
      <br />
      <strong>Adoption Fee: </strong>$250
      <br />
      <br />
      </div>
      </div>





      Here is my code so far:






      	    try{
      Document dogs = Jsoup.connect("https://southwesthumane.org/adopt/dogs/").get();
      Elements links_dogs = dogs.select(":containsOwn(Details »)");
      for (Element link : links_dogs) {
      String test = "https://southwesthumane.org" + link.attr("href");
      System.out.println("url: " + test);
      try{
      Document dog = Jsoup.connect(test).get();
      Elements name = dog.select("h3");
      Elements description = dog.select("div.Animaldetails");
      for (Element code : name) {
      System.out.println("Name: " + code.text());
      }
      for (Element code : description) {
      System.out.println("Description: " + code.select("p").text());
      System.out.println(code.select("strong").first().text());
      System.out.println(code.select("div.Animaldetails").text());
      }
      } catch (IOException e) {
      e.printStackTrace();
      }
      }





      This line:




      System.out.println(code.select("div.Animaldetails").text());




      is taking all the information I need but I do not know how to parse each individual line because ultimately I will save each individual information into a list. Any help would be greatly appreciated. Thank you for your time!










      share|improve this question














      This is the source code that I am trying to scrape with Jsoup. I am interested in taking data from following: "Code Number", "Date Available", "Type", "Breed", "Sex", "Age", "Weight" and "Adoption Fee". That is, I am looking for my output to be:



      Code Number: 107796



      Date Availabe: 11/20/2018



      Type: Dog



      Breed: German Shepherd Dog



      Sex: Male



      Age: 2 years, 0 months



      Weight: 64.6 lbs



      Adoption Fee: $250



      Source code from:
      view-source:https://southwesthumane.org/adopt/dogs/dog-details/?id=84807
      lines 186-215






      <div id="ContentPlaceHolder_Item3_AnimalDetails_2_divDetails">
      <h3>Alan</h3>
      <div class="float-to-right animal-slideshow">
      <div class="cycle-slideshow" data-cycle-fx="Fade" data-cycle-timeout="0" data-cycle-auto-height="container" data-cycle-pager="#adv-custom-pager" data-cycle-pager-template="<a href='#'><img src='{{src}}' width=50 height=50></a>">
      <img src="http://southwesthumanepets.shelterbuddy.com/photos/lostfound/84807.jpg" />
      </div>
      <div id="adv-custom-pager"></div>
      </div>
      <div class="AnimalDetails">
      <p>Alan is looking for a new best friend! Could it be you? Alan is new to the shelter and we are still getting to know his unique personality. If Alan looks like your dream dog, let the staff know you are interested in meeting him. Going to a new home can be exciting and strange for pets, so it's best for them to meet any children and other dogs in their future home. Alan can't wait to meet his forever family!</p>
      <br />
      <strong>Code Number: </strong>107796
      <br />
      <strong>Date Available: </strong>11/20/2018
      <br />
      <strong>Type: </strong>Dog
      <br />
      <strong>Breed: </strong>German Shepherd Dog
      <br />
      <strong>Sex: </strong>Male
      <br />
      <strong>Age: </strong>2 years, 0 months
      <br />
      <strong>Weight: </strong>64.6 lbs
      <br />
      <strong>Adoption Fee: </strong>$250
      <br />
      <br />
      </div>
      </div>





      Here is my code so far:






      	    try{
      Document dogs = Jsoup.connect("https://southwesthumane.org/adopt/dogs/").get();
      Elements links_dogs = dogs.select(":containsOwn(Details »)");
      for (Element link : links_dogs) {
      String test = "https://southwesthumane.org" + link.attr("href");
      System.out.println("url: " + test);
      try{
      Document dog = Jsoup.connect(test).get();
      Elements name = dog.select("h3");
      Elements description = dog.select("div.Animaldetails");
      for (Element code : name) {
      System.out.println("Name: " + code.text());
      }
      for (Element code : description) {
      System.out.println("Description: " + code.select("p").text());
      System.out.println(code.select("strong").first().text());
      System.out.println(code.select("div.Animaldetails").text());
      }
      } catch (IOException e) {
      e.printStackTrace();
      }
      }





      This line:




      System.out.println(code.select("div.Animaldetails").text());




      is taking all the information I need but I do not know how to parse each individual line because ultimately I will save each individual information into a list. Any help would be greatly appreciated. Thank you for your time!






      <div id="ContentPlaceHolder_Item3_AnimalDetails_2_divDetails">
      <h3>Alan</h3>
      <div class="float-to-right animal-slideshow">
      <div class="cycle-slideshow" data-cycle-fx="Fade" data-cycle-timeout="0" data-cycle-auto-height="container" data-cycle-pager="#adv-custom-pager" data-cycle-pager-template="<a href='#'><img src='{{src}}' width=50 height=50></a>">
      <img src="http://southwesthumanepets.shelterbuddy.com/photos/lostfound/84807.jpg" />
      </div>
      <div id="adv-custom-pager"></div>
      </div>
      <div class="AnimalDetails">
      <p>Alan is looking for a new best friend! Could it be you? Alan is new to the shelter and we are still getting to know his unique personality. If Alan looks like your dream dog, let the staff know you are interested in meeting him. Going to a new home can be exciting and strange for pets, so it's best for them to meet any children and other dogs in their future home. Alan can't wait to meet his forever family!</p>
      <br />
      <strong>Code Number: </strong>107796
      <br />
      <strong>Date Available: </strong>11/20/2018
      <br />
      <strong>Type: </strong>Dog
      <br />
      <strong>Breed: </strong>German Shepherd Dog
      <br />
      <strong>Sex: </strong>Male
      <br />
      <strong>Age: </strong>2 years, 0 months
      <br />
      <strong>Weight: </strong>64.6 lbs
      <br />
      <strong>Adoption Fee: </strong>$250
      <br />
      <br />
      </div>
      </div>





      <div id="ContentPlaceHolder_Item3_AnimalDetails_2_divDetails">
      <h3>Alan</h3>
      <div class="float-to-right animal-slideshow">
      <div class="cycle-slideshow" data-cycle-fx="Fade" data-cycle-timeout="0" data-cycle-auto-height="container" data-cycle-pager="#adv-custom-pager" data-cycle-pager-template="<a href='#'><img src='{{src}}' width=50 height=50></a>">
      <img src="http://southwesthumanepets.shelterbuddy.com/photos/lostfound/84807.jpg" />
      </div>
      <div id="adv-custom-pager"></div>
      </div>
      <div class="AnimalDetails">
      <p>Alan is looking for a new best friend! Could it be you? Alan is new to the shelter and we are still getting to know his unique personality. If Alan looks like your dream dog, let the staff know you are interested in meeting him. Going to a new home can be exciting and strange for pets, so it's best for them to meet any children and other dogs in their future home. Alan can't wait to meet his forever family!</p>
      <br />
      <strong>Code Number: </strong>107796
      <br />
      <strong>Date Available: </strong>11/20/2018
      <br />
      <strong>Type: </strong>Dog
      <br />
      <strong>Breed: </strong>German Shepherd Dog
      <br />
      <strong>Sex: </strong>Male
      <br />
      <strong>Age: </strong>2 years, 0 months
      <br />
      <strong>Weight: </strong>64.6 lbs
      <br />
      <strong>Adoption Fee: </strong>$250
      <br />
      <br />
      </div>
      </div>





      	    try{
      Document dogs = Jsoup.connect("https://southwesthumane.org/adopt/dogs/").get();
      Elements links_dogs = dogs.select(":containsOwn(Details »)");
      for (Element link : links_dogs) {
      String test = "https://southwesthumane.org" + link.attr("href");
      System.out.println("url: " + test);
      try{
      Document dog = Jsoup.connect(test).get();
      Elements name = dog.select("h3");
      Elements description = dog.select("div.Animaldetails");
      for (Element code : name) {
      System.out.println("Name: " + code.text());
      }
      for (Element code : description) {
      System.out.println("Description: " + code.select("p").text());
      System.out.println(code.select("strong").first().text());
      System.out.println(code.select("div.Animaldetails").text());
      }
      } catch (IOException e) {
      e.printStackTrace();
      }
      }





      	    try{
      Document dogs = Jsoup.connect("https://southwesthumane.org/adopt/dogs/").get();
      Elements links_dogs = dogs.select(":containsOwn(Details »)");
      for (Element link : links_dogs) {
      String test = "https://southwesthumane.org" + link.attr("href");
      System.out.println("url: " + test);
      try{
      Document dog = Jsoup.connect(test).get();
      Elements name = dog.select("h3");
      Elements description = dog.select("div.Animaldetails");
      for (Element code : name) {
      System.out.println("Name: " + code.text());
      }
      for (Element code : description) {
      System.out.println("Description: " + code.select("p").text());
      System.out.println(code.select("strong").first().text());
      System.out.println(code.select("div.Animaldetails").text());
      }
      } catch (IOException e) {
      e.printStackTrace();
      }
      }






      jsoup






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 '18 at 23:47









      Samuel SereaSamuel Serea

      111




      111
























          2 Answers
          2






          active

          oldest

          votes


















          0














          I checked @Eritrean answer, but I guess mine is a closer approach to get exactly what you are looking for in a more clear way! Here is a sample code to do exactly what you want to using JSOUP:



          public class Main {

          public static void main(String args) {
          try {
          String url = "https://southwesthumane.org/adopt/dogs/dog-details/?id=84807";

          Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
          Elements elements = document.select("div.AnimalDetails > strong");

          for (Element element : elements) {
          System.out.println(element.text() + element.nextSibling().toString());
          }

          } catch (IOException e) {
          e.printStackTrace();
          }

          }
          }


          As you can see, one you establish the connection to your desired URL you just need to select all the strong HTML tags contained inside the div HTML tag with class name AnimalDetails.



          Once you do that you are going to get an Elements object from JSOUP, and you need to loop over it using a FOR EACH loop. In which you are going to get all the elements containing a strong HTML tag.



          What you have to do now is to retrieve the text contained between those tags using the .text() selector from JSOUP, and as the HTML code is structured you need to retrieve the next element, that it's the value you are looking for.



          As the HTML structure of the AnimalDetails div is like this:



          <br>
          <strong>Code Number: </strong>107796
          <br>
          <strong>Date Available: </strong>11/20/2018
          <br>
          ...
          and so on


          You now need to get the sibling element of the strong HTML tag using the .nextSibling() selector from JSOUP and the convert it into a String using the .toString() method. This, as you can see, retrieves the value you are looking for. Then you just need to print it as your desired output as described in the new FOR EACH loop.



          Your desired output will look as it follows:



          desired scrapping output



          Hope this helped you! For further information feel free to ask me!






          share|improve this answer































            0














            You can select the strong HTML tags, and for each tag retrieved, get the nextSibling. Try it out by changing your for each loop:



            for (Element code : description) {
            System.out.println("Description: " + code.select("p").text());
            System.out.println(code.select("strong").first().text());
            System.out.println(code.select("div.AnimalDetails").text());
            }


            to:



            for (Element code : description) {
            Elements strongs = code.select("strong");
            for(Element e : strongs){
            System.out.println(e.text() + e.nextSibling().toString());
            }
            System.out.println();
            }





            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403313%2fhow-to-get-specific-lines-with-jsoup%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0














              I checked @Eritrean answer, but I guess mine is a closer approach to get exactly what you are looking for in a more clear way! Here is a sample code to do exactly what you want to using JSOUP:



              public class Main {

              public static void main(String args) {
              try {
              String url = "https://southwesthumane.org/adopt/dogs/dog-details/?id=84807";

              Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
              Elements elements = document.select("div.AnimalDetails > strong");

              for (Element element : elements) {
              System.out.println(element.text() + element.nextSibling().toString());
              }

              } catch (IOException e) {
              e.printStackTrace();
              }

              }
              }


              As you can see, one you establish the connection to your desired URL you just need to select all the strong HTML tags contained inside the div HTML tag with class name AnimalDetails.



              Once you do that you are going to get an Elements object from JSOUP, and you need to loop over it using a FOR EACH loop. In which you are going to get all the elements containing a strong HTML tag.



              What you have to do now is to retrieve the text contained between those tags using the .text() selector from JSOUP, and as the HTML code is structured you need to retrieve the next element, that it's the value you are looking for.



              As the HTML structure of the AnimalDetails div is like this:



              <br>
              <strong>Code Number: </strong>107796
              <br>
              <strong>Date Available: </strong>11/20/2018
              <br>
              ...
              and so on


              You now need to get the sibling element of the strong HTML tag using the .nextSibling() selector from JSOUP and the convert it into a String using the .toString() method. This, as you can see, retrieves the value you are looking for. Then you just need to print it as your desired output as described in the new FOR EACH loop.



              Your desired output will look as it follows:



              desired scrapping output



              Hope this helped you! For further information feel free to ask me!






              share|improve this answer




























                0














                I checked @Eritrean answer, but I guess mine is a closer approach to get exactly what you are looking for in a more clear way! Here is a sample code to do exactly what you want to using JSOUP:



                public class Main {

                public static void main(String args) {
                try {
                String url = "https://southwesthumane.org/adopt/dogs/dog-details/?id=84807";

                Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
                Elements elements = document.select("div.AnimalDetails > strong");

                for (Element element : elements) {
                System.out.println(element.text() + element.nextSibling().toString());
                }

                } catch (IOException e) {
                e.printStackTrace();
                }

                }
                }


                As you can see, one you establish the connection to your desired URL you just need to select all the strong HTML tags contained inside the div HTML tag with class name AnimalDetails.



                Once you do that you are going to get an Elements object from JSOUP, and you need to loop over it using a FOR EACH loop. In which you are going to get all the elements containing a strong HTML tag.



                What you have to do now is to retrieve the text contained between those tags using the .text() selector from JSOUP, and as the HTML code is structured you need to retrieve the next element, that it's the value you are looking for.



                As the HTML structure of the AnimalDetails div is like this:



                <br>
                <strong>Code Number: </strong>107796
                <br>
                <strong>Date Available: </strong>11/20/2018
                <br>
                ...
                and so on


                You now need to get the sibling element of the strong HTML tag using the .nextSibling() selector from JSOUP and the convert it into a String using the .toString() method. This, as you can see, retrieves the value you are looking for. Then you just need to print it as your desired output as described in the new FOR EACH loop.



                Your desired output will look as it follows:



                desired scrapping output



                Hope this helped you! For further information feel free to ask me!






                share|improve this answer


























                  0












                  0








                  0







                  I checked @Eritrean answer, but I guess mine is a closer approach to get exactly what you are looking for in a more clear way! Here is a sample code to do exactly what you want to using JSOUP:



                  public class Main {

                  public static void main(String args) {
                  try {
                  String url = "https://southwesthumane.org/adopt/dogs/dog-details/?id=84807";

                  Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
                  Elements elements = document.select("div.AnimalDetails > strong");

                  for (Element element : elements) {
                  System.out.println(element.text() + element.nextSibling().toString());
                  }

                  } catch (IOException e) {
                  e.printStackTrace();
                  }

                  }
                  }


                  As you can see, one you establish the connection to your desired URL you just need to select all the strong HTML tags contained inside the div HTML tag with class name AnimalDetails.



                  Once you do that you are going to get an Elements object from JSOUP, and you need to loop over it using a FOR EACH loop. In which you are going to get all the elements containing a strong HTML tag.



                  What you have to do now is to retrieve the text contained between those tags using the .text() selector from JSOUP, and as the HTML code is structured you need to retrieve the next element, that it's the value you are looking for.



                  As the HTML structure of the AnimalDetails div is like this:



                  <br>
                  <strong>Code Number: </strong>107796
                  <br>
                  <strong>Date Available: </strong>11/20/2018
                  <br>
                  ...
                  and so on


                  You now need to get the sibling element of the strong HTML tag using the .nextSibling() selector from JSOUP and the convert it into a String using the .toString() method. This, as you can see, retrieves the value you are looking for. Then you just need to print it as your desired output as described in the new FOR EACH loop.



                  Your desired output will look as it follows:



                  desired scrapping output



                  Hope this helped you! For further information feel free to ask me!






                  share|improve this answer













                  I checked @Eritrean answer, but I guess mine is a closer approach to get exactly what you are looking for in a more clear way! Here is a sample code to do exactly what you want to using JSOUP:



                  public class Main {

                  public static void main(String args) {
                  try {
                  String url = "https://southwesthumane.org/adopt/dogs/dog-details/?id=84807";

                  Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
                  Elements elements = document.select("div.AnimalDetails > strong");

                  for (Element element : elements) {
                  System.out.println(element.text() + element.nextSibling().toString());
                  }

                  } catch (IOException e) {
                  e.printStackTrace();
                  }

                  }
                  }


                  As you can see, one you establish the connection to your desired URL you just need to select all the strong HTML tags contained inside the div HTML tag with class name AnimalDetails.



                  Once you do that you are going to get an Elements object from JSOUP, and you need to loop over it using a FOR EACH loop. In which you are going to get all the elements containing a strong HTML tag.



                  What you have to do now is to retrieve the text contained between those tags using the .text() selector from JSOUP, and as the HTML code is structured you need to retrieve the next element, that it's the value you are looking for.



                  As the HTML structure of the AnimalDetails div is like this:



                  <br>
                  <strong>Code Number: </strong>107796
                  <br>
                  <strong>Date Available: </strong>11/20/2018
                  <br>
                  ...
                  and so on


                  You now need to get the sibling element of the strong HTML tag using the .nextSibling() selector from JSOUP and the convert it into a String using the .toString() method. This, as you can see, retrieves the value you are looking for. Then you just need to print it as your desired output as described in the new FOR EACH loop.



                  Your desired output will look as it follows:



                  desired scrapping output



                  Hope this helped you! For further information feel free to ask me!







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 '18 at 12:38









                  alvarobarttalvarobartt

                  12418




                  12418

























                      0














                      You can select the strong HTML tags, and for each tag retrieved, get the nextSibling. Try it out by changing your for each loop:



                      for (Element code : description) {
                      System.out.println("Description: " + code.select("p").text());
                      System.out.println(code.select("strong").first().text());
                      System.out.println(code.select("div.AnimalDetails").text());
                      }


                      to:



                      for (Element code : description) {
                      Elements strongs = code.select("strong");
                      for(Element e : strongs){
                      System.out.println(e.text() + e.nextSibling().toString());
                      }
                      System.out.println();
                      }





                      share|improve this answer






























                        0














                        You can select the strong HTML tags, and for each tag retrieved, get the nextSibling. Try it out by changing your for each loop:



                        for (Element code : description) {
                        System.out.println("Description: " + code.select("p").text());
                        System.out.println(code.select("strong").first().text());
                        System.out.println(code.select("div.AnimalDetails").text());
                        }


                        to:



                        for (Element code : description) {
                        Elements strongs = code.select("strong");
                        for(Element e : strongs){
                        System.out.println(e.text() + e.nextSibling().toString());
                        }
                        System.out.println();
                        }





                        share|improve this answer




























                          0












                          0








                          0







                          You can select the strong HTML tags, and for each tag retrieved, get the nextSibling. Try it out by changing your for each loop:



                          for (Element code : description) {
                          System.out.println("Description: " + code.select("p").text());
                          System.out.println(code.select("strong").first().text());
                          System.out.println(code.select("div.AnimalDetails").text());
                          }


                          to:



                          for (Element code : description) {
                          Elements strongs = code.select("strong");
                          for(Element e : strongs){
                          System.out.println(e.text() + e.nextSibling().toString());
                          }
                          System.out.println();
                          }





                          share|improve this answer















                          You can select the strong HTML tags, and for each tag retrieved, get the nextSibling. Try it out by changing your for each loop:



                          for (Element code : description) {
                          System.out.println("Description: " + code.select("p").text());
                          System.out.println(code.select("strong").first().text());
                          System.out.println(code.select("div.AnimalDetails").text());
                          }


                          to:



                          for (Element code : description) {
                          Elements strongs = code.select("strong");
                          for(Element e : strongs){
                          System.out.println(e.text() + e.nextSibling().toString());
                          }
                          System.out.println();
                          }






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 21 '18 at 12:44









                          alvarobartt

                          12418




                          12418










                          answered Nov 21 '18 at 11:16









                          EritreanEritrean

                          3,6961915




                          3,6961915






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53403313%2fhow-to-get-specific-lines-with-jsoup%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Guess what letter conforming each word

                              Port of Spain

                              Run scheduled task as local user group (not BUILTIN)