Remove re-occuring text strings [closed]












0















I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










share|improve this question















closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

If this question can be reworded to fit the rules in the help center, please edit the question.


















    0















    I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



    My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



    AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


    But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










    share|improve this question















    closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


    This question appears to be off-topic. The users who voted to close gave this specific reason:


    • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

    If this question can be reworded to fit the rules in the help center, please edit the question.
















      0












      0








      0








      I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



      My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



      AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


      But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










      share|improve this question
















      I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



      My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



      AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


      But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?







      r regex






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 8:24









      snoram

      6,549831




      6,549831










      asked Nov 16 '18 at 7:57









      VictorVictor

      11




      11




      closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

      If this question can be reworded to fit the rules in the help center, please edit the question.




      closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

      If this question can be reworded to fit the rules in the help center, please edit the question.
























          1 Answer
          1






          active

          oldest

          votes


















          1














          You need to use a non-greedy regular expression.



          Try



          AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


          The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






          share|improve this answer






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            You need to use a non-greedy regular expression.



            Try



            AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


            The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






            share|improve this answer




























              1














              You need to use a non-greedy regular expression.



              Try



              AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


              The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






              share|improve this answer


























                1












                1








                1







                You need to use a non-greedy regular expression.



                Try



                AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


                The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






                share|improve this answer













                You need to use a non-greedy regular expression.



                Try



                AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


                The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 16 '18 at 8:05









                LAPLAP

                5,2702622




                5,2702622















                    Popular posts from this blog

                    How to pass form data using jquery Ajax to insert data in database?

                    National Museum of Racing and Hall of Fame

                    Guess what letter conforming each word