html scraping in either batch or powershell [closed]











up vote
-4
down vote

favorite












I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:



</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&amp;home
dab
password: captain

<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>


I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?










share|improve this question















closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.



















    up vote
    -4
    down vote

    favorite












    I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:



    </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
    jim (you)
    password: (blank/none)
    bob
    password: Littl3@birD
    batman
    password: 3ndur4N(e&amp;home
    dab
    password: captain

    <b>Authorized Users:</b>
    bag
    crab
    oliver
    james
    scott
    john
    apple
    </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>


    I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?










    share|improve this question















    closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56


    Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

















      up vote
      -4
      down vote

      favorite









      up vote
      -4
      down vote

      favorite











      I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:



      </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
      jim (you)
      password: (blank/none)
      bob
      password: Littl3@birD
      batman
      password: 3ndur4N(e&amp;home
      dab
      password: captain

      <b>Authorized Users:</b>
      bag
      crab
      oliver
      james
      scott
      john
      apple
      </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>


      I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?










      share|improve this question















      I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:



      </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
      jim (you)
      password: (blank/none)
      bob
      password: Littl3@birD
      batman
      password: 3ndur4N(e&amp;home
      dab
      password: captain

      <b>Authorized Users:</b>
      bag
      crab
      oliver
      james
      scott
      john
      apple
      </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>


      I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?







      html powershell batch-file web-scraping






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 12 at 13:03









      mklement0

      124k20237265




      124k20237265










      asked Nov 11 at 19:13









      LandonBB

      32




      32




      closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56


      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






      closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56


      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.


























          3 Answers
          3






          active

          oldest

          votes

















          up vote
          -1
          down vote



          accepted










          I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.



          Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.



          Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:



          # $html is assumed to contain the input HTML text (can be a full document).
          $admins, $users = (
          # Split the HTML text into the sections of interest.
          $html -split
          'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
          -ne '' `
          -replace '<.*'
          ).ForEach({
          # Extract admin lines and user lines each, as an array.
          , ($_ -split 'r?n' -ne '')
          })

          # Clean up the $admins array and transform the username-password pairs
          # into custom objects with .username and .password properties.
          $admins = $admins -split 's+password:s+' -ne ''
          $i = 0;
          $admins.ForEach({
          if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
          else { $co.password = $_; $co }
          })

          # Create custom objects with the same structure for the users.
          $users = $users.ForEach({
          [pscustomobject] @{ username = $_; password = '' }
          })

          # Output to CSV files.
          $admins | Export-Csv admins.csv
          $users | Export-Csv users.csv
          $admins + $users | Export-Csv all.csv


          Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.






          share|improve this answer






























            up vote
            1
            down vote













            Here's my attempt to get what you are after.



            $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
            $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

            # get the content of the web page
            $html = (Invoke-WebRequest -Uri $url).Content

            # load the assembly to de-entify the HTML content
            Add-Type -AssemblyName System.Web
            $html = [System.Web.HttpUtility]::HtmlDecode($html)

            # get the Authorized Admins block
            if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
            $adminblock = $matches[1].Trim()
            # inside this text block, get the admin usernames and passwords
            $admins = @()
            $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
            $match = $regex.Match($adminblock)
            while ($match.Success) {
            $admins += [PSCustomObject]@{
            'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
            'Type' = 'Admin'
            # comment out this next property if you don't want passwords in the output
            'Password' = $match.Groups['password'].Value.Trim()
            }
            $match = $match.NextMatch()
            }

            } else {
            Write-Warning "Could not find 'Authorized Administrators' text block."
            }

            # get the Authorized Users block
            if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
            $userblock = $matches[1].Trim()
            # inside this text block, get the authorized usernames
            $users = @()
            $regex = [regex] '(?m)(?<name>.+)'
            $match = $regex.Match($userblock)
            while ($match.Success) {
            $users += [PSCustomObject]@{
            'Name' = $match.Groups['name'].Value.Trim()
            'Type' = 'User'
            }
            $match = $match.NextMatch()
            }
            } else {
            Write-Warning "Could not find 'Authorized Users' text block."
            }

            # write the csv files
            $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
            $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
            ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force


            When finished, you will have three CSV files:



            admins.csv



            Name   Type  Password      
            ---- ---- --------
            jim Admin (blank/none)
            bob Admin Littl3@birD
            batman Admin 3ndur4N(e&home
            dab Admin captain


            users.csv



            Name   Type
            ---- ----
            bag User
            crab User
            oliver User
            james User
            scott User
            john User
            apple User


            adminsandusers.csv



            Name   Type  Password      
            ---- ---- --------
            jim Admin (blank/none)
            bob Admin Littl3@birD
            batman Admin 3ndur4N(e&home
            dab Admin captain
            bag User
            crab User
            oliver User
            james User
            scott User
            john User
            apple User





            share|improve this answer




























              up vote
              -1
              down vote













              this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.



              however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...



              # fake reading in a text file
              # in real life, use Get-Content
              $InStuff = @'
              </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
              jim (you)
              password: (blank/none)
              bob
              password: Littl3@birD
              batman
              password: 3ndur4N(e&amp;home
              dab
              password: captain

              <b>Authorized Users:</b>
              bag
              crab
              oliver
              james
              scott
              john
              apple
              </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
              '@ -split [environment]::NewLine

              $CleanedInStuff = $InStuff.
              Where({
              $_ -notmatch '^</' -and
              $_ -notmatch '^ ' -and
              $_
              })

              $UserType = 'Administrator'
              $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
              {
              if ($CIS_Item.StartsWith('<b>'))
              {
              $UserType = 'User'
              continue
              }
              [PSCustomObject]@{
              Name = $CIS_Item.Trim()
              UserType = $UserType
              }
              }

              # on screen
              $UserInfo

              # to CSV
              $UserInfo |
              Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation


              on screen output ...



              Name      UserType     
              ---- --------
              jim (you) Administrator
              bob Administrator
              batman Administrator
              dab Administrator
              bag User
              crab User
              oliver User
              james User
              scott User
              john User
              apple User


              CSV file content ...



              "Name","UserType"
              "jim (you)","Administrator"
              "bob","Administrator"
              "batman","Administrator"
              "dab","Administrator"
              "bag","User"
              "crab","User"
              "oliver","User"
              "james","User"
              "scott","User"
              "john","User"
              "apple","User"





              share|improve this answer




























                3 Answers
                3






                active

                oldest

                votes








                3 Answers
                3






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes








                up vote
                -1
                down vote



                accepted










                I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.



                Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.



                Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:



                # $html is assumed to contain the input HTML text (can be a full document).
                $admins, $users = (
                # Split the HTML text into the sections of interest.
                $html -split
                'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                -ne '' `
                -replace '<.*'
                ).ForEach({
                # Extract admin lines and user lines each, as an array.
                , ($_ -split 'r?n' -ne '')
                })

                # Clean up the $admins array and transform the username-password pairs
                # into custom objects with .username and .password properties.
                $admins = $admins -split 's+password:s+' -ne ''
                $i = 0;
                $admins.ForEach({
                if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                else { $co.password = $_; $co }
                })

                # Create custom objects with the same structure for the users.
                $users = $users.ForEach({
                [pscustomobject] @{ username = $_; password = '' }
                })

                # Output to CSV files.
                $admins | Export-Csv admins.csv
                $users | Export-Csv users.csv
                $admins + $users | Export-Csv all.csv


                Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.






                share|improve this answer



























                  up vote
                  -1
                  down vote



                  accepted










                  I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.



                  Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.



                  Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:



                  # $html is assumed to contain the input HTML text (can be a full document).
                  $admins, $users = (
                  # Split the HTML text into the sections of interest.
                  $html -split
                  'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                  -ne '' `
                  -replace '<.*'
                  ).ForEach({
                  # Extract admin lines and user lines each, as an array.
                  , ($_ -split 'r?n' -ne '')
                  })

                  # Clean up the $admins array and transform the username-password pairs
                  # into custom objects with .username and .password properties.
                  $admins = $admins -split 's+password:s+' -ne ''
                  $i = 0;
                  $admins.ForEach({
                  if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                  else { $co.password = $_; $co }
                  })

                  # Create custom objects with the same structure for the users.
                  $users = $users.ForEach({
                  [pscustomobject] @{ username = $_; password = '' }
                  })

                  # Output to CSV files.
                  $admins | Export-Csv admins.csv
                  $users | Export-Csv users.csv
                  $admins + $users | Export-Csv all.csv


                  Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.






                  share|improve this answer

























                    up vote
                    -1
                    down vote



                    accepted







                    up vote
                    -1
                    down vote



                    accepted






                    I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.



                    Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.



                    Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:



                    # $html is assumed to contain the input HTML text (can be a full document).
                    $admins, $users = (
                    # Split the HTML text into the sections of interest.
                    $html -split
                    'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                    -ne '' `
                    -replace '<.*'
                    ).ForEach({
                    # Extract admin lines and user lines each, as an array.
                    , ($_ -split 'r?n' -ne '')
                    })

                    # Clean up the $admins array and transform the username-password pairs
                    # into custom objects with .username and .password properties.
                    $admins = $admins -split 's+password:s+' -ne ''
                    $i = 0;
                    $admins.ForEach({
                    if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                    else { $co.password = $_; $co }
                    })

                    # Create custom objects with the same structure for the users.
                    $users = $users.ForEach({
                    [pscustomobject] @{ username = $_; password = '' }
                    })

                    # Output to CSV files.
                    $admins | Export-Csv admins.csv
                    $users | Export-Csv users.csv
                    $admins + $users | Export-Csv all.csv


                    Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.






                    share|improve this answer














                    I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.



                    Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.



                    Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:



                    # $html is assumed to contain the input HTML text (can be a full document).
                    $admins, $users = (
                    # Split the HTML text into the sections of interest.
                    $html -split
                    'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                    -ne '' `
                    -replace '<.*'
                    ).ForEach({
                    # Extract admin lines and user lines each, as an array.
                    , ($_ -split 'r?n' -ne '')
                    })

                    # Clean up the $admins array and transform the username-password pairs
                    # into custom objects with .username and .password properties.
                    $admins = $admins -split 's+password:s+' -ne ''
                    $i = 0;
                    $admins.ForEach({
                    if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                    else { $co.password = $_; $co }
                    })

                    # Create custom objects with the same structure for the users.
                    $users = $users.ForEach({
                    [pscustomobject] @{ username = $_; password = '' }
                    })

                    # Output to CSV files.
                    $admins | Export-Csv admins.csv
                    $users | Export-Csv users.csv
                    $admins + $users | Export-Csv all.csv


                    Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.







                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 12 at 12:49

























                    answered Nov 11 at 21:50









                    mklement0

                    124k20237265




                    124k20237265
























                        up vote
                        1
                        down vote













                        Here's my attempt to get what you are after.



                        $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                        $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                        # get the content of the web page
                        $html = (Invoke-WebRequest -Uri $url).Content

                        # load the assembly to de-entify the HTML content
                        Add-Type -AssemblyName System.Web
                        $html = [System.Web.HttpUtility]::HtmlDecode($html)

                        # get the Authorized Admins block
                        if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                        $adminblock = $matches[1].Trim()
                        # inside this text block, get the admin usernames and passwords
                        $admins = @()
                        $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                        $match = $regex.Match($adminblock)
                        while ($match.Success) {
                        $admins += [PSCustomObject]@{
                        'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                        'Type' = 'Admin'
                        # comment out this next property if you don't want passwords in the output
                        'Password' = $match.Groups['password'].Value.Trim()
                        }
                        $match = $match.NextMatch()
                        }

                        } else {
                        Write-Warning "Could not find 'Authorized Administrators' text block."
                        }

                        # get the Authorized Users block
                        if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                        $userblock = $matches[1].Trim()
                        # inside this text block, get the authorized usernames
                        $users = @()
                        $regex = [regex] '(?m)(?<name>.+)'
                        $match = $regex.Match($userblock)
                        while ($match.Success) {
                        $users += [PSCustomObject]@{
                        'Name' = $match.Groups['name'].Value.Trim()
                        'Type' = 'User'
                        }
                        $match = $match.NextMatch()
                        }
                        } else {
                        Write-Warning "Could not find 'Authorized Users' text block."
                        }

                        # write the csv files
                        $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                        $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                        ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force


                        When finished, you will have three CSV files:



                        admins.csv



                        Name   Type  Password      
                        ---- ---- --------
                        jim Admin (blank/none)
                        bob Admin Littl3@birD
                        batman Admin 3ndur4N(e&home
                        dab Admin captain


                        users.csv



                        Name   Type
                        ---- ----
                        bag User
                        crab User
                        oliver User
                        james User
                        scott User
                        john User
                        apple User


                        adminsandusers.csv



                        Name   Type  Password      
                        ---- ---- --------
                        jim Admin (blank/none)
                        bob Admin Littl3@birD
                        batman Admin 3ndur4N(e&home
                        dab Admin captain
                        bag User
                        crab User
                        oliver User
                        james User
                        scott User
                        john User
                        apple User





                        share|improve this answer

























                          up vote
                          1
                          down vote













                          Here's my attempt to get what you are after.



                          $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                          $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                          # get the content of the web page
                          $html = (Invoke-WebRequest -Uri $url).Content

                          # load the assembly to de-entify the HTML content
                          Add-Type -AssemblyName System.Web
                          $html = [System.Web.HttpUtility]::HtmlDecode($html)

                          # get the Authorized Admins block
                          if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                          $adminblock = $matches[1].Trim()
                          # inside this text block, get the admin usernames and passwords
                          $admins = @()
                          $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                          $match = $regex.Match($adminblock)
                          while ($match.Success) {
                          $admins += [PSCustomObject]@{
                          'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                          'Type' = 'Admin'
                          # comment out this next property if you don't want passwords in the output
                          'Password' = $match.Groups['password'].Value.Trim()
                          }
                          $match = $match.NextMatch()
                          }

                          } else {
                          Write-Warning "Could not find 'Authorized Administrators' text block."
                          }

                          # get the Authorized Users block
                          if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                          $userblock = $matches[1].Trim()
                          # inside this text block, get the authorized usernames
                          $users = @()
                          $regex = [regex] '(?m)(?<name>.+)'
                          $match = $regex.Match($userblock)
                          while ($match.Success) {
                          $users += [PSCustomObject]@{
                          'Name' = $match.Groups['name'].Value.Trim()
                          'Type' = 'User'
                          }
                          $match = $match.NextMatch()
                          }
                          } else {
                          Write-Warning "Could not find 'Authorized Users' text block."
                          }

                          # write the csv files
                          $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                          $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                          ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force


                          When finished, you will have three CSV files:



                          admins.csv



                          Name   Type  Password      
                          ---- ---- --------
                          jim Admin (blank/none)
                          bob Admin Littl3@birD
                          batman Admin 3ndur4N(e&home
                          dab Admin captain


                          users.csv



                          Name   Type
                          ---- ----
                          bag User
                          crab User
                          oliver User
                          james User
                          scott User
                          john User
                          apple User


                          adminsandusers.csv



                          Name   Type  Password      
                          ---- ---- --------
                          jim Admin (blank/none)
                          bob Admin Littl3@birD
                          batman Admin 3ndur4N(e&home
                          dab Admin captain
                          bag User
                          crab User
                          oliver User
                          james User
                          scott User
                          john User
                          apple User





                          share|improve this answer























                            up vote
                            1
                            down vote










                            up vote
                            1
                            down vote









                            Here's my attempt to get what you are after.



                            $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                            $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                            # get the content of the web page
                            $html = (Invoke-WebRequest -Uri $url).Content

                            # load the assembly to de-entify the HTML content
                            Add-Type -AssemblyName System.Web
                            $html = [System.Web.HttpUtility]::HtmlDecode($html)

                            # get the Authorized Admins block
                            if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                            $adminblock = $matches[1].Trim()
                            # inside this text block, get the admin usernames and passwords
                            $admins = @()
                            $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                            $match = $regex.Match($adminblock)
                            while ($match.Success) {
                            $admins += [PSCustomObject]@{
                            'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                            'Type' = 'Admin'
                            # comment out this next property if you don't want passwords in the output
                            'Password' = $match.Groups['password'].Value.Trim()
                            }
                            $match = $match.NextMatch()
                            }

                            } else {
                            Write-Warning "Could not find 'Authorized Administrators' text block."
                            }

                            # get the Authorized Users block
                            if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                            $userblock = $matches[1].Trim()
                            # inside this text block, get the authorized usernames
                            $users = @()
                            $regex = [regex] '(?m)(?<name>.+)'
                            $match = $regex.Match($userblock)
                            while ($match.Success) {
                            $users += [PSCustomObject]@{
                            'Name' = $match.Groups['name'].Value.Trim()
                            'Type' = 'User'
                            }
                            $match = $match.NextMatch()
                            }
                            } else {
                            Write-Warning "Could not find 'Authorized Users' text block."
                            }

                            # write the csv files
                            $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                            $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                            ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force


                            When finished, you will have three CSV files:



                            admins.csv



                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain


                            users.csv



                            Name   Type
                            ---- ----
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User


                            adminsandusers.csv



                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User





                            share|improve this answer












                            Here's my attempt to get what you are after.



                            $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                            $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                            # get the content of the web page
                            $html = (Invoke-WebRequest -Uri $url).Content

                            # load the assembly to de-entify the HTML content
                            Add-Type -AssemblyName System.Web
                            $html = [System.Web.HttpUtility]::HtmlDecode($html)

                            # get the Authorized Admins block
                            if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                            $adminblock = $matches[1].Trim()
                            # inside this text block, get the admin usernames and passwords
                            $admins = @()
                            $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                            $match = $regex.Match($adminblock)
                            while ($match.Success) {
                            $admins += [PSCustomObject]@{
                            'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                            'Type' = 'Admin'
                            # comment out this next property if you don't want passwords in the output
                            'Password' = $match.Groups['password'].Value.Trim()
                            }
                            $match = $match.NextMatch()
                            }

                            } else {
                            Write-Warning "Could not find 'Authorized Administrators' text block."
                            }

                            # get the Authorized Users block
                            if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                            $userblock = $matches[1].Trim()
                            # inside this text block, get the authorized usernames
                            $users = @()
                            $regex = [regex] '(?m)(?<name>.+)'
                            $match = $regex.Match($userblock)
                            while ($match.Success) {
                            $users += [PSCustomObject]@{
                            'Name' = $match.Groups['name'].Value.Trim()
                            'Type' = 'User'
                            }
                            $match = $match.NextMatch()
                            }
                            } else {
                            Write-Warning "Could not find 'Authorized Users' text block."
                            }

                            # write the csv files
                            $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                            $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                            ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force


                            When finished, you will have three CSV files:



                            admins.csv



                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain


                            users.csv



                            Name   Type
                            ---- ----
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User


                            adminsandusers.csv



                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 11 at 21:12









                            Theo

                            3,0711518




                            3,0711518






















                                up vote
                                -1
                                down vote













                                this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.



                                however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...



                                # fake reading in a text file
                                # in real life, use Get-Content
                                $InStuff = @'
                                </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                jim (you)
                                password: (blank/none)
                                bob
                                password: Littl3@birD
                                batman
                                password: 3ndur4N(e&amp;home
                                dab
                                password: captain

                                <b>Authorized Users:</b>
                                bag
                                crab
                                oliver
                                james
                                scott
                                john
                                apple
                                </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                '@ -split [environment]::NewLine

                                $CleanedInStuff = $InStuff.
                                Where({
                                $_ -notmatch '^</' -and
                                $_ -notmatch '^ ' -and
                                $_
                                })

                                $UserType = 'Administrator'
                                $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                {
                                if ($CIS_Item.StartsWith('<b>'))
                                {
                                $UserType = 'User'
                                continue
                                }
                                [PSCustomObject]@{
                                Name = $CIS_Item.Trim()
                                UserType = $UserType
                                }
                                }

                                # on screen
                                $UserInfo

                                # to CSV
                                $UserInfo |
                                Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation


                                on screen output ...



                                Name      UserType     
                                ---- --------
                                jim (you) Administrator
                                bob Administrator
                                batman Administrator
                                dab Administrator
                                bag User
                                crab User
                                oliver User
                                james User
                                scott User
                                john User
                                apple User


                                CSV file content ...



                                "Name","UserType"
                                "jim (you)","Administrator"
                                "bob","Administrator"
                                "batman","Administrator"
                                "dab","Administrator"
                                "bag","User"
                                "crab","User"
                                "oliver","User"
                                "james","User"
                                "scott","User"
                                "john","User"
                                "apple","User"





                                share|improve this answer

























                                  up vote
                                  -1
                                  down vote













                                  this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.



                                  however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...



                                  # fake reading in a text file
                                  # in real life, use Get-Content
                                  $InStuff = @'
                                  </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                  jim (you)
                                  password: (blank/none)
                                  bob
                                  password: Littl3@birD
                                  batman
                                  password: 3ndur4N(e&amp;home
                                  dab
                                  password: captain

                                  <b>Authorized Users:</b>
                                  bag
                                  crab
                                  oliver
                                  james
                                  scott
                                  john
                                  apple
                                  </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                  '@ -split [environment]::NewLine

                                  $CleanedInStuff = $InStuff.
                                  Where({
                                  $_ -notmatch '^</' -and
                                  $_ -notmatch '^ ' -and
                                  $_
                                  })

                                  $UserType = 'Administrator'
                                  $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                  {
                                  if ($CIS_Item.StartsWith('<b>'))
                                  {
                                  $UserType = 'User'
                                  continue
                                  }
                                  [PSCustomObject]@{
                                  Name = $CIS_Item.Trim()
                                  UserType = $UserType
                                  }
                                  }

                                  # on screen
                                  $UserInfo

                                  # to CSV
                                  $UserInfo |
                                  Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation


                                  on screen output ...



                                  Name      UserType     
                                  ---- --------
                                  jim (you) Administrator
                                  bob Administrator
                                  batman Administrator
                                  dab Administrator
                                  bag User
                                  crab User
                                  oliver User
                                  james User
                                  scott User
                                  john User
                                  apple User


                                  CSV file content ...



                                  "Name","UserType"
                                  "jim (you)","Administrator"
                                  "bob","Administrator"
                                  "batman","Administrator"
                                  "dab","Administrator"
                                  "bag","User"
                                  "crab","User"
                                  "oliver","User"
                                  "james","User"
                                  "scott","User"
                                  "john","User"
                                  "apple","User"





                                  share|improve this answer























                                    up vote
                                    -1
                                    down vote










                                    up vote
                                    -1
                                    down vote









                                    this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.



                                    however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...



                                    # fake reading in a text file
                                    # in real life, use Get-Content
                                    $InStuff = @'
                                    </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                    jim (you)
                                    password: (blank/none)
                                    bob
                                    password: Littl3@birD
                                    batman
                                    password: 3ndur4N(e&amp;home
                                    dab
                                    password: captain

                                    <b>Authorized Users:</b>
                                    bag
                                    crab
                                    oliver
                                    james
                                    scott
                                    john
                                    apple
                                    </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                    '@ -split [environment]::NewLine

                                    $CleanedInStuff = $InStuff.
                                    Where({
                                    $_ -notmatch '^</' -and
                                    $_ -notmatch '^ ' -and
                                    $_
                                    })

                                    $UserType = 'Administrator'
                                    $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                    {
                                    if ($CIS_Item.StartsWith('<b>'))
                                    {
                                    $UserType = 'User'
                                    continue
                                    }
                                    [PSCustomObject]@{
                                    Name = $CIS_Item.Trim()
                                    UserType = $UserType
                                    }
                                    }

                                    # on screen
                                    $UserInfo

                                    # to CSV
                                    $UserInfo |
                                    Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation


                                    on screen output ...



                                    Name      UserType     
                                    ---- --------
                                    jim (you) Administrator
                                    bob Administrator
                                    batman Administrator
                                    dab Administrator
                                    bag User
                                    crab User
                                    oliver User
                                    james User
                                    scott User
                                    john User
                                    apple User


                                    CSV file content ...



                                    "Name","UserType"
                                    "jim (you)","Administrator"
                                    "bob","Administrator"
                                    "batman","Administrator"
                                    "dab","Administrator"
                                    "bag","User"
                                    "crab","User"
                                    "oliver","User"
                                    "james","User"
                                    "scott","User"
                                    "john","User"
                                    "apple","User"





                                    share|improve this answer












                                    this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.



                                    however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...



                                    # fake reading in a text file
                                    # in real life, use Get-Content
                                    $InStuff = @'
                                    </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                    jim (you)
                                    password: (blank/none)
                                    bob
                                    password: Littl3@birD
                                    batman
                                    password: 3ndur4N(e&amp;home
                                    dab
                                    password: captain

                                    <b>Authorized Users:</b>
                                    bag
                                    crab
                                    oliver
                                    james
                                    scott
                                    john
                                    apple
                                    </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                    '@ -split [environment]::NewLine

                                    $CleanedInStuff = $InStuff.
                                    Where({
                                    $_ -notmatch '^</' -and
                                    $_ -notmatch '^ ' -and
                                    $_
                                    })

                                    $UserType = 'Administrator'
                                    $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                    {
                                    if ($CIS_Item.StartsWith('<b>'))
                                    {
                                    $UserType = 'User'
                                    continue
                                    }
                                    [PSCustomObject]@{
                                    Name = $CIS_Item.Trim()
                                    UserType = $UserType
                                    }
                                    }

                                    # on screen
                                    $UserInfo

                                    # to CSV
                                    $UserInfo |
                                    Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation


                                    on screen output ...



                                    Name      UserType     
                                    ---- --------
                                    jim (you) Administrator
                                    bob Administrator
                                    batman Administrator
                                    dab Administrator
                                    bag User
                                    crab User
                                    oliver User
                                    james User
                                    scott User
                                    john User
                                    apple User


                                    CSV file content ...



                                    "Name","UserType"
                                    "jim (you)","Administrator"
                                    "bob","Administrator"
                                    "batman","Administrator"
                                    "dab","Administrator"
                                    "bag","User"
                                    "crab","User"
                                    "oliver","User"
                                    "james","User"
                                    "scott","User"
                                    "john","User"
                                    "apple","User"






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Nov 11 at 20:04









                                    Lee_Dailey

                                    1,08776




                                    1,08776















                                        Popular posts from this blog

                                        Guess what letter conforming each word

                                        Run scheduled task as local user group (not BUILTIN)

                                        Port of Spain