html scraping in either batch or powershell [closed]

up vote
down vote


I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:

</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
password: Littl3@birD
password: 3ndur4N(e&amp;home
password: captain

<b>Authorized Users:</b>
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>

I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?

share|improve this question

closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

    up vote
    down vote


    I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:

    </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
    jim (you)
    password: (blank/none)
    password: Littl3@birD
    password: 3ndur4N(e&amp;home
    password: captain

    <b>Authorized Users:</b>
    </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>

    I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?

    share|improve this question

    closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56

    Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

      up vote
      down vote


      up vote
      down vote


      I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:

      </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
      jim (you)
      password: (blank/none)
      password: Littl3@birD
      password: 3ndur4N(e&amp;home
      password: captain

      <b>Authorized Users:</b>
      </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>

      I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?

      share|improve this question

      I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:

      </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
      jim (you)
      password: (blank/none)
      password: Littl3@birD
      password: 3ndur4N(e&amp;home
      password: captain

      <b>Authorized Users:</b>
      </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>

      I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?

      html powershell batch-file web-scraping

      share|improve this question

      share|improve this question

      share|improve this question

      share|improve this question

      edited Nov 12 at 13:03




      asked Nov 11 at 19:13




      closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56

      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

      closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56

      Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

          3 Answers




          up vote
          down vote


          I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.

          Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.

          Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:

          # $html is assumed to contain the input HTML text (can be a full document).
          $admins, $users = (
          # Split the HTML text into the sections of interest.
          $html -split
          'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
          -ne '' `
          -replace '<.*'
          # Extract admin lines and user lines each, as an array.
          , ($_ -split 'r?n' -ne '')

          # Clean up the $admins array and transform the username-password pairs
          # into custom objects with .username and .password properties.
          $admins = $admins -split 's+password:s+' -ne ''
          $i = 0;
          if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
          else { $co.password = $_; $co }

          # Create custom objects with the same structure for the users.
          $users = $users.ForEach({
          [pscustomobject] @{ username = $_; password = '' }

          # Output to CSV files.
          $admins | Export-Csv admins.csv
          $users | Export-Csv users.csv
          $admins + $users | Export-Csv all.csv

          Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.

          share|improve this answer

            up vote
            down vote

            Here's my attempt to get what you are after.

            $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'

            # get the content of the web page
            $html = (Invoke-WebRequest -Uri $url).Content

            # load the assembly to de-entify the HTML content
            Add-Type -AssemblyName System.Web
            $html = [System.Web.HttpUtility]::HtmlDecode($html)

            # get the Authorized Admins block
            if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
            $adminblock = $matches[1].Trim()
            # inside this text block, get the admin usernames and passwords
            $admins = @()
            $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
            $match = $regex.Match($adminblock)
            while ($match.Success) {
            $admins += [PSCustomObject]@{
            'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
            'Type' = 'Admin'
            # comment out this next property if you don't want passwords in the output
            'Password' = $match.Groups['password'].Value.Trim()
            $match = $match.NextMatch()

            } else {
            Write-Warning "Could not find 'Authorized Administrators' text block."

            # get the Authorized Users block
            if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
            $userblock = $matches[1].Trim()
            # inside this text block, get the authorized usernames
            $users = @()
            $regex = [regex] '(?m)(?<name>.+)'
            $match = $regex.Match($userblock)
            while ($match.Success) {
            $users += [PSCustomObject]@{
            'Name' = $match.Groups['name'].Value.Trim()
            'Type' = 'User'
            $match = $match.NextMatch()
            } else {
            Write-Warning "Could not find 'Authorized Users' text block."

            # write the csv files
            $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
            $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
            ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force

            When finished, you will have three CSV files:


            Name   Type  Password      
            ---- ---- --------
            jim Admin (blank/none)
            bob Admin Littl3@birD
            batman Admin 3ndur4N(e&home
            dab Admin captain


            Name   Type
            ---- ----
            bag User
            crab User
            oliver User
            james User
            scott User
            john User
            apple User


            Name   Type  Password      
            ---- ---- --------
            jim Admin (blank/none)
            bob Admin Littl3@birD
            batman Admin 3ndur4N(e&home
            dab Admin captain
            bag User
            crab User
            oliver User
            james User
            scott User
            john User
            apple User

            share|improve this answer

              up vote
              down vote

              this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.

              however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...

              # fake reading in a text file
              # in real life, use Get-Content
              $InStuff = @'
              </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
              jim (you)
              password: (blank/none)
              password: Littl3@birD
              password: 3ndur4N(e&amp;home
              password: captain

              <b>Authorized Users:</b>
              </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
              '@ -split [environment]::NewLine

              $CleanedInStuff = $InStuff.
              $_ -notmatch '^</' -and
              $_ -notmatch '^ ' -and

              $UserType = 'Administrator'
              $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
              if ($CIS_Item.StartsWith('<b>'))
              $UserType = 'User'
              Name = $CIS_Item.Trim()
              UserType = $UserType

              # on screen

              # to CSV
              $UserInfo |
              Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation

              on screen output ...

              Name      UserType     
              ---- --------
              jim (you) Administrator
              bob Administrator
              batman Administrator
              dab Administrator
              bag User
              crab User
              oliver User
              james User
              scott User
              john User
              apple User

              CSV file content ...

              "jim (you)","Administrator"

              share|improve this answer

                3 Answers




                3 Answers










                up vote
                down vote


                I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.

                Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.

                Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:

                # $html is assumed to contain the input HTML text (can be a full document).
                $admins, $users = (
                # Split the HTML text into the sections of interest.
                $html -split
                'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                -ne '' `
                -replace '<.*'
                # Extract admin lines and user lines each, as an array.
                , ($_ -split 'r?n' -ne '')

                # Clean up the $admins array and transform the username-password pairs
                # into custom objects with .username and .password properties.
                $admins = $admins -split 's+password:s+' -ne ''
                $i = 0;
                if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                else { $co.password = $_; $co }

                # Create custom objects with the same structure for the users.
                $users = $users.ForEach({
                [pscustomobject] @{ username = $_; password = '' }

                # Output to CSV files.
                $admins | Export-Csv admins.csv
                $users | Export-Csv users.csv
                $admins + $users | Export-Csv all.csv

                Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.

                share|improve this answer

                  up vote
                  down vote


                  I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.

                  Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.

                  Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:

                  # $html is assumed to contain the input HTML text (can be a full document).
                  $admins, $users = (
                  # Split the HTML text into the sections of interest.
                  $html -split
                  'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                  -ne '' `
                  -replace '<.*'
                  # Extract admin lines and user lines each, as an array.
                  , ($_ -split 'r?n' -ne '')

                  # Clean up the $admins array and transform the username-password pairs
                  # into custom objects with .username and .password properties.
                  $admins = $admins -split 's+password:s+' -ne ''
                  $i = 0;
                  if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                  else { $co.password = $_; $co }

                  # Create custom objects with the same structure for the users.
                  $users = $users.ForEach({
                  [pscustomobject] @{ username = $_; password = '' }

                  # Output to CSV files.
                  $admins | Export-Csv admins.csv
                  $users | Export-Csv users.csv
                  $admins + $users | Export-Csv all.csv

                  Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.

                  share|improve this answer

                    up vote
                    down vote


                    up vote
                    down vote


                    I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.

                    Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.

                    Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:

                    # $html is assumed to contain the input HTML text (can be a full document).
                    $admins, $users = (
                    # Split the HTML text into the sections of interest.
                    $html -split
                    'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                    -ne '' `
                    -replace '<.*'
                    # Extract admin lines and user lines each, as an array.
                    , ($_ -split 'r?n' -ne '')

                    # Clean up the $admins array and transform the username-password pairs
                    # into custom objects with .username and .password properties.
                    $admins = $admins -split 's+password:s+' -ne ''
                    $i = 0;
                    if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                    else { $co.password = $_; $co }

                    # Create custom objects with the same structure for the users.
                    $users = $users.ForEach({
                    [pscustomobject] @{ username = $_; password = '' }

                    # Output to CSV files.
                    $admins | Export-Csv admins.csv
                    $users | Export-Csv users.csv
                    $admins + $users | Export-Csv all.csv

                    Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.

                    share|improve this answer

                    I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.

                    Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.

                    Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:

                    # $html is assumed to contain the input HTML text (can be a full document).
                    $admins, $users = (
                    # Split the HTML text into the sections of interest.
                    $html -split
                    'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
                    -ne '' `
                    -replace '<.*'
                    # Extract admin lines and user lines each, as an array.
                    , ($_ -split 'r?n' -ne '')

                    # Clean up the $admins array and transform the username-password pairs
                    # into custom objects with .username and .password properties.
                    $admins = $admins -split 's+password:s+' -ne ''
                    $i = 0;
                    if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
                    else { $co.password = $_; $co }

                    # Create custom objects with the same structure for the users.
                    $users = $users.ForEach({
                    [pscustomobject] @{ username = $_; password = '' }

                    # Output to CSV files.
                    $admins | Export-Csv admins.csv
                    $users | Export-Csv users.csv
                    $admins + $users | Export-Csv all.csv

                    Assumptions are made about the desired output format (and HTML entities such as &amp; aren't decoded), given that your question doesn't flesh out the requirements.

                    share|improve this answer

                    share|improve this answer

                    share|improve this answer

                    edited Nov 12 at 12:49

                    answered Nov 11 at 21:50




                        up vote
                        down vote

                        Here's my attempt to get what you are after.

                        $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                        $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                        # get the content of the web page
                        $html = (Invoke-WebRequest -Uri $url).Content

                        # load the assembly to de-entify the HTML content
                        Add-Type -AssemblyName System.Web
                        $html = [System.Web.HttpUtility]::HtmlDecode($html)

                        # get the Authorized Admins block
                        if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                        $adminblock = $matches[1].Trim()
                        # inside this text block, get the admin usernames and passwords
                        $admins = @()
                        $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                        $match = $regex.Match($adminblock)
                        while ($match.Success) {
                        $admins += [PSCustomObject]@{
                        'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                        'Type' = 'Admin'
                        # comment out this next property if you don't want passwords in the output
                        'Password' = $match.Groups['password'].Value.Trim()
                        $match = $match.NextMatch()

                        } else {
                        Write-Warning "Could not find 'Authorized Administrators' text block."

                        # get the Authorized Users block
                        if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                        $userblock = $matches[1].Trim()
                        # inside this text block, get the authorized usernames
                        $users = @()
                        $regex = [regex] '(?m)(?<name>.+)'
                        $match = $regex.Match($userblock)
                        while ($match.Success) {
                        $users += [PSCustomObject]@{
                        'Name' = $match.Groups['name'].Value.Trim()
                        'Type' = 'User'
                        $match = $match.NextMatch()
                        } else {
                        Write-Warning "Could not find 'Authorized Users' text block."

                        # write the csv files
                        $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                        $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                        ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force

                        When finished, you will have three CSV files:


                        Name   Type  Password      
                        ---- ---- --------
                        jim Admin (blank/none)
                        bob Admin Littl3@birD
                        batman Admin 3ndur4N(e&home
                        dab Admin captain


                        Name   Type
                        ---- ----
                        bag User
                        crab User
                        oliver User
                        james User
                        scott User
                        john User
                        apple User


                        Name   Type  Password      
                        ---- ---- --------
                        jim Admin (blank/none)
                        bob Admin Littl3@birD
                        batman Admin 3ndur4N(e&home
                        dab Admin captain
                        bag User
                        crab User
                        oliver User
                        james User
                        scott User
                        john User
                        apple User

                        share|improve this answer

                          up vote
                          down vote

                          Here's my attempt to get what you are after.

                          $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                          $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                          # get the content of the web page
                          $html = (Invoke-WebRequest -Uri $url).Content

                          # load the assembly to de-entify the HTML content
                          Add-Type -AssemblyName System.Web
                          $html = [System.Web.HttpUtility]::HtmlDecode($html)

                          # get the Authorized Admins block
                          if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                          $adminblock = $matches[1].Trim()
                          # inside this text block, get the admin usernames and passwords
                          $admins = @()
                          $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                          $match = $regex.Match($adminblock)
                          while ($match.Success) {
                          $admins += [PSCustomObject]@{
                          'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                          'Type' = 'Admin'
                          # comment out this next property if you don't want passwords in the output
                          'Password' = $match.Groups['password'].Value.Trim()
                          $match = $match.NextMatch()

                          } else {
                          Write-Warning "Could not find 'Authorized Administrators' text block."

                          # get the Authorized Users block
                          if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                          $userblock = $matches[1].Trim()
                          # inside this text block, get the authorized usernames
                          $users = @()
                          $regex = [regex] '(?m)(?<name>.+)'
                          $match = $regex.Match($userblock)
                          while ($match.Success) {
                          $users += [PSCustomObject]@{
                          'Name' = $match.Groups['name'].Value.Trim()
                          'Type' = 'User'
                          $match = $match.NextMatch()
                          } else {
                          Write-Warning "Could not find 'Authorized Users' text block."

                          # write the csv files
                          $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                          $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                          ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force

                          When finished, you will have three CSV files:


                          Name   Type  Password      
                          ---- ---- --------
                          jim Admin (blank/none)
                          bob Admin Littl3@birD
                          batman Admin 3ndur4N(e&home
                          dab Admin captain


                          Name   Type
                          ---- ----
                          bag User
                          crab User
                          oliver User
                          james User
                          scott User
                          john User
                          apple User


                          Name   Type  Password      
                          ---- ---- --------
                          jim Admin (blank/none)
                          bob Admin Littl3@birD
                          batman Admin 3ndur4N(e&home
                          dab Admin captain
                          bag User
                          crab User
                          oliver User
                          james User
                          scott User
                          john User
                          apple User

                          share|improve this answer

                            up vote
                            down vote

                            up vote
                            down vote

                            Here's my attempt to get what you are after.

                            $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                            $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                            # get the content of the web page
                            $html = (Invoke-WebRequest -Uri $url).Content

                            # load the assembly to de-entify the HTML content
                            Add-Type -AssemblyName System.Web
                            $html = [System.Web.HttpUtility]::HtmlDecode($html)

                            # get the Authorized Admins block
                            if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                            $adminblock = $matches[1].Trim()
                            # inside this text block, get the admin usernames and passwords
                            $admins = @()
                            $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                            $match = $regex.Match($adminblock)
                            while ($match.Success) {
                            $admins += [PSCustomObject]@{
                            'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                            'Type' = 'Admin'
                            # comment out this next property if you don't want passwords in the output
                            'Password' = $match.Groups['password'].Value.Trim()
                            $match = $match.NextMatch()

                            } else {
                            Write-Warning "Could not find 'Authorized Administrators' text block."

                            # get the Authorized Users block
                            if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                            $userblock = $matches[1].Trim()
                            # inside this text block, get the authorized usernames
                            $users = @()
                            $regex = [regex] '(?m)(?<name>.+)'
                            $match = $regex.Match($userblock)
                            while ($match.Success) {
                            $users += [PSCustomObject]@{
                            'Name' = $match.Groups['name'].Value.Trim()
                            'Type' = 'User'
                            $match = $match.NextMatch()
                            } else {
                            Write-Warning "Could not find 'Authorized Users' text block."

                            # write the csv files
                            $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                            $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                            ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force

                            When finished, you will have three CSV files:


                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain


                            Name   Type
                            ---- ----
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User


                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User

                            share|improve this answer

                            Here's my attempt to get what you are after.

                            $url        = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
                            $outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'

                            # get the content of the web page
                            $html = (Invoke-WebRequest -Uri $url).Content

                            # load the assembly to de-entify the HTML content
                            Add-Type -AssemblyName System.Web
                            $html = [System.Web.HttpUtility]::HtmlDecode($html)

                            # get the Authorized Admins block
                            if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
                            $adminblock = $matches[1].Trim()
                            # inside this text block, get the admin usernames and passwords
                            $admins = @()
                            $regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
                            $match = $regex.Match($adminblock)
                            while ($match.Success) {
                            $admins += [PSCustomObject]@{
                            'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
                            'Type' = 'Admin'
                            # comment out this next property if you don't want passwords in the output
                            'Password' = $match.Groups['password'].Value.Trim()
                            $match = $match.NextMatch()

                            } else {
                            Write-Warning "Could not find 'Authorized Administrators' text block."

                            # get the Authorized Users block
                            if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
                            $userblock = $matches[1].Trim()
                            # inside this text block, get the authorized usernames
                            $users = @()
                            $regex = [regex] '(?m)(?<name>.+)'
                            $match = $regex.Match($userblock)
                            while ($match.Success) {
                            $users += [PSCustomObject]@{
                            'Name' = $match.Groups['name'].Value.Trim()
                            'Type' = 'User'
                            $match = $match.NextMatch()
                            } else {
                            Write-Warning "Could not find 'Authorized Users' text block."

                            # write the csv files
                            $admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
                            $users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
                            ($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force

                            When finished, you will have three CSV files:


                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain


                            Name   Type
                            ---- ----
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User


                            Name   Type  Password      
                            ---- ---- --------
                            jim Admin (blank/none)
                            bob Admin Littl3@birD
                            batman Admin 3ndur4N(e&home
                            dab Admin captain
                            bag User
                            crab User
                            oliver User
                            james User
                            scott User
                            john User
                            apple User

                            share|improve this answer

                            share|improve this answer

                            share|improve this answer

                            answered Nov 11 at 21:12




                                up vote
                                down vote

                                this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.

                                however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...

                                # fake reading in a text file
                                # in real life, use Get-Content
                                $InStuff = @'
                                </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                jim (you)
                                password: (blank/none)
                                password: Littl3@birD
                                password: 3ndur4N(e&amp;home
                                password: captain

                                <b>Authorized Users:</b>
                                </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                '@ -split [environment]::NewLine

                                $CleanedInStuff = $InStuff.
                                $_ -notmatch '^</' -and
                                $_ -notmatch '^ ' -and

                                $UserType = 'Administrator'
                                $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                if ($CIS_Item.StartsWith('<b>'))
                                $UserType = 'User'
                                Name = $CIS_Item.Trim()
                                UserType = $UserType

                                # on screen

                                # to CSV
                                $UserInfo |
                                Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation

                                on screen output ...

                                Name      UserType     
                                ---- --------
                                jim (you) Administrator
                                bob Administrator
                                batman Administrator
                                dab Administrator
                                bag User
                                crab User
                                oliver User
                                james User
                                scott User
                                john User
                                apple User

                                CSV file content ...

                                "jim (you)","Administrator"

                                share|improve this answer

                                  up vote
                                  down vote

                                  this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.

                                  however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...

                                  # fake reading in a text file
                                  # in real life, use Get-Content
                                  $InStuff = @'
                                  </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                  jim (you)
                                  password: (blank/none)
                                  password: Littl3@birD
                                  password: 3ndur4N(e&amp;home
                                  password: captain

                                  <b>Authorized Users:</b>
                                  </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                  '@ -split [environment]::NewLine

                                  $CleanedInStuff = $InStuff.
                                  $_ -notmatch '^</' -and
                                  $_ -notmatch '^ ' -and

                                  $UserType = 'Administrator'
                                  $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                  if ($CIS_Item.StartsWith('<b>'))
                                  $UserType = 'User'
                                  Name = $CIS_Item.Trim()
                                  UserType = $UserType

                                  # on screen

                                  # to CSV
                                  $UserInfo |
                                  Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation

                                  on screen output ...

                                  Name      UserType     
                                  ---- --------
                                  jim (you) Administrator
                                  bob Administrator
                                  batman Administrator
                                  dab Administrator
                                  bag User
                                  crab User
                                  oliver User
                                  james User
                                  scott User
                                  john User
                                  apple User

                                  CSV file content ...

                                  "jim (you)","Administrator"

                                  share|improve this answer

                                    up vote
                                    down vote

                                    up vote
                                    down vote

                                    this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.

                                    however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...

                                    # fake reading in a text file
                                    # in real life, use Get-Content
                                    $InStuff = @'
                                    </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                    jim (you)
                                    password: (blank/none)
                                    password: Littl3@birD
                                    password: 3ndur4N(e&amp;home
                                    password: captain

                                    <b>Authorized Users:</b>
                                    </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                    '@ -split [environment]::NewLine

                                    $CleanedInStuff = $InStuff.
                                    $_ -notmatch '^</' -and
                                    $_ -notmatch '^ ' -and

                                    $UserType = 'Administrator'
                                    $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                    if ($CIS_Item.StartsWith('<b>'))
                                    $UserType = 'User'
                                    Name = $CIS_Item.Trim()
                                    UserType = $UserType

                                    # on screen

                                    # to CSV
                                    $UserInfo |
                                    Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation

                                    on screen output ...

                                    Name      UserType     
                                    ---- --------
                                    jim (you) Administrator
                                    bob Administrator
                                    batman Administrator
                                    dab Administrator
                                    bag User
                                    crab User
                                    oliver User
                                    james User
                                    scott User
                                    john User
                                    apple User

                                    CSV file content ...

                                    "jim (you)","Administrator"

                                    share|improve this answer

                                    this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.

                                    however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...

                                    # fake reading in a text file
                                    # in real life, use Get-Content
                                    $InStuff = @'
                                    </p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
                                    jim (you)
                                    password: (blank/none)
                                    password: Littl3@birD
                                    password: 3ndur4N(e&amp;home
                                    password: captain

                                    <b>Authorized Users:</b>
                                    </pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
                                    '@ -split [environment]::NewLine

                                    $CleanedInStuff = $InStuff.
                                    $_ -notmatch '^</' -and
                                    $_ -notmatch '^ ' -and

                                    $UserType = 'Administrator'
                                    $UserInfo = foreach ($CIS_Item in $CleanedInStuff)
                                    if ($CIS_Item.StartsWith('<b>'))
                                    $UserType = 'User'
                                    Name = $CIS_Item.Trim()
                                    UserType = $UserType

                                    # on screen

                                    # to CSV
                                    $UserInfo |
                                    Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation

                                    on screen output ...

                                    Name      UserType     
                                    ---- --------
                                    jim (you) Administrator
                                    bob Administrator
                                    batman Administrator
                                    dab Administrator
                                    bag User
                                    crab User
                                    oliver User
                                    james User
                                    scott User
                                    john User
                                    apple User

                                    CSV file content ...

                                    "jim (you)","Administrator"

                                    share|improve this answer

                                    share|improve this answer

                                    share|improve this answer

                                    answered Nov 11 at 20:04




                                        Popular posts from this blog

                                        Guess what letter conforming each word

                                        Run scheduled task as local user group (not BUILTIN)

                                        Port of Spain