html scraping in either batch or powershell [closed]
up vote
-4
down vote
favorite
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-4
down vote
favorite
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-4
down vote
favorite
up vote
-4
down vote
favorite
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
I need to scrape the html of a site, which is launched off a .url file, then find a certain line, and grab every line below it to a certain point. An example of the html code is below:
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
I need to get all of the authorized administrators into a txt file, the authorized users into a txt file, and both into another txt file. Could this be accomplished with just batch and powershell?
html powershell batch-file web-scraping
html powershell batch-file web-scraping
edited Nov 12 at 13:03
mklement0
124k20237265
124k20237265
asked Nov 11 at 19:13
LandonBB
32
32
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as too broad by marc_s, Squashman, Matt, Gerhard Barnard, jeb Nov 12 at 12:56
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
-1
down vote
accepted
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach({
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
})
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach({
if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
else { $co.password = $_; $co }
})
# Create custom objects with the same structure for the users.
$users = $users.ForEach({
[pscustomobject] @{ username = $_; password = '' }
})
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
add a comment |
up vote
1
down vote
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success) {
$admins += [PSCustomObject]@{
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Administrators' text block."
}
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success) {
$users += [PSCustomObject]@{
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Users' text block."
}
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
add a comment |
up vote
-1
down vote
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where({
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
})
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
{
if ($CIS_Item.StartsWith('<b>'))
{
$UserType = 'User'
continue
}
[PSCustomObject]@{
Name = $CIS_Item.Trim()
UserType = $UserType
}
}
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
-1
down vote
accepted
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach({
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
})
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach({
if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
else { $co.password = $_; $co }
})
# Create custom objects with the same structure for the users.
$users = $users.ForEach({
[pscustomobject] @{ username = $_; password = '' }
})
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
add a comment |
up vote
-1
down vote
accepted
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach({
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
})
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach({
if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
else { $co.password = $_; $co }
})
# Create custom objects with the same structure for the users.
$users = $users.ForEach({
[pscustomobject] @{ username = $_; password = '' }
})
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
add a comment |
up vote
-1
down vote
accepted
up vote
-1
down vote
accepted
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach({
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
})
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach({
if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
else { $co.password = $_; $co }
})
# Create custom objects with the same structure for the users.
$users = $users.ForEach({
[pscustomobject] @{ username = $_; password = '' }
})
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
I believe that this answer shows useful techniques, and I've verified that it works with the sample input, within the constraints stated. Do tell us (with words) if you disagree, so the answer can be improved.
Generally, as stated, using a dedicated HTML parser is preferable, but given the easily identifiable enclosing tags in your input (assuming there'll be no variations), you can get away with a regex-based solution.
Here's a regex-based PSv4+ solution, but note that it relies on the input containing whitespace (line breaks, leading spaces) exactly as shown in your question:
# $html is assumed to contain the input HTML text (can be a full document).
$admins, $users = (
# Split the HTML text into the sections of interest.
$html -split
'A.*<b>Authorized Administrators:</b>|<b>Authorized Users:</b>' `
-ne '' `
-replace '<.*'
).ForEach({
# Extract admin lines and user lines each, as an array.
, ($_ -split 'r?n' -ne '')
})
# Clean up the $admins array and transform the username-password pairs
# into custom objects with .username and .password properties.
$admins = $admins -split 's+password:s+' -ne ''
$i = 0;
$admins.ForEach({
if ($i++ % 2 -eq 0) { $co = [pscustomobject] @{ username = $_; password = '' } }
else { $co.password = $_; $co }
})
# Create custom objects with the same structure for the users.
$users = $users.ForEach({
[pscustomobject] @{ username = $_; password = '' }
})
# Output to CSV files.
$admins | Export-Csv admins.csv
$users | Export-Csv users.csv
$admins + $users | Export-Csv all.csv
Assumptions are made about the desired output format (and HTML entities such as &
aren't decoded), given that your question doesn't flesh out the requirements.
edited Nov 12 at 12:49
answered Nov 11 at 21:50
mklement0
124k20237265
124k20237265
add a comment |
add a comment |
up vote
1
down vote
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success) {
$admins += [PSCustomObject]@{
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Administrators' text block."
}
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success) {
$users += [PSCustomObject]@{
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Users' text block."
}
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
add a comment |
up vote
1
down vote
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success) {
$admins += [PSCustomObject]@{
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Administrators' text block."
}
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success) {
$users += [PSCustomObject]@{
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Users' text block."
}
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
add a comment |
up vote
1
down vote
up vote
1
down vote
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success) {
$admins += [PSCustomObject]@{
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Administrators' text block."
}
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success) {
$users += [PSCustomObject]@{
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Users' text block."
}
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
Here's my attempt to get what you are after.
$url = '<THE URL TAKEN FROM THE .URL SHORTCUT FILE>'
$outputPath = '<THE PATH WHERE YOU WANT THE CSV FILES TO BE CREATED>'
# get the content of the web page
$html = (Invoke-WebRequest -Uri $url).Content
# load the assembly to de-entify the HTML content
Add-Type -AssemblyName System.Web
$html = [System.Web.HttpUtility]::HtmlDecode($html)
# get the Authorized Admins block
if ($html -match '(?s)<b>Authorized Administrators:</b>(.+)<b>') {
$adminblock = $matches[1].Trim()
# inside this text block, get the admin usernames and passwords
$admins = @()
$regex = [regex] '(?m)^(?<name>.+)s*password:s+(?<password>.+)'
$match = $regex.Match($adminblock)
while ($match.Success) {
$admins += [PSCustomObject]@{
'Name' = $($match.Groups['name'].Value -replace '(you)', '').Trim()
'Type' = 'Admin'
# comment out this next property if you don't want passwords in the output
'Password' = $match.Groups['password'].Value.Trim()
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Administrators' text block."
}
# get the Authorized Users block
if ($html -match '(?s)<b>Authorized Users:</b>(.+)</pre>') {
$userblock = $matches[1].Trim()
# inside this text block, get the authorized usernames
$users = @()
$regex = [regex] '(?m)(?<name>.+)'
$match = $regex.Match($userblock)
while ($match.Success) {
$users += [PSCustomObject]@{
'Name' = $match.Groups['name'].Value.Trim()
'Type' = 'User'
}
$match = $match.NextMatch()
}
} else {
Write-Warning "Could not find 'Authorized Users' text block."
}
# write the csv files
$admins | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'admins.csv') -NoTypeInformation -Force
$users | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'users.csv') -NoTypeInformation -Force
($admins + $users) | Export-Csv -Path $(Join-Path -Path $outputPath -ChildPath 'adminsandusers.csv') -NoTypeInformation -Force
When finished, you will have three CSV files:
admins.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
users.csv
Name Type
---- ----
bag User
crab User
oliver User
james User
scott User
john User
apple User
adminsandusers.csv
Name Type Password
---- ---- --------
jim Admin (blank/none)
bob Admin Littl3@birD
batman Admin 3ndur4N(e&home
dab Admin captain
bag User
crab User
oliver User
james User
scott User
john User
apple User
answered Nov 11 at 21:12
Theo
3,0711518
3,0711518
add a comment |
add a comment |
up vote
-1
down vote
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where({
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
})
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
{
if ($CIS_Item.StartsWith('<b>'))
{
$UserType = 'User'
continue
}
[PSCustomObject]@{
Name = $CIS_Item.Trim()
UserType = $UserType
}
}
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
add a comment |
up vote
-1
down vote
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where({
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
})
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
{
if ($CIS_Item.StartsWith('<b>'))
{
$UserType = 'User'
continue
}
[PSCustomObject]@{
Name = $CIS_Item.Trim()
UserType = $UserType
}
}
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
add a comment |
up vote
-1
down vote
up vote
-1
down vote
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where({
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
})
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
{
if ($CIS_Item.StartsWith('<b>'))
{
$UserType = 'User'
continue
}
[PSCustomObject]@{
Name = $CIS_Item.Trim()
UserType = $UserType
}
}
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
this is really rather ugly, and very emphatically fragile. a good HTML parser would be a better way to do this.
however, presuming you aint got the resources for that, here's one way to grab the data. if you REALLY want to generate two more files [Admin & User], you can do that from this object ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = @'
</p><ul><li>(None)</li></ul><h2><span style="font-size:18px;">Authorized Administrators and Users</span></h2><pre><b>Authorized Administrators:</b>
jim (you)
password: (blank/none)
bob
password: Littl3@birD
batman
password: 3ndur4N(e&home
dab
password: captain
<b>Authorized Users:</b>
bag
crab
oliver
james
scott
john
apple
</pre><h2><span style="font-size:18px;">Competition Guidelines</span></h2>
'@ -split [environment]::NewLine
$CleanedInStuff = $InStuff.
Where({
$_ -notmatch '^</' -and
$_ -notmatch '^ ' -and
$_
})
$UserType = 'Administrator'
$UserInfo = foreach ($CIS_Item in $CleanedInStuff)
{
if ($CIS_Item.StartsWith('<b>'))
{
$UserType = 'User'
continue
}
[PSCustomObject]@{
Name = $CIS_Item.Trim()
UserType = $UserType
}
}
# on screen
$UserInfo
# to CSV
$UserInfo |
Export-Csv -LiteralPath "$env:TEMPLandonBB.csv" -NoTypeInformation
on screen output ...
Name UserType
---- --------
jim (you) Administrator
bob Administrator
batman Administrator
dab Administrator
bag User
crab User
oliver User
james User
scott User
john User
apple User
CSV file content ...
"Name","UserType"
"jim (you)","Administrator"
"bob","Administrator"
"batman","Administrator"
"dab","Administrator"
"bag","User"
"crab","User"
"oliver","User"
"james","User"
"scott","User"
"john","User"
"apple","User"
answered Nov 11 at 20:04
Lee_Dailey
1,08776
1,08776
add a comment |
add a comment |