How to remove lines that start with the same letters (sequence) in a txt file?
#!/usr/bin/env python
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
lines = set()
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
print(line)
lines.add(beginOfSequence)
This is the code I have right now but it is not working. I have a file that has lines of DNA that sometimes start with the same sequence (or pattern of letters). I need to write a code that will find all lines of DNA that start with the same letters (perhaps the same 10 characters) and delete one of the lines.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
python
add a comment |
#!/usr/bin/env python
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
lines = set()
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
print(line)
lines.add(beginOfSequence)
This is the code I have right now but it is not working. I have a file that has lines of DNA that sometimes start with the same sequence (or pattern of letters). I need to write a code that will find all lines of DNA that start with the same letters (perhaps the same 10 characters) and delete one of the lines.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
python
1
It looks like you wantlines.append(line)instead oflines.add(beginOfSequence)
– ritlew
Nov 20 '18 at 18:12
4
What the issue? I got output you shown as correct.
– Filip Młynarski
Nov 20 '18 at 18:18
Traceback (most recent call last): File "./RemoveDuplicate.py", line 14, in <module> lines.append(line) AttributeError: 'set' object has no attribute 'append' @FilipMłynarski
– Alpa Luca
Nov 20 '18 at 18:22
changelines = set()tolines =. And keep in mind if you'll be storing whole lines instead of beginning of lines in yourlineslist code won't work properly.
– Filip Młynarski
Nov 20 '18 at 18:25
add a comment |
#!/usr/bin/env python
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
lines = set()
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
print(line)
lines.add(beginOfSequence)
This is the code I have right now but it is not working. I have a file that has lines of DNA that sometimes start with the same sequence (or pattern of letters). I need to write a code that will find all lines of DNA that start with the same letters (perhaps the same 10 characters) and delete one of the lines.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
python
#!/usr/bin/env python
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
lines = set()
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
print(line)
lines.add(beginOfSequence)
This is the code I have right now but it is not working. I have a file that has lines of DNA that sometimes start with the same sequence (or pattern of letters). I need to write a code that will find all lines of DNA that start with the same letters (perhaps the same 10 characters) and delete one of the lines.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
python
python
edited Nov 20 '18 at 18:09
eyllanesc
81.1k103259
81.1k103259
asked Nov 20 '18 at 18:08
Alpa LucaAlpa Luca
85
85
1
It looks like you wantlines.append(line)instead oflines.add(beginOfSequence)
– ritlew
Nov 20 '18 at 18:12
4
What the issue? I got output you shown as correct.
– Filip Młynarski
Nov 20 '18 at 18:18
Traceback (most recent call last): File "./RemoveDuplicate.py", line 14, in <module> lines.append(line) AttributeError: 'set' object has no attribute 'append' @FilipMłynarski
– Alpa Luca
Nov 20 '18 at 18:22
changelines = set()tolines =. And keep in mind if you'll be storing whole lines instead of beginning of lines in yourlineslist code won't work properly.
– Filip Młynarski
Nov 20 '18 at 18:25
add a comment |
1
It looks like you wantlines.append(line)instead oflines.add(beginOfSequence)
– ritlew
Nov 20 '18 at 18:12
4
What the issue? I got output you shown as correct.
– Filip Młynarski
Nov 20 '18 at 18:18
Traceback (most recent call last): File "./RemoveDuplicate.py", line 14, in <module> lines.append(line) AttributeError: 'set' object has no attribute 'append' @FilipMłynarski
– Alpa Luca
Nov 20 '18 at 18:22
changelines = set()tolines =. And keep in mind if you'll be storing whole lines instead of beginning of lines in yourlineslist code won't work properly.
– Filip Młynarski
Nov 20 '18 at 18:25
1
1
It looks like you want
lines.append(line) instead of lines.add(beginOfSequence)– ritlew
Nov 20 '18 at 18:12
It looks like you want
lines.append(line) instead of lines.add(beginOfSequence)– ritlew
Nov 20 '18 at 18:12
4
4
What the issue? I got output you shown as correct.
– Filip Młynarski
Nov 20 '18 at 18:18
What the issue? I got output you shown as correct.
– Filip Młynarski
Nov 20 '18 at 18:18
Traceback (most recent call last): File "./RemoveDuplicate.py", line 14, in <module> lines.append(line) AttributeError: 'set' object has no attribute 'append' @FilipMłynarski
– Alpa Luca
Nov 20 '18 at 18:22
Traceback (most recent call last): File "./RemoveDuplicate.py", line 14, in <module> lines.append(line) AttributeError: 'set' object has no attribute 'append' @FilipMłynarski
– Alpa Luca
Nov 20 '18 at 18:22
change
lines = set() to lines = . And keep in mind if you'll be storing whole lines instead of beginning of lines in your lines list code won't work properly.– Filip Młynarski
Nov 20 '18 at 18:25
change
lines = set() to lines = . And keep in mind if you'll be storing whole lines instead of beginning of lines in your lines list code won't work properly.– Filip Młynarski
Nov 20 '18 at 18:25
add a comment |
2 Answers
2
active
oldest
votes
I think your set logic is correct. You are just missing the portion that will save the lines you want to write back into the file. I am guessing you tried this with a separate list that you forgot to add here since you are using append somewhere.
FILE_NAME = "sample_file.txt"
NR_MATCHING_CHARS = 5
lines = set()
output_lines = # keep track of lines you want to keep
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
output_lines.append(line + 'n') # add line to list, newline needed since we will write to file
lines.add(beginOfSequence)
print output_lines
with open(FILE_NAME, 'w') as f:
f.writelines(output_lines) # write it out to the file
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 18:54
Well it would depend on how many of the shorter ones you want to remove. For example:A,AB, andABCall share the prefix. Do you only wantA? You can try storing all the matches into a list value in a dictionary instead. Something like d[beginOfSequence] = [line1, line2, ...]. At the end of your iteration, Just scoop out the top x short ones from each dictionary list.
– LeKhan9
Nov 20 '18 at 19:15
add a comment |
Your approach has a few problems. First, I would avoid naming file variables inF as this can be confused with inf. Descriptive names are better: testFile for instance. Also testing for empty strings using equality misses a few important edge cases (what if line is None for instance?); use the not keyword instead. As for your actual problem, you're not actually doing anything based on that set membership:
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
prefixCache = set()
data =
with open(FILE_NAME, "r") as testFile:
for line in testFile:
line = line.strip()
if not line:
continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if (beginOfSequence in prefixCache):
continue
else:
print(line)
data.append(line)
prefixCache.add(beginOfSequence)
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 19:02
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398991%2fhow-to-remove-lines-that-start-with-the-same-letters-sequence-in-a-txt-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think your set logic is correct. You are just missing the portion that will save the lines you want to write back into the file. I am guessing you tried this with a separate list that you forgot to add here since you are using append somewhere.
FILE_NAME = "sample_file.txt"
NR_MATCHING_CHARS = 5
lines = set()
output_lines = # keep track of lines you want to keep
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
output_lines.append(line + 'n') # add line to list, newline needed since we will write to file
lines.add(beginOfSequence)
print output_lines
with open(FILE_NAME, 'w') as f:
f.writelines(output_lines) # write it out to the file
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 18:54
Well it would depend on how many of the shorter ones you want to remove. For example:A,AB, andABCall share the prefix. Do you only wantA? You can try storing all the matches into a list value in a dictionary instead. Something like d[beginOfSequence] = [line1, line2, ...]. At the end of your iteration, Just scoop out the top x short ones from each dictionary list.
– LeKhan9
Nov 20 '18 at 19:15
add a comment |
I think your set logic is correct. You are just missing the portion that will save the lines you want to write back into the file. I am guessing you tried this with a separate list that you forgot to add here since you are using append somewhere.
FILE_NAME = "sample_file.txt"
NR_MATCHING_CHARS = 5
lines = set()
output_lines = # keep track of lines you want to keep
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
output_lines.append(line + 'n') # add line to list, newline needed since we will write to file
lines.add(beginOfSequence)
print output_lines
with open(FILE_NAME, 'w') as f:
f.writelines(output_lines) # write it out to the file
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 18:54
Well it would depend on how many of the shorter ones you want to remove. For example:A,AB, andABCall share the prefix. Do you only wantA? You can try storing all the matches into a list value in a dictionary instead. Something like d[beginOfSequence] = [line1, line2, ...]. At the end of your iteration, Just scoop out the top x short ones from each dictionary list.
– LeKhan9
Nov 20 '18 at 19:15
add a comment |
I think your set logic is correct. You are just missing the portion that will save the lines you want to write back into the file. I am guessing you tried this with a separate list that you forgot to add here since you are using append somewhere.
FILE_NAME = "sample_file.txt"
NR_MATCHING_CHARS = 5
lines = set()
output_lines = # keep track of lines you want to keep
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
output_lines.append(line + 'n') # add line to list, newline needed since we will write to file
lines.add(beginOfSequence)
print output_lines
with open(FILE_NAME, 'w') as f:
f.writelines(output_lines) # write it out to the file
I think your set logic is correct. You are just missing the portion that will save the lines you want to write back into the file. I am guessing you tried this with a separate list that you forgot to add here since you are using append somewhere.
FILE_NAME = "sample_file.txt"
NR_MATCHING_CHARS = 5
lines = set()
output_lines = # keep track of lines you want to keep
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
output_lines.append(line + 'n') # add line to list, newline needed since we will write to file
lines.add(beginOfSequence)
print output_lines
with open(FILE_NAME, 'w') as f:
f.writelines(output_lines) # write it out to the file
answered Nov 20 '18 at 18:35
LeKhan9LeKhan9
1,065113
1,065113
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 18:54
Well it would depend on how many of the shorter ones you want to remove. For example:A,AB, andABCall share the prefix. Do you only wantA? You can try storing all the matches into a list value in a dictionary instead. Something like d[beginOfSequence] = [line1, line2, ...]. At the end of your iteration, Just scoop out the top x short ones from each dictionary list.
– LeKhan9
Nov 20 '18 at 19:15
add a comment |
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 18:54
Well it would depend on how many of the shorter ones you want to remove. For example:A,AB, andABCall share the prefix. Do you only wantA? You can try storing all the matches into a list value in a dictionary instead. Something like d[beginOfSequence] = [line1, line2, ...]. At the end of your iteration, Just scoop out the top x short ones from each dictionary list.
– LeKhan9
Nov 20 '18 at 19:15
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 18:54
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 18:54
Well it would depend on how many of the shorter ones you want to remove. For example:
A, AB, and ABC all share the prefix. Do you only want A? You can try storing all the matches into a list value in a dictionary instead. Something like d[beginOfSequence] = [line1, line2, ...]. At the end of your iteration, Just scoop out the top x short ones from each dictionary list.– LeKhan9
Nov 20 '18 at 19:15
Well it would depend on how many of the shorter ones you want to remove. For example:
A, AB, and ABC all share the prefix. Do you only want A? You can try storing all the matches into a list value in a dictionary instead. Something like d[beginOfSequence] = [line1, line2, ...]. At the end of your iteration, Just scoop out the top x short ones from each dictionary list.– LeKhan9
Nov 20 '18 at 19:15
add a comment |
Your approach has a few problems. First, I would avoid naming file variables inF as this can be confused with inf. Descriptive names are better: testFile for instance. Also testing for empty strings using equality misses a few important edge cases (what if line is None for instance?); use the not keyword instead. As for your actual problem, you're not actually doing anything based on that set membership:
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
prefixCache = set()
data =
with open(FILE_NAME, "r") as testFile:
for line in testFile:
line = line.strip()
if not line:
continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if (beginOfSequence in prefixCache):
continue
else:
print(line)
data.append(line)
prefixCache.add(beginOfSequence)
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 19:02
add a comment |
Your approach has a few problems. First, I would avoid naming file variables inF as this can be confused with inf. Descriptive names are better: testFile for instance. Also testing for empty strings using equality misses a few important edge cases (what if line is None for instance?); use the not keyword instead. As for your actual problem, you're not actually doing anything based on that set membership:
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
prefixCache = set()
data =
with open(FILE_NAME, "r") as testFile:
for line in testFile:
line = line.strip()
if not line:
continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if (beginOfSequence in prefixCache):
continue
else:
print(line)
data.append(line)
prefixCache.add(beginOfSequence)
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 19:02
add a comment |
Your approach has a few problems. First, I would avoid naming file variables inF as this can be confused with inf. Descriptive names are better: testFile for instance. Also testing for empty strings using equality misses a few important edge cases (what if line is None for instance?); use the not keyword instead. As for your actual problem, you're not actually doing anything based on that set membership:
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
prefixCache = set()
data =
with open(FILE_NAME, "r") as testFile:
for line in testFile:
line = line.strip()
if not line:
continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if (beginOfSequence in prefixCache):
continue
else:
print(line)
data.append(line)
prefixCache.add(beginOfSequence)
Your approach has a few problems. First, I would avoid naming file variables inF as this can be confused with inf. Descriptive names are better: testFile for instance. Also testing for empty strings using equality misses a few important edge cases (what if line is None for instance?); use the not keyword instead. As for your actual problem, you're not actually doing anything based on that set membership:
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
prefixCache = set()
data =
with open(FILE_NAME, "r") as testFile:
for line in testFile:
line = line.strip()
if not line:
continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if (beginOfSequence in prefixCache):
continue
else:
print(line)
data.append(line)
prefixCache.add(beginOfSequence)
answered Nov 20 '18 at 18:50
Woody1193Woody1193
2,286931
2,286931
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 19:02
add a comment |
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 19:02
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 19:02
Hi, I first I would like to say thank you so much for your help! If some of my strands are longer than others would there be an easy way for me to tell python to remove the shorter of the matching strands?
– Alpa Luca
Nov 20 '18 at 19:02
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398991%2fhow-to-remove-lines-that-start-with-the-same-letters-sequence-in-a-txt-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
It looks like you want
lines.append(line)instead oflines.add(beginOfSequence)– ritlew
Nov 20 '18 at 18:12
4
What the issue? I got output you shown as correct.
– Filip Młynarski
Nov 20 '18 at 18:18
Traceback (most recent call last): File "./RemoveDuplicate.py", line 14, in <module> lines.append(line) AttributeError: 'set' object has no attribute 'append' @FilipMłynarski
– Alpa Luca
Nov 20 '18 at 18:22
change
lines = set()tolines =. And keep in mind if you'll be storing whole lines instead of beginning of lines in yourlineslist code won't work properly.– Filip Młynarski
Nov 20 '18 at 18:25