Quick algorithm to create union of multiple intervals data
I have a really simple problem and data structure but the number is so large that I need to find an efficient way.
Suppose I have an object that has an attribute which is an interval.
For example:
`start stop`
obj1 5 10
obj2 8 12
obj3 11 14
obj4 13 20
obj5 22 25
obj6 24 30
obj7 33 37
obj8 36 40
I want to merge it so that overlapping interval will become one object. So, the result of the example will become
start stop
objA 5 20
objB 22 30
objC 33 40
I am using python for this. Please notice that I have thousand of this type of data.
python intervals
add a comment |
I have a really simple problem and data structure but the number is so large that I need to find an efficient way.
Suppose I have an object that has an attribute which is an interval.
For example:
`start stop`
obj1 5 10
obj2 8 12
obj3 11 14
obj4 13 20
obj5 22 25
obj6 24 30
obj7 33 37
obj8 36 40
I want to merge it so that overlapping interval will become one object. So, the result of the example will become
start stop
objA 5 20
objB 22 30
objC 33 40
I am using python for this. Please notice that I have thousand of this type of data.
python intervals
1
What have you tried so far? Have you written any code?
– Yakov Dan
Nov 19 '18 at 9:35
Please specify whether your ranges are always sorted (as in your example) by starts and stops.
– Aristide
Nov 19 '18 at 9:49
2
Duplicate of python union of multiple ranges. Please look at the answers of that question, they are exactly what you need, you just have to pass it to pandas.
– silgon
Nov 19 '18 at 9:56
add a comment |
I have a really simple problem and data structure but the number is so large that I need to find an efficient way.
Suppose I have an object that has an attribute which is an interval.
For example:
`start stop`
obj1 5 10
obj2 8 12
obj3 11 14
obj4 13 20
obj5 22 25
obj6 24 30
obj7 33 37
obj8 36 40
I want to merge it so that overlapping interval will become one object. So, the result of the example will become
start stop
objA 5 20
objB 22 30
objC 33 40
I am using python for this. Please notice that I have thousand of this type of data.
python intervals
I have a really simple problem and data structure but the number is so large that I need to find an efficient way.
Suppose I have an object that has an attribute which is an interval.
For example:
`start stop`
obj1 5 10
obj2 8 12
obj3 11 14
obj4 13 20
obj5 22 25
obj6 24 30
obj7 33 37
obj8 36 40
I want to merge it so that overlapping interval will become one object. So, the result of the example will become
start stop
objA 5 20
objB 22 30
objC 33 40
I am using python for this. Please notice that I have thousand of this type of data.
python intervals
python intervals
asked Nov 19 '18 at 9:33
BharataBharata
1781316
1781316
1
What have you tried so far? Have you written any code?
– Yakov Dan
Nov 19 '18 at 9:35
Please specify whether your ranges are always sorted (as in your example) by starts and stops.
– Aristide
Nov 19 '18 at 9:49
2
Duplicate of python union of multiple ranges. Please look at the answers of that question, they are exactly what you need, you just have to pass it to pandas.
– silgon
Nov 19 '18 at 9:56
add a comment |
1
What have you tried so far? Have you written any code?
– Yakov Dan
Nov 19 '18 at 9:35
Please specify whether your ranges are always sorted (as in your example) by starts and stops.
– Aristide
Nov 19 '18 at 9:49
2
Duplicate of python union of multiple ranges. Please look at the answers of that question, they are exactly what you need, you just have to pass it to pandas.
– silgon
Nov 19 '18 at 9:56
1
1
What have you tried so far? Have you written any code?
– Yakov Dan
Nov 19 '18 at 9:35
What have you tried so far? Have you written any code?
– Yakov Dan
Nov 19 '18 at 9:35
Please specify whether your ranges are always sorted (as in your example) by starts and stops.
– Aristide
Nov 19 '18 at 9:49
Please specify whether your ranges are always sorted (as in your example) by starts and stops.
– Aristide
Nov 19 '18 at 9:49
2
2
Duplicate of python union of multiple ranges. Please look at the answers of that question, they are exactly what you need, you just have to pass it to pandas.
– silgon
Nov 19 '18 at 9:56
Duplicate of python union of multiple ranges. Please look at the answers of that question, they are exactly what you need, you just have to pass it to pandas.
– silgon
Nov 19 '18 at 9:56
add a comment |
3 Answers
3
active
oldest
votes
df['Startpoint'] = df['stop`'].shift() < df['`start'] # Begin of interval
df['Endpoint'] = df['Startpoint'].shift(-1) # End of interval
df.loc['obj1', 'Startpoint'] = True # First line is startpoint
df['Endpoint'].fillna(True, inplace=True) # Last line is endpoint
df2 = df[df[['Startpoint', 'Endpoint']].any(axis=1)]
df2['`start'] = df2['`start'].shift()
df2.loc[df2['Endpoint'], ['`start', 'stop`']]
# `start stop`
# obj4 5.0 20
# obj6 22.0 30
# obj8 33.0 40
Find all begins and ends of interval, keep only those rows, and then shift the start value one row so that the values per interval are in the same row.
This is all pandas, so I believe it should be reasonably quick.
add a comment |
When the intervals are sorted by starts, this simple function should work in linear time:
def merge_intervals(intervals):
result =
(start_candidate, stop_candidate) = intervals[0]
for (start, stop) in intervals[1:]:
if start <= stop_candidate:
stop_candidate = max(stop, stop_candidate)
else:
result.append((start_candidate, stop_candidate))
(start_candidate, stop_candidate) = (start, stop)
result.append((start_candidate, stop_candidate))
return result
intervals = [
( 5, 10),
( 8, 12),
(11, 14),
(13, 20),
(22, 25),
(24, 30),
(33, 37),
(36, 40),
]
assert merge_intervals(intervals) == [(5, 20), (22, 30), (33, 40)]
add a comment |
The quickest way to deal with this kind of data is to use Union find data structure
or disjoint data structure
which tracks a set of elements partitioned into a number of disjoint subsets.
I am leaving the implementation and design of the data structure on you as there are efficient ways of implementing Disjoint data structures which operates in linear time.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371736%2fquick-algorithm-to-create-union-of-multiple-intervals-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
df['Startpoint'] = df['stop`'].shift() < df['`start'] # Begin of interval
df['Endpoint'] = df['Startpoint'].shift(-1) # End of interval
df.loc['obj1', 'Startpoint'] = True # First line is startpoint
df['Endpoint'].fillna(True, inplace=True) # Last line is endpoint
df2 = df[df[['Startpoint', 'Endpoint']].any(axis=1)]
df2['`start'] = df2['`start'].shift()
df2.loc[df2['Endpoint'], ['`start', 'stop`']]
# `start stop`
# obj4 5.0 20
# obj6 22.0 30
# obj8 33.0 40
Find all begins and ends of interval, keep only those rows, and then shift the start value one row so that the values per interval are in the same row.
This is all pandas, so I believe it should be reasonably quick.
add a comment |
df['Startpoint'] = df['stop`'].shift() < df['`start'] # Begin of interval
df['Endpoint'] = df['Startpoint'].shift(-1) # End of interval
df.loc['obj1', 'Startpoint'] = True # First line is startpoint
df['Endpoint'].fillna(True, inplace=True) # Last line is endpoint
df2 = df[df[['Startpoint', 'Endpoint']].any(axis=1)]
df2['`start'] = df2['`start'].shift()
df2.loc[df2['Endpoint'], ['`start', 'stop`']]
# `start stop`
# obj4 5.0 20
# obj6 22.0 30
# obj8 33.0 40
Find all begins and ends of interval, keep only those rows, and then shift the start value one row so that the values per interval are in the same row.
This is all pandas, so I believe it should be reasonably quick.
add a comment |
df['Startpoint'] = df['stop`'].shift() < df['`start'] # Begin of interval
df['Endpoint'] = df['Startpoint'].shift(-1) # End of interval
df.loc['obj1', 'Startpoint'] = True # First line is startpoint
df['Endpoint'].fillna(True, inplace=True) # Last line is endpoint
df2 = df[df[['Startpoint', 'Endpoint']].any(axis=1)]
df2['`start'] = df2['`start'].shift()
df2.loc[df2['Endpoint'], ['`start', 'stop`']]
# `start stop`
# obj4 5.0 20
# obj6 22.0 30
# obj8 33.0 40
Find all begins and ends of interval, keep only those rows, and then shift the start value one row so that the values per interval are in the same row.
This is all pandas, so I believe it should be reasonably quick.
df['Startpoint'] = df['stop`'].shift() < df['`start'] # Begin of interval
df['Endpoint'] = df['Startpoint'].shift(-1) # End of interval
df.loc['obj1', 'Startpoint'] = True # First line is startpoint
df['Endpoint'].fillna(True, inplace=True) # Last line is endpoint
df2 = df[df[['Startpoint', 'Endpoint']].any(axis=1)]
df2['`start'] = df2['`start'].shift()
df2.loc[df2['Endpoint'], ['`start', 'stop`']]
# `start stop`
# obj4 5.0 20
# obj6 22.0 30
# obj8 33.0 40
Find all begins and ends of interval, keep only those rows, and then shift the start value one row so that the values per interval are in the same row.
This is all pandas, so I believe it should be reasonably quick.
answered Nov 19 '18 at 10:10
JondiedoopJondiedoop
1,729214
1,729214
add a comment |
add a comment |
When the intervals are sorted by starts, this simple function should work in linear time:
def merge_intervals(intervals):
result =
(start_candidate, stop_candidate) = intervals[0]
for (start, stop) in intervals[1:]:
if start <= stop_candidate:
stop_candidate = max(stop, stop_candidate)
else:
result.append((start_candidate, stop_candidate))
(start_candidate, stop_candidate) = (start, stop)
result.append((start_candidate, stop_candidate))
return result
intervals = [
( 5, 10),
( 8, 12),
(11, 14),
(13, 20),
(22, 25),
(24, 30),
(33, 37),
(36, 40),
]
assert merge_intervals(intervals) == [(5, 20), (22, 30), (33, 40)]
add a comment |
When the intervals are sorted by starts, this simple function should work in linear time:
def merge_intervals(intervals):
result =
(start_candidate, stop_candidate) = intervals[0]
for (start, stop) in intervals[1:]:
if start <= stop_candidate:
stop_candidate = max(stop, stop_candidate)
else:
result.append((start_candidate, stop_candidate))
(start_candidate, stop_candidate) = (start, stop)
result.append((start_candidate, stop_candidate))
return result
intervals = [
( 5, 10),
( 8, 12),
(11, 14),
(13, 20),
(22, 25),
(24, 30),
(33, 37),
(36, 40),
]
assert merge_intervals(intervals) == [(5, 20), (22, 30), (33, 40)]
add a comment |
When the intervals are sorted by starts, this simple function should work in linear time:
def merge_intervals(intervals):
result =
(start_candidate, stop_candidate) = intervals[0]
for (start, stop) in intervals[1:]:
if start <= stop_candidate:
stop_candidate = max(stop, stop_candidate)
else:
result.append((start_candidate, stop_candidate))
(start_candidate, stop_candidate) = (start, stop)
result.append((start_candidate, stop_candidate))
return result
intervals = [
( 5, 10),
( 8, 12),
(11, 14),
(13, 20),
(22, 25),
(24, 30),
(33, 37),
(36, 40),
]
assert merge_intervals(intervals) == [(5, 20), (22, 30), (33, 40)]
When the intervals are sorted by starts, this simple function should work in linear time:
def merge_intervals(intervals):
result =
(start_candidate, stop_candidate) = intervals[0]
for (start, stop) in intervals[1:]:
if start <= stop_candidate:
stop_candidate = max(stop, stop_candidate)
else:
result.append((start_candidate, stop_candidate))
(start_candidate, stop_candidate) = (start, stop)
result.append((start_candidate, stop_candidate))
return result
intervals = [
( 5, 10),
( 8, 12),
(11, 14),
(13, 20),
(22, 25),
(24, 30),
(33, 37),
(36, 40),
]
assert merge_intervals(intervals) == [(5, 20), (22, 30), (33, 40)]
answered Nov 19 '18 at 10:04
AristideAristide
1,43811430
1,43811430
add a comment |
add a comment |
The quickest way to deal with this kind of data is to use Union find data structure
or disjoint data structure
which tracks a set of elements partitioned into a number of disjoint subsets.
I am leaving the implementation and design of the data structure on you as there are efficient ways of implementing Disjoint data structures which operates in linear time.
add a comment |
The quickest way to deal with this kind of data is to use Union find data structure
or disjoint data structure
which tracks a set of elements partitioned into a number of disjoint subsets.
I am leaving the implementation and design of the data structure on you as there are efficient ways of implementing Disjoint data structures which operates in linear time.
add a comment |
The quickest way to deal with this kind of data is to use Union find data structure
or disjoint data structure
which tracks a set of elements partitioned into a number of disjoint subsets.
I am leaving the implementation and design of the data structure on you as there are efficient ways of implementing Disjoint data structures which operates in linear time.
The quickest way to deal with this kind of data is to use Union find data structure
or disjoint data structure
which tracks a set of elements partitioned into a number of disjoint subsets.
I am leaving the implementation and design of the data structure on you as there are efficient ways of implementing Disjoint data structures which operates in linear time.
answered Nov 19 '18 at 10:36
mohor chattmohor chatt
925
925
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371736%2fquick-algorithm-to-create-union-of-multiple-intervals-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
What have you tried so far? Have you written any code?
– Yakov Dan
Nov 19 '18 at 9:35
Please specify whether your ranges are always sorted (as in your example) by starts and stops.
– Aristide
Nov 19 '18 at 9:49
2
Duplicate of python union of multiple ranges. Please look at the answers of that question, they are exactly what you need, you just have to pass it to pandas.
– silgon
Nov 19 '18 at 9:56