Why is python `in` much faster than `np.isin`

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I'm implementing some search algorithm using numpy where one step is to check weather a vector is in a matrix (as row). I used to use np.isin before, but I suddenly become curious will the python keyword in work. I therefore tested it and find it do works.

Since I didn't find any python interface for in (like __add__ for + or __abs__ for abs), I believe in is hard-wired in python by using standard iterator logic, therefore it should be slower compared to the numpy-provided np.isin. But after I did some testing, unbelievably:

>>> a = np.int8(1)

>>> A = np.zeros(2**24, 'b')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

21.7 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

310 ms ± 20.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is 10+ times slower than python in for small data type. I also did a test for big data type

>>> a = np.ones(1, 'V256')

>>> A = np.zeros(2**22, 'V256')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

129 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

10.5 s ± 184 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is ~100 times slower.

I'm wondering what could be the reason for this. Note since a=1 while A=[0,0,...], the match will have to be done on the whole array. There's no such thing as "early exit" on python side.

EDIT Oh actually there is python interface for in called __contains__. But still why would np.isin be way slower than np.ndarray.__contains__?

edited Nov 22 '18 at 4:43

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

"I therefore tested it and find it do works." - nope.

– user2357112
Nov 22 '18 at 4:38

"Since I didn't find any python interface for in" - you were probably looking in the wrong places.

– user2357112
Nov 22 '18 at 4:39

@user2357112 I know it only works on scalar. You can make it work by viewing the rows as a scalar. Therefore it becomes identical to testing in on VsomeNumber data type

– ZisIsNotZis
Nov 22 '18 at 4:41

isin is Python code which you can read. Or read the whole arraysetops.py. I think this code is written more for convenience than for performance. Arrays aren't optimal for search tasks.

– hpaulj
Nov 22 '18 at 6:49

add a comment |

>>> a = np.int8(1)

>>> A = np.zeros(2**24, 'b')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

21.7 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

310 ms ± 20.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is 10+ times slower than python in for small data type. I also did a test for big data type

>>> a = np.ones(1, 'V256')

>>> A = np.zeros(2**22, 'V256')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

129 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

10.5 s ± 184 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is ~100 times slower.

I'm wondering what could be the reason for this. Note since a=1 while A=[0,0,...], the match will have to be done on the whole array. There's no such thing as "early exit" on python side.

EDIT Oh actually there is python interface for in called __contains__. But still why would np.isin be way slower than np.ndarray.__contains__?

edited Nov 22 '18 at 4:43

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

"I therefore tested it and find it do works." - nope.

– user2357112
Nov 22 '18 at 4:38

"Since I didn't find any python interface for in" - you were probably looking in the wrong places.

– user2357112
Nov 22 '18 at 4:39

@user2357112 I know it only works on scalar. You can make it work by viewing the rows as a scalar. Therefore it becomes identical to testing in on VsomeNumber data type

– ZisIsNotZis
Nov 22 '18 at 4:41

isin is Python code which you can read. Or read the whole arraysetops.py. I think this code is written more for convenience than for performance. Arrays aren't optimal for search tasks.

– hpaulj
Nov 22 '18 at 6:49

add a comment |

>>> a = np.int8(1)

>>> A = np.zeros(2**24, 'b')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

21.7 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

310 ms ± 20.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is 10+ times slower than python in for small data type. I also did a test for big data type

>>> a = np.ones(1, 'V256')

>>> A = np.zeros(2**22, 'V256')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

129 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

10.5 s ± 184 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is ~100 times slower.

I'm wondering what could be the reason for this. Note since a=1 while A=[0,0,...], the match will have to be done on the whole array. There's no such thing as "early exit" on python side.

EDIT Oh actually there is python interface for in called __contains__. But still why would np.isin be way slower than np.ndarray.__contains__?

edited Nov 22 '18 at 4:43

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

>>> a = np.int8(1)

>>> A = np.zeros(2**24, 'b')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

21.7 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

310 ms ± 20.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is 10+ times slower than python in for small data type. I also did a test for big data type

>>> a = np.ones(1, 'V256')

>>> A = np.zeros(2**22, 'V256')

>>> %timeit a in A

>>> %timeit np.isin(a, A)

129 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

10.5 s ± 184 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which sais np.isin is ~100 times slower.

I'm wondering what could be the reason for this. Note since a=1 while A=[0,0,...], the match will have to be done on the whole array. There's no such thing as "early exit" on python side.

EDIT Oh actually there is python interface for in called __contains__. But still why would np.isin be way slower than np.ndarray.__contains__?

python numpy

edited Nov 22 '18 at 4:43

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

edited Nov 22 '18 at 4:43

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

edited Nov 22 '18 at 4:43

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

asked Nov 22 '18 at 4:31

ZisIsNotZis

745620

"I therefore tested it and find it do works." - nope.

– user2357112
Nov 22 '18 at 4:38

"Since I didn't find any python interface for in" - you were probably looking in the wrong places.

– user2357112
Nov 22 '18 at 4:39

@user2357112 I know it only works on scalar. You can make it work by viewing the rows as a scalar. Therefore it becomes identical to testing in on VsomeNumber data type

– ZisIsNotZis
Nov 22 '18 at 4:41

isin is Python code which you can read. Or read the whole arraysetops.py. I think this code is written more for convenience than for performance. Arrays aren't optimal for search tasks.

– hpaulj
Nov 22 '18 at 6:49

add a comment |

"I therefore tested it and find it do works." - nope.

– user2357112
Nov 22 '18 at 4:38

"Since I didn't find any python interface for in" - you were probably looking in the wrong places.

– user2357112
Nov 22 '18 at 4:39

@user2357112 I know it only works on scalar. You can make it work by viewing the rows as a scalar. Therefore it becomes identical to testing in on VsomeNumber data type

– ZisIsNotZis
Nov 22 '18 at 4:41

isin is Python code which you can read. Or read the whole arraysetops.py. I think this code is written more for convenience than for performance. Arrays aren't optimal for search tasks.

– hpaulj
Nov 22 '18 at 6:49

"I therefore tested it and find it do works." - nope.

– user2357112
Nov 22 '18 at 4:38

"Since I didn't find any python interface for in" - you were probably looking in the wrong places.

– user2357112
Nov 22 '18 at 4:39

@user2357112 I know it only works on scalar. You can make it work by viewing the rows as a scalar. Therefore it becomes identical to testing in on VsomeNumber data type

– ZisIsNotZis
Nov 22 '18 at 4:41

isin is Python code which you can read. Or read the whole arraysetops.py. I think this code is written more for convenience than for performance. Arrays aren't optimal for search tasks.

– hpaulj
Nov 22 '18 at 6:49

add a comment |

2 Answers
2

active

oldest

votes

numpy.ndarray.__contains__ is basically just (elem == arr).any() (even when that doesn't make sense). You can take a look at the source, which is very short and simple for a NumPy C routine.

numpy.isin broadcasts over its left operand, and it's optimized for efficiency in the broadcasting case. For a small left operand, it will use an approach based on sorting, which is overkill for a scalar. It currently has no fast path for the left operand being a scalar, or for the left hand being an array small enough that sorting is more expensive than a naive approach.

answered Nov 22 '18 at 4:49

user2357112

158k13177269

add a comment |

-1

My answer is not as asked. May be give you some idea. Generally, the big idea behind getting good performance from numpy is to amortize the cost of the interpreter over many elements at a time. In other words, move the loops from python code (slow) into C/Fortran loops somewhere in the numpy/BLAS/LAPACK/etc. internals (fast). If you succeed in that operation (called vectorization) performance will usually be quite good.

Of course, you can obviously get even better performance by dumping the python interpreter and using, say, C++ instead. Whether this approach actually succeeds or not depends on how good you are at high performance programming with C++ vs. numpy, and what operation exactly you're trying to do.

answered Nov 22 '18 at 4:42

user10468005

296

"high performance programming" is a concept I have never heard before.

– b-fg
Nov 22 '18 at 5:18

The highers performance is machine code or assembly language (different representations of the same thing.) You're not limited by any constraints other than the harware capabilities of the device. (IOW, there's no "make toast" instruction.) After that, which language has the best performance is like asking which hand tool has the best performance. It's difficult to cut wood with a hammer or drive a nail with a saw. Some languages have better performance doing some things, some have better performance doing other things.

– user10468005
Nov 22 '18 at 6:14

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53423924%2fwhy-is-python-in-much-faster-than-np-isin%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

numpy.ndarray.__contains__ is basically just (elem == arr).any() (even when that doesn't make sense). You can take a look at the source, which is very short and simple for a NumPy C routine.

answered Nov 22 '18 at 4:49

user2357112

158k13177269

add a comment |

numpy.ndarray.__contains__ is basically just (elem == arr).any() (even when that doesn't make sense). You can take a look at the source, which is very short and simple for a NumPy C routine.

answered Nov 22 '18 at 4:49

user2357112

158k13177269

add a comment |

numpy.ndarray.__contains__ is basically just (elem == arr).any() (even when that doesn't make sense). You can take a look at the source, which is very short and simple for a NumPy C routine.

answered Nov 22 '18 at 4:49

user2357112

158k13177269

numpy.ndarray.__contains__ is basically just (elem == arr).any() (even when that doesn't make sense). You can take a look at the source, which is very short and simple for a NumPy C routine.

answered Nov 22 '18 at 4:49

user2357112

158k13177269

answered Nov 22 '18 at 4:49

user2357112

158k13177269

answered Nov 22 '18 at 4:49

user2357112

158k13177269

answered Nov 22 '18 at 4:49

user2357112

158k13177269

add a comment |

-1

answered Nov 22 '18 at 4:42

user10468005

296

"high performance programming" is a concept I have never heard before.

– b-fg
Nov 22 '18 at 5:18

The highers performance is machine code or assembly language (different representations of the same thing.) You're not limited by any constraints other than the harware capabilities of the device. (IOW, there's no "make toast" instruction.) After that, which language has the best performance is like asking which hand tool has the best performance. It's difficult to cut wood with a hammer or drive a nail with a saw. Some languages have better performance doing some things, some have better performance doing other things.

– user10468005
Nov 22 '18 at 6:14

add a comment |

-1

answered Nov 22 '18 at 4:42

user10468005

296

"high performance programming" is a concept I have never heard before.

– b-fg
Nov 22 '18 at 5:18

The highers performance is machine code or assembly language (different representations of the same thing.) You're not limited by any constraints other than the harware capabilities of the device. (IOW, there's no "make toast" instruction.) After that, which language has the best performance is like asking which hand tool has the best performance. It's difficult to cut wood with a hammer or drive a nail with a saw. Some languages have better performance doing some things, some have better performance doing other things.

– user10468005
Nov 22 '18 at 6:14

add a comment |

-1

answered Nov 22 '18 at 4:42

user10468005

296

answered Nov 22 '18 at 4:42

user10468005

296

answered Nov 22 '18 at 4:42

user10468005

296

answered Nov 22 '18 at 4:42

user10468005

296

answered Nov 22 '18 at 4:42

user10468005

296

"high performance programming" is a concept I have never heard before.

– b-fg
Nov 22 '18 at 5:18

The highers performance is machine code or assembly language (different representations of the same thing.) You're not limited by any constraints other than the harware capabilities of the device. (IOW, there's no "make toast" instruction.) After that, which language has the best performance is like asking which hand tool has the best performance. It's difficult to cut wood with a hammer or drive a nail with a saw. Some languages have better performance doing some things, some have better performance doing other things.

– user10468005
Nov 22 '18 at 6:14

add a comment |

"high performance programming" is a concept I have never heard before.

– b-fg
Nov 22 '18 at 5:18

The highers performance is machine code or assembly language (different representations of the same thing.) You're not limited by any constraints other than the harware capabilities of the device. (IOW, there's no "make toast" instruction.) After that, which language has the best performance is like asking which hand tool has the best performance. It's difficult to cut wood with a hammer or drive a nail with a saw. Some languages have better performance doing some things, some have better performance doing other things.

– user10468005
Nov 22 '18 at 6:14

"high performance programming" is a concept I have never heard before.

– b-fg
Nov 22 '18 at 5:18

The highers performance is machine code or assembly language (different representations of the same thing.) You're not limited by any constraints other than the harware capabilities of the device. (IOW, there's no "make toast" instruction.) After that, which language has the best performance is like asking which hand tool has the best performance. It's difficult to cut wood with a hammer or drive a nail with a saw. Some languages have better performance doing some things, some have better performance doing other things.

– user10468005
Nov 22 '18 at 6:14

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk