Filter large list object on data from another large list: slow performance
up vote
5
down vote
favorite
I have two large lists of object. First (about of 1 000 000 objects):
public class BaseItem
{
public BaseItem()
{
}
public double Fee { get; set; } = 0;
public string Market { get; set; } = string.Empty;
public string Traider { get; set; } = string.Empty;
public DateTime DateUtc { get; set; } = new DateTime();
}
Second (about of 20 000 objects):
public class TraiderItem
{
public TraiderItem()
{
}
public DateTime DateUtc { get; set; } = new DateTime();
public string Market { get; set; } = string.Empty;
public string Type { get; set; } = string.Empty;
public double Price { get; set; } = 0;
public double Amount { get; set; } = 0;
public double Total { get; set; } = 0;
public double Fee { get; set; } = 0;
public string FeeCoin { get; set; } = string.Empty;
}
I need to find all Traider
items in Base
items when DateUtc
are equals and Fee
are equals. Now i am using Any method:
traiderItemsInBase = traiderItems.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc && Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
But this way is very-very slow. Is there a way to make this more efficient?Is there possibility to use HashSet in this case?
c# performance
|
show 3 more comments
up vote
5
down vote
favorite
I have two large lists of object. First (about of 1 000 000 objects):
public class BaseItem
{
public BaseItem()
{
}
public double Fee { get; set; } = 0;
public string Market { get; set; } = string.Empty;
public string Traider { get; set; } = string.Empty;
public DateTime DateUtc { get; set; } = new DateTime();
}
Second (about of 20 000 objects):
public class TraiderItem
{
public TraiderItem()
{
}
public DateTime DateUtc { get; set; } = new DateTime();
public string Market { get; set; } = string.Empty;
public string Type { get; set; } = string.Empty;
public double Price { get; set; } = 0;
public double Amount { get; set; } = 0;
public double Total { get; set; } = 0;
public double Fee { get; set; } = 0;
public string FeeCoin { get; set; } = string.Empty;
}
I need to find all Traider
items in Base
items when DateUtc
are equals and Fee
are equals. Now i am using Any method:
traiderItemsInBase = traiderItems.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc && Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
But this way is very-very slow. Is there a way to make this more efficient?Is there possibility to use HashSet in this case?
c# performance
1
Can I ask why this is needing to be done in code? This is what databases are designed for. I know this doesn't address your question specifically (which is why it's a comment), but I feel like you may have lost your way with the design of this somehow. How have you generated 1,000,000+ objects in code? If from a DB, leave them there and let SQL/whatever do the work.
– JᴀʏMᴇᴇ
Nov 8 at 9:29
Why Math.Round((double)a.Fee * 0.4, 8)) ?
– Marco Salerno
Nov 8 at 9:31
Math.Round - it is additional logic
– Konstantin
Nov 8 at 9:34
I agree with database process but i need to implement this one withoud database
– Konstantin
Nov 8 at 9:34
1
@Konstantin - I'm curious as to why, though. I'm not questioning your requirements, I'm genuinely intrigued.
– JᴀʏMᴇᴇ
Nov 8 at 9:36
|
show 3 more comments
up vote
5
down vote
favorite
up vote
5
down vote
favorite
I have two large lists of object. First (about of 1 000 000 objects):
public class BaseItem
{
public BaseItem()
{
}
public double Fee { get; set; } = 0;
public string Market { get; set; } = string.Empty;
public string Traider { get; set; } = string.Empty;
public DateTime DateUtc { get; set; } = new DateTime();
}
Second (about of 20 000 objects):
public class TraiderItem
{
public TraiderItem()
{
}
public DateTime DateUtc { get; set; } = new DateTime();
public string Market { get; set; } = string.Empty;
public string Type { get; set; } = string.Empty;
public double Price { get; set; } = 0;
public double Amount { get; set; } = 0;
public double Total { get; set; } = 0;
public double Fee { get; set; } = 0;
public string FeeCoin { get; set; } = string.Empty;
}
I need to find all Traider
items in Base
items when DateUtc
are equals and Fee
are equals. Now i am using Any method:
traiderItemsInBase = traiderItems.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc && Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
But this way is very-very slow. Is there a way to make this more efficient?Is there possibility to use HashSet in this case?
c# performance
I have two large lists of object. First (about of 1 000 000 objects):
public class BaseItem
{
public BaseItem()
{
}
public double Fee { get; set; } = 0;
public string Market { get; set; } = string.Empty;
public string Traider { get; set; } = string.Empty;
public DateTime DateUtc { get; set; } = new DateTime();
}
Second (about of 20 000 objects):
public class TraiderItem
{
public TraiderItem()
{
}
public DateTime DateUtc { get; set; } = new DateTime();
public string Market { get; set; } = string.Empty;
public string Type { get; set; } = string.Empty;
public double Price { get; set; } = 0;
public double Amount { get; set; } = 0;
public double Total { get; set; } = 0;
public double Fee { get; set; } = 0;
public string FeeCoin { get; set; } = string.Empty;
}
I need to find all Traider
items in Base
items when DateUtc
are equals and Fee
are equals. Now i am using Any method:
traiderItemsInBase = traiderItems.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc && Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
But this way is very-very slow. Is there a way to make this more efficient?Is there possibility to use HashSet in this case?
c# performance
c# performance
asked Nov 8 at 9:17
Konstantin
67711026
67711026
1
Can I ask why this is needing to be done in code? This is what databases are designed for. I know this doesn't address your question specifically (which is why it's a comment), but I feel like you may have lost your way with the design of this somehow. How have you generated 1,000,000+ objects in code? If from a DB, leave them there and let SQL/whatever do the work.
– JᴀʏMᴇᴇ
Nov 8 at 9:29
Why Math.Round((double)a.Fee * 0.4, 8)) ?
– Marco Salerno
Nov 8 at 9:31
Math.Round - it is additional logic
– Konstantin
Nov 8 at 9:34
I agree with database process but i need to implement this one withoud database
– Konstantin
Nov 8 at 9:34
1
@Konstantin - I'm curious as to why, though. I'm not questioning your requirements, I'm genuinely intrigued.
– JᴀʏMᴇᴇ
Nov 8 at 9:36
|
show 3 more comments
1
Can I ask why this is needing to be done in code? This is what databases are designed for. I know this doesn't address your question specifically (which is why it's a comment), but I feel like you may have lost your way with the design of this somehow. How have you generated 1,000,000+ objects in code? If from a DB, leave them there and let SQL/whatever do the work.
– JᴀʏMᴇᴇ
Nov 8 at 9:29
Why Math.Round((double)a.Fee * 0.4, 8)) ?
– Marco Salerno
Nov 8 at 9:31
Math.Round - it is additional logic
– Konstantin
Nov 8 at 9:34
I agree with database process but i need to implement this one withoud database
– Konstantin
Nov 8 at 9:34
1
@Konstantin - I'm curious as to why, though. I'm not questioning your requirements, I'm genuinely intrigued.
– JᴀʏMᴇᴇ
Nov 8 at 9:36
1
1
Can I ask why this is needing to be done in code? This is what databases are designed for. I know this doesn't address your question specifically (which is why it's a comment), but I feel like you may have lost your way with the design of this somehow. How have you generated 1,000,000+ objects in code? If from a DB, leave them there and let SQL/whatever do the work.
– JᴀʏMᴇᴇ
Nov 8 at 9:29
Can I ask why this is needing to be done in code? This is what databases are designed for. I know this doesn't address your question specifically (which is why it's a comment), but I feel like you may have lost your way with the design of this somehow. How have you generated 1,000,000+ objects in code? If from a DB, leave them there and let SQL/whatever do the work.
– JᴀʏMᴇᴇ
Nov 8 at 9:29
Why Math.Round((double)a.Fee * 0.4, 8)) ?
– Marco Salerno
Nov 8 at 9:31
Why Math.Round((double)a.Fee * 0.4, 8)) ?
– Marco Salerno
Nov 8 at 9:31
Math.Round - it is additional logic
– Konstantin
Nov 8 at 9:34
Math.Round - it is additional logic
– Konstantin
Nov 8 at 9:34
I agree with database process but i need to implement this one withoud database
– Konstantin
Nov 8 at 9:34
I agree with database process but i need to implement this one withoud database
– Konstantin
Nov 8 at 9:34
1
1
@Konstantin - I'm curious as to why, though. I'm not questioning your requirements, I'm genuinely intrigued.
– JᴀʏMᴇᴇ
Nov 8 at 9:36
@Konstantin - I'm curious as to why, though. I'm not questioning your requirements, I'm genuinely intrigued.
– JᴀʏMᴇᴇ
Nov 8 at 9:36
|
show 3 more comments
6 Answers
6
active
oldest
votes
up vote
7
down vote
accepted
First I though of a solution with Hashet<>
or Dictionary<>
but that doesn't really fit into this use case. How about speeding it up by using more of your cores / threads with PLINQ AsParallel()
?
traiderItemsInBase = traiderItems.AsParallel()
.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc &&
Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8)))
.ToList();
This should scale pretty good since these operations happen from your memory and not querying a database or another bottleneck. So 4 cores should solve this almost 4x faster.
Just test this one. Sounds good.
– Konstantin
Nov 8 at 9:41
4
AsParallel()
: 600000 Bases, 4000 Traiders - 14 sec; withoutAsParallel()
- 49 sec. Right now trying to test on 20 Cores with Hyper-Threading
– Konstantin
Nov 8 at 9:42
@Konstantin great, thank you for measuring
– fubo
Nov 8 at 9:49
4
20 Cores + Hyper-Threading (40 logical processor) - 2.7 sec =)
– Konstantin
Nov 8 at 9:50
1
Up-voting this, wasn't aware of parralel.
– JᴀʏMᴇᴇ
Nov 8 at 10:43
add a comment |
up vote
1
down vote
Imho main delay - Math.Round - can be decreased by:
1. for x.Fee : Make Facade object for TraiderItem and save once calculated FeeRound=x.Fee in it (or add property for FeeRound in TraiderItem itself). Just this Math round called 1m*20k times and, probably, round is not powerful part of compiler/cpu pair.
2. convert first lambda into function and calc a.Fee in it and pass into baseItems.Any(.....) as parameter like this:
traiderItems.Where(a => { var aFeeRound = Math.Round((double)a.Fee * 0.4, 8);
return baseItems
.Any(x =>
x.DateUtc == a.DateUtc &&
x.FeeRound == aFeeRound);})
.ToList();
This way Math.Round will work only once for every expression. sorry if mistakes, no time for test. Sure, TPL good idea. Good luck!
New contributor
add a comment |
up vote
1
down vote
I have tried some suggestions and this is so far the fastest I could get:
private static void TestForPreCountingParallel(List<TraiderItem> traiderItems, List<BaseItem> baseItems)
{
var watch = new Stopwatch();
watch.Start();
ConcurrentBag<TraiderItem> traiderItemsInBase = null;
for (int i = 0; i < 3; i++)
{
traiderItemsInBase = new ConcurrentBag<TraiderItem>();
var baseFeesRounds = baseItems.Select(bi => Math.Round((double)bi.Fee * 0.4, 8)).ToArray();
Parallel.ForEach(traiderItems, traiderItem =>
{
double traiderFeeRound = Math.Round(traiderItem.Fee, 8);
for (var index = 0; index < baseItems.Count; index++)
{
var baseItem = baseItems[index];
if (traiderItem.DateUtc == baseItem.DateUtc && traiderFeeRound == baseFeesRounds[index])
{
traiderItemsInBase.Add(traiderItem);
break;
}
}
});
Console.WriteLine(i + "," + watch.ElapsedMilliseconds);
}
watch.Stop();
Console.WriteLine("base:{0},traid:{1},res:{2},time:{3}", baseItems.Count, traiderItems.Count,
traiderItemsInBase.Count, watch.ElapsedMilliseconds);
}
Anyone have another improvement?
For the things I have tried, it is like this:
- Original Linq: base:100000,traid:20000,res:40,time:102544
- Converted to foreach loops:
base:100000,traid:20000,res:40,time:43890 - Precounting fees: base:100000,traid:20000,res:40,time:22661
- Parallel outer loop: base:100000,traid:20000,res:40,time:6823
Times are not significant, the trend is what to look at. The benchmark is not perfect, I haven't played much with ratio of TraiderItems inside BaseItems, my own is pretty low as you can see. 40 from 100000.
So just to see some different ratios:
- base:100000,traid:20000,res:400,time:102417
- base:100000,traid:20000,res:400,time:50842
- base:100000,traid:20000,res:400,time:21754
- base:100000,traid:20000,res:400,time:8296
And another:
- base:100000,traid:20000,res:2000,time:118150
- base:100000,traid:20000,res:2000,time:57832
- base:100000,traid:20000,res:2000,time:21659
- base:100000,traid:20000,res:2000,time:7350
I am not an expert, so I have to refer to other sources like:
http://mattwarren.org/2016/09/29/Optimising-LINQ/
What’s the problem with LINQ?
As outlined by Joe Duffy, LINQ introduces inefficiencies in the form
of hidden allocations
So the conclusion is:
- do your own benchmark and try some code changes first if you really
care about the performance. Just adding brute force to inefficient
code is going to cost someone money. - do not use complex LINQ queries for large collection unless you test
the performance. I have got burned on that, the threshold is
surprisingly low (like 10k items with wrong LINQ can kill your
processing time when simple loop is working well).
But I like LINQ a lot and use it frequently.
add a comment |
up vote
0
down vote
You could precalculate the rounded fee on both collections. Maybe group the items by date if they duplicate a lot in largest collection.
Thank you for suggestion, but i need compare dates with time
– Konstantin
Nov 8 at 9:27
1
See edited answer..
– AccessViolation
Nov 8 at 9:27
add a comment |
up vote
0
down vote
Using that LINQ i.e. Any inside Where is almost like O(N^2)
A better approach is to first create a HashSet where Key is like:
DateUtc.ToString("<Format based on matching depth (like Date or upto minutes/secs>")_Fee Rounded.ToString()
and fill it with all the BaseItem object lists (worst case you will have about 1 Million items in HashSet)
(This is equivalent to 1 FOR loop)
Next, Loop through all items in TraiderItem collection (smaller collection) - form the Lookup Key like above. And Check in the HashSet. This is Another For Loop.
Net Time Complexity about - O(N) + O(K) ---> Can improve this by building HashSet in advance or Parallely.
Space Complexity is Higher - but You have too much of Ram now a days :)
add a comment |
up vote
0
down vote
It has few BaseItem, you can group them by date in a dictionnary :
var baseItemsDic = new Dictionary<DateTime, List<BaseItem>>();
foreach(var item in baseItems)
{
if (!baseItemsDic.ContainsKey(item.DateUtc))
baseItemsDic.Add(item.DateUtc, new List<BaseItem>());
baseItemsDic[item.DateUtc].Add(item);
}
var traiderItemsInBase = traiderItems.Where(a => baseItemsDic.ContainsKey(a.DateUtc) && baseItemsDic[a.DateUtc].Any(x => Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
add a comment |
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
accepted
First I though of a solution with Hashet<>
or Dictionary<>
but that doesn't really fit into this use case. How about speeding it up by using more of your cores / threads with PLINQ AsParallel()
?
traiderItemsInBase = traiderItems.AsParallel()
.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc &&
Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8)))
.ToList();
This should scale pretty good since these operations happen from your memory and not querying a database or another bottleneck. So 4 cores should solve this almost 4x faster.
Just test this one. Sounds good.
– Konstantin
Nov 8 at 9:41
4
AsParallel()
: 600000 Bases, 4000 Traiders - 14 sec; withoutAsParallel()
- 49 sec. Right now trying to test on 20 Cores with Hyper-Threading
– Konstantin
Nov 8 at 9:42
@Konstantin great, thank you for measuring
– fubo
Nov 8 at 9:49
4
20 Cores + Hyper-Threading (40 logical processor) - 2.7 sec =)
– Konstantin
Nov 8 at 9:50
1
Up-voting this, wasn't aware of parralel.
– JᴀʏMᴇᴇ
Nov 8 at 10:43
add a comment |
up vote
7
down vote
accepted
First I though of a solution with Hashet<>
or Dictionary<>
but that doesn't really fit into this use case. How about speeding it up by using more of your cores / threads with PLINQ AsParallel()
?
traiderItemsInBase = traiderItems.AsParallel()
.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc &&
Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8)))
.ToList();
This should scale pretty good since these operations happen from your memory and not querying a database or another bottleneck. So 4 cores should solve this almost 4x faster.
Just test this one. Sounds good.
– Konstantin
Nov 8 at 9:41
4
AsParallel()
: 600000 Bases, 4000 Traiders - 14 sec; withoutAsParallel()
- 49 sec. Right now trying to test on 20 Cores with Hyper-Threading
– Konstantin
Nov 8 at 9:42
@Konstantin great, thank you for measuring
– fubo
Nov 8 at 9:49
4
20 Cores + Hyper-Threading (40 logical processor) - 2.7 sec =)
– Konstantin
Nov 8 at 9:50
1
Up-voting this, wasn't aware of parralel.
– JᴀʏMᴇᴇ
Nov 8 at 10:43
add a comment |
up vote
7
down vote
accepted
up vote
7
down vote
accepted
First I though of a solution with Hashet<>
or Dictionary<>
but that doesn't really fit into this use case. How about speeding it up by using more of your cores / threads with PLINQ AsParallel()
?
traiderItemsInBase = traiderItems.AsParallel()
.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc &&
Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8)))
.ToList();
This should scale pretty good since these operations happen from your memory and not querying a database or another bottleneck. So 4 cores should solve this almost 4x faster.
First I though of a solution with Hashet<>
or Dictionary<>
but that doesn't really fit into this use case. How about speeding it up by using more of your cores / threads with PLINQ AsParallel()
?
traiderItemsInBase = traiderItems.AsParallel()
.Where(a => baseItems.Any(x => x.DateUtc == a.DateUtc &&
Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8)))
.ToList();
This should scale pretty good since these operations happen from your memory and not querying a database or another bottleneck. So 4 cores should solve this almost 4x faster.
edited Nov 8 at 11:08
Marco
45.1k10103129
45.1k10103129
answered Nov 8 at 9:29
fubo
28.7k963100
28.7k963100
Just test this one. Sounds good.
– Konstantin
Nov 8 at 9:41
4
AsParallel()
: 600000 Bases, 4000 Traiders - 14 sec; withoutAsParallel()
- 49 sec. Right now trying to test on 20 Cores with Hyper-Threading
– Konstantin
Nov 8 at 9:42
@Konstantin great, thank you for measuring
– fubo
Nov 8 at 9:49
4
20 Cores + Hyper-Threading (40 logical processor) - 2.7 sec =)
– Konstantin
Nov 8 at 9:50
1
Up-voting this, wasn't aware of parralel.
– JᴀʏMᴇᴇ
Nov 8 at 10:43
add a comment |
Just test this one. Sounds good.
– Konstantin
Nov 8 at 9:41
4
AsParallel()
: 600000 Bases, 4000 Traiders - 14 sec; withoutAsParallel()
- 49 sec. Right now trying to test on 20 Cores with Hyper-Threading
– Konstantin
Nov 8 at 9:42
@Konstantin great, thank you for measuring
– fubo
Nov 8 at 9:49
4
20 Cores + Hyper-Threading (40 logical processor) - 2.7 sec =)
– Konstantin
Nov 8 at 9:50
1
Up-voting this, wasn't aware of parralel.
– JᴀʏMᴇᴇ
Nov 8 at 10:43
Just test this one. Sounds good.
– Konstantin
Nov 8 at 9:41
Just test this one. Sounds good.
– Konstantin
Nov 8 at 9:41
4
4
AsParallel()
: 600000 Bases, 4000 Traiders - 14 sec; without AsParallel()
- 49 sec. Right now trying to test on 20 Cores with Hyper-Threading– Konstantin
Nov 8 at 9:42
AsParallel()
: 600000 Bases, 4000 Traiders - 14 sec; without AsParallel()
- 49 sec. Right now trying to test on 20 Cores with Hyper-Threading– Konstantin
Nov 8 at 9:42
@Konstantin great, thank you for measuring
– fubo
Nov 8 at 9:49
@Konstantin great, thank you for measuring
– fubo
Nov 8 at 9:49
4
4
20 Cores + Hyper-Threading (40 logical processor) - 2.7 sec =)
– Konstantin
Nov 8 at 9:50
20 Cores + Hyper-Threading (40 logical processor) - 2.7 sec =)
– Konstantin
Nov 8 at 9:50
1
1
Up-voting this, wasn't aware of parralel.
– JᴀʏMᴇᴇ
Nov 8 at 10:43
Up-voting this, wasn't aware of parralel.
– JᴀʏMᴇᴇ
Nov 8 at 10:43
add a comment |
up vote
1
down vote
Imho main delay - Math.Round - can be decreased by:
1. for x.Fee : Make Facade object for TraiderItem and save once calculated FeeRound=x.Fee in it (or add property for FeeRound in TraiderItem itself). Just this Math round called 1m*20k times and, probably, round is not powerful part of compiler/cpu pair.
2. convert first lambda into function and calc a.Fee in it and pass into baseItems.Any(.....) as parameter like this:
traiderItems.Where(a => { var aFeeRound = Math.Round((double)a.Fee * 0.4, 8);
return baseItems
.Any(x =>
x.DateUtc == a.DateUtc &&
x.FeeRound == aFeeRound);})
.ToList();
This way Math.Round will work only once for every expression. sorry if mistakes, no time for test. Sure, TPL good idea. Good luck!
New contributor
add a comment |
up vote
1
down vote
Imho main delay - Math.Round - can be decreased by:
1. for x.Fee : Make Facade object for TraiderItem and save once calculated FeeRound=x.Fee in it (or add property for FeeRound in TraiderItem itself). Just this Math round called 1m*20k times and, probably, round is not powerful part of compiler/cpu pair.
2. convert first lambda into function and calc a.Fee in it and pass into baseItems.Any(.....) as parameter like this:
traiderItems.Where(a => { var aFeeRound = Math.Round((double)a.Fee * 0.4, 8);
return baseItems
.Any(x =>
x.DateUtc == a.DateUtc &&
x.FeeRound == aFeeRound);})
.ToList();
This way Math.Round will work only once for every expression. sorry if mistakes, no time for test. Sure, TPL good idea. Good luck!
New contributor
add a comment |
up vote
1
down vote
up vote
1
down vote
Imho main delay - Math.Round - can be decreased by:
1. for x.Fee : Make Facade object for TraiderItem and save once calculated FeeRound=x.Fee in it (or add property for FeeRound in TraiderItem itself). Just this Math round called 1m*20k times and, probably, round is not powerful part of compiler/cpu pair.
2. convert first lambda into function and calc a.Fee in it and pass into baseItems.Any(.....) as parameter like this:
traiderItems.Where(a => { var aFeeRound = Math.Round((double)a.Fee * 0.4, 8);
return baseItems
.Any(x =>
x.DateUtc == a.DateUtc &&
x.FeeRound == aFeeRound);})
.ToList();
This way Math.Round will work only once for every expression. sorry if mistakes, no time for test. Sure, TPL good idea. Good luck!
New contributor
Imho main delay - Math.Round - can be decreased by:
1. for x.Fee : Make Facade object for TraiderItem and save once calculated FeeRound=x.Fee in it (or add property for FeeRound in TraiderItem itself). Just this Math round called 1m*20k times and, probably, round is not powerful part of compiler/cpu pair.
2. convert first lambda into function and calc a.Fee in it and pass into baseItems.Any(.....) as parameter like this:
traiderItems.Where(a => { var aFeeRound = Math.Round((double)a.Fee * 0.4, 8);
return baseItems
.Any(x =>
x.DateUtc == a.DateUtc &&
x.FeeRound == aFeeRound);})
.ToList();
This way Math.Round will work only once for every expression. sorry if mistakes, no time for test. Sure, TPL good idea. Good luck!
New contributor
New contributor
answered Nov 8 at 12:01
AndrewF
392
392
New contributor
New contributor
add a comment |
add a comment |
up vote
1
down vote
I have tried some suggestions and this is so far the fastest I could get:
private static void TestForPreCountingParallel(List<TraiderItem> traiderItems, List<BaseItem> baseItems)
{
var watch = new Stopwatch();
watch.Start();
ConcurrentBag<TraiderItem> traiderItemsInBase = null;
for (int i = 0; i < 3; i++)
{
traiderItemsInBase = new ConcurrentBag<TraiderItem>();
var baseFeesRounds = baseItems.Select(bi => Math.Round((double)bi.Fee * 0.4, 8)).ToArray();
Parallel.ForEach(traiderItems, traiderItem =>
{
double traiderFeeRound = Math.Round(traiderItem.Fee, 8);
for (var index = 0; index < baseItems.Count; index++)
{
var baseItem = baseItems[index];
if (traiderItem.DateUtc == baseItem.DateUtc && traiderFeeRound == baseFeesRounds[index])
{
traiderItemsInBase.Add(traiderItem);
break;
}
}
});
Console.WriteLine(i + "," + watch.ElapsedMilliseconds);
}
watch.Stop();
Console.WriteLine("base:{0},traid:{1},res:{2},time:{3}", baseItems.Count, traiderItems.Count,
traiderItemsInBase.Count, watch.ElapsedMilliseconds);
}
Anyone have another improvement?
For the things I have tried, it is like this:
- Original Linq: base:100000,traid:20000,res:40,time:102544
- Converted to foreach loops:
base:100000,traid:20000,res:40,time:43890 - Precounting fees: base:100000,traid:20000,res:40,time:22661
- Parallel outer loop: base:100000,traid:20000,res:40,time:6823
Times are not significant, the trend is what to look at. The benchmark is not perfect, I haven't played much with ratio of TraiderItems inside BaseItems, my own is pretty low as you can see. 40 from 100000.
So just to see some different ratios:
- base:100000,traid:20000,res:400,time:102417
- base:100000,traid:20000,res:400,time:50842
- base:100000,traid:20000,res:400,time:21754
- base:100000,traid:20000,res:400,time:8296
And another:
- base:100000,traid:20000,res:2000,time:118150
- base:100000,traid:20000,res:2000,time:57832
- base:100000,traid:20000,res:2000,time:21659
- base:100000,traid:20000,res:2000,time:7350
I am not an expert, so I have to refer to other sources like:
http://mattwarren.org/2016/09/29/Optimising-LINQ/
What’s the problem with LINQ?
As outlined by Joe Duffy, LINQ introduces inefficiencies in the form
of hidden allocations
So the conclusion is:
- do your own benchmark and try some code changes first if you really
care about the performance. Just adding brute force to inefficient
code is going to cost someone money. - do not use complex LINQ queries for large collection unless you test
the performance. I have got burned on that, the threshold is
surprisingly low (like 10k items with wrong LINQ can kill your
processing time when simple loop is working well).
But I like LINQ a lot and use it frequently.
add a comment |
up vote
1
down vote
I have tried some suggestions and this is so far the fastest I could get:
private static void TestForPreCountingParallel(List<TraiderItem> traiderItems, List<BaseItem> baseItems)
{
var watch = new Stopwatch();
watch.Start();
ConcurrentBag<TraiderItem> traiderItemsInBase = null;
for (int i = 0; i < 3; i++)
{
traiderItemsInBase = new ConcurrentBag<TraiderItem>();
var baseFeesRounds = baseItems.Select(bi => Math.Round((double)bi.Fee * 0.4, 8)).ToArray();
Parallel.ForEach(traiderItems, traiderItem =>
{
double traiderFeeRound = Math.Round(traiderItem.Fee, 8);
for (var index = 0; index < baseItems.Count; index++)
{
var baseItem = baseItems[index];
if (traiderItem.DateUtc == baseItem.DateUtc && traiderFeeRound == baseFeesRounds[index])
{
traiderItemsInBase.Add(traiderItem);
break;
}
}
});
Console.WriteLine(i + "," + watch.ElapsedMilliseconds);
}
watch.Stop();
Console.WriteLine("base:{0},traid:{1},res:{2},time:{3}", baseItems.Count, traiderItems.Count,
traiderItemsInBase.Count, watch.ElapsedMilliseconds);
}
Anyone have another improvement?
For the things I have tried, it is like this:
- Original Linq: base:100000,traid:20000,res:40,time:102544
- Converted to foreach loops:
base:100000,traid:20000,res:40,time:43890 - Precounting fees: base:100000,traid:20000,res:40,time:22661
- Parallel outer loop: base:100000,traid:20000,res:40,time:6823
Times are not significant, the trend is what to look at. The benchmark is not perfect, I haven't played much with ratio of TraiderItems inside BaseItems, my own is pretty low as you can see. 40 from 100000.
So just to see some different ratios:
- base:100000,traid:20000,res:400,time:102417
- base:100000,traid:20000,res:400,time:50842
- base:100000,traid:20000,res:400,time:21754
- base:100000,traid:20000,res:400,time:8296
And another:
- base:100000,traid:20000,res:2000,time:118150
- base:100000,traid:20000,res:2000,time:57832
- base:100000,traid:20000,res:2000,time:21659
- base:100000,traid:20000,res:2000,time:7350
I am not an expert, so I have to refer to other sources like:
http://mattwarren.org/2016/09/29/Optimising-LINQ/
What’s the problem with LINQ?
As outlined by Joe Duffy, LINQ introduces inefficiencies in the form
of hidden allocations
So the conclusion is:
- do your own benchmark and try some code changes first if you really
care about the performance. Just adding brute force to inefficient
code is going to cost someone money. - do not use complex LINQ queries for large collection unless you test
the performance. I have got burned on that, the threshold is
surprisingly low (like 10k items with wrong LINQ can kill your
processing time when simple loop is working well).
But I like LINQ a lot and use it frequently.
add a comment |
up vote
1
down vote
up vote
1
down vote
I have tried some suggestions and this is so far the fastest I could get:
private static void TestForPreCountingParallel(List<TraiderItem> traiderItems, List<BaseItem> baseItems)
{
var watch = new Stopwatch();
watch.Start();
ConcurrentBag<TraiderItem> traiderItemsInBase = null;
for (int i = 0; i < 3; i++)
{
traiderItemsInBase = new ConcurrentBag<TraiderItem>();
var baseFeesRounds = baseItems.Select(bi => Math.Round((double)bi.Fee * 0.4, 8)).ToArray();
Parallel.ForEach(traiderItems, traiderItem =>
{
double traiderFeeRound = Math.Round(traiderItem.Fee, 8);
for (var index = 0; index < baseItems.Count; index++)
{
var baseItem = baseItems[index];
if (traiderItem.DateUtc == baseItem.DateUtc && traiderFeeRound == baseFeesRounds[index])
{
traiderItemsInBase.Add(traiderItem);
break;
}
}
});
Console.WriteLine(i + "," + watch.ElapsedMilliseconds);
}
watch.Stop();
Console.WriteLine("base:{0},traid:{1},res:{2},time:{3}", baseItems.Count, traiderItems.Count,
traiderItemsInBase.Count, watch.ElapsedMilliseconds);
}
Anyone have another improvement?
For the things I have tried, it is like this:
- Original Linq: base:100000,traid:20000,res:40,time:102544
- Converted to foreach loops:
base:100000,traid:20000,res:40,time:43890 - Precounting fees: base:100000,traid:20000,res:40,time:22661
- Parallel outer loop: base:100000,traid:20000,res:40,time:6823
Times are not significant, the trend is what to look at. The benchmark is not perfect, I haven't played much with ratio of TraiderItems inside BaseItems, my own is pretty low as you can see. 40 from 100000.
So just to see some different ratios:
- base:100000,traid:20000,res:400,time:102417
- base:100000,traid:20000,res:400,time:50842
- base:100000,traid:20000,res:400,time:21754
- base:100000,traid:20000,res:400,time:8296
And another:
- base:100000,traid:20000,res:2000,time:118150
- base:100000,traid:20000,res:2000,time:57832
- base:100000,traid:20000,res:2000,time:21659
- base:100000,traid:20000,res:2000,time:7350
I am not an expert, so I have to refer to other sources like:
http://mattwarren.org/2016/09/29/Optimising-LINQ/
What’s the problem with LINQ?
As outlined by Joe Duffy, LINQ introduces inefficiencies in the form
of hidden allocations
So the conclusion is:
- do your own benchmark and try some code changes first if you really
care about the performance. Just adding brute force to inefficient
code is going to cost someone money. - do not use complex LINQ queries for large collection unless you test
the performance. I have got burned on that, the threshold is
surprisingly low (like 10k items with wrong LINQ can kill your
processing time when simple loop is working well).
But I like LINQ a lot and use it frequently.
I have tried some suggestions and this is so far the fastest I could get:
private static void TestForPreCountingParallel(List<TraiderItem> traiderItems, List<BaseItem> baseItems)
{
var watch = new Stopwatch();
watch.Start();
ConcurrentBag<TraiderItem> traiderItemsInBase = null;
for (int i = 0; i < 3; i++)
{
traiderItemsInBase = new ConcurrentBag<TraiderItem>();
var baseFeesRounds = baseItems.Select(bi => Math.Round((double)bi.Fee * 0.4, 8)).ToArray();
Parallel.ForEach(traiderItems, traiderItem =>
{
double traiderFeeRound = Math.Round(traiderItem.Fee, 8);
for (var index = 0; index < baseItems.Count; index++)
{
var baseItem = baseItems[index];
if (traiderItem.DateUtc == baseItem.DateUtc && traiderFeeRound == baseFeesRounds[index])
{
traiderItemsInBase.Add(traiderItem);
break;
}
}
});
Console.WriteLine(i + "," + watch.ElapsedMilliseconds);
}
watch.Stop();
Console.WriteLine("base:{0},traid:{1},res:{2},time:{3}", baseItems.Count, traiderItems.Count,
traiderItemsInBase.Count, watch.ElapsedMilliseconds);
}
Anyone have another improvement?
For the things I have tried, it is like this:
- Original Linq: base:100000,traid:20000,res:40,time:102544
- Converted to foreach loops:
base:100000,traid:20000,res:40,time:43890 - Precounting fees: base:100000,traid:20000,res:40,time:22661
- Parallel outer loop: base:100000,traid:20000,res:40,time:6823
Times are not significant, the trend is what to look at. The benchmark is not perfect, I haven't played much with ratio of TraiderItems inside BaseItems, my own is pretty low as you can see. 40 from 100000.
So just to see some different ratios:
- base:100000,traid:20000,res:400,time:102417
- base:100000,traid:20000,res:400,time:50842
- base:100000,traid:20000,res:400,time:21754
- base:100000,traid:20000,res:400,time:8296
And another:
- base:100000,traid:20000,res:2000,time:118150
- base:100000,traid:20000,res:2000,time:57832
- base:100000,traid:20000,res:2000,time:21659
- base:100000,traid:20000,res:2000,time:7350
I am not an expert, so I have to refer to other sources like:
http://mattwarren.org/2016/09/29/Optimising-LINQ/
What’s the problem with LINQ?
As outlined by Joe Duffy, LINQ introduces inefficiencies in the form
of hidden allocations
So the conclusion is:
- do your own benchmark and try some code changes first if you really
care about the performance. Just adding brute force to inefficient
code is going to cost someone money. - do not use complex LINQ queries for large collection unless you test
the performance. I have got burned on that, the threshold is
surprisingly low (like 10k items with wrong LINQ can kill your
processing time when simple loop is working well).
But I like LINQ a lot and use it frequently.
answered Nov 8 at 15:33
Bobo
30719
30719
add a comment |
add a comment |
up vote
0
down vote
You could precalculate the rounded fee on both collections. Maybe group the items by date if they duplicate a lot in largest collection.
Thank you for suggestion, but i need compare dates with time
– Konstantin
Nov 8 at 9:27
1
See edited answer..
– AccessViolation
Nov 8 at 9:27
add a comment |
up vote
0
down vote
You could precalculate the rounded fee on both collections. Maybe group the items by date if they duplicate a lot in largest collection.
Thank you for suggestion, but i need compare dates with time
– Konstantin
Nov 8 at 9:27
1
See edited answer..
– AccessViolation
Nov 8 at 9:27
add a comment |
up vote
0
down vote
up vote
0
down vote
You could precalculate the rounded fee on both collections. Maybe group the items by date if they duplicate a lot in largest collection.
You could precalculate the rounded fee on both collections. Maybe group the items by date if they duplicate a lot in largest collection.
edited Nov 8 at 9:27
answered Nov 8 at 9:24
AccessViolation
1717
1717
Thank you for suggestion, but i need compare dates with time
– Konstantin
Nov 8 at 9:27
1
See edited answer..
– AccessViolation
Nov 8 at 9:27
add a comment |
Thank you for suggestion, but i need compare dates with time
– Konstantin
Nov 8 at 9:27
1
See edited answer..
– AccessViolation
Nov 8 at 9:27
Thank you for suggestion, but i need compare dates with time
– Konstantin
Nov 8 at 9:27
Thank you for suggestion, but i need compare dates with time
– Konstantin
Nov 8 at 9:27
1
1
See edited answer..
– AccessViolation
Nov 8 at 9:27
See edited answer..
– AccessViolation
Nov 8 at 9:27
add a comment |
up vote
0
down vote
Using that LINQ i.e. Any inside Where is almost like O(N^2)
A better approach is to first create a HashSet where Key is like:
DateUtc.ToString("<Format based on matching depth (like Date or upto minutes/secs>")_Fee Rounded.ToString()
and fill it with all the BaseItem object lists (worst case you will have about 1 Million items in HashSet)
(This is equivalent to 1 FOR loop)
Next, Loop through all items in TraiderItem collection (smaller collection) - form the Lookup Key like above. And Check in the HashSet. This is Another For Loop.
Net Time Complexity about - O(N) + O(K) ---> Can improve this by building HashSet in advance or Parallely.
Space Complexity is Higher - but You have too much of Ram now a days :)
add a comment |
up vote
0
down vote
Using that LINQ i.e. Any inside Where is almost like O(N^2)
A better approach is to first create a HashSet where Key is like:
DateUtc.ToString("<Format based on matching depth (like Date or upto minutes/secs>")_Fee Rounded.ToString()
and fill it with all the BaseItem object lists (worst case you will have about 1 Million items in HashSet)
(This is equivalent to 1 FOR loop)
Next, Loop through all items in TraiderItem collection (smaller collection) - form the Lookup Key like above. And Check in the HashSet. This is Another For Loop.
Net Time Complexity about - O(N) + O(K) ---> Can improve this by building HashSet in advance or Parallely.
Space Complexity is Higher - but You have too much of Ram now a days :)
add a comment |
up vote
0
down vote
up vote
0
down vote
Using that LINQ i.e. Any inside Where is almost like O(N^2)
A better approach is to first create a HashSet where Key is like:
DateUtc.ToString("<Format based on matching depth (like Date or upto minutes/secs>")_Fee Rounded.ToString()
and fill it with all the BaseItem object lists (worst case you will have about 1 Million items in HashSet)
(This is equivalent to 1 FOR loop)
Next, Loop through all items in TraiderItem collection (smaller collection) - form the Lookup Key like above. And Check in the HashSet. This is Another For Loop.
Net Time Complexity about - O(N) + O(K) ---> Can improve this by building HashSet in advance or Parallely.
Space Complexity is Higher - but You have too much of Ram now a days :)
Using that LINQ i.e. Any inside Where is almost like O(N^2)
A better approach is to first create a HashSet where Key is like:
DateUtc.ToString("<Format based on matching depth (like Date or upto minutes/secs>")_Fee Rounded.ToString()
and fill it with all the BaseItem object lists (worst case you will have about 1 Million items in HashSet)
(This is equivalent to 1 FOR loop)
Next, Loop through all items in TraiderItem collection (smaller collection) - form the Lookup Key like above. And Check in the HashSet. This is Another For Loop.
Net Time Complexity about - O(N) + O(K) ---> Can improve this by building HashSet in advance or Parallely.
Space Complexity is Higher - but You have too much of Ram now a days :)
answered Nov 8 at 9:27
Prateek Shrivastava
773511
773511
add a comment |
add a comment |
up vote
0
down vote
It has few BaseItem, you can group them by date in a dictionnary :
var baseItemsDic = new Dictionary<DateTime, List<BaseItem>>();
foreach(var item in baseItems)
{
if (!baseItemsDic.ContainsKey(item.DateUtc))
baseItemsDic.Add(item.DateUtc, new List<BaseItem>());
baseItemsDic[item.DateUtc].Add(item);
}
var traiderItemsInBase = traiderItems.Where(a => baseItemsDic.ContainsKey(a.DateUtc) && baseItemsDic[a.DateUtc].Any(x => Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
add a comment |
up vote
0
down vote
It has few BaseItem, you can group them by date in a dictionnary :
var baseItemsDic = new Dictionary<DateTime, List<BaseItem>>();
foreach(var item in baseItems)
{
if (!baseItemsDic.ContainsKey(item.DateUtc))
baseItemsDic.Add(item.DateUtc, new List<BaseItem>());
baseItemsDic[item.DateUtc].Add(item);
}
var traiderItemsInBase = traiderItems.Where(a => baseItemsDic.ContainsKey(a.DateUtc) && baseItemsDic[a.DateUtc].Any(x => Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
add a comment |
up vote
0
down vote
up vote
0
down vote
It has few BaseItem, you can group them by date in a dictionnary :
var baseItemsDic = new Dictionary<DateTime, List<BaseItem>>();
foreach(var item in baseItems)
{
if (!baseItemsDic.ContainsKey(item.DateUtc))
baseItemsDic.Add(item.DateUtc, new List<BaseItem>());
baseItemsDic[item.DateUtc].Add(item);
}
var traiderItemsInBase = traiderItems.Where(a => baseItemsDic.ContainsKey(a.DateUtc) && baseItemsDic[a.DateUtc].Any(x => Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
It has few BaseItem, you can group them by date in a dictionnary :
var baseItemsDic = new Dictionary<DateTime, List<BaseItem>>();
foreach(var item in baseItems)
{
if (!baseItemsDic.ContainsKey(item.DateUtc))
baseItemsDic.Add(item.DateUtc, new List<BaseItem>());
baseItemsDic[item.DateUtc].Add(item);
}
var traiderItemsInBase = traiderItems.Where(a => baseItemsDic.ContainsKey(a.DateUtc) && baseItemsDic[a.DateUtc].Any(x => Math.Round(x.Fee, 8) == Math.Round((double)a.Fee * 0.4, 8))).ToList();
edited Nov 8 at 9:50
answered Nov 8 at 9:43
Orwel
407416
407416
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53204662%2ffilter-large-list-object-on-data-from-another-large-list-slow-performance%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Can I ask why this is needing to be done in code? This is what databases are designed for. I know this doesn't address your question specifically (which is why it's a comment), but I feel like you may have lost your way with the design of this somehow. How have you generated 1,000,000+ objects in code? If from a DB, leave them there and let SQL/whatever do the work.
– JᴀʏMᴇᴇ
Nov 8 at 9:29
Why Math.Round((double)a.Fee * 0.4, 8)) ?
– Marco Salerno
Nov 8 at 9:31
Math.Round - it is additional logic
– Konstantin
Nov 8 at 9:34
I agree with database process but i need to implement this one withoud database
– Konstantin
Nov 8 at 9:34
1
@Konstantin - I'm curious as to why, though. I'm not questioning your requirements, I'm genuinely intrigued.
– JᴀʏMᴇᴇ
Nov 8 at 9:36