Search values using Index in mongodb












2














I am new to Mongodb and wish to implement search on field in mongo collection.



I have the following structure for my test collection:-



{
'key': <unique key>,
'val_arr': [
['laptop', 'macbook pro', '16gb', 'i9', 'spacegrey'],
['cellphone', 'iPhone', '4gb', 't2', 'rose gold'],
['laptop', 'macbook air', '8gb', 'i5', 'black'],
['router', 'huawei', '10x10', 'white'],
['laptop', 'macbook', '8gb', 'i5', 'silve'],
}


And I wish to find them based on index number and value, i.e.
Find the entry where first element in any of the val_arr is laptop and 3rd element's value is 8gb.



I tried looking at composite indexes in mongodb, but they have a limit of 32 keys to be indexed. Any help in this direction is appreciated.










share|improve this question
























  • Is there something in the provided answer that you believe does not address your question? If so then please comment on the answer to clarify what exactly needs to be addressed that has not. If it does in fact answer the question you asked then please note to Accept your Answers to the questions you ask
    – Neil Lunn
    Nov 19 '18 at 3:43
















2














I am new to Mongodb and wish to implement search on field in mongo collection.



I have the following structure for my test collection:-



{
'key': <unique key>,
'val_arr': [
['laptop', 'macbook pro', '16gb', 'i9', 'spacegrey'],
['cellphone', 'iPhone', '4gb', 't2', 'rose gold'],
['laptop', 'macbook air', '8gb', 'i5', 'black'],
['router', 'huawei', '10x10', 'white'],
['laptop', 'macbook', '8gb', 'i5', 'silve'],
}


And I wish to find them based on index number and value, i.e.
Find the entry where first element in any of the val_arr is laptop and 3rd element's value is 8gb.



I tried looking at composite indexes in mongodb, but they have a limit of 32 keys to be indexed. Any help in this direction is appreciated.










share|improve this question
























  • Is there something in the provided answer that you believe does not address your question? If so then please comment on the answer to clarify what exactly needs to be addressed that has not. If it does in fact answer the question you asked then please note to Accept your Answers to the questions you ask
    – Neil Lunn
    Nov 19 '18 at 3:43














2












2








2







I am new to Mongodb and wish to implement search on field in mongo collection.



I have the following structure for my test collection:-



{
'key': <unique key>,
'val_arr': [
['laptop', 'macbook pro', '16gb', 'i9', 'spacegrey'],
['cellphone', 'iPhone', '4gb', 't2', 'rose gold'],
['laptop', 'macbook air', '8gb', 'i5', 'black'],
['router', 'huawei', '10x10', 'white'],
['laptop', 'macbook', '8gb', 'i5', 'silve'],
}


And I wish to find them based on index number and value, i.e.
Find the entry where first element in any of the val_arr is laptop and 3rd element's value is 8gb.



I tried looking at composite indexes in mongodb, but they have a limit of 32 keys to be indexed. Any help in this direction is appreciated.










share|improve this question















I am new to Mongodb and wish to implement search on field in mongo collection.



I have the following structure for my test collection:-



{
'key': <unique key>,
'val_arr': [
['laptop', 'macbook pro', '16gb', 'i9', 'spacegrey'],
['cellphone', 'iPhone', '4gb', 't2', 'rose gold'],
['laptop', 'macbook air', '8gb', 'i5', 'black'],
['router', 'huawei', '10x10', 'white'],
['laptop', 'macbook', '8gb', 'i5', 'silve'],
}


And I wish to find them based on index number and value, i.e.
Find the entry where first element in any of the val_arr is laptop and 3rd element's value is 8gb.



I tried looking at composite indexes in mongodb, but they have a limit of 32 keys to be indexed. Any help in this direction is appreciated.







mongodb multidimensional-array mongodb-query






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 0:10









Neil Lunn

97.1k22170181




97.1k22170181










asked Nov 14 '18 at 18:37









Nishutosh Sharma

1,12921527




1,12921527












  • Is there something in the provided answer that you believe does not address your question? If so then please comment on the answer to clarify what exactly needs to be addressed that has not. If it does in fact answer the question you asked then please note to Accept your Answers to the questions you ask
    – Neil Lunn
    Nov 19 '18 at 3:43


















  • Is there something in the provided answer that you believe does not address your question? If so then please comment on the answer to clarify what exactly needs to be addressed that has not. If it does in fact answer the question you asked then please note to Accept your Answers to the questions you ask
    – Neil Lunn
    Nov 19 '18 at 3:43
















Is there something in the provided answer that you believe does not address your question? If so then please comment on the answer to clarify what exactly needs to be addressed that has not. If it does in fact answer the question you asked then please note to Accept your Answers to the questions you ask
– Neil Lunn
Nov 19 '18 at 3:43




Is there something in the provided answer that you believe does not address your question? If so then please comment on the answer to clarify what exactly needs to be addressed that has not. If it does in fact answer the question you asked then please note to Accept your Answers to the questions you ask
– Neil Lunn
Nov 19 '18 at 3:43












1 Answer
1






active

oldest

votes


















3














There is a limit on indexes here but it really should not matter. In your case you actually say 'key': <unique key>. So if that really is "unique" then it's the only thing in the collection that need be indexed, as long as you actually include that "key" as part of every query you make since this will determine you to select a document.



Indexes on arrays "within" a document really don't matter that much unless you actually intend to search directly for those elements within a document. That might be the case, but this actually has no bearing on matching your values by numbered index positions:



db.collection.find(
{
"val_arr": {
"$elemMatch": { "0": "laptop", "2": "8gb" }
}
},
{ "val_arr.$": 1 }
)


Which would return:



{
"val_arr" : [
[
"laptop",
"macbook air",
"8gb",
"i5",
"black"
]
]
}


The $elemMatch allows you to express "multiple conditions" on the same array element. This is needed over standard dot notation forms because otherwise the condition is simply looking for "any" array member which matches the value at the index. For instance:



db.collection.find({ "val_arr.0": "laptop", "val_arr.2": "4gb" })


Actually matches the given document even though that "combination" does not exist on a single "row", but both values are actually present in the array as a whole. But just in different members. Using those same values with $elemMatch makes sure the pair is matched on the same element.



Note the { "val_arr.$": 1 } in the above example, which is the projection for the "single" matched element. That is optional, but this is just to talk about identifying the matches.



Using .find() this is as much as you can do and is a limitation of the positional operator in that it can only identify one matching element. The way to do this for "multiple matches" is to use aggregate() with $filter:



db.collection.aggregate([
{ "$match": {
"val_arr": {
"$elemMatch": { "0": "laptop", "2": "8gb" }
}
}},
{ "$addFields": {
"val_arr": {
"$filter": {
"input": "$val_arr",
"cond": {
"$and": [
{ "$eq": [ { "$arrayElemAt": [ "$$this", 0 ] }, "laptop" ] },
{ "$eq": [ { "$arrayElemAt": [ "$$this", 2 ] }, "8gb" ] }
]
}
}
}
}}
])


Which returns:



{
"key" : "k",
"val_arr" : [
[
"laptop",
"macbook air",
"8gb",
"i5",
"black"
],
[
"laptop",
"macbook",
"8gb",
"i5",
"silve"
]
]
}


The initial query conditions which actually select the matching document go into the $match and are exactly the same as the query conditions shown earlier. The $filter is applied to just get the elements which actually match it's conditions. Those conditions do a similar usage of $arrayElemAt inside the logical expression as to how the index values of "0" and "2" are applies in the query conditions itself.



Using any aggregation expression incurs an additional cost over the standard query engine capabilities. So it is always best to consider if you really need it before you dive and and use the statement. Regular query expressions are always better as long as they do the job.



Changing Structure



Of course whilst it's possible to match on index positions of an array, none of this actually helps in being able to actually create an "index" which can be used to speed up queries.



The best course here is to actually use meaningful property names instead of plain arrays:



{
'key': "k",
'val_arr': [
{
'type': 'laptop',
'name': 'macbook pro',
'memory': '16gb',
'processor': 'i9',
'color': 'spacegrey'
},
{
'type': 'cellphone',
'name': 'iPhone',
'memory': '4gb',
'processor': 't2',
'color': 'rose gold'
},
{
'type': 'laptop',
'name': 'macbook air',
'memory': '8gb',
'processor': 'i5',
'color': 'black'
},
{
'type':'router',
'name': 'huawei',
'size': '10x10',
'color': 'white'
},
{
'type': 'laptop',
'name': 'macbook',
'memory': '8gb',
'processor': 'i5',
'color': 'silve'
}
]
}


This does allow you "within reason" to include the paths to property names within the array as part of a compound index. For example:



db.collection.createIndex({ "val_arr.type": 1, "val_arr.memory": 1 })


And then actually issuing queries looks far more descriptive in the code than cryptic values of 0 and 2:



db.collection.aggregate([
{ "$match": {
"val_arr": {
"$elemMatch": { "type": "laptop", "memory": "8gb" }
}
}},
{ "$addFields": {
"val_arr": {
"$filter": {
"input": "$val_arr",
"cond": {
"$and": [
{ "$eq": [ "$$this.type", "laptop" ] },
{ "$eq": [ "$$this.memory", "8gb" ] }
]
}
}
}
}}
])


Expected results, and more meaningful:



{
"key" : "k",
"val_arr" : [
{
"type" : "laptop",
"name" : "macbook air",
"memory" : "8gb",
"processor" : "i5",
"color" : "black"
},
{
"type" : "laptop",
"name" : "macbook",
"memory" : "8gb",
"processor" : "i5",
"color" : "silve"
}
]
}


The common reason most people arrive at a structure like you have in the question is typically because they think they are saving space. This is not simply not true, and with most modern optimizations to the storage engines MongoDB uses it's basically irrelevant over any small gains that might have been anticipated.



Therefore, for the sake of "clarity" and also in order to actually support indexing on the data within your "arrays" you really should be changing the structure and use named properties here instead.



And again, if your entire usage pattern of this data is not using the key property of the document in queries, then it probably would be better to store those entries as separate documents to begin with instead of being in an array at all. That also makes getting results more efficient.



So to break that all down your options here really are:




  • You actually always include key as part of your query, so indexes anywhere else but on that property do not matter.

  • You change to using named properties for the values on the array members allowing you to index on those properties without hitting "Multikey limitations"

  • You decide you never access this data using the key anyway, so you just write all the array data as separate documents in the collection with proper named properties.


Going with one of those that actually suits your needs best is essentially the solution allowing you to efficiently deal with the sort of data you have.




N.B Nothing to do with the topic at hand really ( except maybe a note on storage size ), but it would generally be recommended that things with an inherent numeric value such as the memory or "8gb" types of data actually be expressed as numeric rather than "strings".



The simple reasoning is that whilst you can query for "8gb" as an equality, this does not help you with ranges such as "between 4 and 12 gigabytes.



Therefore it usually makes much more sense to use numeric values like 8 or even 8000. Note that numeric values will actually have an impact on storage in that they will typically take less space than strings. Which given that the omission of property names may have been attempting to reduce storage but does nothing, does show an actual area where storage size can be reduced as well.







share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306752%2fsearch-values-using-index-in-mongodb%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    There is a limit on indexes here but it really should not matter. In your case you actually say 'key': <unique key>. So if that really is "unique" then it's the only thing in the collection that need be indexed, as long as you actually include that "key" as part of every query you make since this will determine you to select a document.



    Indexes on arrays "within" a document really don't matter that much unless you actually intend to search directly for those elements within a document. That might be the case, but this actually has no bearing on matching your values by numbered index positions:



    db.collection.find(
    {
    "val_arr": {
    "$elemMatch": { "0": "laptop", "2": "8gb" }
    }
    },
    { "val_arr.$": 1 }
    )


    Which would return:



    {
    "val_arr" : [
    [
    "laptop",
    "macbook air",
    "8gb",
    "i5",
    "black"
    ]
    ]
    }


    The $elemMatch allows you to express "multiple conditions" on the same array element. This is needed over standard dot notation forms because otherwise the condition is simply looking for "any" array member which matches the value at the index. For instance:



    db.collection.find({ "val_arr.0": "laptop", "val_arr.2": "4gb" })


    Actually matches the given document even though that "combination" does not exist on a single "row", but both values are actually present in the array as a whole. But just in different members. Using those same values with $elemMatch makes sure the pair is matched on the same element.



    Note the { "val_arr.$": 1 } in the above example, which is the projection for the "single" matched element. That is optional, but this is just to talk about identifying the matches.



    Using .find() this is as much as you can do and is a limitation of the positional operator in that it can only identify one matching element. The way to do this for "multiple matches" is to use aggregate() with $filter:



    db.collection.aggregate([
    { "$match": {
    "val_arr": {
    "$elemMatch": { "0": "laptop", "2": "8gb" }
    }
    }},
    { "$addFields": {
    "val_arr": {
    "$filter": {
    "input": "$val_arr",
    "cond": {
    "$and": [
    { "$eq": [ { "$arrayElemAt": [ "$$this", 0 ] }, "laptop" ] },
    { "$eq": [ { "$arrayElemAt": [ "$$this", 2 ] }, "8gb" ] }
    ]
    }
    }
    }
    }}
    ])


    Which returns:



    {
    "key" : "k",
    "val_arr" : [
    [
    "laptop",
    "macbook air",
    "8gb",
    "i5",
    "black"
    ],
    [
    "laptop",
    "macbook",
    "8gb",
    "i5",
    "silve"
    ]
    ]
    }


    The initial query conditions which actually select the matching document go into the $match and are exactly the same as the query conditions shown earlier. The $filter is applied to just get the elements which actually match it's conditions. Those conditions do a similar usage of $arrayElemAt inside the logical expression as to how the index values of "0" and "2" are applies in the query conditions itself.



    Using any aggregation expression incurs an additional cost over the standard query engine capabilities. So it is always best to consider if you really need it before you dive and and use the statement. Regular query expressions are always better as long as they do the job.



    Changing Structure



    Of course whilst it's possible to match on index positions of an array, none of this actually helps in being able to actually create an "index" which can be used to speed up queries.



    The best course here is to actually use meaningful property names instead of plain arrays:



    {
    'key': "k",
    'val_arr': [
    {
    'type': 'laptop',
    'name': 'macbook pro',
    'memory': '16gb',
    'processor': 'i9',
    'color': 'spacegrey'
    },
    {
    'type': 'cellphone',
    'name': 'iPhone',
    'memory': '4gb',
    'processor': 't2',
    'color': 'rose gold'
    },
    {
    'type': 'laptop',
    'name': 'macbook air',
    'memory': '8gb',
    'processor': 'i5',
    'color': 'black'
    },
    {
    'type':'router',
    'name': 'huawei',
    'size': '10x10',
    'color': 'white'
    },
    {
    'type': 'laptop',
    'name': 'macbook',
    'memory': '8gb',
    'processor': 'i5',
    'color': 'silve'
    }
    ]
    }


    This does allow you "within reason" to include the paths to property names within the array as part of a compound index. For example:



    db.collection.createIndex({ "val_arr.type": 1, "val_arr.memory": 1 })


    And then actually issuing queries looks far more descriptive in the code than cryptic values of 0 and 2:



    db.collection.aggregate([
    { "$match": {
    "val_arr": {
    "$elemMatch": { "type": "laptop", "memory": "8gb" }
    }
    }},
    { "$addFields": {
    "val_arr": {
    "$filter": {
    "input": "$val_arr",
    "cond": {
    "$and": [
    { "$eq": [ "$$this.type", "laptop" ] },
    { "$eq": [ "$$this.memory", "8gb" ] }
    ]
    }
    }
    }
    }}
    ])


    Expected results, and more meaningful:



    {
    "key" : "k",
    "val_arr" : [
    {
    "type" : "laptop",
    "name" : "macbook air",
    "memory" : "8gb",
    "processor" : "i5",
    "color" : "black"
    },
    {
    "type" : "laptop",
    "name" : "macbook",
    "memory" : "8gb",
    "processor" : "i5",
    "color" : "silve"
    }
    ]
    }


    The common reason most people arrive at a structure like you have in the question is typically because they think they are saving space. This is not simply not true, and with most modern optimizations to the storage engines MongoDB uses it's basically irrelevant over any small gains that might have been anticipated.



    Therefore, for the sake of "clarity" and also in order to actually support indexing on the data within your "arrays" you really should be changing the structure and use named properties here instead.



    And again, if your entire usage pattern of this data is not using the key property of the document in queries, then it probably would be better to store those entries as separate documents to begin with instead of being in an array at all. That also makes getting results more efficient.



    So to break that all down your options here really are:




    • You actually always include key as part of your query, so indexes anywhere else but on that property do not matter.

    • You change to using named properties for the values on the array members allowing you to index on those properties without hitting "Multikey limitations"

    • You decide you never access this data using the key anyway, so you just write all the array data as separate documents in the collection with proper named properties.


    Going with one of those that actually suits your needs best is essentially the solution allowing you to efficiently deal with the sort of data you have.




    N.B Nothing to do with the topic at hand really ( except maybe a note on storage size ), but it would generally be recommended that things with an inherent numeric value such as the memory or "8gb" types of data actually be expressed as numeric rather than "strings".



    The simple reasoning is that whilst you can query for "8gb" as an equality, this does not help you with ranges such as "between 4 and 12 gigabytes.



    Therefore it usually makes much more sense to use numeric values like 8 or even 8000. Note that numeric values will actually have an impact on storage in that they will typically take less space than strings. Which given that the omission of property names may have been attempting to reduce storage but does nothing, does show an actual area where storage size can be reduced as well.







    share|improve this answer




























      3














      There is a limit on indexes here but it really should not matter. In your case you actually say 'key': <unique key>. So if that really is "unique" then it's the only thing in the collection that need be indexed, as long as you actually include that "key" as part of every query you make since this will determine you to select a document.



      Indexes on arrays "within" a document really don't matter that much unless you actually intend to search directly for those elements within a document. That might be the case, but this actually has no bearing on matching your values by numbered index positions:



      db.collection.find(
      {
      "val_arr": {
      "$elemMatch": { "0": "laptop", "2": "8gb" }
      }
      },
      { "val_arr.$": 1 }
      )


      Which would return:



      {
      "val_arr" : [
      [
      "laptop",
      "macbook air",
      "8gb",
      "i5",
      "black"
      ]
      ]
      }


      The $elemMatch allows you to express "multiple conditions" on the same array element. This is needed over standard dot notation forms because otherwise the condition is simply looking for "any" array member which matches the value at the index. For instance:



      db.collection.find({ "val_arr.0": "laptop", "val_arr.2": "4gb" })


      Actually matches the given document even though that "combination" does not exist on a single "row", but both values are actually present in the array as a whole. But just in different members. Using those same values with $elemMatch makes sure the pair is matched on the same element.



      Note the { "val_arr.$": 1 } in the above example, which is the projection for the "single" matched element. That is optional, but this is just to talk about identifying the matches.



      Using .find() this is as much as you can do and is a limitation of the positional operator in that it can only identify one matching element. The way to do this for "multiple matches" is to use aggregate() with $filter:



      db.collection.aggregate([
      { "$match": {
      "val_arr": {
      "$elemMatch": { "0": "laptop", "2": "8gb" }
      }
      }},
      { "$addFields": {
      "val_arr": {
      "$filter": {
      "input": "$val_arr",
      "cond": {
      "$and": [
      { "$eq": [ { "$arrayElemAt": [ "$$this", 0 ] }, "laptop" ] },
      { "$eq": [ { "$arrayElemAt": [ "$$this", 2 ] }, "8gb" ] }
      ]
      }
      }
      }
      }}
      ])


      Which returns:



      {
      "key" : "k",
      "val_arr" : [
      [
      "laptop",
      "macbook air",
      "8gb",
      "i5",
      "black"
      ],
      [
      "laptop",
      "macbook",
      "8gb",
      "i5",
      "silve"
      ]
      ]
      }


      The initial query conditions which actually select the matching document go into the $match and are exactly the same as the query conditions shown earlier. The $filter is applied to just get the elements which actually match it's conditions. Those conditions do a similar usage of $arrayElemAt inside the logical expression as to how the index values of "0" and "2" are applies in the query conditions itself.



      Using any aggregation expression incurs an additional cost over the standard query engine capabilities. So it is always best to consider if you really need it before you dive and and use the statement. Regular query expressions are always better as long as they do the job.



      Changing Structure



      Of course whilst it's possible to match on index positions of an array, none of this actually helps in being able to actually create an "index" which can be used to speed up queries.



      The best course here is to actually use meaningful property names instead of plain arrays:



      {
      'key': "k",
      'val_arr': [
      {
      'type': 'laptop',
      'name': 'macbook pro',
      'memory': '16gb',
      'processor': 'i9',
      'color': 'spacegrey'
      },
      {
      'type': 'cellphone',
      'name': 'iPhone',
      'memory': '4gb',
      'processor': 't2',
      'color': 'rose gold'
      },
      {
      'type': 'laptop',
      'name': 'macbook air',
      'memory': '8gb',
      'processor': 'i5',
      'color': 'black'
      },
      {
      'type':'router',
      'name': 'huawei',
      'size': '10x10',
      'color': 'white'
      },
      {
      'type': 'laptop',
      'name': 'macbook',
      'memory': '8gb',
      'processor': 'i5',
      'color': 'silve'
      }
      ]
      }


      This does allow you "within reason" to include the paths to property names within the array as part of a compound index. For example:



      db.collection.createIndex({ "val_arr.type": 1, "val_arr.memory": 1 })


      And then actually issuing queries looks far more descriptive in the code than cryptic values of 0 and 2:



      db.collection.aggregate([
      { "$match": {
      "val_arr": {
      "$elemMatch": { "type": "laptop", "memory": "8gb" }
      }
      }},
      { "$addFields": {
      "val_arr": {
      "$filter": {
      "input": "$val_arr",
      "cond": {
      "$and": [
      { "$eq": [ "$$this.type", "laptop" ] },
      { "$eq": [ "$$this.memory", "8gb" ] }
      ]
      }
      }
      }
      }}
      ])


      Expected results, and more meaningful:



      {
      "key" : "k",
      "val_arr" : [
      {
      "type" : "laptop",
      "name" : "macbook air",
      "memory" : "8gb",
      "processor" : "i5",
      "color" : "black"
      },
      {
      "type" : "laptop",
      "name" : "macbook",
      "memory" : "8gb",
      "processor" : "i5",
      "color" : "silve"
      }
      ]
      }


      The common reason most people arrive at a structure like you have in the question is typically because they think they are saving space. This is not simply not true, and with most modern optimizations to the storage engines MongoDB uses it's basically irrelevant over any small gains that might have been anticipated.



      Therefore, for the sake of "clarity" and also in order to actually support indexing on the data within your "arrays" you really should be changing the structure and use named properties here instead.



      And again, if your entire usage pattern of this data is not using the key property of the document in queries, then it probably would be better to store those entries as separate documents to begin with instead of being in an array at all. That also makes getting results more efficient.



      So to break that all down your options here really are:




      • You actually always include key as part of your query, so indexes anywhere else but on that property do not matter.

      • You change to using named properties for the values on the array members allowing you to index on those properties without hitting "Multikey limitations"

      • You decide you never access this data using the key anyway, so you just write all the array data as separate documents in the collection with proper named properties.


      Going with one of those that actually suits your needs best is essentially the solution allowing you to efficiently deal with the sort of data you have.




      N.B Nothing to do with the topic at hand really ( except maybe a note on storage size ), but it would generally be recommended that things with an inherent numeric value such as the memory or "8gb" types of data actually be expressed as numeric rather than "strings".



      The simple reasoning is that whilst you can query for "8gb" as an equality, this does not help you with ranges such as "between 4 and 12 gigabytes.



      Therefore it usually makes much more sense to use numeric values like 8 or even 8000. Note that numeric values will actually have an impact on storage in that they will typically take less space than strings. Which given that the omission of property names may have been attempting to reduce storage but does nothing, does show an actual area where storage size can be reduced as well.







      share|improve this answer


























        3












        3








        3






        There is a limit on indexes here but it really should not matter. In your case you actually say 'key': <unique key>. So if that really is "unique" then it's the only thing in the collection that need be indexed, as long as you actually include that "key" as part of every query you make since this will determine you to select a document.



        Indexes on arrays "within" a document really don't matter that much unless you actually intend to search directly for those elements within a document. That might be the case, but this actually has no bearing on matching your values by numbered index positions:



        db.collection.find(
        {
        "val_arr": {
        "$elemMatch": { "0": "laptop", "2": "8gb" }
        }
        },
        { "val_arr.$": 1 }
        )


        Which would return:



        {
        "val_arr" : [
        [
        "laptop",
        "macbook air",
        "8gb",
        "i5",
        "black"
        ]
        ]
        }


        The $elemMatch allows you to express "multiple conditions" on the same array element. This is needed over standard dot notation forms because otherwise the condition is simply looking for "any" array member which matches the value at the index. For instance:



        db.collection.find({ "val_arr.0": "laptop", "val_arr.2": "4gb" })


        Actually matches the given document even though that "combination" does not exist on a single "row", but both values are actually present in the array as a whole. But just in different members. Using those same values with $elemMatch makes sure the pair is matched on the same element.



        Note the { "val_arr.$": 1 } in the above example, which is the projection for the "single" matched element. That is optional, but this is just to talk about identifying the matches.



        Using .find() this is as much as you can do and is a limitation of the positional operator in that it can only identify one matching element. The way to do this for "multiple matches" is to use aggregate() with $filter:



        db.collection.aggregate([
        { "$match": {
        "val_arr": {
        "$elemMatch": { "0": "laptop", "2": "8gb" }
        }
        }},
        { "$addFields": {
        "val_arr": {
        "$filter": {
        "input": "$val_arr",
        "cond": {
        "$and": [
        { "$eq": [ { "$arrayElemAt": [ "$$this", 0 ] }, "laptop" ] },
        { "$eq": [ { "$arrayElemAt": [ "$$this", 2 ] }, "8gb" ] }
        ]
        }
        }
        }
        }}
        ])


        Which returns:



        {
        "key" : "k",
        "val_arr" : [
        [
        "laptop",
        "macbook air",
        "8gb",
        "i5",
        "black"
        ],
        [
        "laptop",
        "macbook",
        "8gb",
        "i5",
        "silve"
        ]
        ]
        }


        The initial query conditions which actually select the matching document go into the $match and are exactly the same as the query conditions shown earlier. The $filter is applied to just get the elements which actually match it's conditions. Those conditions do a similar usage of $arrayElemAt inside the logical expression as to how the index values of "0" and "2" are applies in the query conditions itself.



        Using any aggregation expression incurs an additional cost over the standard query engine capabilities. So it is always best to consider if you really need it before you dive and and use the statement. Regular query expressions are always better as long as they do the job.



        Changing Structure



        Of course whilst it's possible to match on index positions of an array, none of this actually helps in being able to actually create an "index" which can be used to speed up queries.



        The best course here is to actually use meaningful property names instead of plain arrays:



        {
        'key': "k",
        'val_arr': [
        {
        'type': 'laptop',
        'name': 'macbook pro',
        'memory': '16gb',
        'processor': 'i9',
        'color': 'spacegrey'
        },
        {
        'type': 'cellphone',
        'name': 'iPhone',
        'memory': '4gb',
        'processor': 't2',
        'color': 'rose gold'
        },
        {
        'type': 'laptop',
        'name': 'macbook air',
        'memory': '8gb',
        'processor': 'i5',
        'color': 'black'
        },
        {
        'type':'router',
        'name': 'huawei',
        'size': '10x10',
        'color': 'white'
        },
        {
        'type': 'laptop',
        'name': 'macbook',
        'memory': '8gb',
        'processor': 'i5',
        'color': 'silve'
        }
        ]
        }


        This does allow you "within reason" to include the paths to property names within the array as part of a compound index. For example:



        db.collection.createIndex({ "val_arr.type": 1, "val_arr.memory": 1 })


        And then actually issuing queries looks far more descriptive in the code than cryptic values of 0 and 2:



        db.collection.aggregate([
        { "$match": {
        "val_arr": {
        "$elemMatch": { "type": "laptop", "memory": "8gb" }
        }
        }},
        { "$addFields": {
        "val_arr": {
        "$filter": {
        "input": "$val_arr",
        "cond": {
        "$and": [
        { "$eq": [ "$$this.type", "laptop" ] },
        { "$eq": [ "$$this.memory", "8gb" ] }
        ]
        }
        }
        }
        }}
        ])


        Expected results, and more meaningful:



        {
        "key" : "k",
        "val_arr" : [
        {
        "type" : "laptop",
        "name" : "macbook air",
        "memory" : "8gb",
        "processor" : "i5",
        "color" : "black"
        },
        {
        "type" : "laptop",
        "name" : "macbook",
        "memory" : "8gb",
        "processor" : "i5",
        "color" : "silve"
        }
        ]
        }


        The common reason most people arrive at a structure like you have in the question is typically because they think they are saving space. This is not simply not true, and with most modern optimizations to the storage engines MongoDB uses it's basically irrelevant over any small gains that might have been anticipated.



        Therefore, for the sake of "clarity" and also in order to actually support indexing on the data within your "arrays" you really should be changing the structure and use named properties here instead.



        And again, if your entire usage pattern of this data is not using the key property of the document in queries, then it probably would be better to store those entries as separate documents to begin with instead of being in an array at all. That also makes getting results more efficient.



        So to break that all down your options here really are:




        • You actually always include key as part of your query, so indexes anywhere else but on that property do not matter.

        • You change to using named properties for the values on the array members allowing you to index on those properties without hitting "Multikey limitations"

        • You decide you never access this data using the key anyway, so you just write all the array data as separate documents in the collection with proper named properties.


        Going with one of those that actually suits your needs best is essentially the solution allowing you to efficiently deal with the sort of data you have.




        N.B Nothing to do with the topic at hand really ( except maybe a note on storage size ), but it would generally be recommended that things with an inherent numeric value such as the memory or "8gb" types of data actually be expressed as numeric rather than "strings".



        The simple reasoning is that whilst you can query for "8gb" as an equality, this does not help you with ranges such as "between 4 and 12 gigabytes.



        Therefore it usually makes much more sense to use numeric values like 8 or even 8000. Note that numeric values will actually have an impact on storage in that they will typically take less space than strings. Which given that the omission of property names may have been attempting to reduce storage but does nothing, does show an actual area where storage size can be reduced as well.







        share|improve this answer














        There is a limit on indexes here but it really should not matter. In your case you actually say 'key': <unique key>. So if that really is "unique" then it's the only thing in the collection that need be indexed, as long as you actually include that "key" as part of every query you make since this will determine you to select a document.



        Indexes on arrays "within" a document really don't matter that much unless you actually intend to search directly for those elements within a document. That might be the case, but this actually has no bearing on matching your values by numbered index positions:



        db.collection.find(
        {
        "val_arr": {
        "$elemMatch": { "0": "laptop", "2": "8gb" }
        }
        },
        { "val_arr.$": 1 }
        )


        Which would return:



        {
        "val_arr" : [
        [
        "laptop",
        "macbook air",
        "8gb",
        "i5",
        "black"
        ]
        ]
        }


        The $elemMatch allows you to express "multiple conditions" on the same array element. This is needed over standard dot notation forms because otherwise the condition is simply looking for "any" array member which matches the value at the index. For instance:



        db.collection.find({ "val_arr.0": "laptop", "val_arr.2": "4gb" })


        Actually matches the given document even though that "combination" does not exist on a single "row", but both values are actually present in the array as a whole. But just in different members. Using those same values with $elemMatch makes sure the pair is matched on the same element.



        Note the { "val_arr.$": 1 } in the above example, which is the projection for the "single" matched element. That is optional, but this is just to talk about identifying the matches.



        Using .find() this is as much as you can do and is a limitation of the positional operator in that it can only identify one matching element. The way to do this for "multiple matches" is to use aggregate() with $filter:



        db.collection.aggregate([
        { "$match": {
        "val_arr": {
        "$elemMatch": { "0": "laptop", "2": "8gb" }
        }
        }},
        { "$addFields": {
        "val_arr": {
        "$filter": {
        "input": "$val_arr",
        "cond": {
        "$and": [
        { "$eq": [ { "$arrayElemAt": [ "$$this", 0 ] }, "laptop" ] },
        { "$eq": [ { "$arrayElemAt": [ "$$this", 2 ] }, "8gb" ] }
        ]
        }
        }
        }
        }}
        ])


        Which returns:



        {
        "key" : "k",
        "val_arr" : [
        [
        "laptop",
        "macbook air",
        "8gb",
        "i5",
        "black"
        ],
        [
        "laptop",
        "macbook",
        "8gb",
        "i5",
        "silve"
        ]
        ]
        }


        The initial query conditions which actually select the matching document go into the $match and are exactly the same as the query conditions shown earlier. The $filter is applied to just get the elements which actually match it's conditions. Those conditions do a similar usage of $arrayElemAt inside the logical expression as to how the index values of "0" and "2" are applies in the query conditions itself.



        Using any aggregation expression incurs an additional cost over the standard query engine capabilities. So it is always best to consider if you really need it before you dive and and use the statement. Regular query expressions are always better as long as they do the job.



        Changing Structure



        Of course whilst it's possible to match on index positions of an array, none of this actually helps in being able to actually create an "index" which can be used to speed up queries.



        The best course here is to actually use meaningful property names instead of plain arrays:



        {
        'key': "k",
        'val_arr': [
        {
        'type': 'laptop',
        'name': 'macbook pro',
        'memory': '16gb',
        'processor': 'i9',
        'color': 'spacegrey'
        },
        {
        'type': 'cellphone',
        'name': 'iPhone',
        'memory': '4gb',
        'processor': 't2',
        'color': 'rose gold'
        },
        {
        'type': 'laptop',
        'name': 'macbook air',
        'memory': '8gb',
        'processor': 'i5',
        'color': 'black'
        },
        {
        'type':'router',
        'name': 'huawei',
        'size': '10x10',
        'color': 'white'
        },
        {
        'type': 'laptop',
        'name': 'macbook',
        'memory': '8gb',
        'processor': 'i5',
        'color': 'silve'
        }
        ]
        }


        This does allow you "within reason" to include the paths to property names within the array as part of a compound index. For example:



        db.collection.createIndex({ "val_arr.type": 1, "val_arr.memory": 1 })


        And then actually issuing queries looks far more descriptive in the code than cryptic values of 0 and 2:



        db.collection.aggregate([
        { "$match": {
        "val_arr": {
        "$elemMatch": { "type": "laptop", "memory": "8gb" }
        }
        }},
        { "$addFields": {
        "val_arr": {
        "$filter": {
        "input": "$val_arr",
        "cond": {
        "$and": [
        { "$eq": [ "$$this.type", "laptop" ] },
        { "$eq": [ "$$this.memory", "8gb" ] }
        ]
        }
        }
        }
        }}
        ])


        Expected results, and more meaningful:



        {
        "key" : "k",
        "val_arr" : [
        {
        "type" : "laptop",
        "name" : "macbook air",
        "memory" : "8gb",
        "processor" : "i5",
        "color" : "black"
        },
        {
        "type" : "laptop",
        "name" : "macbook",
        "memory" : "8gb",
        "processor" : "i5",
        "color" : "silve"
        }
        ]
        }


        The common reason most people arrive at a structure like you have in the question is typically because they think they are saving space. This is not simply not true, and with most modern optimizations to the storage engines MongoDB uses it's basically irrelevant over any small gains that might have been anticipated.



        Therefore, for the sake of "clarity" and also in order to actually support indexing on the data within your "arrays" you really should be changing the structure and use named properties here instead.



        And again, if your entire usage pattern of this data is not using the key property of the document in queries, then it probably would be better to store those entries as separate documents to begin with instead of being in an array at all. That also makes getting results more efficient.



        So to break that all down your options here really are:




        • You actually always include key as part of your query, so indexes anywhere else but on that property do not matter.

        • You change to using named properties for the values on the array members allowing you to index on those properties without hitting "Multikey limitations"

        • You decide you never access this data using the key anyway, so you just write all the array data as separate documents in the collection with proper named properties.


        Going with one of those that actually suits your needs best is essentially the solution allowing you to efficiently deal with the sort of data you have.




        N.B Nothing to do with the topic at hand really ( except maybe a note on storage size ), but it would generally be recommended that things with an inherent numeric value such as the memory or "8gb" types of data actually be expressed as numeric rather than "strings".



        The simple reasoning is that whilst you can query for "8gb" as an equality, this does not help you with ranges such as "between 4 and 12 gigabytes.



        Therefore it usually makes much more sense to use numeric values like 8 or even 8000. Note that numeric values will actually have an impact on storage in that they will typically take less space than strings. Which given that the omission of property names may have been attempting to reduce storage but does nothing, does show an actual area where storage size can be reduced as well.








        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 15 '18 at 1:26

























        answered Nov 15 '18 at 0:17









        Neil Lunn

        97.1k22170181




        97.1k22170181






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53306752%2fsearch-values-using-index-in-mongodb%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Guess what letter conforming each word

            Run scheduled task as local user group (not BUILTIN)

            Port of Spain