EvalSpec input_fn : case where the feature is an array












0















I have some trouble understanding some details of the Estimator API and tf.estimator.EvalSpec.
In an EvalSpec, the user is supposed to give a input_fn. A call to input_fn is supposed to return A tuple (features, labels) .
As far as I understand, the features can be a dictionary keyed by "feature name" and whose values are a tensor of values. For instance, if I have a batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1), with key weight, and with all the weights for all the examples, right?



However:




  • what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?


And the question I'm mostly interested in:




  • what if my initial feature is a variable-length array ? For instance my feature could be "prices of all purchased products", and it would be a variable-length array of doubles (this correspond to tf.io.VarLenFeature in feature specs). How can I send several examples of this via the input_fn ?


Are these types of features "compatible" with the Estimator API ?



thanks!










share|improve this question



























    0















    I have some trouble understanding some details of the Estimator API and tf.estimator.EvalSpec.
    In an EvalSpec, the user is supposed to give a input_fn. A call to input_fn is supposed to return A tuple (features, labels) .
    As far as I understand, the features can be a dictionary keyed by "feature name" and whose values are a tensor of values. For instance, if I have a batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1), with key weight, and with all the weights for all the examples, right?



    However:




    • what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?


    And the question I'm mostly interested in:




    • what if my initial feature is a variable-length array ? For instance my feature could be "prices of all purchased products", and it would be a variable-length array of doubles (this correspond to tf.io.VarLenFeature in feature specs). How can I send several examples of this via the input_fn ?


    Are these types of features "compatible" with the Estimator API ?



    thanks!










    share|improve this question

























      0












      0








      0








      I have some trouble understanding some details of the Estimator API and tf.estimator.EvalSpec.
      In an EvalSpec, the user is supposed to give a input_fn. A call to input_fn is supposed to return A tuple (features, labels) .
      As far as I understand, the features can be a dictionary keyed by "feature name" and whose values are a tensor of values. For instance, if I have a batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1), with key weight, and with all the weights for all the examples, right?



      However:




      • what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?


      And the question I'm mostly interested in:




      • what if my initial feature is a variable-length array ? For instance my feature could be "prices of all purchased products", and it would be a variable-length array of doubles (this correspond to tf.io.VarLenFeature in feature specs). How can I send several examples of this via the input_fn ?


      Are these types of features "compatible" with the Estimator API ?



      thanks!










      share|improve this question














      I have some trouble understanding some details of the Estimator API and tf.estimator.EvalSpec.
      In an EvalSpec, the user is supposed to give a input_fn. A call to input_fn is supposed to return A tuple (features, labels) .
      As far as I understand, the features can be a dictionary keyed by "feature name" and whose values are a tensor of values. For instance, if I have a batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1), with key weight, and with all the weights for all the examples, right?



      However:




      • what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?


      And the question I'm mostly interested in:




      • what if my initial feature is a variable-length array ? For instance my feature could be "prices of all purchased products", and it would be a variable-length array of doubles (this correspond to tf.io.VarLenFeature in feature specs). How can I send several examples of this via the input_fn ?


      Are these types of features "compatible" with the Estimator API ?



      thanks!







      tensorflow tensorflow-datasets tensorflow-estimator






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 '18 at 20:42









      lezebulonlezebulon

      3,28373063




      3,28373063
























          1 Answer
          1






          active

          oldest

          votes


















          0














          I am also new to the Estimator API, but I have learned quite a lot with the S.O. community and will try to answer your question.



          First, I would like to point you to this colab. This is currently the convention I use for my Estimators.



          You are correct in that the input_fn for both the TRAIN and EVAL modes are meant to be tuples in form (features, labels).



          So let us tackle your first question:




          what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?




          Well this requires me to back track a bit, to your input of:




          batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1),




          To make sure I understand correctly, you are saying that, what if instead of a Tensor with shape [100, 1], you have a Tensor or [100, <size>], in this case 3 doubles, so [100, 3]?



          Well if that is the case, that is no problem at all. In the linked colab a single example of the input has shape [20, 7]. So a Tensor of [3] is straight forward.



          The short answer is that whatever you specify as the features part of the tuple is passed to model_fn. So you want to pass a Tensor of [batch_size, size] you return a tuple of ([batch_size, size], labels). However, as another user pointed out to me on S.O. I will impart you with the same advice - use dictionaries e.g.



          my_data = # Tensor with shape [batch_size, size]
          features = {'my_data': my_data}
          ...
          return (features, labels)


          For reference, let us example the input_fn of the colab, where I do the same things as advised above:



          def input_fn(filenames:list, params):
          mode = params['mode'] if 'mode' in params else 'train'
          batch_size = params['batch_size']


          shuffle(filenames) # <--- far more efficient than tf dataset shuffle
          dataset = tf.data.TFRecordDataset(filenames)

          # using fio's SCHEMA fill the TF Feature placeholders with values
          dataset = dataset.map(lambda record: fio.from_record(record))

          # using fio's SCHEMA restructure and unwrap (if possible) features (because tf records require wrapping everything into a list)
          dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))

          # dataset should be a tuple of (features, labels)
          dataset = dataset.map(lambda context, features: (
          {"input_tensors": features[I_FEATURE]}, # features <--- wrapping it in a dictionary
          features[O_FEATURE] # labels
          )
          )


          For simplicity, I will assume you are using tf.data.Dataset. If your data is not stored as TF Records, you will need to replace line 1.



            1. dataset = tf.data.TFRecordDataset(filenames)
          2. dataset = dataset.map(lambda record: fio.from_record(record))
          3. dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))


          with however you construct your dataset, be it FeatureColumn, from_tensor_slices, etc and remove lines 2 and 3 since you do not need to recover your (Sequence)Example from TF Records.



          Now let us address your second question, variable length arrays.



          It is just the same as above! wrap it in a dictionary and return it.



          This is true with the notable exception of recovering your SequenceExample from TF Records, where you will need VarLenFeature






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401198%2fevalspec-input-fn-case-where-the-feature-is-an-array%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I am also new to the Estimator API, but I have learned quite a lot with the S.O. community and will try to answer your question.



            First, I would like to point you to this colab. This is currently the convention I use for my Estimators.



            You are correct in that the input_fn for both the TRAIN and EVAL modes are meant to be tuples in form (features, labels).



            So let us tackle your first question:




            what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?




            Well this requires me to back track a bit, to your input of:




            batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1),




            To make sure I understand correctly, you are saying that, what if instead of a Tensor with shape [100, 1], you have a Tensor or [100, <size>], in this case 3 doubles, so [100, 3]?



            Well if that is the case, that is no problem at all. In the linked colab a single example of the input has shape [20, 7]. So a Tensor of [3] is straight forward.



            The short answer is that whatever you specify as the features part of the tuple is passed to model_fn. So you want to pass a Tensor of [batch_size, size] you return a tuple of ([batch_size, size], labels). However, as another user pointed out to me on S.O. I will impart you with the same advice - use dictionaries e.g.



            my_data = # Tensor with shape [batch_size, size]
            features = {'my_data': my_data}
            ...
            return (features, labels)


            For reference, let us example the input_fn of the colab, where I do the same things as advised above:



            def input_fn(filenames:list, params):
            mode = params['mode'] if 'mode' in params else 'train'
            batch_size = params['batch_size']


            shuffle(filenames) # <--- far more efficient than tf dataset shuffle
            dataset = tf.data.TFRecordDataset(filenames)

            # using fio's SCHEMA fill the TF Feature placeholders with values
            dataset = dataset.map(lambda record: fio.from_record(record))

            # using fio's SCHEMA restructure and unwrap (if possible) features (because tf records require wrapping everything into a list)
            dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))

            # dataset should be a tuple of (features, labels)
            dataset = dataset.map(lambda context, features: (
            {"input_tensors": features[I_FEATURE]}, # features <--- wrapping it in a dictionary
            features[O_FEATURE] # labels
            )
            )


            For simplicity, I will assume you are using tf.data.Dataset. If your data is not stored as TF Records, you will need to replace line 1.



              1. dataset = tf.data.TFRecordDataset(filenames)
            2. dataset = dataset.map(lambda record: fio.from_record(record))
            3. dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))


            with however you construct your dataset, be it FeatureColumn, from_tensor_slices, etc and remove lines 2 and 3 since you do not need to recover your (Sequence)Example from TF Records.



            Now let us address your second question, variable length arrays.



            It is just the same as above! wrap it in a dictionary and return it.



            This is true with the notable exception of recovering your SequenceExample from TF Records, where you will need VarLenFeature






            share|improve this answer




























              0














              I am also new to the Estimator API, but I have learned quite a lot with the S.O. community and will try to answer your question.



              First, I would like to point you to this colab. This is currently the convention I use for my Estimators.



              You are correct in that the input_fn for both the TRAIN and EVAL modes are meant to be tuples in form (features, labels).



              So let us tackle your first question:




              what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?




              Well this requires me to back track a bit, to your input of:




              batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1),




              To make sure I understand correctly, you are saying that, what if instead of a Tensor with shape [100, 1], you have a Tensor or [100, <size>], in this case 3 doubles, so [100, 3]?



              Well if that is the case, that is no problem at all. In the linked colab a single example of the input has shape [20, 7]. So a Tensor of [3] is straight forward.



              The short answer is that whatever you specify as the features part of the tuple is passed to model_fn. So you want to pass a Tensor of [batch_size, size] you return a tuple of ([batch_size, size], labels). However, as another user pointed out to me on S.O. I will impart you with the same advice - use dictionaries e.g.



              my_data = # Tensor with shape [batch_size, size]
              features = {'my_data': my_data}
              ...
              return (features, labels)


              For reference, let us example the input_fn of the colab, where I do the same things as advised above:



              def input_fn(filenames:list, params):
              mode = params['mode'] if 'mode' in params else 'train'
              batch_size = params['batch_size']


              shuffle(filenames) # <--- far more efficient than tf dataset shuffle
              dataset = tf.data.TFRecordDataset(filenames)

              # using fio's SCHEMA fill the TF Feature placeholders with values
              dataset = dataset.map(lambda record: fio.from_record(record))

              # using fio's SCHEMA restructure and unwrap (if possible) features (because tf records require wrapping everything into a list)
              dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))

              # dataset should be a tuple of (features, labels)
              dataset = dataset.map(lambda context, features: (
              {"input_tensors": features[I_FEATURE]}, # features <--- wrapping it in a dictionary
              features[O_FEATURE] # labels
              )
              )


              For simplicity, I will assume you are using tf.data.Dataset. If your data is not stored as TF Records, you will need to replace line 1.



                1. dataset = tf.data.TFRecordDataset(filenames)
              2. dataset = dataset.map(lambda record: fio.from_record(record))
              3. dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))


              with however you construct your dataset, be it FeatureColumn, from_tensor_slices, etc and remove lines 2 and 3 since you do not need to recover your (Sequence)Example from TF Records.



              Now let us address your second question, variable length arrays.



              It is just the same as above! wrap it in a dictionary and return it.



              This is true with the notable exception of recovering your SequenceExample from TF Records, where you will need VarLenFeature






              share|improve this answer


























                0












                0








                0







                I am also new to the Estimator API, but I have learned quite a lot with the S.O. community and will try to answer your question.



                First, I would like to point you to this colab. This is currently the convention I use for my Estimators.



                You are correct in that the input_fn for both the TRAIN and EVAL modes are meant to be tuples in form (features, labels).



                So let us tackle your first question:




                what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?




                Well this requires me to back track a bit, to your input of:




                batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1),




                To make sure I understand correctly, you are saying that, what if instead of a Tensor with shape [100, 1], you have a Tensor or [100, <size>], in this case 3 doubles, so [100, 3]?



                Well if that is the case, that is no problem at all. In the linked colab a single example of the input has shape [20, 7]. So a Tensor of [3] is straight forward.



                The short answer is that whatever you specify as the features part of the tuple is passed to model_fn. So you want to pass a Tensor of [batch_size, size] you return a tuple of ([batch_size, size], labels). However, as another user pointed out to me on S.O. I will impart you with the same advice - use dictionaries e.g.



                my_data = # Tensor with shape [batch_size, size]
                features = {'my_data': my_data}
                ...
                return (features, labels)


                For reference, let us example the input_fn of the colab, where I do the same things as advised above:



                def input_fn(filenames:list, params):
                mode = params['mode'] if 'mode' in params else 'train'
                batch_size = params['batch_size']


                shuffle(filenames) # <--- far more efficient than tf dataset shuffle
                dataset = tf.data.TFRecordDataset(filenames)

                # using fio's SCHEMA fill the TF Feature placeholders with values
                dataset = dataset.map(lambda record: fio.from_record(record))

                # using fio's SCHEMA restructure and unwrap (if possible) features (because tf records require wrapping everything into a list)
                dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))

                # dataset should be a tuple of (features, labels)
                dataset = dataset.map(lambda context, features: (
                {"input_tensors": features[I_FEATURE]}, # features <--- wrapping it in a dictionary
                features[O_FEATURE] # labels
                )
                )


                For simplicity, I will assume you are using tf.data.Dataset. If your data is not stored as TF Records, you will need to replace line 1.



                  1. dataset = tf.data.TFRecordDataset(filenames)
                2. dataset = dataset.map(lambda record: fio.from_record(record))
                3. dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))


                with however you construct your dataset, be it FeatureColumn, from_tensor_slices, etc and remove lines 2 and 3 since you do not need to recover your (Sequence)Example from TF Records.



                Now let us address your second question, variable length arrays.



                It is just the same as above! wrap it in a dictionary and return it.



                This is true with the notable exception of recovering your SequenceExample from TF Records, where you will need VarLenFeature






                share|improve this answer













                I am also new to the Estimator API, but I have learned quite a lot with the S.O. community and will try to answer your question.



                First, I would like to point you to this colab. This is currently the convention I use for my Estimators.



                You are correct in that the input_fn for both the TRAIN and EVAL modes are meant to be tuples in form (features, labels).



                So let us tackle your first question:




                what if my initial feature is already a tensor, like "size" which is a an array of 3 double values? How can I input it via the input_fn ?




                Well this requires me to back track a bit, to your input of:




                batch of 100 examples and a feature called "weight" I will create an entry in the feature dictionary that is a tensor of shape (100,1),




                To make sure I understand correctly, you are saying that, what if instead of a Tensor with shape [100, 1], you have a Tensor or [100, <size>], in this case 3 doubles, so [100, 3]?



                Well if that is the case, that is no problem at all. In the linked colab a single example of the input has shape [20, 7]. So a Tensor of [3] is straight forward.



                The short answer is that whatever you specify as the features part of the tuple is passed to model_fn. So you want to pass a Tensor of [batch_size, size] you return a tuple of ([batch_size, size], labels). However, as another user pointed out to me on S.O. I will impart you with the same advice - use dictionaries e.g.



                my_data = # Tensor with shape [batch_size, size]
                features = {'my_data': my_data}
                ...
                return (features, labels)


                For reference, let us example the input_fn of the colab, where I do the same things as advised above:



                def input_fn(filenames:list, params):
                mode = params['mode'] if 'mode' in params else 'train'
                batch_size = params['batch_size']


                shuffle(filenames) # <--- far more efficient than tf dataset shuffle
                dataset = tf.data.TFRecordDataset(filenames)

                # using fio's SCHEMA fill the TF Feature placeholders with values
                dataset = dataset.map(lambda record: fio.from_record(record))

                # using fio's SCHEMA restructure and unwrap (if possible) features (because tf records require wrapping everything into a list)
                dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))

                # dataset should be a tuple of (features, labels)
                dataset = dataset.map(lambda context, features: (
                {"input_tensors": features[I_FEATURE]}, # features <--- wrapping it in a dictionary
                features[O_FEATURE] # labels
                )
                )


                For simplicity, I will assume you are using tf.data.Dataset. If your data is not stored as TF Records, you will need to replace line 1.



                  1. dataset = tf.data.TFRecordDataset(filenames)
                2. dataset = dataset.map(lambda record: fio.from_record(record))
                3. dataset = dataset.map(lambda context, features: fio.reconstitute((context, features)))


                with however you construct your dataset, be it FeatureColumn, from_tensor_slices, etc and remove lines 2 and 3 since you do not need to recover your (Sequence)Example from TF Records.



                Now let us address your second question, variable length arrays.



                It is just the same as above! wrap it in a dictionary and return it.



                This is true with the notable exception of recovering your SequenceExample from TF Records, where you will need VarLenFeature







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 21 '18 at 14:59









                SumNeuronSumNeuron

                1,228826




                1,228826
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53401198%2fevalspec-input-fn-case-where-the-feature-is-an-array%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Guess what letter conforming each word

                    Port of Spain

                    Run scheduled task as local user group (not BUILTIN)