How to split one input stream to multiple topics and guarantee the simultaneously consuming












0














I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.



Here my requirement:



I get sensor data via an byte array with different data inside.
For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.



Array 1: [ 1, 2, 3 ]



Array 2: [ 4, 5, 6 ]



Array 3: [ 7, 8, 9 ]



Array 4: [ 10, 11, 12 ]



Now I want to read these arrays and want to produce messages for three topics:




  • topic-temp1


  • topic-temp2


  • topic-voltage



The order of producing is:




  • Read array 1

  • produce message to topic-temp1 (value=1)

  • produce message to topic-temp2 (value=2)

  • produce message to topic-voltage (value=3)


  • Read array 2


  • produce message to topic-temp1 (value=4)

  • produce message to topic-temp2 (value=5)

  • produce message to topic-voltage (value=6)


  • Read array 3


  • produce message to topic-temp1 (value=7)

  • produce message to topic-temp2 (value=8)

  • produce message to topic-voltage (value=9)


... Read array n ...



After that I have 3 Topics with different data inside:




  • topic-temp1: 1, 4, 7, 10


  • topic-temp2: 2, 5, 8, 11


  • topic-voltage: 3, 6, 9, 12



Now to my question:
I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.



How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.




  • 1,2,3


  • 4,5,6


  • 7,8,9


  • 10,11,12



Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?



Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?



Thank you in advance!










share|improve this question



























    0














    I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.



    Here my requirement:



    I get sensor data via an byte array with different data inside.
    For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.



    Array 1: [ 1, 2, 3 ]



    Array 2: [ 4, 5, 6 ]



    Array 3: [ 7, 8, 9 ]



    Array 4: [ 10, 11, 12 ]



    Now I want to read these arrays and want to produce messages for three topics:




    • topic-temp1


    • topic-temp2


    • topic-voltage



    The order of producing is:




    • Read array 1

    • produce message to topic-temp1 (value=1)

    • produce message to topic-temp2 (value=2)

    • produce message to topic-voltage (value=3)


    • Read array 2


    • produce message to topic-temp1 (value=4)

    • produce message to topic-temp2 (value=5)

    • produce message to topic-voltage (value=6)


    • Read array 3


    • produce message to topic-temp1 (value=7)

    • produce message to topic-temp2 (value=8)

    • produce message to topic-voltage (value=9)


    ... Read array n ...



    After that I have 3 Topics with different data inside:




    • topic-temp1: 1, 4, 7, 10


    • topic-temp2: 2, 5, 8, 11


    • topic-voltage: 3, 6, 9, 12



    Now to my question:
    I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.



    How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.




    • 1,2,3


    • 4,5,6


    • 7,8,9


    • 10,11,12



    Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?



    Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?



    Thank you in advance!










    share|improve this question

























      0












      0








      0







      I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.



      Here my requirement:



      I get sensor data via an byte array with different data inside.
      For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.



      Array 1: [ 1, 2, 3 ]



      Array 2: [ 4, 5, 6 ]



      Array 3: [ 7, 8, 9 ]



      Array 4: [ 10, 11, 12 ]



      Now I want to read these arrays and want to produce messages for three topics:




      • topic-temp1


      • topic-temp2


      • topic-voltage



      The order of producing is:




      • Read array 1

      • produce message to topic-temp1 (value=1)

      • produce message to topic-temp2 (value=2)

      • produce message to topic-voltage (value=3)


      • Read array 2


      • produce message to topic-temp1 (value=4)

      • produce message to topic-temp2 (value=5)

      • produce message to topic-voltage (value=6)


      • Read array 3


      • produce message to topic-temp1 (value=7)

      • produce message to topic-temp2 (value=8)

      • produce message to topic-voltage (value=9)


      ... Read array n ...



      After that I have 3 Topics with different data inside:




      • topic-temp1: 1, 4, 7, 10


      • topic-temp2: 2, 5, 8, 11


      • topic-voltage: 3, 6, 9, 12



      Now to my question:
      I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.



      How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.




      • 1,2,3


      • 4,5,6


      • 7,8,9


      • 10,11,12



      Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?



      Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?



      Thank you in advance!










      share|improve this question













      I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.



      Here my requirement:



      I get sensor data via an byte array with different data inside.
      For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.



      Array 1: [ 1, 2, 3 ]



      Array 2: [ 4, 5, 6 ]



      Array 3: [ 7, 8, 9 ]



      Array 4: [ 10, 11, 12 ]



      Now I want to read these arrays and want to produce messages for three topics:




      • topic-temp1


      • topic-temp2


      • topic-voltage



      The order of producing is:




      • Read array 1

      • produce message to topic-temp1 (value=1)

      • produce message to topic-temp2 (value=2)

      • produce message to topic-voltage (value=3)


      • Read array 2


      • produce message to topic-temp1 (value=4)

      • produce message to topic-temp2 (value=5)

      • produce message to topic-voltage (value=6)


      • Read array 3


      • produce message to topic-temp1 (value=7)

      • produce message to topic-temp2 (value=8)

      • produce message to topic-voltage (value=9)


      ... Read array n ...



      After that I have 3 Topics with different data inside:




      • topic-temp1: 1, 4, 7, 10


      • topic-temp2: 2, 5, 8, 11


      • topic-voltage: 3, 6, 9, 12



      Now to my question:
      I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.



      How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.




      • 1,2,3


      • 4,5,6


      • 7,8,9


      • 10,11,12



      Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?



      Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?



      Thank you in advance!







      apache-kafka kafka-consumer-api apache-kafka-streams kafka-producer-api






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 11:50









      StM

      1




      1
























          1 Answer
          1






          active

          oldest

          votes


















          0














          I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.



          Otherwise, joining data purely by timestamp doesn't seem that fail proof.



          Your message key can be the UUID/name, and you can scale that to hundreds of partitions



          You could binary encode the data you're sending, but I will use a JSON string for illustration



          {
          "sensor_id" : "some unique name",
          "temperatures" [1,2],
          "voltage": 3
          }


          If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL



          Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages






          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53299594%2fhow-to-split-one-input-stream-to-multiple-topics-and-guarantee-the-simultaneousl%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.



            Otherwise, joining data purely by timestamp doesn't seem that fail proof.



            Your message key can be the UUID/name, and you can scale that to hundreds of partitions



            You could binary encode the data you're sending, but I will use a JSON string for illustration



            {
            "sensor_id" : "some unique name",
            "temperatures" [1,2],
            "voltage": 3
            }


            If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL



            Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages






            share|improve this answer


























              0














              I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.



              Otherwise, joining data purely by timestamp doesn't seem that fail proof.



              Your message key can be the UUID/name, and you can scale that to hundreds of partitions



              You could binary encode the data you're sending, but I will use a JSON string for illustration



              {
              "sensor_id" : "some unique name",
              "temperatures" [1,2],
              "voltage": 3
              }


              If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL



              Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages






              share|improve this answer
























                0












                0








                0






                I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.



                Otherwise, joining data purely by timestamp doesn't seem that fail proof.



                Your message key can be the UUID/name, and you can scale that to hundreds of partitions



                You could binary encode the data you're sending, but I will use a JSON string for illustration



                {
                "sensor_id" : "some unique name",
                "temperatures" [1,2],
                "voltage": 3
                }


                If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL



                Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages






                share|improve this answer












                I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.



                Otherwise, joining data purely by timestamp doesn't seem that fail proof.



                Your message key can be the UUID/name, and you can scale that to hundreds of partitions



                You could binary encode the data you're sending, but I will use a JSON string for illustration



                {
                "sensor_id" : "some unique name",
                "temperatures" [1,2],
                "voltage": 3
                }


                If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL



                Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 14 '18 at 15:59









                cricket_007

                79.4k1142109




                79.4k1142109






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53299594%2fhow-to-split-one-input-stream-to-multiple-topics-and-guarantee-the-simultaneousl%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Guess what letter conforming each word

                    Port of Spain

                    Run scheduled task as local user group (not BUILTIN)