How to split one input stream to multiple topics and guarantee the simultaneously consuming
I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.
Here my requirement:
I get sensor data via an byte array with different data inside.
For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.
Array 1: [ 1, 2, 3 ]
Array 2: [ 4, 5, 6 ]
Array 3: [ 7, 8, 9 ]
Array 4: [ 10, 11, 12 ]
Now I want to read these arrays and want to produce messages for three topics:
topic-temp1
topic-temp2
topic-voltage
The order of producing is:
- Read array 1
- produce message to topic-temp1 (value=1)
- produce message to topic-temp2 (value=2)
produce message to topic-voltage (value=3)
Read array 2
- produce message to topic-temp1 (value=4)
- produce message to topic-temp2 (value=5)
produce message to topic-voltage (value=6)
Read array 3
- produce message to topic-temp1 (value=7)
- produce message to topic-temp2 (value=8)
- produce message to topic-voltage (value=9)
... Read array n ...
After that I have 3 Topics with different data inside:
topic-temp1: 1, 4, 7, 10
topic-temp2: 2, 5, 8, 11
topic-voltage: 3, 6, 9, 12
Now to my question:
I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.
How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.
1,2,3
4,5,6
7,8,9
10,11,12
Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?
Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?
Thank you in advance!
apache-kafka kafka-consumer-api apache-kafka-streams kafka-producer-api
add a comment |
I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.
Here my requirement:
I get sensor data via an byte array with different data inside.
For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.
Array 1: [ 1, 2, 3 ]
Array 2: [ 4, 5, 6 ]
Array 3: [ 7, 8, 9 ]
Array 4: [ 10, 11, 12 ]
Now I want to read these arrays and want to produce messages for three topics:
topic-temp1
topic-temp2
topic-voltage
The order of producing is:
- Read array 1
- produce message to topic-temp1 (value=1)
- produce message to topic-temp2 (value=2)
produce message to topic-voltage (value=3)
Read array 2
- produce message to topic-temp1 (value=4)
- produce message to topic-temp2 (value=5)
produce message to topic-voltage (value=6)
Read array 3
- produce message to topic-temp1 (value=7)
- produce message to topic-temp2 (value=8)
- produce message to topic-voltage (value=9)
... Read array n ...
After that I have 3 Topics with different data inside:
topic-temp1: 1, 4, 7, 10
topic-temp2: 2, 5, 8, 11
topic-voltage: 3, 6, 9, 12
Now to my question:
I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.
How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.
1,2,3
4,5,6
7,8,9
10,11,12
Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?
Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?
Thank you in advance!
apache-kafka kafka-consumer-api apache-kafka-streams kafka-producer-api
add a comment |
I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.
Here my requirement:
I get sensor data via an byte array with different data inside.
For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.
Array 1: [ 1, 2, 3 ]
Array 2: [ 4, 5, 6 ]
Array 3: [ 7, 8, 9 ]
Array 4: [ 10, 11, 12 ]
Now I want to read these arrays and want to produce messages for three topics:
topic-temp1
topic-temp2
topic-voltage
The order of producing is:
- Read array 1
- produce message to topic-temp1 (value=1)
- produce message to topic-temp2 (value=2)
produce message to topic-voltage (value=3)
Read array 2
- produce message to topic-temp1 (value=4)
- produce message to topic-temp2 (value=5)
produce message to topic-voltage (value=6)
Read array 3
- produce message to topic-temp1 (value=7)
- produce message to topic-temp2 (value=8)
- produce message to topic-voltage (value=9)
... Read array n ...
After that I have 3 Topics with different data inside:
topic-temp1: 1, 4, 7, 10
topic-temp2: 2, 5, 8, 11
topic-voltage: 3, 6, 9, 12
Now to my question:
I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.
How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.
1,2,3
4,5,6
7,8,9
10,11,12
Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?
Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?
Thank you in advance!
apache-kafka kafka-consumer-api apache-kafka-streams kafka-producer-api
I want to create a simple sensor data based application with apache kafka. My question is very simple and is referenced to the basic concept of apache kafka. I'm a beginner at apache kafka.
Here my requirement:
I get sensor data via an byte array with different data inside.
For example the array exists of three entries (Temperature 1, Temperature 2 and Voltage). Here one example with 4 arrays and value data. Each array comes in a defined timestamp.
Array 1: [ 1, 2, 3 ]
Array 2: [ 4, 5, 6 ]
Array 3: [ 7, 8, 9 ]
Array 4: [ 10, 11, 12 ]
Now I want to read these arrays and want to produce messages for three topics:
topic-temp1
topic-temp2
topic-voltage
The order of producing is:
- Read array 1
- produce message to topic-temp1 (value=1)
- produce message to topic-temp2 (value=2)
produce message to topic-voltage (value=3)
Read array 2
- produce message to topic-temp1 (value=4)
- produce message to topic-temp2 (value=5)
produce message to topic-voltage (value=6)
Read array 3
- produce message to topic-temp1 (value=7)
- produce message to topic-temp2 (value=8)
- produce message to topic-voltage (value=9)
... Read array n ...
After that I have 3 Topics with different data inside:
topic-temp1: 1, 4, 7, 10
topic-temp2: 2, 5, 8, 11
topic-voltage: 3, 6, 9, 12
Now to my question:
I want to create a software application that consumes these 3 topics. I want to display 3 graphs (temp1, temp2, voltage) in one diagram. The y-axe is the signal value and the x-axe is the timestamp.
How can I quarantee that I get the consumed values at the same timestamp? Only the I can overlay the graphs.
1,2,3
4,5,6
7,8,9
10,11,12
Should I use the Kafka-Stream API? One input-stream-topic (byte array) and three output-stream-topics? How to ensure that these three values are together produced and will be consumed together?
Or should I use a simple consumer api and access the data via offset value. because the offset should be the same for the entries (1,2,3) (4,5,6) ..., because I produced them in this order?
Thank you in advance!
apache-kafka kafka-consumer-api apache-kafka-streams kafka-producer-api
apache-kafka kafka-consumer-api apache-kafka-streams kafka-producer-api
asked Nov 14 '18 at 11:50
StM
1
1
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.
Otherwise, joining data purely by timestamp doesn't seem that fail proof.
Your message key can be the UUID/name, and you can scale that to hundreds of partitions
You could binary encode the data you're sending, but I will use a JSON string for illustration
{
"sensor_id" : "some unique name",
"temperatures" [1,2],
"voltage": 3
}
If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL
Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53299594%2fhow-to-split-one-input-stream-to-multiple-topics-and-guarantee-the-simultaneousl%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.
Otherwise, joining data purely by timestamp doesn't seem that fail proof.
Your message key can be the UUID/name, and you can scale that to hundreds of partitions
You could binary encode the data you're sending, but I will use a JSON string for illustration
{
"sensor_id" : "some unique name",
"temperatures" [1,2],
"voltage": 3
}
If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL
Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages
add a comment |
I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.
Otherwise, joining data purely by timestamp doesn't seem that fail proof.
Your message key can be the UUID/name, and you can scale that to hundreds of partitions
You could binary encode the data you're sending, but I will use a JSON string for illustration
{
"sensor_id" : "some unique name",
"temperatures" [1,2],
"voltage": 3
}
If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL
Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages
add a comment |
I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.
Otherwise, joining data purely by timestamp doesn't seem that fail proof.
Your message key can be the UUID/name, and you can scale that to hundreds of partitions
You could binary encode the data you're sending, but I will use a JSON string for illustration
{
"sensor_id" : "some unique name",
"temperatures" [1,2],
"voltage": 3
}
If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL
Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages
I suggest you use one topic of sensor-resdings with a payload of sensor name (or preferably a UUID), so you know which sensor sent the data, and data it generates, as one whole message.
Otherwise, joining data purely by timestamp doesn't seem that fail proof.
Your message key can be the UUID/name, and you can scale that to hundreds of partitions
You could binary encode the data you're sending, but I will use a JSON string for illustration
{
"sensor_id" : "some unique name",
"temperatures" [1,2],
"voltage": 3
}
If you want three topics out of that, you can very easily create three output topics using Kafka Streams or KSQL
Else, go ahead and create individual topics, but add the ID/name so you can join on that, using windows of time on orders of seconds or minutes, not trying to adjust for lag where one event is just microseconds off and you cannot join messages
answered Nov 14 '18 at 15:59
cricket_007
79.4k1142109
79.4k1142109
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53299594%2fhow-to-split-one-input-stream-to-multiple-topics-and-guarantee-the-simultaneousl%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown