Mapping out a Kafka+Zookeeper cluster











up vote
1
down vote

favorite












Background



I inherited a Kafka/Zookeeper installation. I have a passing knowledge of those - I know the general architecture, how clients work, about topics, etc., have been involved in programming Java clients etc.



But the installation is somewhat dubious. They are three instances of Kafka and Zookeeper each (in their separate docker containers). Supposedly they should work, but what I am seeing is all processes spout immense amount of log output with loads and loads of (diverse) warnings and errors. I have the impression that some of these seem to be quite normal (or are being self-healed all the time), and am having a very hard time figuring if everything works as intended or not, and set up correctly.



Some of these are - according to Google - related to unclean shutdowns of the brokers; corrupted individual topics and such. As this is a test environment, I can easily delete such files.



I know about some commands which help me check topics etc. (basic stuff, like listing them, displaying their individual configuration etc.).



However...



Question



Is there an online ressource/documentation which can be used as a systematic walkthrough to check whether everything is basically setup OK; for example to clear up these questions:




  • Do the three Zookeepers and the three Kafka instances correctly talk to each other for high-availability purposes? Do they have a correct "leader" etc.?

  • Are the servers generally "healthy", i.e., easily able to accept connections etc.?

  • How are the topics working (what's in there, how many messages, etc.)?


I am aware that one may very quickly dismiss this question as too generic; I am not asking you to solve my problems. I am looking for a ressource to systematically walk through such an installation - it may or may not cover the examples I have given, but it definitely should give a systematic way to find out if things are fundamentally wrong.










share|improve this question


























    up vote
    1
    down vote

    favorite












    Background



    I inherited a Kafka/Zookeeper installation. I have a passing knowledge of those - I know the general architecture, how clients work, about topics, etc., have been involved in programming Java clients etc.



    But the installation is somewhat dubious. They are three instances of Kafka and Zookeeper each (in their separate docker containers). Supposedly they should work, but what I am seeing is all processes spout immense amount of log output with loads and loads of (diverse) warnings and errors. I have the impression that some of these seem to be quite normal (or are being self-healed all the time), and am having a very hard time figuring if everything works as intended or not, and set up correctly.



    Some of these are - according to Google - related to unclean shutdowns of the brokers; corrupted individual topics and such. As this is a test environment, I can easily delete such files.



    I know about some commands which help me check topics etc. (basic stuff, like listing them, displaying their individual configuration etc.).



    However...



    Question



    Is there an online ressource/documentation which can be used as a systematic walkthrough to check whether everything is basically setup OK; for example to clear up these questions:




    • Do the three Zookeepers and the three Kafka instances correctly talk to each other for high-availability purposes? Do they have a correct "leader" etc.?

    • Are the servers generally "healthy", i.e., easily able to accept connections etc.?

    • How are the topics working (what's in there, how many messages, etc.)?


    I am aware that one may very quickly dismiss this question as too generic; I am not asking you to solve my problems. I am looking for a ressource to systematically walk through such an installation - it may or may not cover the examples I have given, but it definitely should give a systematic way to find out if things are fundamentally wrong.










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      Background



      I inherited a Kafka/Zookeeper installation. I have a passing knowledge of those - I know the general architecture, how clients work, about topics, etc., have been involved in programming Java clients etc.



      But the installation is somewhat dubious. They are three instances of Kafka and Zookeeper each (in their separate docker containers). Supposedly they should work, but what I am seeing is all processes spout immense amount of log output with loads and loads of (diverse) warnings and errors. I have the impression that some of these seem to be quite normal (or are being self-healed all the time), and am having a very hard time figuring if everything works as intended or not, and set up correctly.



      Some of these are - according to Google - related to unclean shutdowns of the brokers; corrupted individual topics and such. As this is a test environment, I can easily delete such files.



      I know about some commands which help me check topics etc. (basic stuff, like listing them, displaying their individual configuration etc.).



      However...



      Question



      Is there an online ressource/documentation which can be used as a systematic walkthrough to check whether everything is basically setup OK; for example to clear up these questions:




      • Do the three Zookeepers and the three Kafka instances correctly talk to each other for high-availability purposes? Do they have a correct "leader" etc.?

      • Are the servers generally "healthy", i.e., easily able to accept connections etc.?

      • How are the topics working (what's in there, how many messages, etc.)?


      I am aware that one may very quickly dismiss this question as too generic; I am not asking you to solve my problems. I am looking for a ressource to systematically walk through such an installation - it may or may not cover the examples I have given, but it definitely should give a systematic way to find out if things are fundamentally wrong.










      share|improve this question













      Background



      I inherited a Kafka/Zookeeper installation. I have a passing knowledge of those - I know the general architecture, how clients work, about topics, etc., have been involved in programming Java clients etc.



      But the installation is somewhat dubious. They are three instances of Kafka and Zookeeper each (in their separate docker containers). Supposedly they should work, but what I am seeing is all processes spout immense amount of log output with loads and loads of (diverse) warnings and errors. I have the impression that some of these seem to be quite normal (or are being self-healed all the time), and am having a very hard time figuring if everything works as intended or not, and set up correctly.



      Some of these are - according to Google - related to unclean shutdowns of the brokers; corrupted individual topics and such. As this is a test environment, I can easily delete such files.



      I know about some commands which help me check topics etc. (basic stuff, like listing them, displaying their individual configuration etc.).



      However...



      Question



      Is there an online ressource/documentation which can be used as a systematic walkthrough to check whether everything is basically setup OK; for example to clear up these questions:




      • Do the three Zookeepers and the three Kafka instances correctly talk to each other for high-availability purposes? Do they have a correct "leader" etc.?

      • Are the servers generally "healthy", i.e., easily able to accept connections etc.?

      • How are the topics working (what's in there, how many messages, etc.)?


      I am aware that one may very quickly dismiss this question as too generic; I am not asking you to solve my problems. I am looking for a ressource to systematically walk through such an installation - it may or may not cover the examples I have given, but it definitely should give a systematic way to find out if things are fundamentally wrong.







      apache-kafka apache-zookeeper






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 8 at 17:51









      AnoE

      5,1691919




      5,1691919
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote













          This packtpub tutorial/training by Stéphane Maarek is wonderful resource for setting kafka in cluster mode. However he did that in AWS cloud in ubuntu VM.



          I have followed the same steps and installed in Vagrant VMs in cent OS. You can find the code here.



          The VM has yahoo kafka manager to monitor the kafka internal details. list of broker available, healthy , partitions, leaders etc.,



          kafka manager can help you with high level monitoring.



          Please provide your comments.






          share|improve this answer























          • The question doesn't seem to be about setting up, but rather monitoring
            – cricket_007
            Nov 9 at 15:34










          • yeah, I missed to add about kafka manager.
            – Rajkumar Natarajan
            Nov 9 at 18:02


















          up vote
          0
          down vote













          Rather than looking solely at logs, you might want to familiarize yourself with JMX metrics and how you can gather them across the cluster.



          If you want to actually collect and analyze logs, you'll likely need to separately use something like Elasticsearch.



          You won't see "how many messages" in a topic, and you'll need even more monitoring to know if a port is actually open and the Kafka process is running, the disks are filling up, etc.



          My point here is that, Kafka needs fed and watered, if you plan to productionalize it, you can't just set up a small cluster and forget about it. Even if you think it's setup correctly at the beginning, increasing the load on it will cause it to fall in a bad state eventually.



          For a limited trial for your dev environment to get a full look at your cluster health, Confluent Control Center can assist with that.





          To solve the "what's in there" problem, I suggest you setup a Schema Registry, and convince Kafka producers to use it.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53213484%2fmapping-out-a-kafkazookeeper-cluster%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            This packtpub tutorial/training by Stéphane Maarek is wonderful resource for setting kafka in cluster mode. However he did that in AWS cloud in ubuntu VM.



            I have followed the same steps and installed in Vagrant VMs in cent OS. You can find the code here.



            The VM has yahoo kafka manager to monitor the kafka internal details. list of broker available, healthy , partitions, leaders etc.,



            kafka manager can help you with high level monitoring.



            Please provide your comments.






            share|improve this answer























            • The question doesn't seem to be about setting up, but rather monitoring
              – cricket_007
              Nov 9 at 15:34










            • yeah, I missed to add about kafka manager.
              – Rajkumar Natarajan
              Nov 9 at 18:02















            up vote
            0
            down vote













            This packtpub tutorial/training by Stéphane Maarek is wonderful resource for setting kafka in cluster mode. However he did that in AWS cloud in ubuntu VM.



            I have followed the same steps and installed in Vagrant VMs in cent OS. You can find the code here.



            The VM has yahoo kafka manager to monitor the kafka internal details. list of broker available, healthy , partitions, leaders etc.,



            kafka manager can help you with high level monitoring.



            Please provide your comments.






            share|improve this answer























            • The question doesn't seem to be about setting up, but rather monitoring
              – cricket_007
              Nov 9 at 15:34










            • yeah, I missed to add about kafka manager.
              – Rajkumar Natarajan
              Nov 9 at 18:02













            up vote
            0
            down vote










            up vote
            0
            down vote









            This packtpub tutorial/training by Stéphane Maarek is wonderful resource for setting kafka in cluster mode. However he did that in AWS cloud in ubuntu VM.



            I have followed the same steps and installed in Vagrant VMs in cent OS. You can find the code here.



            The VM has yahoo kafka manager to monitor the kafka internal details. list of broker available, healthy , partitions, leaders etc.,



            kafka manager can help you with high level monitoring.



            Please provide your comments.






            share|improve this answer














            This packtpub tutorial/training by Stéphane Maarek is wonderful resource for setting kafka in cluster mode. However he did that in AWS cloud in ubuntu VM.



            I have followed the same steps and installed in Vagrant VMs in cent OS. You can find the code here.



            The VM has yahoo kafka manager to monitor the kafka internal details. list of broker available, healthy , partitions, leaders etc.,



            kafka manager can help you with high level monitoring.



            Please provide your comments.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 9 at 18:01

























            answered Nov 8 at 21:13









            Rajkumar Natarajan

            9651033




            9651033












            • The question doesn't seem to be about setting up, but rather monitoring
              – cricket_007
              Nov 9 at 15:34










            • yeah, I missed to add about kafka manager.
              – Rajkumar Natarajan
              Nov 9 at 18:02


















            • The question doesn't seem to be about setting up, but rather monitoring
              – cricket_007
              Nov 9 at 15:34










            • yeah, I missed to add about kafka manager.
              – Rajkumar Natarajan
              Nov 9 at 18:02
















            The question doesn't seem to be about setting up, but rather monitoring
            – cricket_007
            Nov 9 at 15:34




            The question doesn't seem to be about setting up, but rather monitoring
            – cricket_007
            Nov 9 at 15:34












            yeah, I missed to add about kafka manager.
            – Rajkumar Natarajan
            Nov 9 at 18:02




            yeah, I missed to add about kafka manager.
            – Rajkumar Natarajan
            Nov 9 at 18:02












            up vote
            0
            down vote













            Rather than looking solely at logs, you might want to familiarize yourself with JMX metrics and how you can gather them across the cluster.



            If you want to actually collect and analyze logs, you'll likely need to separately use something like Elasticsearch.



            You won't see "how many messages" in a topic, and you'll need even more monitoring to know if a port is actually open and the Kafka process is running, the disks are filling up, etc.



            My point here is that, Kafka needs fed and watered, if you plan to productionalize it, you can't just set up a small cluster and forget about it. Even if you think it's setup correctly at the beginning, increasing the load on it will cause it to fall in a bad state eventually.



            For a limited trial for your dev environment to get a full look at your cluster health, Confluent Control Center can assist with that.





            To solve the "what's in there" problem, I suggest you setup a Schema Registry, and convince Kafka producers to use it.






            share|improve this answer



























              up vote
              0
              down vote













              Rather than looking solely at logs, you might want to familiarize yourself with JMX metrics and how you can gather them across the cluster.



              If you want to actually collect and analyze logs, you'll likely need to separately use something like Elasticsearch.



              You won't see "how many messages" in a topic, and you'll need even more monitoring to know if a port is actually open and the Kafka process is running, the disks are filling up, etc.



              My point here is that, Kafka needs fed and watered, if you plan to productionalize it, you can't just set up a small cluster and forget about it. Even if you think it's setup correctly at the beginning, increasing the load on it will cause it to fall in a bad state eventually.



              For a limited trial for your dev environment to get a full look at your cluster health, Confluent Control Center can assist with that.





              To solve the "what's in there" problem, I suggest you setup a Schema Registry, and convince Kafka producers to use it.






              share|improve this answer

























                up vote
                0
                down vote










                up vote
                0
                down vote









                Rather than looking solely at logs, you might want to familiarize yourself with JMX metrics and how you can gather them across the cluster.



                If you want to actually collect and analyze logs, you'll likely need to separately use something like Elasticsearch.



                You won't see "how many messages" in a topic, and you'll need even more monitoring to know if a port is actually open and the Kafka process is running, the disks are filling up, etc.



                My point here is that, Kafka needs fed and watered, if you plan to productionalize it, you can't just set up a small cluster and forget about it. Even if you think it's setup correctly at the beginning, increasing the load on it will cause it to fall in a bad state eventually.



                For a limited trial for your dev environment to get a full look at your cluster health, Confluent Control Center can assist with that.





                To solve the "what's in there" problem, I suggest you setup a Schema Registry, and convince Kafka producers to use it.






                share|improve this answer














                Rather than looking solely at logs, you might want to familiarize yourself with JMX metrics and how you can gather them across the cluster.



                If you want to actually collect and analyze logs, you'll likely need to separately use something like Elasticsearch.



                You won't see "how many messages" in a topic, and you'll need even more monitoring to know if a port is actually open and the Kafka process is running, the disks are filling up, etc.



                My point here is that, Kafka needs fed and watered, if you plan to productionalize it, you can't just set up a small cluster and forget about it. Even if you think it's setup correctly at the beginning, increasing the load on it will cause it to fall in a bad state eventually.



                For a limited trial for your dev environment to get a full look at your cluster health, Confluent Control Center can assist with that.





                To solve the "what's in there" problem, I suggest you setup a Schema Registry, and convince Kafka producers to use it.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 10 at 5:17

























                answered Nov 9 at 15:41









                cricket_007

                76.4k1042106




                76.4k1042106






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53213484%2fmapping-out-a-kafkazookeeper-cluster%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Guess what letter conforming each word

                    Run scheduled task as local user group (not BUILTIN)

                    Port of Spain