Different results when using more than one thread

-1

The following code performs a transpose matrix-vector multiplication where the matrix is sparse and stored in CSR format. Depending on the number of threads the result is different. I think the reason is the concurrent memory access and addition.
Is there a way to use multithreading but keep the result the same as for single threading?

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

asked Nov 21 '18 at 20:10

vydesaster

175

Nobody can tell, without knowing the types of result and matrix at least, and whether there's any aliasing of the two.

– Toby Speight
Nov 22 '18 at 9:19

result and vector1 are stl double vectors. Matrix is a class which holds the three vectors that are needed to describe a sparse matrix in CSR format.

– vydesaster
Nov 22 '18 at 18:12

A Minimal, Complete, and Verifiable example would make the question much clearer - could you edit so that others can reproduce and diagnose the issue you're seeing?

– Toby Speight
Nov 22 '18 at 18:18

BTW, if you're comparing int against std::size_t like that, it suggests you have nowhere near enough compiler warnings enabled.

– Toby Speight
Nov 22 '18 at 18:22

add a comment |

-1

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

asked Nov 21 '18 at 20:10

vydesaster

175

Nobody can tell, without knowing the types of result and matrix at least, and whether there's any aliasing of the two.

– Toby Speight
Nov 22 '18 at 9:19

result and vector1 are stl double vectors. Matrix is a class which holds the three vectors that are needed to describe a sparse matrix in CSR format.

– vydesaster
Nov 22 '18 at 18:12

A Minimal, Complete, and Verifiable example would make the question much clearer - could you edit so that others can reproduce and diagnose the issue you're seeing?

– Toby Speight
Nov 22 '18 at 18:18

BTW, if you're comparing int against std::size_t like that, it suggests you have nowhere near enough compiler warnings enabled.

– Toby Speight
Nov 22 '18 at 18:22

add a comment |

-1

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

asked Nov 21 '18 at 20:10

vydesaster

175

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }

c++ vector openmp

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

asked Nov 21 '18 at 20:10

vydesaster

175

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

asked Nov 21 '18 at 20:10

vydesaster

175

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

edited Nov 22 '18 at 18:35

Toby Speight

17.4k134368

asked Nov 21 '18 at 20:10

vydesaster

175

asked Nov 21 '18 at 20:10

vydesaster

175

asked Nov 21 '18 at 20:10

vydesaster

175

Nobody can tell, without knowing the types of result and matrix at least, and whether there's any aliasing of the two.

– Toby Speight
Nov 22 '18 at 9:19

result and vector1 are stl double vectors. Matrix is a class which holds the three vectors that are needed to describe a sparse matrix in CSR format.

– vydesaster
Nov 22 '18 at 18:12

A Minimal, Complete, and Verifiable example would make the question much clearer - could you edit so that others can reproduce and diagnose the issue you're seeing?

– Toby Speight
Nov 22 '18 at 18:18

BTW, if you're comparing int against std::size_t like that, it suggests you have nowhere near enough compiler warnings enabled.

– Toby Speight
Nov 22 '18 at 18:22

add a comment |

Nobody can tell, without knowing the types of result and matrix at least, and whether there's any aliasing of the two.

– Toby Speight
Nov 22 '18 at 9:19

result and vector1 are stl double vectors. Matrix is a class which holds the three vectors that are needed to describe a sparse matrix in CSR format.

– vydesaster
Nov 22 '18 at 18:12

A Minimal, Complete, and Verifiable example would make the question much clearer - could you edit so that others can reproduce and diagnose the issue you're seeing?

– Toby Speight
Nov 22 '18 at 18:18

BTW, if you're comparing int against std::size_t like that, it suggests you have nowhere near enough compiler warnings enabled.

– Toby Speight
Nov 22 '18 at 18:22

Nobody can tell, without knowing the types of result and matrix at least, and whether there's any aliasing of the two.

– Toby Speight
Nov 22 '18 at 9:19

result and vector1 are stl double vectors. Matrix is a class which holds the three vectors that are needed to describe a sparse matrix in CSR format.

– vydesaster
Nov 22 '18 at 18:12

A Minimal, Complete, and Verifiable example would make the question much clearer - could you edit so that others can reproduce and diagnose the issue you're seeing?

– Toby Speight
Nov 22 '18 at 18:18

BTW, if you're comparing int against std::size_t like that, it suggests you have nowhere near enough compiler warnings enabled.

– Toby Speight
Nov 22 '18 at 18:22

add a comment |

2 Answers
2

active

oldest

votes

The increment operation should indeed be made atomic to make sure two threads are not updating the value at the same time, which would be a race condition. This will make the code correct, but since atomic increment is slower than standard increment, it may make the parallel code slower than the serial one overall.

Performance may also be affected by false-sharing: if the vector size is not much greater than the number of threads, it will often happen that two threads attempt to increment values belonging to the same cache line and much time is spent synchronizing the cache between CPUs.

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            #pragma omp atomic

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

This being said, for a given element in the result vector, the order in which the various products are added to make that element will not be the same in the serial and parallel cases. Rounding errors will add differently and one should not expect the serial and parallel results to be the same, but rather the difference between these results to be small relative to the precision of the float or double format.

answered Nov 22 '18 at 9:07

Brice

1,415110

You should probably point out that there will be very little (possibly negative) speed increase from parallelisation when the entire body of the loop is in an atomic block.

– Toby Speight
Nov 22 '18 at 18:21

It's probably more efficient to make the elements of result be some suitable std::atomic<> type instead. If not, at least bring the computation of lhs and rhs ahead of the atomic section, so only the += is affected.

– Toby Speight
Nov 22 '18 at 18:32

I have already stated that since atomic increment is slower than standard increment, it may make the parallel code slower than the serial one. Then I do not know for sure in which direction the balance between one fast op at a time and several slow ones in parallel will point. This depends on too many factors and there may still be a small performance improvement.

– Brice
Nov 23 '18 at 8:42

I have tried the atomic pragma and it worked fine. But as stated, the performance decreased. But I am totaly fine with that. At least I am getting reproducable results.

– vydesaster
Nov 23 '18 at 19:20

add a comment |

Without a reproducible example, I've had to guess at the context of this code:

#include <vector>



class SparseMatrix

{

    std::vector<double> values = { 5, 8, 3, 6, };

    std::vector<std::size_t> rows = { 0, 0, 2, 3, 4, };

    std::vector<std::size_t> cols = { 1, 2, 1, 1, };



public:

    const std::vector<std::size_t> *get_rowptr() const { return &rows; };

    const std::vector<std::size_t> *get_columnindex() const { return &cols; };

    const std::vector<double> *get_value() const { return &values; };

};



#include <array>

#include <iostream>

int main()

{

    SparseMatrix matrix;



    std::array<double, 4> result{};

    std::array<double, 4> vector1{ 1, 2, 3, 4 };



#pragma omp parallel for

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }



    for (auto const& i: result)

        std::cout << i << " ";

    std::cout << 'n';

}

With some suitable variables we can simplify the code so we see what's going on:

    auto const& rows = *matrix.get_rowptr();

    auto const& cols = *matrix.get_columnindex();

    auto const& values = *matrix.get_value();



 #pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            result[cols[j]] += values[j] * vector1[i];

        }

    }

Now we can see in the body of the loop that we're assigning to a result entry that other threads may also be writing to. We need to serialize the access to result[cols[j]] so that only one thread at a time is executing the +=. We can do that by marking that operation indivisible, using the OMP atomic keyword:

#pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            auto& dest = result[cols[j]];

            auto const term = values[j] * vector1[i];

#pragma omp atomic

            dest += term;

        }

    }

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

Thank you very much, Toby. You nailed it. Sorry I was not able to post the code in the last days.

– vydesaster
Dec 4 '18 at 20:13

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53419780%2fdifferent-results-when-using-more-than-one-thread%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            #pragma omp atomic

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

answered Nov 22 '18 at 9:07

Brice

1,415110

You should probably point out that there will be very little (possibly negative) speed increase from parallelisation when the entire body of the loop is in an atomic block.

– Toby Speight
Nov 22 '18 at 18:21

It's probably more efficient to make the elements of result be some suitable std::atomic<> type instead. If not, at least bring the computation of lhs and rhs ahead of the atomic section, so only the += is affected.

– Toby Speight
Nov 22 '18 at 18:32

I have already stated that since atomic increment is slower than standard increment, it may make the parallel code slower than the serial one. Then I do not know for sure in which direction the balance between one fast op at a time and several slow ones in parallel will point. This depends on too many factors and there may still be a small performance improvement.

– Brice
Nov 23 '18 at 8:42

I have tried the atomic pragma and it worked fine. But as stated, the performance decreased. But I am totaly fine with that. At least I am getting reproducable results.

– vydesaster
Nov 23 '18 at 19:20

add a comment |

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            #pragma omp atomic

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

answered Nov 22 '18 at 9:07

Brice

1,415110

You should probably point out that there will be very little (possibly negative) speed increase from parallelisation when the entire body of the loop is in an atomic block.

– Toby Speight
Nov 22 '18 at 18:21

It's probably more efficient to make the elements of result be some suitable std::atomic<> type instead. If not, at least bring the computation of lhs and rhs ahead of the atomic section, so only the += is affected.

– Toby Speight
Nov 22 '18 at 18:32

I have already stated that since atomic increment is slower than standard increment, it may make the parallel code slower than the serial one. Then I do not know for sure in which direction the balance between one fast op at a time and several slow ones in parallel will point. This depends on too many factors and there may still be a small performance improvement.

– Brice
Nov 23 '18 at 8:42

I have tried the atomic pragma and it worked fine. But as stated, the performance decreased. But I am totaly fine with that. At least I am getting reproducable results.

– vydesaster
Nov 23 '18 at 19:20

add a comment |

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            #pragma omp atomic

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

answered Nov 22 '18 at 9:07

Brice

1,415110

#pragma omp parallel for num_threads(m_numthreads)

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            #pragma omp atomic

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

answered Nov 22 '18 at 9:07

Brice

1,415110

answered Nov 22 '18 at 9:07

Brice

1,415110

answered Nov 22 '18 at 9:07

Brice

1,415110

answered Nov 22 '18 at 9:07

Brice

1,415110

You should probably point out that there will be very little (possibly negative) speed increase from parallelisation when the entire body of the loop is in an atomic block.

– Toby Speight
Nov 22 '18 at 18:21

It's probably more efficient to make the elements of result be some suitable std::atomic<> type instead. If not, at least bring the computation of lhs and rhs ahead of the atomic section, so only the += is affected.

– Toby Speight
Nov 22 '18 at 18:32

I have already stated that since atomic increment is slower than standard increment, it may make the parallel code slower than the serial one. Then I do not know for sure in which direction the balance between one fast op at a time and several slow ones in parallel will point. This depends on too many factors and there may still be a small performance improvement.

– Brice
Nov 23 '18 at 8:42

I have tried the atomic pragma and it worked fine. But as stated, the performance decreased. But I am totaly fine with that. At least I am getting reproducable results.

– vydesaster
Nov 23 '18 at 19:20

add a comment |

You should probably point out that there will be very little (possibly negative) speed increase from parallelisation when the entire body of the loop is in an atomic block.

– Toby Speight
Nov 22 '18 at 18:21

It's probably more efficient to make the elements of result be some suitable std::atomic<> type instead. If not, at least bring the computation of lhs and rhs ahead of the atomic section, so only the += is affected.

– Toby Speight
Nov 22 '18 at 18:32

I have already stated that since atomic increment is slower than standard increment, it may make the parallel code slower than the serial one. Then I do not know for sure in which direction the balance between one fast op at a time and several slow ones in parallel will point. This depends on too many factors and there may still be a small performance improvement.

– Brice
Nov 23 '18 at 8:42

I have tried the atomic pragma and it worked fine. But as stated, the performance decreased. But I am totaly fine with that. At least I am getting reproducable results.

– vydesaster
Nov 23 '18 at 19:20

You should probably point out that there will be very little (possibly negative) speed increase from parallelisation when the entire body of the loop is in an atomic block.

– Toby Speight
Nov 22 '18 at 18:21

It's probably more efficient to make the elements of result be some suitable std::atomic<> type instead. If not, at least bring the computation of lhs and rhs ahead of the atomic section, so only the += is affected.

– Toby Speight
Nov 22 '18 at 18:32

I have already stated that since atomic increment is slower than standard increment, it may make the parallel code slower than the serial one. Then I do not know for sure in which direction the balance between one fast op at a time and several slow ones in parallel will point. This depends on too many factors and there may still be a small performance improvement.

– Brice
Nov 23 '18 at 8:42

I have tried the atomic pragma and it worked fine. But as stated, the performance decreased. But I am totaly fine with that. At least I am getting reproducable results.

– vydesaster
Nov 23 '18 at 19:20

add a comment |

Without a reproducible example, I've had to guess at the context of this code:

#include <vector>



class SparseMatrix

{

    std::vector<double> values = { 5, 8, 3, 6, };

    std::vector<std::size_t> rows = { 0, 0, 2, 3, 4, };

    std::vector<std::size_t> cols = { 1, 2, 1, 1, };



public:

    const std::vector<std::size_t> *get_rowptr() const { return &rows; };

    const std::vector<std::size_t> *get_columnindex() const { return &cols; };

    const std::vector<double> *get_value() const { return &values; };

};



#include <array>

#include <iostream>

int main()

{

    SparseMatrix matrix;



    std::array<double, 4> result{};

    std::array<double, 4> vector1{ 1, 2, 3, 4 };



#pragma omp parallel for

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }



    for (auto const& i: result)

        std::cout << i << " ";

    std::cout << 'n';

}

With some suitable variables we can simplify the code so we see what's going on:

    auto const& rows = *matrix.get_rowptr();

    auto const& cols = *matrix.get_columnindex();

    auto const& values = *matrix.get_value();



 #pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            result[cols[j]] += values[j] * vector1[i];

        }

    }

#pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            auto& dest = result[cols[j]];

            auto const term = values[j] * vector1[i];

#pragma omp atomic

            dest += term;

        }

    }

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

Thank you very much, Toby. You nailed it. Sorry I was not able to post the code in the last days.

– vydesaster
Dec 4 '18 at 20:13

add a comment |

Without a reproducible example, I've had to guess at the context of this code:

#include <vector>



class SparseMatrix

{

    std::vector<double> values = { 5, 8, 3, 6, };

    std::vector<std::size_t> rows = { 0, 0, 2, 3, 4, };

    std::vector<std::size_t> cols = { 1, 2, 1, 1, };



public:

    const std::vector<std::size_t> *get_rowptr() const { return &rows; };

    const std::vector<std::size_t> *get_columnindex() const { return &cols; };

    const std::vector<double> *get_value() const { return &values; };

};



#include <array>

#include <iostream>

int main()

{

    SparseMatrix matrix;



    std::array<double, 4> result{};

    std::array<double, 4> vector1{ 1, 2, 3, 4 };



#pragma omp parallel for

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }



    for (auto const& i: result)

        std::cout << i << " ";

    std::cout << 'n';

}

With some suitable variables we can simplify the code so we see what's going on:

    auto const& rows = *matrix.get_rowptr();

    auto const& cols = *matrix.get_columnindex();

    auto const& values = *matrix.get_value();



 #pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            result[cols[j]] += values[j] * vector1[i];

        }

    }

#pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            auto& dest = result[cols[j]];

            auto const term = values[j] * vector1[i];

#pragma omp atomic

            dest += term;

        }

    }

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

Thank you very much, Toby. You nailed it. Sorry I was not able to post the code in the last days.

– vydesaster
Dec 4 '18 at 20:13

add a comment |

Without a reproducible example, I've had to guess at the context of this code:

#include <vector>



class SparseMatrix

{

    std::vector<double> values = { 5, 8, 3, 6, };

    std::vector<std::size_t> rows = { 0, 0, 2, 3, 4, };

    std::vector<std::size_t> cols = { 1, 2, 1, 1, };



public:

    const std::vector<std::size_t> *get_rowptr() const { return &rows; };

    const std::vector<std::size_t> *get_columnindex() const { return &cols; };

    const std::vector<double> *get_value() const { return &values; };

};



#include <array>

#include <iostream>

int main()

{

    SparseMatrix matrix;



    std::array<double, 4> result{};

    std::array<double, 4> vector1{ 1, 2, 3, 4 };



#pragma omp parallel for

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }



    for (auto const& i: result)

        std::cout << i << " ";

    std::cout << 'n';

}

With some suitable variables we can simplify the code so we see what's going on:

    auto const& rows = *matrix.get_rowptr();

    auto const& cols = *matrix.get_columnindex();

    auto const& values = *matrix.get_value();



 #pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            result[cols[j]] += values[j] * vector1[i];

        }

    }

#pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            auto& dest = result[cols[j]];

            auto const term = values[j] * vector1[i];

#pragma omp atomic

            dest += term;

        }

    }

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

Without a reproducible example, I've had to guess at the context of this code:

#include <vector>



class SparseMatrix

{

    std::vector<double> values = { 5, 8, 3, 6, };

    std::vector<std::size_t> rows = { 0, 0, 2, 3, 4, };

    std::vector<std::size_t> cols = { 1, 2, 1, 1, };



public:

    const std::vector<std::size_t> *get_rowptr() const { return &rows; };

    const std::vector<std::size_t> *get_columnindex() const { return &cols; };

    const std::vector<double> *get_value() const { return &values; };

};



#include <array>

#include <iostream>

int main()

{

    SparseMatrix matrix;



    std::array<double, 4> result{};

    std::array<double, 4> vector1{ 1, 2, 3, 4 };



#pragma omp parallel for

    for (int i = 0; i < matrix.get_rowptr()->size() - 1; ++i)

    {

        for (int j = matrix.get_rowptr()->operator(i); j < matrix.get_rowptr()->operator(i + 1); ++j)

        {

            result[matrix.get_columnindex()->operator(j)] += matrix.get_value()->operator(j) * vector1[i];

        }

    }



    for (auto const& i: result)

        std::cout << i << " ";

    std::cout << 'n';

}

With some suitable variables we can simplify the code so we see what's going on:

    auto const& rows = *matrix.get_rowptr();

    auto const& cols = *matrix.get_columnindex();

    auto const& values = *matrix.get_value();



 #pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            result[cols[j]] += values[j] * vector1[i];

        }

    }

#pragma omp parallel for

    for (std::size_t i = 0;  i < rows.size() - 1;  ++i)

    {

        for (std::size_t j = rows[i];  j < rows[i+1];  ++j)

        {

            auto& dest = result[cols[j]];

            auto const term = values[j] * vector1[i];

#pragma omp atomic

            dest += term;

        }

    }

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

answered Nov 30 '18 at 8:42

Toby Speight

17.4k134368

Thank you very much, Toby. You nailed it. Sorry I was not able to post the code in the last days.

– vydesaster
Dec 4 '18 at 20:13

add a comment |

Thank you very much, Toby. You nailed it. Sorry I was not able to post the code in the last days.

– vydesaster
Dec 4 '18 at 20:13

Thank you very much, Toby. You nailed it. Sorry I was not able to post the code in the last days.

– vydesaster
Dec 4 '18 at 20:13

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk