Is there a standard conforming way to write a portable ls utility in C++?

Let's consider the following code listing the directory contents of the path given as the first argument to the program:

#include <filesystem>

#include <iostream>



int main(int argc, char **argv)

{



    if(argc != 2)

        std::cerr << "Please specify a directory.n";



    for(auto& p: std::filesystem::directory_iterator(argv[1]))

        std::cout << p << 'n';



}

On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).

However, there seem to be a few pitfalls. In particular, the C++ standard does not seem to mandate that the encoding of argv[1] matches that accepted by std::filesystem::path constructors nor does it seem to mandate that the encoding returned by std::filesystem::path::string() matches that accepted by std::cout.

Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:

The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).

From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type matches the char type of argv[1] (which is true on any POSIX system).

This seems to allow, for example, a conforming implementation in which the execution character set encoding (and hence the encoding of argv[1] and that accepted by std::cout) is EBCDIC, but the encoding of strings accepted and provided by the filesystem library is ISO 8859-1, with no conversion performed between the two, making the filesystem library essentially useless. Worse yet, there is no way to figure out if the two encodings are the same or not.

This can even get dangerous if you start to write utilities which delete files and the to be deleted file provided by argv[1] matches a completely different file when it's interpreted in the native encoding of the filesystem library.

Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.

The u8path() and u8string() functions are of no use here either because the standard also provides no way to convert between UTF-8 and the execution character set encoding (used by argv[1] and std::cout).

Is there any portable, encoding agnostic and standard compliant way to do this?

asked Nov 15 '18 at 16:38

Contter

312

Speaking of standards, ls will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40

2

Yes, there is no portable way to write ls application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41

@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43

The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09

@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20

|
show 8 more comments

Let's consider the following code listing the directory contents of the path given as the first argument to the program:

#include <filesystem>

#include <iostream>



int main(int argc, char **argv)

{



    if(argc != 2)

        std::cerr << "Please specify a directory.n";



    for(auto& p: std::filesystem::directory_iterator(argv[1]))

        std::cout << p << 'n';



}

On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).

Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:

The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).

From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type matches the char type of argv[1] (which is true on any POSIX system).

Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.

Is there any portable, encoding agnostic and standard compliant way to do this?

asked Nov 15 '18 at 16:38

Contter

312

Speaking of standards, ls will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40

2

Yes, there is no portable way to write ls application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41

@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43

The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09

@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20

|
show 8 more comments

Let's consider the following code listing the directory contents of the path given as the first argument to the program:

#include <filesystem>

#include <iostream>



int main(int argc, char **argv)

{



    if(argc != 2)

        std::cerr << "Please specify a directory.n";



    for(auto& p: std::filesystem::directory_iterator(argv[1]))

        std::cout << p << 'n';



}

On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).

Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:

The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).

From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type matches the char type of argv[1] (which is true on any POSIX system).

Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.

Is there any portable, encoding agnostic and standard compliant way to do this?

asked Nov 15 '18 at 16:38

Contter

312

Let's consider the following code listing the directory contents of the path given as the first argument to the program:

#include <filesystem>

#include <iostream>



int main(int argc, char **argv)

{



    if(argc != 2)

        std::cerr << "Please specify a directory.n";



    for(auto& p: std::filesystem::directory_iterator(argv[1]))

        std::cout << p << 'n';



}

On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).

Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:

The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).

From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type matches the char type of argv[1] (which is true on any POSIX system).

Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.

Is there any portable, encoding agnostic and standard compliant way to do this?

c++ character-encoding filesystems c++17 c++-standard-library

asked Nov 15 '18 at 16:38

Contter

312

asked Nov 15 '18 at 16:38

Contter

312

asked Nov 15 '18 at 16:38

Contter

312

asked Nov 15 '18 at 16:38

Contter

312

asked Nov 15 '18 at 16:38

Contter

312

Speaking of standards, ls will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40

2

Yes, there is no portable way to write ls application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41

@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43

The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09

@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20

|
show 8 more comments

Speaking of standards, ls will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40

2

Yes, there is no portable way to write ls application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41

@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43

The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09

@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20

Speaking of standards, ls will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40

Yes, there is no portable way to write ls application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41

@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43

The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09

@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20

|
show 8 more comments

1 Answer
1

active

oldest

votes

No, and this is not just theoretical.

On Windows systems, paths are UTF-16, and path::value_type is wchar_t, not the char you get from char** argv. This isn't a problem by itself - path can be created from a char*. However, not every Windows file name can be expressed as a char*. Hence the program is unable to list the contents of some directories whose name cannot be expressed as char*.

Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!

answered Nov 15 '18 at 17:20

MSalters

133k8115267

Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53324033%2fis-there-a-standard-conforming-way-to-write-a-portable-ls-utility-in-c%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

No, and this is not just theoretical.

Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!

answered Nov 15 '18 at 17:20

MSalters

133k8115267

Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56

add a comment |

No, and this is not just theoretical.

Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!

answered Nov 15 '18 at 17:20

MSalters

133k8115267

Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56

add a comment |

No, and this is not just theoretical.

Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!

answered Nov 15 '18 at 17:20

MSalters

133k8115267

No, and this is not just theoretical.

Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!

answered Nov 15 '18 at 17:20

MSalters

133k8115267

answered Nov 15 '18 at 17:20

MSalters

133k8115267

answered Nov 15 '18 at 17:20

MSalters

133k8115267

answered Nov 15 '18 at 17:20

MSalters

133k8115267

Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56

add a comment |

Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56

Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Agfdhyk