Is there a standard conforming way to write a portable ls utility in C++?
Let's consider the following code listing the directory contents of the path given as the first argument to the program:
#include <filesystem>
#include <iostream>
int main(int argc, char **argv)
{
if(argc != 2)
std::cerr << "Please specify a directory.n";
for(auto& p: std::filesystem::directory_iterator(argv[1]))
std::cout << p << 'n';
}
On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).
However, there seem to be a few pitfalls. In particular, the C++ standard does not seem to mandate that the encoding of argv[1]
matches that accepted by std::filesystem::path
constructors nor does it seem to mandate that the encoding returned by std::filesystem::path::string()
matches that accepted by std::cout
.
Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:
The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).
From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type
matches the char
type of argv[1]
(which is true on any POSIX system).
This seems to allow, for example, a conforming implementation in which the execution character set encoding (and hence the encoding of argv[1]
and that accepted by std::cout
) is EBCDIC, but the encoding of strings accepted and provided by the filesystem library is ISO 8859-1, with no conversion performed between the two, making the filesystem library essentially useless. Worse yet, there is no way to figure out if the two encodings are the same or not.
This can even get dangerous if you start to write utilities which delete files and the to be deleted file provided by argv[1]
matches a completely different file when it's interpreted in the native encoding of the filesystem library.
Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.
The u8path()
and u8string()
functions are of no use here either because the standard also provides no way to convert between UTF-8 and the execution character set encoding (used by argv[1]
and std::cout
).
Is there any portable, encoding agnostic and standard compliant way to do this?
c++ character-encoding filesystems c++17 c++-standard-library
|
show 8 more comments
Let's consider the following code listing the directory contents of the path given as the first argument to the program:
#include <filesystem>
#include <iostream>
int main(int argc, char **argv)
{
if(argc != 2)
std::cerr << "Please specify a directory.n";
for(auto& p: std::filesystem::directory_iterator(argv[1]))
std::cout << p << 'n';
}
On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).
However, there seem to be a few pitfalls. In particular, the C++ standard does not seem to mandate that the encoding of argv[1]
matches that accepted by std::filesystem::path
constructors nor does it seem to mandate that the encoding returned by std::filesystem::path::string()
matches that accepted by std::cout
.
Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:
The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).
From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type
matches the char
type of argv[1]
(which is true on any POSIX system).
This seems to allow, for example, a conforming implementation in which the execution character set encoding (and hence the encoding of argv[1]
and that accepted by std::cout
) is EBCDIC, but the encoding of strings accepted and provided by the filesystem library is ISO 8859-1, with no conversion performed between the two, making the filesystem library essentially useless. Worse yet, there is no way to figure out if the two encodings are the same or not.
This can even get dangerous if you start to write utilities which delete files and the to be deleted file provided by argv[1]
matches a completely different file when it's interpreted in the native encoding of the filesystem library.
Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.
The u8path()
and u8string()
functions are of no use here either because the standard also provides no way to convert between UTF-8 and the execution character set encoding (used by argv[1]
and std::cout
).
Is there any portable, encoding agnostic and standard compliant way to do this?
c++ character-encoding filesystems c++17 c++-standard-library
Speaking of standards,ls
will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40
2
Yes, there is no portable way to writels
application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41
@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43
The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09
@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20
|
show 8 more comments
Let's consider the following code listing the directory contents of the path given as the first argument to the program:
#include <filesystem>
#include <iostream>
int main(int argc, char **argv)
{
if(argc != 2)
std::cerr << "Please specify a directory.n";
for(auto& p: std::filesystem::directory_iterator(argv[1]))
std::cout << p << 'n';
}
On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).
However, there seem to be a few pitfalls. In particular, the C++ standard does not seem to mandate that the encoding of argv[1]
matches that accepted by std::filesystem::path
constructors nor does it seem to mandate that the encoding returned by std::filesystem::path::string()
matches that accepted by std::cout
.
Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:
The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).
From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type
matches the char
type of argv[1]
(which is true on any POSIX system).
This seems to allow, for example, a conforming implementation in which the execution character set encoding (and hence the encoding of argv[1]
and that accepted by std::cout
) is EBCDIC, but the encoding of strings accepted and provided by the filesystem library is ISO 8859-1, with no conversion performed between the two, making the filesystem library essentially useless. Worse yet, there is no way to figure out if the two encodings are the same or not.
This can even get dangerous if you start to write utilities which delete files and the to be deleted file provided by argv[1]
matches a completely different file when it's interpreted in the native encoding of the filesystem library.
Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.
The u8path()
and u8string()
functions are of no use here either because the standard also provides no way to convert between UTF-8 and the execution character set encoding (used by argv[1]
and std::cout
).
Is there any portable, encoding agnostic and standard compliant way to do this?
c++ character-encoding filesystems c++17 c++-standard-library
Let's consider the following code listing the directory contents of the path given as the first argument to the program:
#include <filesystem>
#include <iostream>
int main(int argc, char **argv)
{
if(argc != 2)
std::cerr << "Please specify a directory.n";
for(auto& p: std::filesystem::directory_iterator(argv[1]))
std::cout << p << 'n';
}
On first sight this seems to be very lean, portable and conforming to the C++ standard (please ignore that it does not catch exceptions if the directory does not exist).
However, there seem to be a few pitfalls. In particular, the C++ standard does not seem to mandate that the encoding of argv[1]
matches that accepted by std::filesystem::path
constructors nor does it seem to mandate that the encoding returned by std::filesystem::path::string()
matches that accepted by std::cout
.
Quite the opposite, the standard seems to introduce the new term "native encoding" which may be different from the execution character set encoding and is defined as:
The native encoding of a narrow character string is the operating
system dependent current encoding for pathnames ([fs.class.path]).
From my reading of the standard no conversion between encodings takes place if std::filesystem::path::value_type
matches the char
type of argv[1]
(which is true on any POSIX system).
This seems to allow, for example, a conforming implementation in which the execution character set encoding (and hence the encoding of argv[1]
and that accepted by std::cout
) is EBCDIC, but the encoding of strings accepted and provided by the filesystem library is ISO 8859-1, with no conversion performed between the two, making the filesystem library essentially useless. Worse yet, there is no way to figure out if the two encodings are the same or not.
This can even get dangerous if you start to write utilities which delete files and the to be deleted file provided by argv[1]
matches a completely different file when it's interpreted in the native encoding of the filesystem library.
Note that I'm not concerned about filesystems using different encodings than those used by programs. My concern is that the standard does not seem to mandate any conversion of those encodings.
The u8path()
and u8string()
functions are of no use here either because the standard also provides no way to convert between UTF-8 and the execution character set encoding (used by argv[1]
and std::cout
).
Is there any portable, encoding agnostic and standard compliant way to do this?
c++ character-encoding filesystems c++17 c++-standard-library
c++ character-encoding filesystems c++17 c++-standard-library
asked Nov 15 '18 at 16:38
ContterContter
312
312
Speaking of standards,ls
will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40
2
Yes, there is no portable way to writels
application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41
@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43
The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09
@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20
|
show 8 more comments
Speaking of standards,ls
will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.
– tadman
Nov 15 '18 at 16:40
2
Yes, there is no portable way to writels
application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.
– SergeyA
Nov 15 '18 at 16:41
@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43
The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09
@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20
Speaking of standards,
ls
will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.– tadman
Nov 15 '18 at 16:40
Speaking of standards,
ls
will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.– tadman
Nov 15 '18 at 16:40
2
2
Yes, there is no portable way to write
ls
application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.– SergeyA
Nov 15 '18 at 16:41
Yes, there is no portable way to write
ls
application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.– SergeyA
Nov 15 '18 at 16:41
@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43
@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43
The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09
The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09
@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20
@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20
|
show 8 more comments
1 Answer
1
active
oldest
votes
No, and this is not just theoretical.
On Windows systems, paths are UTF-16, and path::value_type
is wchar_t
, not the char
you get from char** argv
. This isn't a problem by itself - path
can be created from a char*
. However, not every Windows file name can be expressed as a char*
. Hence the program is unable to list the contents of some directories whose name cannot be expressed as char*
.
Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!
Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53324033%2fis-there-a-standard-conforming-way-to-write-a-portable-ls-utility-in-c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
No, and this is not just theoretical.
On Windows systems, paths are UTF-16, and path::value_type
is wchar_t
, not the char
you get from char** argv
. This isn't a problem by itself - path
can be created from a char*
. However, not every Windows file name can be expressed as a char*
. Hence the program is unable to list the contents of some directories whose name cannot be expressed as char*
.
Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!
Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56
add a comment |
No, and this is not just theoretical.
On Windows systems, paths are UTF-16, and path::value_type
is wchar_t
, not the char
you get from char** argv
. This isn't a problem by itself - path
can be created from a char*
. However, not every Windows file name can be expressed as a char*
. Hence the program is unable to list the contents of some directories whose name cannot be expressed as char*
.
Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!
Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56
add a comment |
No, and this is not just theoretical.
On Windows systems, paths are UTF-16, and path::value_type
is wchar_t
, not the char
you get from char** argv
. This isn't a problem by itself - path
can be created from a char*
. However, not every Windows file name can be expressed as a char*
. Hence the program is unable to list the contents of some directories whose name cannot be expressed as char*
.
Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!
No, and this is not just theoretical.
On Windows systems, paths are UTF-16, and path::value_type
is wchar_t
, not the char
you get from char** argv
. This isn't a problem by itself - path
can be created from a char*
. However, not every Windows file name can be expressed as a char*
. Hence the program is unable to list the contents of some directories whose name cannot be expressed as char*
.
Now you'd think that Linux would be better. That's actually not entirely the case - the bytes you get for a filename can depend on whether you entered them on a keyboard or via TAB completion!
answered Nov 15 '18 at 17:20
MSaltersMSalters
133k8115267
133k8115267
Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56
add a comment |
Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56
Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56
Point taken, but Windows and Linux are non-conforming in this respect anyway. ;-)
– Contter
Nov 15 '18 at 17:56
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53324033%2fis-there-a-standard-conforming-way-to-write-a-portable-ls-utility-in-c%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Speaking of standards,
ls
will show you the current working directory if given no arguments, it won't give you flack for not specifying it. Also if you're working with EBCDIC and C++ together I'm impressed.– tadman
Nov 15 '18 at 16:40
2
Yes, there is no portable way to write
ls
application in C++. Moreover, my experience tells me that there is no portable way to write any complex application in C++ - you will always have to rely on things which are not specified by C++ standard, either directly, or hidden inside third-party libraries like boost. In my opinion, this greatly contrasts C++ with languages like Java.– SergeyA
Nov 15 '18 at 16:41
@SergeyA Yeah, every operating system is free to make up their own rules, and they often do for reasons we'll never be able to properly explain.
– tadman
Nov 15 '18 at 16:43
The root problem is that WG21 doesn't want to rely on POSIX here. Without that, the whole notion of a file name becomes non-portable. Now this can be reasonable; on tiny embedded systems files might be identified by merely a number.
– MSalters
Nov 15 '18 at 17:09
@MSalters I understand the reason but these systems could still exist if the standard provided a way to reliably set and get that number in the execution character set encoding.
– Contter
Nov 15 '18 at 17:20