Here, we cover some basics about the implementation and use of strings (and character arrays) in C++ (version 17). The main focus will be on the standard 1 byte char
type characters.
'a'
. It's default data type is const char
. In order to assign several characters at once for arrays and string objects, string literals are employed, which are const char
type arrays. They are defined by encasing text in double quotation marks, e.g. "This is a string literal."
and can contain any character, besides quotation marks "
and backslashes \
. The size of string literals is the number of characters + 1, since they are always terminated by a special null character '\0'
, indicating the end of the text. Escape Sequence | Description |
---|---|
\\ |
Backslash sign \. |
\" |
Double quotation mark ". |
\' |
Single quotation mark ', only required for character literals. |
\0 |
Terminating null character. Used to signify end of a character string. |
\n |
New-line character. |
\f |
New page character. |
\b |
Backspace character. |
\t |
Horizontal tab character. |
\v |
Vertical tab character. |
\? |
Question mark (see trigraph explanation below). |
#include <iostream> int main() { std::cout << "\\ \" " << '\'' << std::endl //Some special characters. << "1 \t 2" << std::endl //Horizontal tab. << "3 \v 4" << std::endl //Vertical tab. << "5 \f 6" << std::endl //New page character. << "asdf \0 this will not be printed" << std::endl; //Terminating null character. std::cout << "overwrite last sign" << "\b "; //Overwrites n with whitespace. return 0; }Example output:
\ " ' 1 2 3 4 5 6 asdf overwrite last sigParticularly the vertical tab, new page and backspace character may vary in effect from console to console. For mine the first two are equivalent to a new-line character, while the backspace one moves the output cursor to the left – allowing previously printed signs to be overwritten. Other consoles might simply ignore these characters.
"??/"
would be parsed as "\"
, resulting in a compiler error. Other cases might not lead to outright errors, but still be problematic: "Date: ??/??/????"
would be parsed as "\\????"
, yielding the character string \????
.R"delimiter(text goes here)delimiter"
, i.e. an R, followed by the text encapsulated in quotation marks, delimiter terms and parentheses. The delimiter term can be left out or any character sequence up to 16 signs, excluding parentheses, whitespaces and backslashes. Raw string literals can contain any character sequence, as long as it doesn't contain )
followed by the delimiter term, i.e. we could write R"asdf(\(OwO)/""'"??/)asdf"
for the string \(OwO)/""'"??/
, without the escape sequences or trigraph being processed.string
class. They are fundamentally different in implementation:<cstring>
or "string.h"
. One should make sure not to mix those up with the C++ header <string>
. Note that the text end in such arrays is always to be marked, by adding a special \0
character. It is automatically added, when defining character arrays with string literals.<string>
.'\0'
can lead to undefined behavior. The header can be included with the command #include <cstring>
or alternatively #include "string.h"
. The following functions are contained:strlen
std::size_t strlen(const char*);
.'\0'
not included).std::size_t foo = std::strlen("asdf");
(foo is 4).strchr
const char* strchr(const char*, int); char* strchr(char*, int);
.static_cast<char>
(the use of int
instead of char
for standard library functions has its origin in the C language).char *foo = std::strchr("asdfdsa", 'd');
(*foo is 'd', foo pointing to the third character of the string literal).strrchr
const char* strrchr(const char*, int); char* strrchr(char*, int);
.strchr
, except that the function searches for the last occurrence of the character in a C string.char *foo = std::strrchr("asdfdsa", 'd');
(*foo is 'd', foo pointing to the fifth character or the string literal).memchr
const void* memchr(const void*, int, std::size_t); void* memchr(void*, int, std::size_t);
.strchr
, except that the character search isn't restricted by the null character '\0', but only by a specified size argument. The function takes a pointer to an object (which is possibly converted to a void pointer) and searches it for a character, which is specified in the int
argument. This argument is transformed into an unsigned char
. The last argument specifies the amount of characters/bytes that are to be searched for the desired character. If it's bigger than the size of the input pointer's underlying object, this leads to undefined behavior, unless the desired character is found in the object. When the searched character is found, a void
type pointer to it is returned, otherwise a null pointer (NULL
).char *foo = (char*)std::memchr("ab\0cAfsga", 'A', 9);
(foo is a pointer to the 'A' character in the string literal. strchr
would have returned a null pointer).strcmp
int strcmp(const char*, const char*);
.int foo = std::strcmp("/asdf", "*asdf");
(foo is positive, since '/' translates to 47, '*' to 42 and 47>42).strncmp
int strncmp(const char*, const char*, size_t);
.strcmp
, except that it also takes a size variable that specifies up to how many characters are to be compared.int foo = std::strncmp("aaaa", "aaab", 3);
(foo is 0, since the first three characters are equal).memcmp
int memcmp(const void*, const void*, std::size_t);
.strncmp
, except that the comparison doesn't end upon encountering the terminating null character in the input arguments.int foo = std::memcmp("ab\0cd", "ab\0gh", 5);
(foo is negative, since 'c' translates to 99, 'g' to 103 and 99<103. Note that strncmp
would have returned 0).strcoll
int strcoll(const char*, const char*);
.strcmp
, except that it takes into account the collation order, set through the local LC_COLLATE
value (by default, this one is set to a general "C" locale). Returns negative values if the first strings comes before the second, 0 if they are equal and positive ones if it comes after.int foo = std::strcoll("a", "z");
(foo is negative, e.g. -25).strspn
size_t strspn(const char*, const char*);
.std::size_t foo = std::strspn("abaacababa", "ab");
(foo is 4.)strcspn
size_t strspn(const char*, const char*);
.strspn
, except that the span returned is of the characters not contained in the second C string (complementary span).std::size_t foo = std::strcspn("hello world", " ");
(foo is 5.)strpbrk
const char* strpbrk(const char*, const char*); char* strpbrk(char*, const char*);
.const char *foo = std::strpbrk("hello world", " ");
(foo is pointer to " world", *foo is ' ').strstr
const char* strstr(const char*, const char*); char* strstr(char*, const char*);
.strpbrk
, if the second C string is just one character.const char *foo = std::strstr("hello world", "world");
(foo is pointer to "world", *foo is 'w').strtok
char* strtok(char*, const char*);
.NULL
(null pointer declared in the header).#include <iostream> #include <cstring> int main() { char cstring1[] = "---just--a----bunch-of--words--with---dashes"; int len1 = std::strlen(cstring1); //Originally 44 characters long (excluding '\0'). char *token = std::strtok(cstring1, "-"); //First token is "just". Initial dashes are skipped and //one of the dashes after just is replaced by '\0'. while (token) { //token evaluates as true, as long as it isn't NULL (null pointer). std::cout << token; //Loop prints all tokens. token = std::strtok(NULL, "-"); //Call function again, picking up after the last token. } //Note that the original C string is now split up, containing several '\0' characters and lacking some dashes. std::cout << "\n"; for (int i = 0; i < len1; i++) { std::cout << cstring1[i]; } return 0; }Output:
justabunchofwordswithdashes ---just-a---bunchof-words-with--dashesNull characters are not printed, but it's noticeable that some dashes are missing from the original string, since they were replaced by
'\0'
s.
strcpy
char* strcpy(char*, const char*);
.'\0'
and returns a pointer to the newly copied C string. The array has to be big enough for the copied C string, including the null character, and the memory used by the arguments shall not overlap, otherwise the function behavior is undefined. char foo[6]; std::strcpy(foo, "hello");
(foo is "hello").strncpy
char* strncpy(char*, const char*, std::size_t);
.strcpy
, except that the function also takes a size argument, determining the amount of characters to be copied. If the size argument is smaller than the size of the C string to be copied, the null character won't be included, resulting in a malformed C string, unless the first character array already had a terminating null that wasn't overwritten. In the case that the size argument is bigger than the C string to be copied, the remaining characters will be filled with terminating null characters.char foo[6] = "aaaaa"; std::strncpy(foo, "hello", 3);
(foo is "helaa"; not malformed, because it already had a null character).memcpy
void* memcpy(void*, const void*, std::size_t);
.strncpy
, this function copies the object from the second argument to the first one. However, the input is not treated as C strings here, but rather as bytes of (unsigned) characters. Thus, it is unaffected by the terminating null sign and copies as many bytes, as specified in the third input argument. If said size argument is bigger than the objects to be copied or the first object is smaller than the amount of bytes copied, the function behavior is undefined. Other sources of undefined behavior are overlapping memory blocks of the first and second argument and copying of a non-trivially copyable (e.g. if the object has a non-trivial constructor/destructor) object. Lastly, neither input pointers can be null pointers. Just like with the other copy functions, the returned value is a pointer to the object that is copied to (with the notable difference that it is a void pointer, not a char
one).char foo[sizeof("hello\0 world")]; std::memcpy(foo, "hello\0 world", sizeof("hello\0 world"));
(foo is "hello\0 world". Note that the terminating null character doesn't affect the copying process, unlike with strcpy
or strncpy
).memmove
void* memmove(void*, const void*, std::size_t);
.memcpy
. However, this one potentially allows for copying of sub-objects, e.g. overlapping arrays, since the implementation allows for indirect copying of the second argument into a memory block, before moving it to the first argument. Whether indirect copying is actually used, may vary with input, though, depending on necessity and the size of the object to copy. Applying the function on non-trivially copyable or insufficiently big objects (smaller than the 3rd size argument) still results in undefined behavior.char foo[20] = "Hello World"; std::memmove(foo+6, foo, 11);
(foo is "Hello Hello World"; if memcpy
had been used instead, the result would've been "Hello Hello Wollo").strcat
char* strcat(char*, const char*);
.char foo[20]="hello "; std::strcat(foo, "world");
(foo is "hello world").strncat
char* strncat(char*, const char*, std::size_t);
.strcat
, except that the function also has a size argument indicating the maximum amount of concatenated characters. Unlike with strncpy
, no additional null characters are appended, if the specified number of characters to copy is larger than the second C string.char foo[20]="hello "; std::strncat(foo, "world", 2);
(foo is "hello wo").strxfrm
std::size_t strxfrm(char*, const char*, std::size_t);
.strncpy
in the way that it copies a C string from the second argument to the first, with the length being specified in the third argument. However, the copied C string is also transformed to be comparable with strcmp
, without having to use strcoll
, i.e. it accounts for the collation order determined by the LC_COLLATE
value. The returned value is the C string length of the transformed string, excluding the null character. As with the other copy functions, if the first character array is too small to accommodate the second, or if they overlap, the behavior is undefined.char foo[20]; std::strxfrm(foo, "äaÜuöoéeè", 9);
(foo contains "äaÜuöoéeè", possibly transformed according to LC_COLLATE value).memset
void* memset(void*, int, std::size_t);
.int
argument and subsequently converted in the function body. The amount of bytes that shall be written is declared in the size argument. The function will produce undefined behavior, if the size argument is larger than the object pointed to by the void pointer, if the character that shall fill the memory block is contained in said block, or if the object is not trivially copyable (e.g. has a non-trivial constructor/destructor). The returned void pointer is the same as the one from the first input argument.char foo[20] = "hello world"; std::memset(foo, 'a', 5);
(foo is "aaaaa world").strerror
char* strerror(int);
.<errno>
("errno.h"
) can be used. It contains an integer macro named errno
, saving the error number of the latest error. The actual error messages to each number are defined by the locale value LC_MESSAGES
.char err[100]; std::strcpy(err, std::strerror(1));
(err is "Operation not permitted" in the language set by LC_MESSAGES
).NULL
) and the size type size_t
, which is similar to unsigned integers.
string
and can be used with #include <string>
. It contains several string classes for different character types, with sizes 1 Byte (standard char
), 2 Byte, 4 Byte or "wide characters", as well as a class template for implementation of specialized string classes. Conversion and iterator template functions are included, too.string
class is the most commonly used of the C++ string header, it's actually just an instance of the template class basic_string
, which is declared as basic_string< class charT, class traits = char_traits<charT>, class alloc = allocator<charT> >
.typedef
name equivalent):
basic_string<char>
: Covers char
character type strings, defined as the string
class.basic_string<char16_t>
: Class for 2 byte characters char16_t
, with typedef name u16string
.basic_string<char32_t>
: 4 byte character string class, named u32string
.basic_string<wchar_t>
: Wide character class, equivalent to wstring
.string
instantiated versions, not all overloads included) below:size
/ length
std::size_t size() const noexcept;
/ std::size_t length() const noexcept;
. empty
bool empty() const noexcept;
. clear
void clear() noexcept;
. max_size>
std::size_t max_size() const noexcept;
. resize
void resize(std::size_t); void resize(std::size_t, char);
. wstring
's method requires a wchar_t
character.capacity
std::size_t capacity() const noexcept;
. shrink_to_fit
void shrink_to_fit();
. reserve
void reserve (std::size_t);
. shrink_to_fit
command. (This might change in C++20)c_str
/ data
const char* c_str() const noexcept;
/ const char* data() const noexcept;
. const
member functions are called, which invalidates all pointers and references to the old memory address!copy
std::size_t copy (char* dest, std::size_t count, std::size_t position = 0) const;
. find
std::size_t find(const std::string& str, std::size_t position = 0) const noexcept;
std::size_t find(const char* cstr, std::size_t position = 0) const;
std::size_t find(char c, std::size_t position = 0) const noexcept;
. std::size_t find(const char* char_array, std::size_t position, std::size_t count) const;
std::string::npos
is returned, which is the maximum value of std::size_t
(equivalent to -1, since it's an unsigned type).rfind
find
(input argument overloads identical). find
, except that the search begins at the end of the string and the position argument offset is reversed accordingly (e.g. if position = 1, the search starts at the second last character). The value returned is the first character of the last match in the string, or std::string::npos
in case of no occurrence.find_first_of
find
(input argument overloads identical). find
, except that the search is for all characters specified in the input individually, not the term as a whole. For example, an input of "hello" searches for the first occurrence of any of the characters 'h', 'e', 'l', 'o'.find_last_of
find
(input argument overloads identical). find_first_of
.find_first_not_of
find
(input argument overloads identical). find_first_of
, except that it searches for the first character of the string, which is not contained in the input term.find_last_not_of
find
(input argument overloads identical). find_first_not_of
.substr
std::string substr (std::size_t position = 0, std::size_t length = std::string::npos) const;
. std::string::npos
(highest possible value of std::size_t
), i.e. if the function is called with only one argument, the substring contains all characters after the input position offset. When called with no arguments, a copy of the string is returned.compare
int compare(const std::string& str) const noexcept;
int compare(std::size_t position, std::size_t length, const std::string& str) const;
int compare(std::size_t position, std::size_t length, const std::string& str, std::size_t subpos, std::size_t sublen = std::string::npos) const;
int compare(const char* cstr) const;
int compare(std::size_t position, std::size_t length, const char* cstr) const;
int compare(std::size_t position, std::size_t length, const char* arr, std::size_t count) const;
. std::string::npos
for comparison of whole string after offset). If the comparison is with another string, two more arguments setting the offset and length for it can be specified.operator==
, operator!=
, operator<
etc. (Relational operators)bool operator== (const std::string& str1, const std::string& str2) noexcept;
bool operator== (const char* cstr, const std::string& str);
bool operator== (const std::string& str, const char* cstr);
, etc. .compare
and returning the relation value as a bool
.push_back
void push_back(char);
. pop_back
void pop_back();
. append
std::string& append(const std::string& str);
std::string& append(const std::string& substr, std::size_t offset, std::size_t length = std::string::npos);
std::string& append(const char* cstr);
std::string& append(const char* arr, std::size_t count);
std::string& append(std::size_t count, char c)
.operator+=
std::string& operator+= (const std::string& str);
std::string& operator+= (const char* cstr);
std::string& operator+= (char c);
.operator+
std::string operator+ (const std::string& str1, const std::string& str2);
std::string operator+ (const std::string& str, const char* cstr);
std::string operator+ (const char* cstr, const std::string& str);
std::string operator+ (const std::string& str, char c);
std::string operator+ (char c, const std::string& str);
.assign
append
(input argument overloads identical). insert
std::string& insert(std::size_t position, const std::string& str);
std::string& insert(std::size_t position, const std::string& substr, std::size_t offset, std::size_t length = std::string::npos);
std::string& insert(std::size_t position, const char* cstr);
std::string& insert(std::size_t position, const char* arr, std::size_t count);
std::string& insert(std::size_t position, std::size_t count, char c);
. erase
std::string& erase (std::size_t offset = 0, std::size_t count = std::string::npos);
. *this
).replace
std::string& replace(std::size_t position, std::size_t count, const std::string& str);
std::string& replace(std::size_t position, std::size_t count, const std::string& str, std::size_t offset, std::size_t length = std::string::npos);
std::string& replace(std::size_t position, std::size_t count, const char* cstr);
std::string& replace(std::size_t position, std::size_t count, const char* arr, std::size_t arr_length);
std::string& replace(std::size_t position, std::size_t count, std::size_t repeat_count, char c);
. *this
.swap
void swap(std::string&);
. std::swap
.front
char& front(); const char& front() const;
. const
if the string is const
. Calling this member function with an empty string leads to undefined behavior.back
char& back(); const char& back() const;
. front
, except that it returns a reference to the last string character.at
char& at(std::size_t position); const char& at(std::size_t position) const;
. front
, except that it returns a reference to the character at "position" (0 being the first character).operator[]
char& operator[] (std::size_t position); const char& operator[] (std::size_t position) const;
.at
.operator<<
std::ostream& operator<< (std::ostream&, const std::string&);
.ostream
object.operator>>
std::istream& operator>> (std::istream&, std::string&);
.getline
can be used (see below).getline
istream& getline (istream& is, string& str);
std::istream& getline (std::istream&, string&, char);
.'\n'
. Another delimiter can be chosen in a third char
argument.end()
) and shall not be dereferenced.
Name | Starting point | Increment direction | Constant iterator |
---|---|---|---|
begin |
First character | Normal | No |
end |
Theoretical character after the last character | Normal | No |
rbegin |
Last character | Reversed | No |
rend |
Theoretical character before the first character | Reversed | No |
cbegin |
First character | Normal | Yes |
cend |
Theoretical character after the last character | Normal | Yes |
crbegin |
Last character | Reversed | Yes |
crend |
Theoretical character before the first character | Reversed | Yes |