Here, we cover some basics about the implementation and use of strings (and character arrays) in C++ (version 17). The main focus will be on the standard 1 byte char type characters.
'a'. It's default data type is const char. In order to assign several characters at once for arrays and string objects, string literals are employed, which are const char type arrays. They are defined by encasing text in double quotation marks, e.g. "This is a string literal." and can contain any character, besides quotation marks " and backslashes \. The size of string literals is the number of characters + 1, since they are always terminated by a special null character '\0', indicating the end of the text. | Escape Sequence | Description |
|---|---|
\\ |
Backslash sign \. |
\" |
Double quotation mark ". |
\' |
Single quotation mark ', only required for character literals. |
\0 |
Terminating null character. Used to signify end of a character string. |
\n |
New-line character. |
\f |
New page character. |
\b |
Backspace character. |
\t |
Horizontal tab character. |
\v |
Vertical tab character. |
\? |
Question mark (see trigraph explanation below). |
#include <iostream>
int main()
{
std::cout << "\\ \" " << '\'' << std::endl //Some special characters.
<< "1 \t 2" << std::endl //Horizontal tab.
<< "3 \v 4" << std::endl //Vertical tab.
<< "5 \f 6" << std::endl //New page character.
<< "asdf \0 this will not be printed" << std::endl; //Terminating null character.
std::cout << "overwrite last sign" << "\b "; //Overwrites n with whitespace.
return 0;
}
Example output:
\ " ' 1 2 3 4 5 6 asdf overwrite last sigParticularly the vertical tab, new page and backspace character may vary in effect from console to console. For mine the first two are equivalent to a new-line character, while the backspace one moves the output cursor to the left – allowing previously printed signs to be overwritten. Other consoles might simply ignore these characters.
"??/" would be parsed as "\", resulting in a compiler error. Other cases might not lead to outright errors, but still be problematic: "Date: ??/??/????" would be parsed as "\\????", yielding the character string \????.R"delimiter(text goes here)delimiter", i.e. an R, followed by the text encapsulated in quotation marks, delimiter terms and parentheses. The delimiter term can be left out or any character sequence up to 16 signs, excluding parentheses, whitespaces and backslashes. Raw string literals can contain any character sequence, as long as it doesn't contain ) followed by the delimiter term, i.e. we could write R"asdf(\(OwO)/""'"??/)asdf" for the string \(OwO)/""'"??/, without the escape sequences or trigraph being processed.string class. They are fundamentally different in implementation:<cstring> or "string.h". One should make sure not to mix those up with the C++ header <string>. Note that the text end in such arrays is always to be marked, by adding a special \0 character. It is automatically added, when defining character arrays with string literals.<string>.'\0' can lead to undefined behavior. The header can be included with the command #include <cstring> or alternatively #include "string.h". The following functions are contained:strlen std::size_t strlen(const char*);.'\0' not included).std::size_t foo = std::strlen("asdf"); (foo is 4).strchr const char* strchr(const char*, int); char* strchr(char*, int);.static_cast<char> (the use of int instead of char for standard library functions has its origin in the C language).char *foo = std::strchr("asdfdsa", 'd'); (*foo is 'd', foo pointing to the third character of the string literal).strrchr const char* strrchr(const char*, int); char* strrchr(char*, int);.strchr, except that the function searches for the last occurrence of the character in a C string.char *foo = std::strrchr("asdfdsa", 'd'); (*foo is 'd', foo pointing to the fifth character or the string literal).memchr const void* memchr(const void*, int, std::size_t); void* memchr(void*, int, std::size_t);.strchr, except that the character search isn't restricted by the null character '\0', but only by a specified size argument. The function takes a pointer to an object (which is possibly converted to a void pointer) and searches it for a character, which is specified in the int argument. This argument is transformed into an unsigned char. The last argument specifies the amount of characters/bytes that are to be searched for the desired character. If it's bigger than the size of the input pointer's underlying object, this leads to undefined behavior, unless the desired character is found in the object. When the searched character is found, a void type pointer to it is returned, otherwise a null pointer (NULL).char *foo = (char*)std::memchr("ab\0cAfsga", 'A', 9); (foo is a pointer to the 'A' character in the string literal. strchr would have returned a null pointer).strcmp int strcmp(const char*, const char*);.int foo = std::strcmp("/asdf", "*asdf"); (foo is positive, since '/' translates to 47, '*' to 42 and 47>42).strncmp int strncmp(const char*, const char*, size_t);.strcmp, except that it also takes a size variable that specifies up to how many characters are to be compared.int foo = std::strncmp("aaaa", "aaab", 3); (foo is 0, since the first three characters are equal).memcmp int memcmp(const void*, const void*, std::size_t);.strncmp, except that the comparison doesn't end upon encountering the terminating null character in the input arguments.int foo = std::memcmp("ab\0cd", "ab\0gh", 5); (foo is negative, since 'c' translates to 99, 'g' to 103 and 99<103. Note that strncmp would have returned 0).strcoll int strcoll(const char*, const char*);.strcmp, except that it takes into account the collation order, set through the local LC_COLLATE value (by default, this one is set to a general "C" locale). Returns negative values if the first strings comes before the second, 0 if they are equal and positive ones if it comes after.int foo = std::strcoll("a", "z"); (foo is negative, e.g. -25).strspn size_t strspn(const char*, const char*);.std::size_t foo = std::strspn("abaacababa", "ab"); (foo is 4.)strcspn size_t strspn(const char*, const char*);.strspn, except that the span returned is of the characters not contained in the second C string (complementary span).std::size_t foo = std::strcspn("hello world", " "); (foo is 5.)strpbrk const char* strpbrk(const char*, const char*); char* strpbrk(char*, const char*);.const char *foo = std::strpbrk("hello world", " "); (foo is pointer to " world", *foo is ' ').strstr const char* strstr(const char*, const char*); char* strstr(char*, const char*);.strpbrk, if the second C string is just one character.const char *foo = std::strstr("hello world", "world"); (foo is pointer to "world", *foo is 'w').strtok char* strtok(char*, const char*);.NULL (null pointer declared in the header).#include <iostream>
#include <cstring>
int main()
{
char cstring1[] = "---just--a----bunch-of--words--with---dashes";
int len1 = std::strlen(cstring1); //Originally 44 characters long (excluding '\0').
char *token = std::strtok(cstring1, "-"); //First token is "just". Initial dashes are skipped and
//one of the dashes after just is replaced by '\0'.
while (token) { //token evaluates as true, as long as it isn't NULL (null pointer).
std::cout << token; //Loop prints all tokens.
token = std::strtok(NULL, "-"); //Call function again, picking up after the last token.
}
//Note that the original C string is now split up, containing several '\0' characters and lacking some dashes.
std::cout << "\n";
for (int i = 0; i < len1; i++) {
std::cout << cstring1[i];
}
return 0;
}
Output:
justabunchofwordswithdashes ---just-a---bunchof-words-with--dashesNull characters are not printed, but it's noticeable that some dashes are missing from the original string, since they were replaced by
'\0's.
strcpy char* strcpy(char*, const char*);.'\0' and returns a pointer to the newly copied C string. The array has to be big enough for the copied C string, including the null character, and the memory used by the arguments shall not overlap, otherwise the function behavior is undefined. char foo[6]; std::strcpy(foo, "hello"); (foo is "hello").strncpy char* strncpy(char*, const char*, std::size_t);.strcpy, except that the function also takes a size argument, determining the amount of characters to be copied. If the size argument is smaller than the size of the C string to be copied, the null character won't be included, resulting in a malformed C string, unless the first character array already had a terminating null that wasn't overwritten. In the case that the size argument is bigger than the C string to be copied, the remaining characters will be filled with terminating null characters.char foo[6] = "aaaaa"; std::strncpy(foo, "hello", 3); (foo is "helaa"; not malformed, because it already had a null character).memcpy void* memcpy(void*, const void*, std::size_t);.strncpy, this function copies the object from the second argument to the first one. However, the input is not treated as C strings here, but rather as bytes of (unsigned) characters. Thus, it is unaffected by the terminating null sign and copies as many bytes, as specified in the third input argument. If said size argument is bigger than the objects to be copied or the first object is smaller than the amount of bytes copied, the function behavior is undefined. Other sources of undefined behavior are overlapping memory blocks of the first and second argument and copying of a non-trivially copyable (e.g. if the object has a non-trivial constructor/destructor) object. Lastly, neither input pointers can be null pointers. Just like with the other copy functions, the returned value is a pointer to the object that is copied to (with the notable difference that it is a void pointer, not a char one).char foo[sizeof("hello\0 world")]; std::memcpy(foo, "hello\0 world", sizeof("hello\0 world")); (foo is "hello\0 world". Note that the terminating null character doesn't affect the copying process, unlike with strcpy or strncpy).memmove void* memmove(void*, const void*, std::size_t);.memcpy. However, this one potentially allows for copying of sub-objects, e.g. overlapping arrays, since the implementation allows for indirect copying of the second argument into a memory block, before moving it to the first argument. Whether indirect copying is actually used, may vary with input, though, depending on necessity and the size of the object to copy. Applying the function on non-trivially copyable or insufficiently big objects (smaller than the 3rd size argument) still results in undefined behavior.char foo[20] = "Hello World"; std::memmove(foo+6, foo, 11); (foo is "Hello Hello World"; if memcpy had been used instead, the result would've been "Hello Hello Wollo").strcat char* strcat(char*, const char*);.char foo[20]="hello "; std::strcat(foo, "world"); (foo is "hello world").strncat char* strncat(char*, const char*, std::size_t);.strcat, except that the function also has a size argument indicating the maximum amount of concatenated characters. Unlike with strncpy, no additional null characters are appended, if the specified number of characters to copy is larger than the second C string.char foo[20]="hello "; std::strncat(foo, "world", 2); (foo is "hello wo").strxfrm std::size_t strxfrm(char*, const char*, std::size_t);.strncpy in the way that it copies a C string from the second argument to the first, with the length being specified in the third argument. However, the copied C string is also transformed to be comparable with strcmp, without having to use strcoll, i.e. it accounts for the collation order determined by the LC_COLLATE value. The returned value is the C string length of the transformed string, excluding the null character. As with the other copy functions, if the first character array is too small to accommodate the second, or if they overlap, the behavior is undefined.char foo[20]; std::strxfrm(foo, "äaÜuöoéeè", 9); (foo contains "äaÜuöoéeè", possibly transformed according to LC_COLLATE value).memset void* memset(void*, int, std::size_t);.int argument and subsequently converted in the function body. The amount of bytes that shall be written is declared in the size argument. The function will produce undefined behavior, if the size argument is larger than the object pointed to by the void pointer, if the character that shall fill the memory block is contained in said block, or if the object is not trivially copyable (e.g. has a non-trivial constructor/destructor). The returned void pointer is the same as the one from the first input argument.char foo[20] = "hello world"; std::memset(foo, 'a', 5); (foo is "aaaaa world").strerror char* strerror(int);.<errno> ("errno.h") can be used. It contains an integer macro named errno, saving the error number of the latest error. The actual error messages to each number are defined by the locale value LC_MESSAGES.char err[100]; std::strcpy(err, std::strerror(1)); (err is "Operation not permitted" in the language set by LC_MESSAGES).NULL) and the size type size_t, which is similar to unsigned integers.
string and can be used with #include <string>. It contains several string classes for different character types, with sizes 1 Byte (standard char), 2 Byte, 4 Byte or "wide characters", as well as a class template for implementation of specialized string classes. Conversion and iterator template functions are included, too.string class is the most commonly used of the C++ string header, it's actually just an instance of the template class basic_string, which is declared as basic_string< class charT, class traits = char_traits<charT>, class alloc = allocator<charT> >.typedef name equivalent):
basic_string<char>: Covers char character type strings, defined as the string class.basic_string<char16_t>: Class for 2 byte characters char16_t, with typedef name u16string.basic_string<char32_t>: 4 byte character string class, named u32string.basic_string<wchar_t>: Wide character class, equivalent to wstring.string instantiated versions, not all overloads included) below:size / lengthstd::size_t size() const noexcept; / std::size_t length() const noexcept;. emptybool empty() const noexcept;. clearvoid clear() noexcept;. max_size>std::size_t max_size() const noexcept;. resizevoid resize(std::size_t); void resize(std::size_t, char);. wstring's method requires a wchar_t character.capacitystd::size_t capacity() const noexcept;. shrink_to_fitvoid shrink_to_fit();. reservevoid reserve (std::size_t);. shrink_to_fit command. (This might change in C++20)c_str / dataconst char* c_str() const noexcept; / const char* data() const noexcept;. const member functions are called, which invalidates all pointers and references to the old memory address!copystd::size_t copy (char* dest, std::size_t count, std::size_t position = 0) const;. findstd::size_t find(const std::string& str, std::size_t position = 0) const noexcept;std::size_t find(const char* cstr, std::size_t position = 0) const;std::size_t find(char c, std::size_t position = 0) const noexcept;. std::size_t find(const char* char_array, std::size_t position, std::size_t count) const;std::string::npos is returned, which is the maximum value of std::size_t (equivalent to -1, since it's an unsigned type).rfindfind (input argument overloads identical). find, except that the search begins at the end of the string and the position argument offset is reversed accordingly (e.g. if position = 1, the search starts at the second last character). The value returned is the first character of the last match in the string, or std::string::npos in case of no occurrence.find_first_offind (input argument overloads identical). find, except that the search is for all characters specified in the input individually, not the term as a whole. For example, an input of "hello" searches for the first occurrence of any of the characters 'h', 'e', 'l', 'o'.find_last_offind (input argument overloads identical). find_first_of.find_first_not_offind (input argument overloads identical). find_first_of, except that it searches for the first character of the string, which is not contained in the input term.find_last_not_offind (input argument overloads identical). find_first_not_of.substrstd::string substr (std::size_t position = 0, std::size_t length = std::string::npos) const;. std::string::npos (highest possible value of std::size_t), i.e. if the function is called with only one argument, the substring contains all characters after the input position offset. When called with no arguments, a copy of the string is returned.compareint compare(const std::string& str) const noexcept;int compare(std::size_t position, std::size_t length, const std::string& str) const;int compare(std::size_t position, std::size_t length, const std::string& str, std::size_t subpos, std::size_t sublen = std::string::npos) const;int compare(const char* cstr) const;int compare(std::size_t position, std::size_t length, const char* cstr) const;int compare(std::size_t position, std::size_t length, const char* arr, std::size_t count) const;. std::string::npos for comparison of whole string after offset). If the comparison is with another string, two more arguments setting the offset and length for it can be specified.operator==, operator!=, operator< etc. (Relational operators)bool operator== (const std::string& str1, const std::string& str2) noexcept;bool operator== (const char* cstr, const std::string& str);bool operator== (const std::string& str, const char* cstr);, etc. .compare and returning the relation value as a bool.push_backvoid push_back(char);. pop_backvoid pop_back();. appendstd::string& append(const std::string& str);std::string& append(const std::string& substr, std::size_t offset, std::size_t length = std::string::npos);std::string& append(const char* cstr);std::string& append(const char* arr, std::size_t count);std::string& append(std::size_t count, char c).operator+=std::string& operator+= (const std::string& str);std::string& operator+= (const char* cstr);std::string& operator+= (char c);.operator+std::string operator+ (const std::string& str1, const std::string& str2);std::string operator+ (const std::string& str, const char* cstr);std::string operator+ (const char* cstr, const std::string& str);std::string operator+ (const std::string& str, char c);std::string operator+ (char c, const std::string& str);.assignappend (input argument overloads identical). insertstd::string& insert(std::size_t position, const std::string& str);std::string& insert(std::size_t position, const std::string& substr, std::size_t offset, std::size_t length = std::string::npos);std::string& insert(std::size_t position, const char* cstr);std::string& insert(std::size_t position, const char* arr, std::size_t count);std::string& insert(std::size_t position, std::size_t count, char c);. erasestd::string& erase (std::size_t offset = 0, std::size_t count = std::string::npos);. *this).replacestd::string& replace(std::size_t position, std::size_t count, const std::string& str);std::string& replace(std::size_t position, std::size_t count, const std::string& str, std::size_t offset, std::size_t length = std::string::npos);std::string& replace(std::size_t position, std::size_t count, const char* cstr);std::string& replace(std::size_t position, std::size_t count, const char* arr, std::size_t arr_length);std::string& replace(std::size_t position, std::size_t count, std::size_t repeat_count, char c);. *this.swapvoid swap(std::string&);. std::swap.frontchar& front(); const char& front() const;. const if the string is const. Calling this member function with an empty string leads to undefined behavior.backchar& back(); const char& back() const;. front, except that it returns a reference to the last string character.atchar& at(std::size_t position); const char& at(std::size_t position) const;. front, except that it returns a reference to the character at "position" (0 being the first character).operator[]char& operator[] (std::size_t position); const char& operator[] (std::size_t position) const;.at.operator<<std::ostream& operator<< (std::ostream&, const std::string&);.ostream object.operator>>std::istream& operator>> (std::istream&, std::string&);.getline can be used (see below).getlineistream& getline (istream& is, string& str);
std::istream& getline (std::istream&, string&, char);.'\n'. Another delimiter can be chosen in a third char argument.end()) and shall not be dereferenced.
| Name | Starting point | Increment direction | Constant iterator |
|---|---|---|---|
begin |
First character | Normal | No |
end |
Theoretical character after the last character | Normal | No |
rbegin |
Last character | Reversed | No |
rend |
Theoretical character before the first character | Reversed | No |
cbegin |
First character | Normal | Yes |
cend |
Theoretical character after the last character | Normal | Yes |
crbegin |
Last character | Reversed | Yes |
crend |
Theoretical character before the first character | Reversed | Yes |