C Strings

C Strings.

Intro.

We have used C strings in the past every time we wrote a literal like "Alice" or "Hello World", for example in a program like this:

#include <iostream>

using namespace std;

int main () {
  cout << "Hello!\n";

  int x = 15;
  cout << "x = " << x << endl;
}
Hello!
x = 15

C string is an array of type char terminated by '\0'.

The C string is a simple datatype inherited from the C language, and it is simply an array of characters. For example, a C string variable can be can be defined as follows:


char str[15] = "Some text";

To mark the end of the text in the array, the meaningful text is followed by the so called null-character '\0', it is added in the array automatically by the compiler, so in the memory this string looks like the following sequence of characters:

  {'S', 'o', 'm', 'e', ' ', 't', 'e', 'x', 't', '\0', ...}

Because of this obligatory zero-character '\0', C strings are also called Null-terminated strings.

Even though the string str defined above is declared as an array of 15 elements, only the first 9 elements store the actual text, then they are followed by '\0'. When you print this string, for example using cout:

  cout << str << endl;

because of the null-character, the program knows where there is the end of the string, and so it knows where it has to stop (it prints the characters one by one until it reaches '\0').

What is the minimum number of characters to store a string of length n? It requires n+1 characters. One additional character is necessary to store the null-character. For example, to store "Internationalization", a string of 20 letters, it would require an array of length 21 or greater:

char s[21] = "Internationalization"; // 20 chracters + 1 to store '\0'

String initialization

The following initializations are absolutely equivalent:

  char s1[8] = "Hello";                         // 6 initializers
  char s2[8] = {'H', 'e', 'l', 'l', 'o', '\0'}; // 6 initializers

Notice that the same final result will be obtained in these cases as well:

  char s3[8] = {'H', 'e', 'l', 'l', 'o'};       // only 5 initializers
  char s2[8] = {'H', 'e', 'l', 'l', 'o', '\0', '\0', '\0'}; // 8 initializers
#include <iostream>

using namespace std;

int main () {

  // Correct:
  char s1[10] = "Green";                           // ok
  char s2[10] = { 'G', 'r', 'e', 'e', 'n', '\0' }; // ok
  char s3[10] = { 'G', 'r', 'e', 'e', 'n' };       // ok, but be careful
  char s4[6]  = "Green";                           // ok
  char s5[]   = "Green";                           // ok, auto length 6
 
  // Incorrect:
  char s6[5]  = "Green";                           // compile time error
  char s7[5]  = { 'G', 'r', 'e', 'e', 'n', '\0' }; // compile time error
  char s8[5]  = { 'G', 'r', 'e', 'e', 'n' };       // no '\0', a bug
  char s9[]   = { 'G', 'r', 'e', 'e', 'n' };       // no '\0', a bug

}

Some standard library functions for C strings.

To use these functions, you need to include the header:

#include <cstring>

See the book for some other functions.

Reading C strings from cin.

The difference between >> and getline.

  char s1[20];
  char s2[20];

  cin >> s1;   // as usual for the >> operator, it skips leading 
               // whitespace characters and reads the string 
               // until the next whitespace is reached, the result is
               // stored in s1.
               //
               // not safe and may write out of bounds of the array

  int n = 20;

  cin.getline(s2, n); // reads all characters until the end of the line 
                      // and stores (n-1) characters + '\0' in the string s2
                     

Reading command line arguments. Using argc and argv parameters.

Example:

The following datafiles “cities.dat” contains the list of cities in the following format: Index, City, State, Popiulation, Area (in sq. miles):

1 NewYork       NewYork       8175133  302.6
2 LosAngeles    California    3792621  468.7
3 Chicago       Illinois      2695598  227.6
4 Houston       Texas         2100263  599.6
5 Philadelphia  Pennsylvania  1526006  134.1
6 Phoenix       Arizona       1445632  516.7
7 SanAntonio    Texas         1327407  460.9
8 SanDiego      California    1307402  325.2
9 Dallas        Texas         1197816  340.5

The following program requires two command line arguments: the number of cities to read (N) and the file name. It print N first cities from the file: their name + their population density.

/*
  This program reads two command line arguments,
  the number of cities to print N, and the file name.

  It reads the datafile with the list of cities,
  and prints out N first cities, together with their 
  population densities
*/ 

#include <iostream>
#include <cstring>
#include <cstdlib>
#include <fstream>

using namespace std;
int main (int argc, char* argv[]) {

  // Exit if the number of arguments is not correct,
  // print the error message and a correct usage example
  if (argc != 3) {
    cout << "Usage: ./density N file" << endl;
    exit(1);
  }
  
  // n = the number of cities to read.
  // we convert the C string to an integer
  int n = atoi(argv[1]);

  // open the file for reading
  fstream fin;
  fin.open(argv[2]);

  if (fin.fail()) {
    cout << "Usage: ./density N file" << endl;
    exit(1);
  }

  for (int i = 0; i< n; i++) {
    int index;
    char city[100];
    char state[100];
    int population;
    double area;
    char t[100];
    fin >> index;
    fin >> city;
    fin >> state;
    fin >> t; population = atoi(t);
    fin >> t; area = atof(t);

    double density = population / area;
    cout << city << "  " << density << endl;
  }

  fin.close();
}
./density 5 cities.dat 
NewYork  27016.3
LosAngeles  8091.79
Chicago  11843.6
Houston  3502.77
Philadelphia  11379.6