• Tidak ada hasil yang ditemukan

Conversion of Strings and Numbers

Dalam dokumen GAWK: Effective AWK Programming (Halaman 139-142)

6.1 Constants, Variables, and Conversions

6.1.4 Conversion of Strings and Numbers

Number-to-string and string-to-number conversion are generally straightforward. There can be subtleties to be aware of; this section discusses this important facet of awk.

6.1.4.1 How awk Converts Between Strings and Numbers

Strings are converted to numbers and numbers are converted to strings, if the context of the awk program demands it. For example, if the value of eitherfoo or barin the expression

‘foo + bar’ happens to be a string, it is converted to a number before the addition is performed. If numeric values appear in string concatenation, they are converted to strings.

Consider the following:

two = 2; three = 3 print (two three) + 4

This prints the (numeric) value 27. The numeric values of the variablestwoand threeare converted to strings and concatenated together. The resulting string is converted back to the number 23, to which 4 is then added.

If, for some reason, you need to force a number to be converted to a string, concatenate that number with the empty string,"". To force a string to be converted to a number, add zero to that string. A string is converted to a number by interpreting any numeric prefix of the string as numerals: "2.5"converts to 2.5,"1e3"converts to 1,000, and"25fix"has a numeric value of 25. Strings that can’t be interpreted as valid numbers convert to zero.

The exact manner in which numbers are converted into strings is controlled by the awk predefined variableCONVFMT (see Section 7.5 [Predefined Variables], page 157). Num- bers are converted using thesprintf() function withCONVFMTas the format specifier (see Section 9.1.3 [String-Manipulation Functions], page 189).

CONVFMT’s default value is "%.6g", which creates a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, 17 digits is usually enough to capture a floating-point number’s value exactly.2

Strange results can occur if you set CONVFMTto a string that doesn’t tellsprintf()how to format floating-point numbers in a useful way. For example, if you forget the ‘%’ in the format, awkconverts all numbers to the same constant string.

As a special case, if a number is an integer, then the result of converting it to a string is always an integer, no matter what the value ofCONVFMT may be. Given the following code fragment:

CONVFMT = "%2.2f"

a = 12 b = a ""

bhas the value "12", not"12.00".

Pre-POSIXawk UsedOFMT for String Conversion

Prior to the POSIX standard, awk used the value of OFMT for converting numbers to strings. OFMTspecifies the output format to use when printing numbers withprint. CONVFMT was introduced in order to separate the semantics of conversion from the semantics of printing. BothCONVFMTandOFMThave the same default value: "%.6g". In the vast majority of cases, old awk programs do not change their behavior. See Section 5.1 [The print Statement], page 93,for more information on the printstatement.

6.1.4.2 Locales Can Influence Conversion

Where you are can matter when it comes to converting between numbers and strings. The local character set and language—thelocale—can affect numeric formats. In particular, for awkprograms, it affects the decimal point character and the thousands-separator character.

The "C" locale, and most English-language locales, use the period character (‘.’) as the

2 Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.

decimal point and don’t have a thousands separator. However, many (if not most) European and non-English locales use the comma (‘,’) as the decimal point character. European locales often use either a space or a period as the thousands separator, if they have one.

The POSIX standard says that awk always uses the period as the decimal point when reading the awk program source code, and for command-line variable assignments (see Section 2.3 [Other Command-Line Arguments], page 38). However, when interpreting input data, forprint andprintfoutput, and for number-to-string conversion, the local decimal point character is used. In all cases, numbers in source code and in input data cannot have a thousands separator. Here are some examples indicating the difference in behavior, on a GNU/Linux system:

$ export POSIXLY_CORRECT=1 Force POSIX behavior

$ gawk 'BEGIN { printf "%g\n", 3.1415927 }' a 3.14159

$ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }' a 3,14159

$ echo 4,321 | gawk '{ print $1 + 1 }' a 5

$ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }' a 5,321

The en_DK.utf-8locale is for English in Denmark, where the comma acts as the decimal point separator. In the normal "C" locale, gawk treats ‘4,321’ as 4, while in the Danish locale, it’s treated as the full number including the fractional part, 4.321.

Some earlier versions of gawk fully complied with this aspect of the standard. How- ever, many users in non-English locales complained about this behavior, because their data used a period as the decimal point, so the default behavior was restored to use a period as the decimal point character. You can use the --use-lc-numeric option (see Section 2.2 [Command-Line Options], page 31) to force gawk to use the locale’s decimal point charac- ter. (gawk also uses the locale’s decimal point character when in POSIX mode, either via --posix or thePOSIXLY_CORRECTenvironment variable, as shown previously.)

Table 6.1 describes the cases in which the locale’s decimal point character is used and when a period is used. Some of these features have not been described yet.

Feature Default --posixor --use-lc-numeric

%'g Use locale Use locale

%g Use period Use locale

Input Use period Use locale

strtonum() Use period Use locale Table 6.1: Locale decimal point versus a period

Finally, modern-day formal standards and the IEEE standard floating-point representa- tion can have an unusual but important effect on the waygawk converts some special string values to numbers. The details are presented in Section 16.7 [Standards Versus Existing Practice], page 378.

Dalam dokumen GAWK: Effective AWK Programming (Halaman 139-142)