• Tidak ada hasil yang ditemukan

INITCAP

Dalam dokumen Buku Oracle PL/SQL (Third Edtion) (Halaman 172-178)

The INITCAP function reformats the case of the string argument, setting the first letter of each word to uppercase and the remainder of the letters to lowercase. A word is a set of characters separated by a space or nonalphanumeric character (such as # or _ ). The specification of INITCAP is:

FUNCTION INITCAP (string_in IN VARCHAR2) RETURN VARCHAR2

Here are some examples of the impact of INITCAP on your strings:

• Shift all lowercase to mixed case:

INITCAP ('this is lower') --> 'This Is Lower'

• Shift all uppercase to mixed case:

INITCAP ('BIG>AND^TALL') --> 'Big>And^Tall'

• Shift a confusing blend of cases to consistent initcap format:

INITCAP ('wHatISthis_MESS?') --> 'Whatisthis_Mess?'

• Create Visual Basic -style variable names (I use REPLACE, explained later, to strip out the embedded spaces):

• REPLACE (INITCAP ('ALMOST_UNREADABLE_VAR_NAME'), '_', NULL)

--> 'AlmostUnreadableVarName'

When and why would you use INITCAP? Many Oracle shops like to store all character string data, such as names and addresses, in uppercase for consistency. This practice makes it easier to search for records that match certain criteria.

The problem with storing all the data in uppercase is that, while it is a convenient "machine format," it is not particularly readable or presentable. How easy is it to scan a page of information that looks like the following?

CUSTOMER TRACKING LIST - GENERATED ON 12-MAR-1994

LAST PAYMENT WEDNESDAY: PAUL JOHNSON, 123 MADISON AVE -

$1200

LAST PAYMENT MONDAY: HARRY SIMMERSON, 555 AXELROD RD - $1500

It is hard for the eye to pick out the individual words and different types of information; all that text just blends together. Furthermore, solid uppercase has a "machine" or even "mainframe" feel to it; you'd never actually type it that way. A mixture of upper- and lowercase can make your output much more readable and friendly in appearance:

Customer Tracking List - Generated On 12-Mar-1994

Last Payment Wednesday: Paul Johnson, 123 Madison Ave -

$1200

Last Payment Monday: Harry Simmerson, 555 Axelrod Rd - $1500

Can you see any problems with using INITCAP to format output? There are a couple of drawbacks to the way it works. First, as with the string "BIG AND TALL", INITCAP is not very useful for generating titles because it doesn't know that little words like "and" and "the" should not be capitalized. That is a relatively minor problem compared with the second one: INITCAP is completely ignorant of real-world surname conventions. Names with internal capital letters, in particular, cannot be generated with INITCAP. Consider the following example in which the "D"

in "McDonald's" ends up in lowercase.

INITCAP ('HAMBURGERS BY THE BILLIONS AT MCDONALDS') --> 'Hamburgers By The Billions At Mcdonalds'

For these reasons, use INITCAP with caution when printing reports or displaying data. The information it produces may not always be formatted correctly.

INSTR, INSTRB, INSTRC, INSTR2, and INSTR4

The INSTR family of functions allow you to search a string to find a match for a substring. If the substring is found, the functions return the position, in the source string, of the first character of the substring. If there is no match, then the functions return 0.

The five INSTR functions differ only in terms of how they look at the string and substring:

INSTR

Strings consist of characters. The return value indicates the character position at which the substring is found.

INSTRB

Strings consist ofbytes. The return value indicates the byte position at which the substring is found.

INSTRC

Strings consist of Unicode characters. Decomposed Unicode characters are recognized (e.g., a\0303 is recognized as being the same as \00E3 or ã).

INSTR2

Looks at strings in terms of Unicode code units.

INSTR4

Looks at strings in terms of Unicode code points.

All of the INSTR functions share the same specification:

FUNCTION INSTR

(string1 IN VARCHAR2, string2 IN VARCHAR2

[,start_position IN NUMBER := 1 [, nth_appearance IN NUMBER := 1]]) RETURN NUMBER

where string1 is the string searched by INSTR for the position in which the nth_appearance of string2 is found. The start_position parameter is the character (not byte) position in the string where the search will start. It is optional and defaults to 1 (the beginning of string1). The nth_appearance parameter is also optional and defaults to 1.

Both the start_position and the nth_appearance parameters can be literals (like 5 or 157), variables, or complex expressions, as follows:

INSTR (company_name, 'INC', (last_location + 5) * 10)

If start_position is negative, then INSTR counts back start_position number of characters from the end of the string, and then searches from that point toward the beginning of the string for the nth match. Figure 8-2 illustrates the two directions in which INSTR searches, depending on whether the start_position parameter is positive or negative.

Figure 8-2. Forward and reverse searches with INSTR

We have found INSTR to be a very handy function, especially when used to the fullest extent possible. Most programmers make use of (and may only be aware of) only the first two parameters. Use INSTR to search from the end of the string? Search for the nth appearance as opposed to just the first appearance? "Wow!" many programmers would say, "I didn't know it could do that." Take the time to get familiar with INSTR and use all of its power.

In Oracle7, if nth_appearance is not positive (i.e., if it is 0 or negative), then INSTR always returns 1. In Oracle8, a value of 0 or a negative number for nth_appearance causes INSTR to raise the VALUE_ERROR exception.

Let's look at some examples of INSTR. In these examples, you will see all four parameters used in all their permutations. As you write your own programs, keep in mind the different ways in which INSTR can be used to extract information from a string; it can greatly simplify the code you write to parse and analyze character data.

• Find the first occurrence of "archie" in "bug-or-tv-character?archie":

INSTR ('bug-or-tv-character?archie', 'archie') --> 21

The starting position and the nth appearance both defaulted to 1.

• Find the last occurrence of "ar" in "bug-or-tv-character?archie".

INSTR ('bug-or-tv-character?archie', 'ar', -1) --> 21

Were you thinking that the answer might be 6? Remember that the character position returned by INSTR is always calculated from the leftmost character of the string as position 1. The easiest way to find the last of anything in a string is to specify a negative number for the starting position. I did not have to specify the nth appearance (leaving me with a default value of 1), because the last occurrence is also the first when searching backwards.

• Find the second-to-last occurrence of "a" in "bug-or-tv-character?archie":

INSTR ('bug-or-tv-character?archie', 'a', -1, 2) -->

15

No surprises here. Counting from the back of the string, INSTR passes over the "a" in archie because that is the last occurrence, and searches for the next occurrence. Again, the character position is counted from the leftmost character, not the rightmost character, in the string.

• Find the position of the letter "t" closest to (but not past) the question mark in the string

"bug-or-tv-character?archie tophat":

• search_string := 'bug-or-tv-character?archie tophat';

• tee_loc :=

• INSTR (search_string, 't',

-1 * (LENGTH (search_string) - INSTR (search_string, '?') +1));

I dynamically calculate the location of the question mark (actually, the first question mark in the string; I assume that there is only one). Then I subtract that from the full length of the string and multiply by -1 because I need to count the number of characters from the end of the string. I then use that value to kick off the search for the closest prior "t".

This example is a good reminder that any of the parameters to INSTR can be complex expressions that call other functions or perform their own calculations. This fact is also highlighted in the next INSTR example.

• Use INSTR to confirm that a user entry is valid. In the following code, I check to see if the command selected by the user is found in the list of valid commands. If so, I execute that command:

• IF INSTR ('|ADD|DELETE|CHANGE|VIEW|CALC|', '|' || cmd

|| '|') > 0

• THEN

• execute_command (cmd);

• ELSE

• DBMS_OUTPUT.PUT_LINE

• (' You entered an invalid command. Please try again...');

END IF;

In this case, I use the concatenation operator to construct the string that I will search for in the command list. I have to append a vertical bar (|) to the selected command because it is used as a delimiter in the command list. I also use the call to INSTR in a Boolean expression. If INSTR finds a match in the string, it returns a nonzero value; the Boolean expression therefore evaluates to TRUE, and I can go on with my processing. Otherwise, I display an error message.

The following code example, generated using Unicode UTF-8 as the database character set, illustrates the difference in semantics between INSTR and INSTRB, the two variations of INSTR that you are most likely to use:

DECLARE

--The underlying database datatype for this example is Unicode UTF-8

x CHAR(30 CHAR) := 'The character ã is two-bytes.';

BEGIN

--Find the location of "is" in terms of characters DBMS_OUTPUT.PUT_LINE(INSTR(x,'is'));

--Find the location of "is" in terms of bytes DBMS_OUTPUT.PUT_LINE(INSTRB(x,'is'));

END;

The output is:

17 18

The difference in the location of the word "is" comes about because the character ã is represented in Unicode UTF-8 using two bytes. Thus, while "is" is 17 characters into the string, it is at the same time 18 bytes into the string.

The INSTRC function is capable of recognizing decomposed characters. As described in the section on COMPOSE, an alternate representation of ã is a\0303. The following example demonstrates that INSTRC recognizes this alternate representation, whereas INSTR does not:

DECLARE

--The underlying database datatype for this example is Unicode UTF-8

x CHAR(40 CHAR) := UNISTR('The character a\0303 could be composed.');

BEGIN

--INSTR won't see that a\0303 is the same as ã DBMS_OUTPUT.PUT_LINE(INSTR(x,'ã'));

--INSTRC, however, will recognize that a\0303 = ã DBMS_OUTPUT.PUT_LINE(INSTRC(x,'ã'));

END;

The output is:

0 15

According to INSTR, the string x does not contain ã at all. INSTRC, on the other hand, recognizes that a\0303 is an alternate representation for the same character. The UNISTR function is used in the declaration of x to convert the ASCII string that we can read into a Unicode string for the example.

The INSTR2 and INSTR4 functions allow you to search for code units and code points respectively, which may not correspond to complete characters. For the following example, AL16UTF16 is used as the national language character set. The character ã is represented in the string x as two code points: one for "a", and the other (\0303) for the tilde that goes above the "a".

INSTRC and INSTR4 are then used to search for the location of the \0303 code point:

DECLARE

--The underlying database datatype for this example is Unicode UTF-16

x NCHAR(40) := UNISTR('The character a\0303 could be composed.');

BEGIN

--Find the location of "\0303" using INSTRC

DBMS_OUTPUT.PUT_LINE(INSTRC(x,UNISTR(' \0303')));

--Find the location of "\0303" using INSTR4

DBMS_OUTPUT.PUT_LINE(INSTR4(x,UNISTR(' \0303')));

END;

The output is:

0 16

The INSTRC function works with full characters, and is of no use when you need to search for a code point that does not represent a complete character. INSTR4, on the other hand, is able to locate the \0303 code point. The return value of 16 indicates that \0303 is the 16th code point in the string.

INSTR2 works like INSTR4, but allows you to search for UCS-4 code units. Look at the following example:

DECLARE

x NCHAR(40) := UNISTR('This is a \D834\DD1E test');

BEGIN

DBMS_OUTPUT.PUT_LINE (INSTR2(x, UNISTR('\D834')));

DBMS_OUTPUT.PUT_LINE (INSTR4(x, UNISTR('\D834')));

END;

11

0

\D834\DD1E (the musical G clef) is a surrogate pair; each value in a surrogate pair is a code unit.

Together, the two code units represent a single code point. This example shows how INSTR2 matches on just the high surrogate, whereas INSTR4 does not. That's because INSTR2 matches in terms of code units, whereas INSTR4 matches in terms of code points. Matching on just one code unit of a surrogate is sort of equivalent to searching and matching on the first byte of a multibyte character. Normally, you don't want to do this.

Dalam dokumen Buku Oracle PL/SQL (Third Edtion) (Halaman 172-178)