Wednesday, January 02, 2008

Two Reasons To Use LineNumberReader Instead Of BufferedReader

Preamble

LineNumberReader extends BufferedReader so LineNumberReader "is a" BufferedReader. This "is a" distinction is important in this discussion, because I spend a bit of time talking about BufferedReader. But because of the "is a" relationship (a.k.a inheritance), anything said about BufferedReader is also true of LineNumberReader.

Reason Number One

LineNumberReader keeps track of line numbers. Duh!! right? But I had to say it. There would be no punchline if I didn't.

Reason Number Two

LineNumberReader compresses line terminators to a single newline character ('\n'). Now, "whoopty friggin doo!", is certainly a fair characterization of Reason Number Two, but bear with me.

I've observed over the years that many programmers use BufferedReader exclusively for it's readLine method, because it allows the programmer to work with lines at a time instead of individual characters. But sometimes you need to work a character at a time. BufferedReader is great for this. It is built for efficient reading of text data from non memory sources, like the filesystem or the network. The efficiency comes from the fact that individual calls to read do not map one-to-one with individual calls to the source(s). This results in less physical I/O which causes things to run much faster, which is always a plus.

Now, it's important to realize that you don't actually need BufferedReader to efficiently read text. You can create your own buffer using a char[] and read directly to/from your buffer. This cuts out the middle man, thus eliminating some object allocation, a bit of garbage collection, method calls and the overhead that goes with .hem. But what BufferedReader does for you that you would have to do for yourself, is detect the line terminators in the text. This detection is what makes readLine possible. LineNumberReader kicks it up a notch in that the "end of line" detection doesn't just affect readline, it also affects read. LineNumberReader's read method simplifies newline processing by returning a single newline character ('\n'), regardless of the type and quantity of them. So lets say you are working with text files on/from a system that uses CR+LF ("\r\n") as it's line terminator. Without LineNumberReader you would have to manually process the '\r' separately from the '\n'. With LineNumberReader you will only ever see the '\n', which simplifies the logic required to process the text being read.

Wrap Up

The impetus for this entry started two nights ago when I needed to whip up a little template processing system. I was working with an existing code base and I noticed that there were four types of notification files/messages that were being used. At least 60% of the text between the four messages were the same and about 90% of text between [specific] pairs of messages were the same. So I set out to unify the four files into a single file, a template, from which the original four messages can be derived. I'm posting the code that parses the master template because it showcases the benefit of having LineNumberReader handle "end of line" detection. If you are not used to doing this sort of text processing it may not be obvious how the code is benefiting from using LineNumberReader, so I'll spell it out for you. (1) LineNumberReader tracks line numbers. This time I'm not trying to be funny. The code explicitly throws one Exception and it uses the line number to make the error message more useful. (2) There is no newline "look ahead" nor "look behind" code. Without LineNumberReader the code would need to explicitly handle CR+LF, which requires "look ahead" or "look behind" semantics.

Code

/**
 * Creates a new "payment processor" specific template from the master template. 
 * 
 * @param pp The payment processor. 
 * 
 * @return A payment processor specific template. 
 * 
 * @throws IOException If there is a problem reading the master template. 
 */ 
private String getNotificationTemplate(PaymentProcessor pp) throws IOException 
{ 
    int token = UNDEF; 
    int state = UNDEF; 
    String tagname = null; 
    String ppname = pp.name(); 
    boolean matched = false; 
    StringBuilder tnb = new StringBuilder(); 
    StringBuilder sink = new StringBuilder(); 
    InputStream stream = getServletContext().getResourceAsStream("/WEB-INF/notification_template"); 
    try 
    { 
        LineNumberReader reader = new LineNumberReader(new InputStreamReader(stream, "UTF-8")); 
        for (int eof; -1 != (eof = reader.read());) 
        { 
            char c = (char)eof; 
            switch (c) 
            { 
                case '$': 
                    switch (token) 
                    { 
                        case UNDEF: 
                        case CONTENT: 
                            token = DOLLAR_SIGN; 
                            break; 
 
                        case START_TAG: 
                            token = END_TAG; 
                            break; 
                    } 
                    break; 
 
                case '{': 
                    switch (token) 
                    { 
                        case DOLLAR_SIGN: 
                            token = OPEN_BRACKET1; 
                            break; 
 
                        case OPEN_BRACKET1: 
                            token = OPEN_BRACKET2; 
                            break; 
 
                        default: 
                            token = UNDEF; 
                            break; 
                    } 
                    break; 
 
                case '}': 
                    switch (token) 
                    { 
                        case TAG_NAME: 
                            token = CLOSE_BRACKET1; 
                            break; 
 
                        case CLOSE_BRACKET1: 
                            token = START_TAG; 
                            break; 
 
                        default: 
                            token = UNDEF; 
                            break; 
                    } 
                    break; 
 
                case '\n':// No test for '\r' cause LineNumberReader compresses line terminators to a single '\n'. 
                    switch (token) 
                    { 
                        case START_TAG: 
                            token = state = CONTENT; 
                            matched = Arrays.asList(PIPE_REGX.split(tagname = tnb.toString())).contains(ppname); 
                            sink.setLength(sink.length() - tnb.length() - 5); 
                            continue; 
 
                        case END_TAG: 
                            if (tnb.toString().equals(tagname)) 
                            { 
                                if (matched) 
                                { 
                                    sink.setLength(sink.length() - tnb.length() - 6); 
                                    matched = false; 
                                } 
                                token = state = UNDEF; 
                            } 
                            else 
                            { 
                                String message = 
                                    "Illegal closing tag at line " + reader.getLineNumber() + ". Expected " + tagname + 
                                    " but found " + tnb + " instead."; 
                                throw new RuntimeException(message); 
                            } 
                            continue; 
                    } 
                    break; 
 
                default: 
                    if (OPEN_BRACKET2 == token) 
                    { 
                        token = TAG_NAME; 
                        tnb.setLength(0); 
                    } 
 
                    if (TAG_NAME == token) 
                        tnb.append(c); 
                    else if (CONTENT != token && CONTENT != state) 
                        token = UNDEF; 
                    break; 
            } 
 
            if (CONTENT != state || (CONTENT == state && matched)) 
                sink.append(c); 
        } 
    } 
    finally 
    { 
        stream.close(); 
    } 
    String template = sink.toString(); 
    switch (pp) 
    { 
        case PAYPAL: 
            paypalNotificationTemplate = template; 
            break; 
 
        case CREDITCARD: 
            ccNotificationTemplate = template; 
            break; 
    } 
    return template; 
}

1 comment:

drbcladd said...

Thank you very much for posting this. Interestingly, I just finished reading Harold's _Java I/O_; the discussion of LineNumberReader didn't talk about the modification of read.

When I work with students on language processors, they are surprised to find there is such a filter reader.