Search

Sunday, June 13, 2010

ColdFusion likes lazy programmers



I don’t usually like to alias my code. Even when I write SQL I like to use the full field name, schema.table.column. I only use aliases when I’m in a sub query and need to reference the main query; then it’s required. I do the same thing in ColdFusion, I don't usually assign deep variables to a reference (no I don't really think it's lazy, just my preference).


Recently I was working with xml documents uploaded by users. For some reason my looping through my collection was timing out. My test document has 12,804 child elements from the root. At first I thought it was the processing I was doing within the loop. I kept removing sections of code trying to find out where the problem was. Finally I was left with a single cfoutput of a single element and my code was still timing out.


I’m not sure why I decided to try, because it isn't something I would normally do, but I assigned the parsedXML to a reference. To my surprise this solved my timeout problem. It solved it so well... my head exploded.


Here is a simple test to show the contrast.
The first loop uses the full variable syntax:
variables.parsedXML.XMLrootElement.XMLChildElement[variables.i].id.xmlText
The second loop uses the reference:
variables.XMLByRef[variables.i].id.xmlText
<cfsetting requesttimeout="960" /> 
<cfset variables.parsedXMLLen = arrayLen(variables.parsedXML.XMLrootElement.XMLChildElement) > 
<cfset variables.startXMLLongLoop = getTickCount() > 

#variables.parsedXML.XMLrootElement.XMLChildElement[variables.i].id.xmlText#


<cfset variables.stopXMLLongLoop = getTickCount() >

<cfset variables.XMLByRef = variables.parsedXML.XMLrootElement.XMLChildElement > <cfset variables.startXMLShortLoop = getTickCount() > #variables.XMLByRef[variables.i].id.xmlText# <cfset variables.stopXMLShortLoop = getTickCount() > Fully qualified: #variables.stopXMLLongLoop-variables.startXMLLongLoop# miliseconds to loop #variables.parsedXMLLen# times Referenced: #variables.stopXMLShortLoop-variables.startXMLShortLoop# miliseconds to loop #variables.parsedXMLLen# times
The result:
Fully qualified: 537765 miliseconds to loop 12804 times
Referenced: 250 miliseconds to loop 12804 times
The first loop using #variables.parsedXML.XMLrootElement.XMLChildElement[variables.i].id.xmlText# took almost 9 minutes to run!!
The second loop using #variables.XMLByRef[variables.i].id.xmlText# only took a quarter second??!! WTF just happened here? Anyone know why using a reference variable is faster than using the fully qualified variable?

Thanks!


4 comments:

  1. G'day:
    I think you're confusing "laziness" with "expedience" and "clarity". With your verbose approach you are not being "more thorough", you're writing unnecessarily cluttered code, which will be adding a maintenance overhead... from being harder to read.

    But anyway... your point remains an interesting one.

    Doing a quick test reveals that plain old structs also suffer from this... I converted your logic to be using nested structs / arrays, and there was a very measurable performance hit using the full variable path cf using a reference to the array.

    This makes sense, because CF does still need to do look-ups on the structs to fetch the contents of them, so doing this ONCE outside the loop is always going to be more expedient than repeating it over and over again inside the loop. So why make CF do a lot of unnecessary work?

    --
    Adam

    ReplyDelete
  2. Like I said, "no I don't really think it's lazy." I guess "clarity" depends on the programmer. Personally variables.parsedXML.XMLrootElement.XMLChildElement[variables.i].id.xmlText is easier for me to maintain because I don't have to remember where variables.XMLByRef[variables.i].id.xmlText came from. One isn't bad but reviewing code I haven't looked at in over a year or SQL someone else wrote, aliases and references annoy me.

    I can see your point that it has to look up each struct, and I wouldn't have been so surprised if the difference wasn't 9 MINUTES.

    ReplyDelete
  3. Hmm.. that's just a very basic optimization. Less dots (lookup or parsing operations) in a large loop will always perform better.

    ReplyDelete
  4. Yeah, I guess so. A direct access to
    variables.parsedXML.XMLrootElement.XMLChildElement[10000].id.xmlText averages about 50Miliseconds, about the same time as assigning the reference. (((50*12804)/1000)/60) =~ 10 minutes if I were to hard code each line. It just seems odd to me that using a simple pointer is 2000+ times faster than accessing the variable directly. I didn't realize structures were so inefficient.

    ReplyDelete