Search
Friday, July 9, 2010
Don't be so quick to blame ColdFusion
Not long ago my new boss and I were having a conversation and it came up that we might not be using ColdFusion much longer, but he’d get me trained on whatever I needed when they changed technologies.
Back story: The ColdFusion app that I’m maintaining now is an add-on to an asp.net application. It turns out the reason this add-on isn’t in .net is because the person they had available didn’t know or want to learn .net (I don’t blame him personally). On the home page of the .net app lives an iframe with a news scroller. The news is stored in a table and presented in a .cfm page. I’m sorry but I don’t see any reason why a single query and a few records can’t be displayed in .net, in a .net application. His aversion of .net was that strong.
When I asked why we were moving away from CF he said “Well, when we started this CF was the hottest thing, then not so much, now it is again, so who knows. CF also can’t handle the requirements. The system runs like shit!”
When I asked who said CF wasn’t the greatest thing anymore he said “the same guy that said it was great, the first developer.” I was pretty shocked. The same guy that didn’t want to learn enough .net to make a single page is now blaming CF for his lack of ability! Thanks to him, CF has a bad rap with my management and our customers!
The system did run like shit, fortunately for me one of my side projects is very similar to my day time job, making “charts and graphs 'n shit” as my boss puts it. I showed what I had done and how fast it works. I suggested it might not be CF, but the previous two developers. I asked him to give me the opportunity to fix the system before passing judgment on CF.
This week I put some proper error trapping in the system which logs the errors and a few other modifications. I saw a couple pages people were hitting often but were timing out. I made a couple changes to the query in question, removed the two queries nested inside the cfloop, removed # signs where they weren’t needed, removed some conditionals, replaced the several font tags with CSS, et cetera. The page loads in 1.3 seconds now. Using CSS also reduced the size of the .xls and html files by almost half (the xls files are html output with cfheader, I need to fix that too) so the page also downloads faster when 12,000 rows are returned (no, there’s no paging, customer requirement) from about 12 megs to 7.
Employers/managers: If your ColdFusion apps aren't running well, don't be so quick to pass judgement on ColdFusion. Look at your hardware, SQL, bandwidth and even your developers. If you're not a technical person, ask someone OUTSIDE of your team or higher a consultant to look at your situation.
I think I’ve saved ColdFusion’s reputation within my organization with these simple changes. My next hurdle is to save CF from our customers. Hopefully the proper error trapping and gradual fixing of the system will do this. Wish me luck!
Tuesday, July 6, 2010
MySQL Union and query optimization
UNION can be a pretty nice tool. It’s a good way to query several tables with the same data with a single call to the database server. The most common example includes adding results from archive table to your recent data. MYSQL ships with two flavors of Union; DISTINCT and ALL (assume this is the same with Oracle and MSSQL but I don’t use them enough to know).
Basic syntax:
(SELECT fields FROM table WHERE conditions)
UNION [DISTINCT | ALL]
(SELECT fields FROM table2 WHERE conditions)
[WHERE]
ORDER BY
[LIMIT]
UNION defaults to UNION DISTINCT implicitly. The difference between DISTINCT and ALL is pretty self explanatory; DISTINCT prevents duplicates just like in a normal SELECT statement. The problem with DISTINCT is that it’s very VERY costly. Both DISTINCT and ALL build a temp table to store the results; however, DISTINCT must index the results to avoid duplicates. In addition to that, from what I’ve read, applying ORDER BY and LIMIT to the entire UNION will cause MYSQL to ignore any indexes as well. I have the need to UNION up to eight tables. In my test I was working with a query I know that was giving me problems; it has six tables.
Table 1: ~25,000 records
Table 2: ~8,000 records
Table 3: ~300,000 records
Table 4: ~200,000 records
Table 5: ~820,000 records
Table 6: ~125,000 records
My where clause was pretty simple, just looking for all the records added by a single user. UNION DISTINCT caused my ColdFusion page to time out after 60 seconds. I could increase the timeout to get an accurate picture of how long it actually takes but I don’t feel like breaking anything right now (that’s what my 9-5 job is for). At first I was afraid it was because MYSQL ignores the index when ordering and limiting unions. My first thought was I needed to have to run each query individually then do a QofQ to re-order them.
In my particular situation I'm not going to have any duplicates because the data in each query is already unique to the other tables. In this case I can safely use UNION ALL without getting any duplicates with the added benefit of the speed UNION ALL gives me over UNION DISTINCT. How much faster is it? Considerably. My query went from timing out at 60 seconds to completing in 34 seconds. That’s pretty sweet, but not great.
My needs require me to limit the end result to 500 records. That’s fine for my end result, but what about each SELECT within the union? Believe it or not 99% of all the records in these six tables match my WHERE condition. That means my unions create a temp table of ~1,500,000 records of which I only need 500. No wonder UNION DISTINCT timed out; that’s a lot of indexing. Building such a large table when I only need a very small subset of that is just silly. Fortunately I can limit the results from the individual selects. Limiting each select to 500 now gives me a temp table with only 3,000 records, much better than 1.5M and growing! Unfortunately, to be sure I get the results I expect, I also need to include the same ORDER BY clause to each select that I have at the end of the entire union. Even with the overhead of ordering 1.5M records, the query now finishes in 21 seconds, a savings of 13 more seconds.
Most of the time the query only runs using tables 1 and 2, which are pretty small so it runs very fast. If anyone has any other ideas on how to make the larger unions run better, please comment below!
Subscribe to:
Posts (Atom)