Results 1 to 11 of 11
-
2018-12-12, 03:21 PM (ISO 8601)
- Join Date
- Jan 2009
slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
There's lot of times when I need to update one data element in one table with corresponding values from another table.
I'm using SAS, so I can do this a number of ways. The default in SAS is to use a data step to merge the two tables. It'd doable, but the syntax lacks finesse to just grab 1 variable.
So I tend to prefer SQL, using an UPDATE statement.
However, I've found that, when I'm trying to update a huge table with values from another at-least-big table, it takes prohibitively too long to run. I had one example before when it ran for an hour or more before I stopped it and programmed it in data-step code, but I think I'm running into it again.
My question is: any advice on how to modify the SQL code to speed up efficiency, or is SQL just slow here?
Example: I have a table with about 6 million records (called mydata). Another with about 600,000 records. I want to pull update one variable based on a shared ID number, for the year 2016.
Code:proc sql; update mydata set myVar=(SELECT myVar FROM otherdata WHERE otherdata.IDNum = mydata.IDNum) where myVar='' AND year='2016' and flag=1; quit;
I could break mydata into two tables, one for just 2016 and one with the rest of the data, then do the UPDATE, then combine the tables. I reckon the issue is that, for each 6 million records, it has to check each of the 600k records for a match. Hence doing 6 mil * 600k operations. (I'm not sure if the WHERE statement filters before or after the SET, but I think I read that SQL does all the work first and then filters out based on WHERE.)
-
2018-12-12, 03:48 PM (ISO 8601)
- Join Date
- Aug 2005
- Location
- Mountain View, CA
- Gender
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
If otherdata.IDNum does not have an index already, then yes you should create one for it. That by itself will likely make it over 10,000 times faster.
Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.
Avatar by Ceika.
Archives:
SpoilerSaberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
Isstinen Tonche for ECL 74 playtesting.
Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
Arcane Swordsage: Making it actually work (homebrew)
-
2018-12-12, 04:17 PM (ISO 8601)
- Join Date
- Mar 2007
- Location
- Oregon, USA
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
...quite the opposite. A unique ID is the first thing you should have an index on, so the database engine can look at the index and get pointed directly to where the corresponding row is in the table (as opposed to scanning through all the rows in the table to look for matches). Like Douglas is saying, an index on otherdata.IDNum would negate a ton of overhead by itself.
While that depends on how the database engine parsing the SQL statement performs it...it's almost certainly going to filter mydata by the WHERE clause first, because a cross-table subquery is bound to be expensive and will be more expensive the more rows it has to query; it's an "ounce of prevention vs pound of cure" scenario.FeytouchedBanana eldritch disciple avatar by...me!
The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas
-
2018-12-13, 12:11 AM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
Not familiar with SAS Data Step, but if this were Microsoft SQL Server you could do the update as a join rather than as a correlated subquery:
UPDATE mydata SET myVar = otherdata.myVar
FROM mydata
LEFT OUTER JOIN otherdata ON mydata.IDNum = otherdata.IDNum
where myVar='' AND year=2016 and flag=1;
No idea if that would be faster in this instance or would even work in the variant of SQL you're using, but worth a try, I think.
(Also, why are you using "year = '2016'" in your WHERE statement? If year is an integer, which seems likely, then you're just introducing a string conversion for no good reason).Last edited by factotum; 2018-12-13 at 12:12 AM.
-
2018-12-13, 04:53 AM (ISO 8601)
- Join Date
- Nov 2006
- Location
- England. Ish.
- Gender
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
In addition to the above...
Depending on how technical you wish to be, most database systems can tell you how they are actually running the query. Oracle has "explain plan" , for example. If you can get at that information it may show you what you need to fix the query.Last edited by Manga Shoggoth; 2018-12-13 at 04:54 AM. Reason: removed irrelevant bit...
-
2018-12-13, 10:13 AM (ISO 8601)
- Join Date
- Jan 2009
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
Next time I have to do this, I'll do the index thing. Thanks! I probably misread something back when studying index stuff.
EDIT: I just found out I have to re-run the data because the guy who gave me the input needs to regenerate it. So that means I get to test the index idea now. Yay.
As for Year="2016", Year is stored as a character string, so that's why that's how it is.Last edited by JeenLeen; 2018-12-13 at 10:32 AM.
-
2018-12-16, 12:49 AM (ISO 8601)
- Join Date
- May 2009
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
"None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain
-
2018-12-17, 09:34 AM (ISO 8601)
- Join Date
- Jan 2009
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
That's disappointing. I could easily do a DATA step merge or a SQL join like you mention, but what I liked about the subquery was its simplicity when typing the code. That is, it seemed really eloquent in being short and easy to comprehend what the code is supposed to do. I feel annoyed to use a more complicated syntax.
I guess I'm just being stubborn since I think my code's pretty, even if it's a nightmare on the 'what the computer actually does' end.
Though I do really appreciate this advice. It's good to realize the why behind why the code wasn't running well and to have a couple other alternatives.
-
2018-12-17, 11:55 AM (ISO 8601)
- Join Date
- Aug 2007
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
You are not. Code clarity is as important as code efficiency - sometime more. Computers continue to get faster, making inefficient code have less of an impact. But code no-one can read remains code no-one can read no matter how much faster it becomes with each upgrade.
That said, in this case, your performance is getting enough of a hit that you probably should go with the less clean code. Besides, to a DB person, the update-with-join is probably as clean as the update-with-subclause.
Grey WolfInterested in MitD? Join us in MitD's thread.There is a world of imagination
Deep in the corners of your mind
Where reality is an intruder
And myth and legend thrive
Ceterum autem censeo Hilgya malefica est
-
2018-12-17, 01:13 PM (ISO 8601)
- Join Date
- Nov 2006
- Location
- England. Ish.
- Gender
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
Depending on the underlying DBMS, what should happen is that the query optimiser looks at the query and works out the best way of running it. If the index exists and is usable then the optimiser should promote the subquery to a join on the fly. You shouldn't have to rewrite the query, just make sure it has the indexes it needs.
If you have access to the underlying database (or the ear of a friendly Database Administrator) it is worth getting the query plan to see how the query is actually being run - that will usually point you to the best way of improving the query performance.Warning: This posting may contain wit, wisdom, pathos, irony, satire, sarcasm and puns. And traces of nut.
"The main skill of a good ruler seems to be not preventing the conflagrations but rather keeping them contained enough they rate more as campfires." Rogar Demonblud
"Hold on just a d*** second. UK has spam callers that try to get you to buy conservatories?!? Even y'alls spammers are higher class than ours!" Peelee
-
2018-12-17, 01:50 PM (ISO 8601)
- Join Date
- Feb 2007
- Location
- Manchester, UK
- Gender
Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)
It takes time and data about the underlying tables for the query analyzer to do its job, though, and you don't really have control over either factor unless you're the actual DBA looking after the SQL server. Therefore, I think it never hurts to write your query in the most optimal way that you can in order to give the query analyzer less to do in finding the optimal way to run it.
(Oh, and I'd note that actually find the JOIN method more easy to grok than the subquery--I really don't like subqueries and always get confused over their syntax).