New OOTS products from CafePress
New OOTS t-shirts, ornaments, mugs, bags, and more
Results 1 to 11 of 11
  1. - Top - End - #1
    Ettin in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    There's lot of times when I need to update one data element in one table with corresponding values from another table.
    I'm using SAS, so I can do this a number of ways. The default in SAS is to use a data step to merge the two tables. It'd doable, but the syntax lacks finesse to just grab 1 variable.

    So I tend to prefer SQL, using an UPDATE statement.

    However, I've found that, when I'm trying to update a huge table with values from another at-least-big table, it takes prohibitively too long to run. I had one example before when it ran for an hour or more before I stopped it and programmed it in data-step code, but I think I'm running into it again.

    My question is: any advice on how to modify the SQL code to speed up efficiency, or is SQL just slow here?

    Example: I have a table with about 6 million records (called mydata). Another with about 600,000 records. I want to pull update one variable based on a shared ID number, for the year 2016.
    Code:
    proc sql;
        update mydata
               set myVar=(SELECT myVar FROM otherdata WHERE otherdata.IDNum = mydata.IDNum)
               where myVar='' AND year='2016' and flag=1;
    quit;
    Would creating an index on the ID number be fruitful? This is a many-to-one match, with mydata having the ID show up multiple times and the other data just having it once. I think I heard that indexing on unique IDs isn't usually helpful. I could sort both by the ID, but I don't think having it sorted helps much with SQL.

    I could break mydata into two tables, one for just 2016 and one with the rest of the data, then do the UPDATE, then combine the tables. I reckon the issue is that, for each 6 million records, it has to check each of the 600k records for a match. Hence doing 6 mil * 600k operations. (I'm not sure if the WHERE statement filters before or after the SET, but I think I read that SQL does all the work first and then filters out based on WHERE.)

  2. - Top - End - #2
    Surgebinder in the Playground Moderator
     
    Douglas's Avatar

    Join Date
    Aug 2005
    Location
    Mountain View, CA
    Gender
    Male

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    If otherdata.IDNum does not have an index already, then yes you should create one for it. That by itself will likely make it over 10,000 times faster.
    Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.

    Avatar by Ceika.

    Archives:
    Spoiler
    Show
    Saberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
    Isstinen Tonche for ECL 74 playtesting.
    Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
    Arcane Swordsage: Making it actually work (homebrew)

  3. - Top - End - #3
    Titan in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Quote Originally Posted by JeenLeen View Post
    I think I heard that indexing on unique IDs isn't usually helpful.
    ...quite the opposite. A unique ID is the first thing you should have an index on, so the database engine can look at the index and get pointed directly to where the corresponding row is in the table (as opposed to scanning through all the rows in the table to look for matches). Like Douglas is saying, an index on otherdata.IDNum would negate a ton of overhead by itself.

    Quote Originally Posted by JeenLeen View Post
    (I'm not sure if the WHERE statement filters before or after the SET, but I think I read that SQL does all the work first and then filters out based on WHERE.)
    While that depends on how the database engine parsing the SQL statement performs it...it's almost certainly going to filter mydata by the WHERE clause first, because a cross-table subquery is bound to be expensive and will be more expensive the more rows it has to query; it's an "ounce of prevention vs pound of cure" scenario.
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  4. - Top - End - #4
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Not familiar with SAS Data Step, but if this were Microsoft SQL Server you could do the update as a join rather than as a correlated subquery:

    UPDATE mydata SET myVar = otherdata.myVar
    FROM mydata
    LEFT OUTER JOIN otherdata ON mydata.IDNum = otherdata.IDNum
    where myVar='' AND year=2016 and flag=1;

    No idea if that would be faster in this instance or would even work in the variant of SQL you're using, but worth a try, I think.

    (Also, why are you using "year = '2016'" in your WHERE statement? If year is an integer, which seems likely, then you're just introducing a string conversion for no good reason).
    Last edited by factotum; 2018-12-13 at 12:12 AM.

  5. - Top - End - #5
    Troll in the Playground
     
    Flumph

    Join Date
    Nov 2006
    Location
    England. Ish.
    Gender
    Male

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    In addition to the above...

    Depending on how technical you wish to be, most database systems can tell you how they are actually running the query. Oracle has "explain plan" , for example. If you can get at that information it may show you what you need to fix the query.
    Last edited by Manga Shoggoth; 2018-12-13 at 04:54 AM. Reason: removed irrelevant bit...

  6. - Top - End - #6
    Ettin in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Next time I have to do this, I'll do the index thing. Thanks! I probably misread something back when studying index stuff.

    EDIT: I just found out I have to re-run the data because the guy who gave me the input needs to regenerate it. So that means I get to test the index idea now. Yay.

    As for Year="2016", Year is stored as a character string, so that's why that's how it is.
    Last edited by JeenLeen; 2018-12-13 at 10:32 AM.

  7. - Top - End - #7
    Ettin in the Playground
     
    Kobold

    Join Date
    May 2009

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Quote Originally Posted by factotum View Post
    No idea if that would be faster in this instance or would even work in the variant of SQL you're using, but worth a try, I think.
    That should work in any SQL variant, I think, it's all standard functions. And joining is nearly always dramatically faster than a subquery. Subqueries are always slow, and should only be used in the last resort.
    "None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain

  8. - Top - End - #8
    Ettin in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Quote Originally Posted by veti View Post
    That should work in any SQL variant, I think, it's all standard functions. And joining is nearly always dramatically faster than a subquery. Subqueries are always slow, and should only be used in the last resort.
    That's disappointing. I could easily do a DATA step merge or a SQL join like you mention, but what I liked about the subquery was its simplicity when typing the code. That is, it seemed really eloquent in being short and easy to comprehend what the code is supposed to do. I feel annoyed to use a more complicated syntax.

    I guess I'm just being stubborn since I think my code's pretty, even if it's a nightmare on the 'what the computer actually does' end.

    Though I do really appreciate this advice. It's good to realize the why behind why the code wasn't running well and to have a couple other alternatives.

  9. - Top - End - #9
    Titan in the Playground
     
    Grey_Wolf_c's Avatar

    Join Date
    Aug 2007

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Quote Originally Posted by JeenLeen View Post
    I guess I'm just being stubborn since I think my code's pretty, even if it's a nightmare on the 'what the computer actually does' end.
    You are not. Code clarity is as important as code efficiency - sometime more. Computers continue to get faster, making inefficient code have less of an impact. But code no-one can read remains code no-one can read no matter how much faster it becomes with each upgrade.

    That said, in this case, your performance is getting enough of a hit that you probably should go with the less clean code. Besides, to a DB person, the update-with-join is probably as clean as the update-with-subclause.

    Grey Wolf
    Interested in MitD? Join us in MitD's thread.
    There is a world of imagination
    Deep in the corners of your mind
    Where reality is an intruder
    And myth and legend thrive
    Quote Originally Posted by The Giant View Post
    But really, the important lesson here is this: Rather than making assumptions that don't fit with the text and then complaining about the text being wrong, why not just choose different assumptions that DO fit with the text?
    Ceterum autem censeo Hilgya malefica est

  10. - Top - End - #10
    Troll in the Playground
     
    Flumph

    Join Date
    Nov 2006
    Location
    England. Ish.
    Gender
    Male

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Quote Originally Posted by JeenLeen View Post
    That's disappointing. I could easily do a DATA step merge or a SQL join like you mention, but what I liked about the subquery was its simplicity when typing the code. That is, it seemed really eloquent in being short and easy to comprehend what the code is supposed to do. I feel annoyed to use a more complicated syntax.

    I guess I'm just being stubborn since I think my code's pretty, even if it's a nightmare on the 'what the computer actually does' end.

    Though I do really appreciate this advice. It's good to realize the why behind why the code wasn't running well and to have a couple other alternatives.
    Depending on the underlying DBMS, what should happen is that the query optimiser looks at the query and works out the best way of running it. If the index exists and is usable then the optimiser should promote the subquery to a join on the fly. You shouldn't have to rewrite the query, just make sure it has the indexes it needs.

    If you have access to the underlying database (or the ear of a friendly Database Administrator) it is worth getting the query plan to see how the query is actually being run - that will usually point you to the best way of improving the query performance.
    Warning: This posting may contain wit, wisdom, pathos, irony, satire, sarcasm and puns. And traces of nut.

    "The main skill of a good ruler seems to be not preventing the conflagrations but rather keeping them contained enough they rate more as campfires." Rogar Demonblud

    "Hold on just a d*** second. UK has spam callers that try to get you to buy conservatories?!? Even y'alls spammers are higher class than ours!" Peelee

  11. - Top - End - #11
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: slowness of SQL UPDATE statements (vs SAS Data Step but any SQL advice welcome)

    Quote Originally Posted by Manga Shoggoth View Post
    Depending on the underlying DBMS, what should happen is that the query optimiser looks at the query and works out the best way of running it. If the index exists and is usable then the optimiser should promote the subquery to a join on the fly.
    It takes time and data about the underlying tables for the query analyzer to do its job, though, and you don't really have control over either factor unless you're the actual DBA looking after the SQL server. Therefore, I think it never hurts to write your query in the most optimal way that you can in order to give the query analyzer less to do in finding the optimal way to run it.

    (Oh, and I'd note that actually find the JOIN method more easy to grok than the subquery--I really don't like subqueries and always get confused over their syntax).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •