New OOTS products from CafePress
New OOTS t-shirts, ornaments, mugs, bags, and more
Page 1 of 2 12 LastLast
Results 1 to 30 of 39
  1. - Top - End - #1
    Ettin in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default minor programming rant

    In the spirit of camaraderie and sharing small bugs that cause big problems, sharing something that just happened.

    I'm updating a program from about 6 months ago, as the reporting format changed and they added a couple reports. Nothing too bad, but annoying in that I have to figure out what I did before and at least one change is rather big so I have to undo some data cleaning and then recode it a different way.
    I just made one "minor" change to the code to try to remove pre-kindergarten students, and about half of my output (the part not related to Grade level) vanished.
    When I started this post, I was kicking myself for not saving a backup copy of my program before fiddling with it, since I had no clue how I lost half my output from some minor changes.

    But then it hit me and turned out an easy fix. Grade is stored numerically, where 0 = Kindergarten and negatives are pre-k. I added a line
    If Grade < 0 then delete
    when I needed to write
    If Grade < 0 and Grade NE . then delete.

    The half of my output that was deleted were statistics not related to Grade, so the Grade column was null. And null is less than 0 according to the programming language's logic.
    I feel quite relieved, and hope this will teach me to save versions of my program better.

  2. - Top - End - #2
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: minor programming rant

    Quote Originally Posted by JeenLeen View Post
    The half of my output that was deleted were statistics not related to Grade, so the Grade column was null. And null is less than 0 according to the programming language's logic.
    Weird logic in whatever language you're using, that--in SQL server, at any rate, NULL is explicitly not equivalent to *any* value; this is why you can't find all the NULL values in a list using something like SELECT * FROM list WHERE column = NULL, because the "column = NULL" will never be true even if the value in the column is actually NULL. (You would have to use IS NULL instead of the direct equals statement).

  3. - Top - End - #3
    Titan in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: minor programming rant

    Quote Originally Posted by factotum View Post
    Weird logic in whatever language you're using, that--in SQL server, at any rate, NULL is explicitly not equivalent to *any* value; this is why you can't find all the NULL values in a list using something like SELECT * FROM list WHERE column = NULL, because the "column = NULL" will never be true even if the value in the column is actually NULL. (You would have to use IS NULL instead of the direct equals statement).
    Right. NULL doesn't represent no value, it represents an unknown value. "NULL = NULL" is itself NULL, because it's unknown whether two unknown values are equal to each other.

    "NULL < 0" would be NULL for the same reason (it's unknown whether an unknown value is less than 0)...so it certainly seems weird for an "if" to act on an unknown value like that.
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  4. - Top - End - #4
    Ettin in the Playground
     
    BardGuy

    Join Date
    Jan 2009

    Default Re: minor programming rant

    The language is SAS, and I think that's just how SAS orders numerics for both sorting and comparison purposes. Null/missing is less than any actual number, hence less than 0 for logical operators.

    I started to think of some way it could be a strange logic resolution, like the "Grade < 0" being unknown when Grade is null, and "if unknown" then somehow passing as true, but I'm pretty sure null defaults to False.
    But I just tested something like
    if . then foo=1
    and foo was not set to 1.
    So, thinking it's just the ordering logic in that language.

  5. - Top - End - #5
    Troll in the Playground
    Join Date
    Jan 2007

    Default Re: minor programming rant

    This nicely explains, why some languages have three possible outcomes of IF statements respectively for true, false and no clue.
    In a war it doesn't matter who's right, only who's left.

  6. - Top - End - #6
    Troll in the Playground
     
    Imp

    Join Date
    Jul 2008
    Location
    Sweden
    Gender
    Male

    Default Re: minor programming rant

    Quote Originally Posted by Jasdoif View Post
    Right. NULL doesn't represent no value, it represents an unknown value. "NULL = NULL" is itself NULL, because it's unknown whether two unknown values are equal to each other.

    "NULL < 0" would be NULL for the same reason (it's unknown whether an unknown value is less than 0)...so it certainly seems weird for an "if" to act on an unknown value like that.
    When I first read this I was like "WHAT??? Since when is null not no value", I set out to find out if you're right and turns out the answer is also null.
    http://www.dbta.com/Columns/DBA-Corn...ll-102619.aspx I got this site and it says like you do "A null represents missing or unknown information at the column level". Always thought null meant "no value", not "unknown/missing"
    Black text is for sarcasm, also sincerity. You'll just have to read between the lines and infer from context like an animal

  7. - Top - End - #7
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: minor programming rant

    There are already perfectly good ways to represent no value--e.g. 0 for a number or an empty string for a text field. There wouldn't be much point in NULL if it did the same thing.

  8. - Top - End - #8
    Surgebinder in the Playground Moderator
     
    Douglas's Avatar

    Join Date
    Aug 2005
    Location
    Mountain View, CA
    Gender
    Male

    Default Re: minor programming rant

    Quote Originally Posted by Mastikator View Post
    When I first read this I was like "WHAT??? Since when is null not no value", I set out to find out if you're right and turns out the answer is also null.
    http://www.dbta.com/Columns/DBA-Corn...ll-102619.aspx I got this site and it says like you do "A null represents missing or unknown information at the column level". Always thought null meant "no value", not "unknown/missing"
    "Unknown/missing" is a concept that clearly has many use cases where it needs to be represented in some way, and what other way is there than null? Same with "uninitialized", and of course "no value". This is part of the big problem with null as a concept in programming - there are at least three different concepts it can represent, maybe more I haven't thought of, and few languages have any way to indicate which one of them any particular null is supposed to mean.

    Quote Originally Posted by factotum View Post
    There are already perfectly good ways to represent no value--e.g. 0 for a number or an empty string for a text field. There wouldn't be much point in NULL if it did the same thing.
    Those are terrible ways to represent "no value", because they are in fact actual values too.
    Last edited by Douglas; 2019-09-15 at 01:10 PM.
    Like 4X (aka Civilization-like) gaming? Know programming? Interested in game development? Take a look.

    Avatar by Ceika.

    Archives:
    Spoiler
    Show
    Saberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
    Isstinen Tonche for ECL 74 playtesting.
    Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
    Arcane Swordsage: Making it actually work (homebrew)

  9. - Top - End - #9
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: minor programming rant

    Quote Originally Posted by Douglas View Post
    Those are terrible ways to represent "no value", because they are in fact actual values too.
    Actual values which are either (a) unlikely to be meaningful in a real-world application, so are safe to use for this application or (b) will work the same as having "no value".

  10. - Top - End - #10
    Troll in the Playground
    Join Date
    Jan 2007

    Default Re: minor programming rant

    Quote Originally Posted by factotum View Post
    Actual values which are either (a) unlikely to be meaningful in a real-world application, so are safe to use for this application or (b) will work the same as having "no value".
    I beg to differ: 0 is a valid data value in many cases and is vastly different then having no value at all. Being able do distinguish knowledge of there being nothing from lack of knowledge about something is important. If you put 0 as an unknown value, then you can never have 0 as a legitimate value, since you could not distinguish between two of its uses otherwise. The same goes for an empty string.

    Languages geared for numerical calculations even go further catching infinities and "not a number" results with each having its own designation.
    In a war it doesn't matter who's right, only who's left.

  11. - Top - End - #11
    Titan in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: minor programming rant

    Quote Originally Posted by Mastikator View Post
    When I first read this I was like "WHAT??? Since when is null not no value", I set out to find out if you're right and turns out the answer is also null.
    http://www.dbta.com/Columns/DBA-Corn...ll-102619.aspx I got this site and it says like you do "A null represents missing or unknown information at the column level". Always thought null meant "no value", not "unknown/missing"
    Yeah, I thought the same thing for years, largely because all the applications I worked with treated it as "no value"...but it turned out that in SQL itself, NULL means "unknown value". Which explained a lot of otherwise bizarre things like why "NULL = NULL" wasn't true, and why IS NULL constructs existed.

    Quote Originally Posted by Douglas View Post
    Quote Originally Posted by factotum View Post
    There are already perfectly good ways to represent no value--e.g. 0 for a number or an empty string for a text field. There wouldn't be much point in NULL if it did the same thing.
    Those are terrible ways to represent "no value", because they are in fact actual values too.
    The closest thing to that I remember is integer fields where only 0 and positive values made sense, so negative values were used to signify special cases (like weapon ranges in Red Alert 2, "-2" meant "skip the distance check").
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  12. - Top - End - #12
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: minor programming rant

    Quote Originally Posted by Radar View Post
    If you put 0 as an unknown value, then you can never have 0 as a legitimate value, since you could not distinguish between two of its uses otherwise. The same goes for an empty string.
    But we've already established that NULL is the unknown value. What we're discussing above is *no* value, which is not the same thing.

  13. - Top - End - #13
    Ettin in the Playground
     
    Kobold

    Join Date
    May 2009

    Default Re: minor programming rant

    Certain virtually-prehistoric languages (read: BASIC, back in the day) would initialise a new variable as zero (or '', for a string) as soon as it was named. (So for instance, "IF A = 0" would return true, if A had never previously been mentioned.)

    SQL has very definite ideas about NULL. I'm surprised to learn of a language where NULL < 0, but then it takes all sorts to make a world. (Even some who haven't grasped the idea of version control, apparently...)
    "None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain

  14. - Top - End - #14
    Ogre in the Playground
     
    ElfPirate

    Join Date
    Aug 2013

    Default Re: minor programming rant

    Quote Originally Posted by JeenLeen View Post
    I feel quite relieved, and hope this will teach me to save versions of my program better.
    S'yeah. You can hope.


    Must admit I immediately thought you were doing something in Excel because that also has interesting ideas about values sometimes.
    Last edited by snowblizz; 2019-09-16 at 05:16 AM.

  15. - Top - End - #15
    Troll in the Playground
     
    Imp

    Join Date
    Jul 2008
    Location
    Sweden
    Gender
    Male

    Default Re: minor programming rant

    Quote Originally Posted by factotum View Post
    But we've already established that NULL is the unknown value. What we're discussing above is *no* value, which is not the same thing.
    I think "no value" goes under the umbrella of missing value in this case. If your database has a lot of NULL then you either have garbage data or worse, garbage database design.
    Black text is for sarcasm, also sincerity. You'll just have to read between the lines and infer from context like an animal

  16. - Top - End - #16
    Orc in the Playground
     
    MonkGuy

    Join Date
    Dec 2006

    Default Re: minor programming rant

    Quote Originally Posted by veti View Post
    I'm surprised to learn of a language where NULL < 0, but then it takes all sorts to make a world.
    The closest analogue that I'm familiar with would be NaN (not a number) in C, in which any comparison to it is false. So (x <= 0.0) and (x >= 0.0) would both be false for NaN. I could envision a language where for some reason they thought it made more sense to have the condition always evaluate to true instead of always evaluate to false.

  17. - Top - End - #17
    Ettin in the Playground
     
    Kobold

    Join Date
    May 2009

    Default Re: minor programming rant

    Come to think of it, there's an inconsistency in the SQL treatment. If you select MAX from a column that contains nulls, the nulls are ignored and you get the highest non-null value.

    But according to above logic, you should get NULL - because you can't know if one of the unknown values may be higher than the current maximum.
    "None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain

  18. - Top - End - #18
    Titan in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: minor programming rant

    Quote Originally Posted by veti View Post
    Come to think of it, there's an inconsistency in the SQL treatment. If you select MAX from a column that contains nulls, the nulls are ignored and you get the highest non-null value.

    But according to above logic, you should get NULL - because you can't know if one of the unknown values may be higher than the current maximum.
    If the SQL standard didn't specify that NULLs are to be excluded before determining the maximum, that is indeed what should happen. (If there's nothing but NULLs, the result is then NULL.)
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  19. - Top - End - #19
    Ettin in the Playground
     
    Kobold

    Join Date
    May 2009

    Default Re: minor programming rant

    Quote Originally Posted by Jasdoif View Post
    If the SQL standard didn't specify that NULLs are to be excluded before determining the maximum, that is indeed what should happen. (If there's nothing but NULLs, the result is then NULL.)
    But, it's the rules set out in the SQL standard that define the meaning of "NULL" in SQL. And in this instance at least, they are clearly not treating it as "unknown", but more like "no value".
    "None of us likes to be hated, none of us likes to be shunned. A natural result of these conditions is, that we consciously or unconsciously pay more attention to tuning our opinions to our neighbor’s pitch and preserving his approval than we do to examining the opinions searchingly and seeing to it that they are right and sound." - Mark Twain

  20. - Top - End - #20
    Firbolg in the Playground
    Join Date
    Dec 2010

    Default Re: minor programming rant

    An example of 'no value': you have a dataset of molecules, and one derived quantity is the fraction of double bonds to single bonds on nitrogen atoms in that molecule. If there are no nitrogen atoms in that molecule, the nitrogen double bond fraction isn't 'unknown', it's 'not applicable'.

    "No value" would imply to me that the code should raise an error if that entry is ever directly used in a computation or comparison, whereas "Unknown value" could plausibly just convert things its used with into "Unknown value" and still make sense. Though, honestly, this is really annoying behavior in practice and while it makes logical sense, it ends up creating really hard-to-debug results in simulations since you can't easily find where the NaN or NULL originated from.

    The sort of unknown value propagation behavior makes more sense to me if you're operating in a framework that has some sort of method to impute or approximate those unknown values automatically, though there we're getting to a point where we might as well just say that everything is a variable with some statistical distribution and 'unknown value' just means e.g. 'no additional information other than that it's part of the same dataset'.

  21. - Top - End - #21
    Titan in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: minor programming rant

    Quote Originally Posted by veti View Post
    But, it's the rules set out in the SQL standard that define the meaning of "NULL" in SQL. And in this instance at least, they are clearly not treating it as "unknown", but more like "no value".
    I suppose that's true....At the same, any aggregate function that works with values would need to return NULL if there's any NULL in the set it's working on, undercutting the general usefulness of aggregate functions without some sort of prefiltering scheme...and it turns out SQL's prefiltering scheme is "filter out NULL values before passing to aggregate functions that aren't COUNT(*) ." Which, incidentally, means you could compare a COUNT with a COUNT(*) to check for the presence (and number) of NULLs, when that's important.

    Is it kind of a hacky way to say "MAX returns the highest known value" and not need special steps for when a value might be (or might ever become) NULL? Yes...but I think I'm okay with erring on the side of maintainability, in this case.
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  22. - Top - End - #22
    Ogre in the Playground
     
    ElfPirate

    Join Date
    Aug 2013

    Default Re: minor programming rant

    Quote Originally Posted by Jasdoif View Post
    I suppose that's true....At the same, any aggregate function that works with values would need to return NULL if there's any NULL in the set it's working on, undercutting the general usefulness of aggregate functions without some sort of prefiltering scheme...and it turns out SQL's prefiltering scheme is "filter out NULL values before passing to aggregate functions that aren't COUNT(*) ." Which, incidentally, means you could compare a COUNT with a COUNT(*) to check for the presence (and number) of NULLs, when that's important.

    Is it kind of a hacky way to say "MAX returns the highest known value" and not need special steps for when a value might be (or might ever become) NULL? Yes...but I think I'm okay with erring on the side of maintainability, in this case.
    I was gonna say. SQL (like Excel) probably works on the assuption that from a database you want whatever result can be got, even if there are entries missing. If it didn't, basically, it would be entirely useless as you'd almost always end up working with "bad" data of some kind.
    Last edited by snowblizz; 2019-09-17 at 02:10 AM.

  23. - Top - End - #23
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: minor programming rant

    To be fair, you could filter the results yourself in the event aggregate functions didn't work that way, e.g. "SELECT MAX(value) FROM list WHERE value IS NOT NULL". Arguably that might be better, because it would make aggregate functions consistent with other uses of NULL.

  24. - Top - End - #24
    Troll in the Playground
    Join Date
    Jan 2007

    Default Re: minor programming rant

    Quote Originally Posted by factotum View Post
    To be fair, you could filter the results yourself in the event aggregate functions didn't work that way, e.g. "SELECT MAX(value) FROM list WHERE value IS NOT NULL". Arguably that might be better, because it would make aggregate functions consistent with other uses of NULL.
    The question is, when would you ever use aggregate function without filtering NULL values? How would that even work (aside from propagating NULL)? If noone would ever need such a functionality and everyone would have to always specify that they want ot exclude NULL values, then this is not the way to go.

    Also, any unknown value is by default assumed to come from the same probabilistic distribution as the ones we have. This means, it would not affect the calculation of distribution parameters in a significant way unless there is a bias in which data points we failed to obtain. But in this case it means you need to go back to data gathering procedures. Bottom line is, excluding unknown values from aggregate functions is justified.
    In a war it doesn't matter who's right, only who's left.

  25. - Top - End - #25
    Firbolg in the Playground
    Join Date
    Dec 2010

    Default Re: minor programming rant

    Quote Originally Posted by Radar View Post
    The question is, when would you ever use aggregate function without filtering NULL values? How would that even work (aside from propagating NULL)? If noone would ever need such a functionality and everyone would have to always specify that they want ot exclude NULL values, then this is not the way to go.

    Also, any unknown value is by default assumed to come from the same probabilistic distribution as the ones we have. This means, it would not affect the calculation of distribution parameters in a significant way unless there is a bias in which data points we failed to obtain. But in this case it means you need to go back to data gathering procedures. Bottom line is, excluding unknown values from aggregate functions is justified.
    For what it's worth, I encountered an interesting problem where the NULL pattern was correlated with observables. The dataset was constructed by fusing two measurement techniques - one which was more complete, but narrow in what objects it could capture; the other of which captured more objects, but not every field.

    In this case, you would get weird results assuming you could draw NULLs from the data distribution.

  26. - Top - End - #26
    Troll in the Playground
    Join Date
    Jan 2007

    Default Re: minor programming rant

    Quote Originally Posted by NichG View Post
    For what it's worth, I encountered an interesting problem where the NULL pattern was correlated with observables. The dataset was constructed by fusing two measurement techniques - one which was more complete, but narrow in what objects it could capture; the other of which captured more objects, but not every field.

    In this case, you would get weird results assuming you could draw NULLs from the data distribution.
    Might I ask, what kind of experiment that was? Just out of curiosity.

    Still I guess you would not want NULL affecting aggregate functions. I would assume you needed to use more sophisticated tools to combine and analyse the data than standard statistical functions.
    In a war it doesn't matter who's right, only who's left.

  27. - Top - End - #27
    Firbolg in the Playground
    Join Date
    Dec 2010

    Default Re: minor programming rant

    Quote Originally Posted by Radar View Post
    Might I ask, what kind of experiment that was? Just out of curiosity.

    Still I guess you would not want NULL affecting aggregate functions. I would assume you needed to use more sophisticated tools to combine and analyse the data than standard statistical functions.
    It's the dataset of observed exoplanets. Some are detected with radial velocity measurements (gets period, but only determines a lower bound on the mass and no inclination information), some via transits (gets radius, period, maybe some other things like eccentricity and inclination), or both (allowing you to know the mass field because you have independent measurement of the inclination).

    For transits, it has to be big enough to pass between the Earth's field of view and the disk of the star, so you have a bias towards short periods and/or large radii. For radial velocity measurements, it can be further away, but the size of the effect is based on mass (and star type might factor in). And so on.

    In the end, we trained a generative model to reproduce the distribution and found that when we trained with missing data using e.g. a masking scheme to hide those values, the network actually used the fact that some values were masked to improve its estimates of other values we were testing it on. So for instance, there was more information about the orbital period in the pattern of NULLs than there was in the radius, star type, etc.

    We didn't actually figure out how to solve this, so we just proceeded using the 600 or so exoplanets that had the complete set of fields we ended up retaining. But this means we had to drop stuff like eccentricity, which would have been cool.

  28. - Top - End - #28
    Titan in the Playground
     
    Jasdoif's Avatar

    Join Date
    Mar 2007
    Location
    Oregon, USA

    Default Re: minor programming rant

    Quote Originally Posted by factotum View Post
    To be fair, you could filter the results yourself in the event aggregate functions didn't work that way, e.g. "SELECT MAX(value) FROM list WHERE value IS NOT NULL". Arguably that might be better, because it would make aggregate functions consistent with other uses of NULL.
    If you're only interested in one aggregate value, that's pretty easy. If you're trying to work with interrelated data, though, you probably don't want to drop known values simply because another column associated with them is NULL. And this gets "exciting" if you're working with joined tables and can't be sure in advance there'll be matches between the tables (ie, any time you'd want to use a left/right/outer join rather than an inner join). And/Or if you're using grouping or window functions, where applying filtering on the spot is awkward at best. True, a hypothetical clause like "MAX(value) WITHOUT NULL" could have been used to explicitly declare what SQL does implicitly...but then we're at what Radar already said; is quasi-mandatory boilerplate just to make explicit what nearly everyone does really worth it?


    Semi-tangential: aggregate functions work on expressions, not necessarily columns; so functions like IFNULL/NVL can be used to replace a NULL value with an approximation, if desired...and if the approximation is NULL, it's still excluded like any other NULL value is.
    Feytouched Banana eldritch disciple avatar by...me!

    The Index of the Giant's Comments VI―Making Dogma from Zapped Bananas

  29. - Top - End - #29
    Colossus in the Playground
     
    BlackDragon

    Join Date
    Feb 2007
    Location
    Manchester, UK
    Gender
    Male

    Default Re: minor programming rant

    Quote Originally Posted by Jasdoif View Post
    is quasi-mandatory boilerplate just to make explicit what nearly everyone does really worth it?
    Maybe not in this particular instance, but personally I like to write code that makes sense without the inside knowledge of what the language considers an exception or not. That's why I never use BETWEEN in SQL but instead use >= and <=, because I can never remember if BETWEEN includes the endpoints or not and I suspect I'm not alone in that. OK, that means I sometimes add redundant code elements, but I do so for the sake of readability.

  30. - Top - End - #30
    Troll in the Playground
    Join Date
    Jan 2007

    Default Re: minor programming rant

    Quote Originally Posted by NichG View Post
    It's the dataset of observed exoplanets. Some are detected with radial velocity measurements (gets period, but only determines a lower bound on the mass and no inclination information), some via transits (gets radius, period, maybe some other things like eccentricity and inclination), or both (allowing you to know the mass field because you have independent measurement of the inclination).

    For transits, it has to be big enough to pass between the Earth's field of view and the disk of the star, so you have a bias towards short periods and/or large radii. For radial velocity measurements, it can be further away, but the size of the effect is based on mass (and star type might factor in). And so on.

    In the end, we trained a generative model to reproduce the distribution and found that when we trained with missing data using e.g. a masking scheme to hide those values, the network actually used the fact that some values were masked to improve its estimates of other values we were testing it on. So for instance, there was more information about the orbital period in the pattern of NULLs than there was in the radius, star type, etc.

    We didn't actually figure out how to solve this, so we just proceeded using the 600 or so exoplanets that had the complete set of fields we ended up retaining. But this means we had to drop stuff like eccentricity, which would have been cool.
    So that means the NULLs were not random. From the measurement techniques alone there are some bounds on what is detectable or not, which means that NULLs do carry some useful information. As they say, lack of news is good news.

    Understanding those patterns however without going through the data personally might be not possible.
    In a war it doesn't matter who's right, only who's left.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •