How To Compare Model Vs Measured Data

There are a number of things yous can do with this sort of data but in the stop you have to make some calls most what is acceptable to you - and the old way of proverb "is the average fault more than five pct?" may not be such a bad way of doing it.

Statistics tin can be useful in showing whether in that location is bias in the measurements; and in whether any of the properties is systematically mismeasured. And it tin help quantify on average how far out the sensor is, and a sense of the spread of that (ie is it more often than not correct and occasionally manner out - or e'er mediocre). It's your judgement call on whether that's acceptable.

Here's what I did with your data (ok, I was having a deadening afternoon...), using R and Hadley Wickham'due south plyr and ggplot2 libraries.. My first instinct (basically also Michelle'south suggestion) was to expect at the human relationship between the actual value and the sensor's value, conditioned on the five properties. That gives me:

enter image description here

which is a start. Amongst other things I see you accept very few observations on backdrop four and five, which might complicate things afterward. And that some of the properties typically have much higher scores than others.

Next, I built a similar plot and compared the information to a line with zip intercept and slope of 1 which is what you'd go if the sensor and the actuals were ever the same. I added to this locally smoothed lines showing the bodily relationship between actual and sensor scores for the first iii properties. This gives me the following, which is starting to suggest that property 2 is perhaps scored higher than it should be by the sensor; and properties one and three perhaps the contrary problem. Property three is probably the one nigh to look out for, judging by this plot.

enter image description here

I'm interested in testing this in a model, but there is clear heteroscedasticity ie the variance in sensor scores, unsurprisingly, increases as the scores increment. This would invalidate the simpler models to fit, so I try taking logarithms and this trouble goes away:

enter image description here

I can so use this as the basis of fitting a linear regression model which finds that yep, in that location is statistically significant prove that knowing which property is existence measured is helpful in knowing what the sensor's score volition exist. If the sensor was equally good at measuring all properties you lot wouldn't get this, and then we now know there is a problem hither at to the lowest degree (again, whether it matters is up to y'all).

          > mod <- lm(log(sens) ~ log(act) + act.var, data=sensorm) > anova(mod) # shows there is a difference in how well properties measured Analysis of Variance Table  Response: log(sens)           Df Sum Sq Hateful Sq F value  Pr(>F)     log(deed)   i  30.15   thirty.15 3854.54 < 2e-xvi *** act.var    4   0.28    0.07    9.06 3.6e-06 *** Residuals 88   0.69    0.01                     --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1  > summary(modern) # as graphics prove, property two seems higher than properties ane and 3 ...  Coefficients:             Estimate Std. Error t value Pr(>|t|)     (Intercept)  0.37988    0.35210    1.08    0.284     log(act)     0.95216    0.04094   23.26   <2e-xvi *** act.varX2    0.11459    0.04772    ii.40    0.018 *   act.varX3    0.00855    0.05262    0.sixteen    0.871     act.varX4    0.14067    0.06479    ii.17    0.033 *   act.varX5   -0.15136    0.06463   -ii.34    0.021 *   --- Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.i ' ' 1   Residue standard error: 0.0884 on 88 degrees of freedom   (61 observations deleted due to missingness) Multiple R-squared: 0.978,      Adapted R-squared: 0.977  F-statistic:  778 on 5 and 88 DF,  p-value: <2e-xvi

Ok, so I've resolved that at to the lowest degree one of the properties is measured differently from the others. Next I wanted to go a feel for how far out in full general the measurements are. I did this by plotting a histogram of the average percentage out for each ascertainment of a property - which shows in that location really is a pretty big range:

enter image description here

I'd intuitively say that already this is plenty to betoken you need some more callibration, just nosotros can quantify information technology a bit.

There's non quite prove the boilerplate is significantly different from zilch ie of systematic bias, but it'due south getting close (p value of effectually 0.08). If it weren't for those couple of large values where the sensor was 20-40 percent too large, you'd say that it seems to generally underestimate the true value.

All the same, perhaps more importantly, the average absolute value of the pct out is about 7 or viii percent (depending on how yous translate "average") which strikes me equally too much. But come across, we've just come back to that first rule of thumb, albeit after some useful graphical insight...

Hope that helps. The code in R that produced this is below.

          sensor <- read.csv("sensor.csv", row.names=1) # reads data in as matrix with 12 columns  library(plyr) library(ggplot2)   sensorm <- cbind(melt(sensor[,2:6]), melt(sensor[,8:12])) names(sensorm) <- c("deed.var", "act", "sens.var", "sens") win.graph() qplot(act, sens, information=sensorm, facets=.~act.var)  win.graph() qplot(human activity,sens, data=sensorm, colour=act.var) +     geom_abline(intercept=0,slope=1, legend=F) +     geom_smooth(data=sensorm[as.numeric(sensorm$act.var)<iv,],          legend=F, se=F) # just describe this line for 3 backdrop,                              # as properties 4 and 5 have as well few points  last_plot() + scale_x_log10() + scale_y_log10() # seems to fix heterosceadsticity ie variance                                                # now roughly the aforementioned at different values  mod <- lm(log(sens) ~ log(act) + act.var, information=sensorm) anova(modern) # shows there is a difference in how well backdrop measured summary(mod) # equally graphics show, property 2 seems higher than properties 1 and 3  qplot((sens-act)/human action*100, data=sensorm) with(sensorm, t.test((sens-human action)/deed*100)) # not quite evidence of bias, but nearly significant with(sensorm, mean(abs(sens-act)/human action*100 , na.rm=T)) # on average viii% wrong with(sensorm, mean(abs(sens-act)/act*100 , trim=0.ii, na.rm=T)) # on average 7% wrong fifty-fifty when trimmed   with(sensorm, cbind(deed, sens, circular(abs(sens-human activity)/act*100))) # gives table of actual, sensor, and % out