Statistical Functions

The functions in this topic are useful for solving statistical problems. The first four can take any number of arguments, a vector, or a matrix. If given a vector (see Vectors and MatricesVectors_and_Matrices), the function is applied to the elements of the vector. If given a matrix, the function is applied to the rows of the matrix, and the results are returned as a vector.

AVERAGE(z1, z2, ..., zn) simplifies to the arithmetic mean or average of z1, z2, ..., zn (i.e. the sum of the zi’s divided by n). For example, both

AVERAGE(2, 4, 6, 8)

and

AVERAGE([2, 4, 6, 8])

simplify to 5.

RMS(z1, z2, ..., zn) simplifies to the root mean square of z1, z2, ..., zn (i.e. the square-root of the sum of the squares of the zi’s divided by n). For example, both

RMS(2, 4, 6, 8)

simplifies to √30 and approximates to 5.477225575.

VARIANCE(z1, z2, ..., zn) simplifies to the unbiased sample variance of z1, z2, ..., zn (i.e. the sum of the squares of the difference of zi’s and their average divided by n-1). For example,

VARIANCE(2, 4, 6, 8)

simplifies to 20/3 and approximates to 6.666666666.

STDEV(z1, z2, ..., zn) simplifies to the sample standard deviation of z1, z2, ..., zn (i.e. the square-root of the variance of z1, z2, ..., zn). For example,

STDEV(2, 4, 6, 8)

simplifies to 2·√15/3 and approximates to 2.581988897.

FIT(v, A) returns the least squares fit of a parameterized expression in label vector v to the set of points in the data matrix A. A least squares fit minimizes the sum of the squares of the discrepancies at the points.

The elements of the label vector are the data variables followed by the parameterized expression. The expression should depend on the data variables and one or more parametric variables. The dependence on the parametric variables should be linear (i.e. if p is a parametric variable, the expression must be of the form r·p+s, where r and s are expressions independent of p). The expression's dependence on the data variables need not be linear.

The elements of each row of the data matrix are the numeric values of the data variables and the corresponding numeric value of the expression given in the label vector. The data values on each row must correspond one-to-one with the elements of the label vector.

When the number of data matrix rows equals the number of parametric variables, FIT returns an expression that exactly fits the data, to within roundoff error. For example,

FIT([x, a·x^2 + b·x + c], [-1.5, 0; 0.5, -2; 1.5, -1.5])

simplifies to the parabola

2
x x 15
———— - ——— - ————
2 2 8

When the number of data matrix rows exceeds the number of parametric variables, FIT returns an expression that is a least squares fit to the data. For example, if the value of the variable data is the matrix

⎡ 2.75 -2.3 2.4 ⎤
⎢ ⎥
⎢ -3.5 4.5 4.2 ⎥
⎢ ⎥
⎢ 5 3.5 5.8 ⎥
⎢ ⎥
⎣ -4 -5 1.3 ⎦

then

FIT([x, y, a·x + b·y + c], data)

approximates to the plane

0.1536447245·x + 0.3577492652·y + 3.352791083

Note that the dependence on the data variables need not be linear, even though the dependence on the parametric variables must be linear. For example,

FIT([t, q·ATAN(t) + r·SIN(t)], [-4, -1; -1, -1.9; 2, 2.25])

approximates to

1.297230859·ATAN(t) + 0.9613787149·SIN(t)

Even though the dependence on the parametric variables in a parameterized expression is nonlinear, it may be possible to transform the expression so the dependence is linear. For example, to fit the expression

a·EXP(b·x)

to a set of data points, first transform it to LN(a)+b·x by taking its natural logarithm. Then if LN(a) is replaced by a_ in the transformed expression, FIT can be used to find the a_ and b that give the best fit of the transformed expression to the natural logarithm of the original data points. Finally, approximate EXP(a_) to find a.

Other Built-in Functions and ConstantsBuilt_in_Functions_and_Constants

Created with the Personal Edition of HelpNDoc: Benefits of a Help Authoring Tool