• Dennis Francis's avatar
    tdf#109042 : Add support for multivariate regression... · b7a02f2b
    Dennis Francis yazdı
    to regression tool. This means we now support more than
    one X variable(independent variable). One caveat is that
    all X variable observations needs to be present adjacent
    to each other in the same table. For example if data is
    grouped by columns, a valid organization of X variables
    look like :-
    
      X Variables ---->
    
      A        B       C     ...
    
      XVar1    XVar2   XVar3 ... XVarN       |
      0.1      0.45    0.32  ...         Observations
      0.34     0.23    0.54  ...             |
      0.23     0.56    0.90  ...             |
      0.32     0.11    0.78  ...             V
    
    This patch also makes our regression tool output to have
    similar structure to what Excel and Gnumeric does. This
    means more statistical measures are added including
    confidence intervals for all parmeter estimates.
    
    We already have support for Logarithmic and Power regression
    in addition to plain Linear regression. This patch's
    multivariate support extends to all of these types of
    regressions.
    
    Earlier all regression statistics were computed separately
    from scratch, which mostly compute the same regression
    multiple times. This would slow things down if the
    data-set being analysed is big. This is not true anymore
    as we use LINEST() formula. LINEST() formula provides all
    the necessary statistics needed in regression analysis, so
    here it is called just once and its output components are
    referenced to compute other statistics(derived).
    
    Following are the UI changes for the regression dialog box :-
    
    1. Changed the regression-type selectors from check-boxes
       to radio-buttons. So only one type of regression can
       be done at a time. This is because the output of a single
       regression type itself shows a lot of information and
       if do all types of regression, it is hard to read and
       interpret especially for bigger data-sets with lots of
       X variables.
    
    2. Allow the variable's ranges to have label in them, via
       a checkbox. If labels are provided, they are used to
       annotate the variable specific statistics and the user
       can easily identify the stats corresponding to each
       variable.
    
    3. More robust input validity checks, with error messages
       at the bottom of the dialog to let the user know which
       of their entry is invalid.
    
    4. User can enter the confidence level (default = 95%)
       for computing the confidence intervals of each estimate.
    
    5. Make residual computations optional via a check-box,
       as this involves writing a table with all X's and Y
       with predicted Y and residual for each observation.
       If the data-set is big, or the user just care about
       the estimates and confidence intervals, they can
       avoid this.
    
    Finally the patch includes a uitest that tests all
    3 types of regressions with a small dataset. The ground
    truths for the tests were obtained by running
    regression tool in Gnumeric.
    
    Change-Id: I9762b716eae14b9fbd16e2c7228edf9e1930dc93
    Reviewed-on: https://gerrit.libreoffice.org/56809
    Tested-by: Jenkins
    Reviewed-by: 's avatarMichael Meeks <michael.meeks@collabora.com>
    Reviewed-by: 's avatarTomaž Vajngerl <quikee@gmail.com>
    b7a02f2b
Adı
Son kayıt (commit)
Son güncelleme
..
menubar Loading commit data...
popupmenu Loading commit data...
statusbar Loading commit data...
toolbar Loading commit data...
ui Loading commit data...