To get version number go to Help -> About To find version go to R Command Editor and run BSkyVersion() ===================================== BlueSky Statistics 7.10 Release Notes ===================================== Enhancements 1. Create the following new dialogs a. "Model Statistics > Add Model Statistics to Observations" Adds the following model statistics to the dataset used to build the model or the training dataset. Models of the following classes are supported namely Linear models (lm), Generalized linear models (glm), Robust linear regression (rlm), Quantile Regression (rq), Cox proportional hazards regression model (coxph), Robust linear regression (rlm), Parametric survival regression model (survreg), Quantile Regression (rq), Multinomial Log-linear models(multinom), Ordered Logistic or probit regression (polr) All model classes do not support all statistics below. The dialog lists the statistics supported for each model class. .hat (Diagonal of the hat matrix) .sigma (Estimate of residual standard deviation when corresponding observation is dropped from model) .cooksd (Cooks distance) .fitted (fitted values of the model) .se.fit (Standard errors of fitted values) .resid (Residuals) .std.resid (Standardized residuals) b. "Model Statistics > Model Level Statistics" Displays model statistics for the following classes of models (R classes are listed in parenthesis) using the glance function within the broom package Linear models and generalized linear models (lm and glm) Multinomial Log-linear Models (multinom) Ordered logistic or probit regression (polr) Lasso and Elastic-net regularized generalized linear models (glmnet) Robust linear regression (rlm) Quantile regression (rq) Linear mixed effects model (lme, lmerModLmerTest) Survival Curve Object (survfit) Proportional Hazards regression (coxph) Parametric survival regression model (survreg) c. "Model Statistics > Parameter estimates" Displays parameter estimates of models of the following classes (R classes are listed in parenthesis) Linear models and generalized linear models (lm and glm) Multinomial Log-linear (multinom) Ordered Logistic or probit regression (polr) Lasso and Elastic-net regularized generalized linear models (glmnet) Robust linear regression (rlm) Quantile regression (rq) Survival Curve Object (survfit) Linear mixed effects model (lme, lmerModLmerTest)Proportional Hazards regression (coxph) Lasso and Elastic-net regularized generalized linear models (glmnet) d. "Model Statistics > Anova & Likelihood Ratio Test" runs an Anova and likelihood ratio test for models of the following classes (classes are mentioned in parenthesis). For Mixed effects models, the ANOVA like table for random effects is displayed Linear models (lm) Generalized linear models (glm) Linear mixed effects model (lme) Loess regression model (loess) Negative-binomial log linear model (negbin) Survival regression model (survreg) Proportional hazard model (coxph) Linear model using generalized least squares (gls) e. "Model Statistics > Summarize a model" This dialog summarizes the model selected f. "Model Statistics > Plot a model" This dialog plots the model selected g. "Model Statistics > Compare Models" Compares 2 nested modes using a F or a Chi-sq test depending on estimation. F tests are used for least squares estimation, Chi-sq test are used for maximum likelihood estimation. Both models should be created on the same dataset as differences in missing values or variable names can cause incompatibility. Model classes supported include linear models and generalized linear models (lm and glm) Ordered logistic or probit regression (polr) Linear model using generalized least squares (gls) Robust linear regression (rlm) Quantile regression (rq) Linear mixed effects model (lme, lmerModLmerTest) Survival Curve Object (survfit) Proportional Hazards regression (coxph) Parametric survival regression model (survreg) Local polynomial regression (loess) Multi-nomial log-linear models (multinom) i. "Data > Split Datasets > For Partitioning > Down Sample" for down sampling imbalanced datasets j. "Data > Split Datasets > For Partitioning > Up Sample" for up sampling imbalanced dataset 2. All dialogs under Model Statistics now require you to explicitly pick a model on the dialog itself. In the prior version, you had to pick the model under the score active dataset on the top right-hand side of the main application window. 3. When scoring a dataset you have the option to save the confidence intervals of the individual predicted values. This is supported for linear models (class lm) only.4. Better support for dates. You can right click on the variable tab of the data grid and create a new date variable. We automatically populate new date variables with the current date and use the default date format set on the machine operating system to display the date in the data grid. You have several choices to control how that date variable is displayed, for example %m/%d/%Y %H:%M:%S, %d/%m/%y %H:%M:%S, %y/%m/%d %H:%M:%S, %m/%d/%y (ignoring the %H:%M:%S) are some examples of the formats supported. All supported formats can be seen by clicking the DateFormat drop down in the variable grid tab. The display format of existing variables of date classes (POSIXct and the date class) can also be changed in the variable grid. 5.Added the following sample datasets C:\Program Files\BlueSky Statistics\Sample Datasets and Demos\Sample R Datasets (RData)\SampleDatasetWithDates.Rdata C:\Program Files\BlueSky Statistics\Sample Datasets and Demos\Anova\OneWayAnova.RData for use in Analysis > Means > ANOVA,one Way with blocks and Anova, one way with random blocks C:\Program Files\BlueSky Statistics\Sample Datasets and Demos\Comparing Datasets\df1 and df2 for comparing datasets 6. "Data> Missing Values> Missing Values, basic" has a new function getmode that replaces missing values by mode. 7. Added the following themes as choices in the theme’s menu. This can be accessed by clicking on the themes icon which is to the right of the coming soon icon on the top of the BlueSky Statistics main window. These themes control the themes in the graphics dialogs under Graphics theme_grey theme_gray theme_bw theme_linedraw theme_light theme_dark theme_minimal theme_classic theme_void theme_test 8. Added the na.action = na.exclude when building a model with Model Fitting > linear regression, linear regression with formula, Logistic Regression, Logistic Regression with formula to enable compatibility when adding observations to the dataset. 9. The Compute Dummy variables dialogs has been improved to prevent the creation of variable names with spaces, this was a problem because factor levels often times had spaces e.g. High School, Middle School…BUG FIXES 1. Fixed a defect in "Analysis > Factor Analysis > Principal Component Analysis" where the option to use the correlation matrix was not working. In the prior version the covariance matrix was used even though the correlation matrix was specified. 2. Fixed an issue with the ROC curve not displaying when scoring datasets when building models with "Model Fitting -> Extreme Gradient Boosting". 3. The scatter plot in Analysis > Means > ANCOVA shows the best fit line 4. Overlapping text in the plot complexity parameter table displayed in "Model Fitting > Decision Trees" has been fixed 5. Replaced the deprecated emmeans::CLD with multcomp::cld in "Analysis > Means > ANOVA, one way and two way", “ANOVA, one way with blocks”, “ANOVA, one way with random blocks”. 6. Replaced bar color with fill color in "Graphics > Density Plot> Options" 7. "Random Split at Data > Split Dataset for Partitioning > Random Spit" has the option to sample with replacements 8. "Distribution > Sample from distributions" would not work when there was a single column selected. This has been fixed. 9. "Data > Standardize Variables" automatically appends an underscore before the suffix or after the prefix, like the other functions do. 10. "Data > Factor Levels > Label NA as Missing" had a variable selection box erroneously titled, "Factor variables to add new levels to", this has been changed to "Factor variables to label NA values". 11. "Data > Factor Levels > Lumping into other" and "Data > Factor Levels> Specify Levels to Keep or Replace by Other" both had incorrect titles and captions. This has been corrected. 12. "Data> Factor Levels> Add New Levels" had an incorrect label associated with the control specifying the variables to select to add new levels to. This has been corrected. 13. "Analysis > Time Series > Automated ARIMA" missed the original vs. fitted plot and the option to save fitted values to a dataset. This has been fixed. 14. Corrected the spelling for Ljung-Box test in "Analysis > Time Series > Automated ARIMA, Holt Winters Seasonal, Holt-Winters Non-Seasonal and Exponential Smoothing". 15. Corrected the spelling of degrees of freedom in the "Graphics > Scatterplot, 3D" 15. The sub-dialog associated with the Facets option in the Graphics dialogs had an incorrect title. This has been corrected. 16. In "Analysis> Cluster> Hierarchical", the checkbox for "dendrogram" was misspelled. 17. "Analysis> Summarize> Summary Analysis> Numerical statistical analysis, using describe" required you to provide at least two variables, we now allow you to provide a single variable, however when a single variable is provided the name X1 is displayed in the output instead of the variable name. This is an issue with the describe function and we have reported it to the package developer. 18. In "Analysis> Factor Analysis" on the hypothesis test for number of factors, degrees of freedom was spelled incorrectly, this has been fixed. 19. In "Analysis > Non Parametric Tests > Friedman Test, the table with medians was labeled incorrectly as "Results", it is corrected and says "Medians" 20. In the "Analysis > Variance > Levene Test" changed the default from center to median 21. When entering polynomial coefficients to Mixed Effects Models e.g. age^4, we needed to append an I for e.g. I(age^4) for the model to be created correctly. This has been fixed, we also add all lower ordered polynomials if they are not already specified. Entries added can be deleted using the trash button. 22. Analysis > Means > “ANOVA, one way with random blocks” was not displaying a p value for the fixed effects, this has been corrected ===================================== BlueSky Statistics 7.0 Release Notes ===================================== Enhancements - Machine learning ------------------------------- We have expanded the choices under Model Fitting to 1. Train a neural network model using the neuralnet package. 2. Train a multi-layer perceptron using the RSNNS package 3. Extreme Gradient Boosting using the xgboost package These dialogs provide direct control over their R functions, providing great flexibility. However, since each function has its own author, their use has unique rules, some of which are described below (also see help files on each dialog box). The Model Tuning menu drives modeling via the caret package, which standardizes the interface to these same functions, but which also offers less detailed controls. 1. Train a multi-layer perceptron using the RSNNS package, see "Model Fitting > Neural Nets > NeuralNets". Sample datasets can be found in the "C:\Program Files\BlueSky Statistics\Sample Datasets and Demos\Neuralnet" folder. The dependent variable can be numeric or factor. If the dependent variable specified is a factor, we automatically dummy code the factor variable using one-hot Encoding using the decode function in the RSNNS package. Additionally, if you are using one-hot encoding to dummy code a factor variable, you can specify more than one dependent variable in the dialog. In this case, the dependent variables must be of type numeric. You can use "Data > Compute dummy variables", choose the “Keep all levels” setting to get one-hot encoding. For dependent variables of type factor, we will display a confusion matrix, ROC and model accuracy statistics when scoring a dataset using the model built. The predictions generated are of type factor since we predict the class. These will be saved in the dataset along with the predicted probabilities when scoring. When there are dummy coded dependent variables, we will not display a confusion matrix, ROC and model accuracy statistics when scoring a dataset using the model built. However, the predictions will be saved in the dataset when scoring the dataset. The predictions are the probabilities associated with the dummy coded dependent variables. It usually best to standardize independent variables (they must be numeric, too) See “Data > Standardize Variables.” If you have categorical independent variables, use one-hot encoding to dummy code the factor variables. 2. Train a neural network model using the neuralnet package, see “Model Fitting > Neural Nets > Multi-layer Perceptron”. Sample datasets can be found in the “C:\Program Files\BlueSky Statistics\Sample Datasets and Demos\Neuralnet” folder. The dependent variable can be numeric or factor. If the dependent variable is a factor, we dummy code the factor variable using one-hot encoding using the decode function in the RSNNS package. Additionally, if you are using one-hot encoding to dummy code a factor variable, you can specify more than one dependent variable in the dialog. In this case, the dependent variables must be of type numeric. You can use “Data > Compute dummy variables,” select the “Keep all levels” option for one-hot encoding. For dependent variables of type factor, we will display a confusion matrix, ROC and model accuracy statistics when scoring a dataset using the model built. The predictions generated are of type factor since we predict the class. These will be saved in the dataset along with the predicted probabilities when scoring. When there are dummy coded dependent variables, we will not display a confusion matrix, ROC and model accuracy statistics when scoring a dataset using the model built. However, the predictions will be saved in the dataset when scoring the dataset. The predictions are the probabilities associated with the dummy coded dependent variables. It usually best to standardize independent variables (they must be numeric, too) See “Data > Standardize Variables.” If you have categorical independent variables, use one-hot encoding to dummy code the factor variables. 3. Extreme Gradient Boosting using the xgboost package, see “Model Fitting > Extreme Gradient Boosting.” For predicting dependent variable of type factor, you need to recode the dependent variable to a numeric with values starting from 0. For example, if there are 3 levels in the factor variable, the numeric variable must contain the values 0,1,2. See “Data > Recode Variables.” Alternately, just convert the factor variable to numeric, typically the levels will get mapped to integers starting from 1, and then subtract 1 from the resulting variable to get numeric values starting with 0. This will give you a numeric variable with values starting from 0. You need to dummy code independent factor variables, use one-hot encoding see “Data > Compute Dummy Variables.” 4. Support a broader set of models in Model Tuning > Bootstrap Resampling, Model Tuning > k-Fold Cross Validation, Model Tuning > Leave One Out Cross Validation, Model Tuning > Repeated K-Fold Cross Validation. Scoring datasets is supported with these tuned models a. Gradient Boosting Machines with the gbm package b. Extreme Gradient Boosting with the XGBoost package c. Stepwise model selection with AIC with the MASS package d. Conditional Inference Trees with the party package e. Robust Linear regression with the MASS package f. Lasso and Elastic-Net Regularized Generalized Linear Models with the glmnet package g. Multi-variate Adaptive Regression Spline with the earth package h. Neural Net (Single hidden layer) with the nnet package i. Neural Net (Train neural nets using backpropagation, RPROP, GRPROP) with the neuralnet package 5. In the scoring section on the top right-hand part of the main application window, renamed Model Type to Model Class. 6. Added a Help button below Scoring button. Enhancements – General ---------------------- 1. Added a Font icon on the ribbon bar of the R Syntax editor window to increase the Font of the syntax displayed. This was provided for instructors to easily display syntax during lectures. 2. Integration with QuestionPro Datapad, see https://www.questionpro.com and https://www.questionpro.com/help/datapad.html. This allows you to bring in survey data directly into BlueSky Statistics for analysis, see “File > Open QuestionPro Dataset” and on the Output window, “File > Export Output to QuestionPro” to save the results of the analysis back to the QuestionPro Datapad. 3. When saving a dataset to an RData file, we rename the dataset/data frame to the name of the R data file. This is to prevent overwriting existing datasets in the situation when you create a new dataset via File > New Dataset, call it test1, close the application. Now restart the application, create another dataset from File-> New Dataset, and call it test2. In the prior release, the datasets were called Dataset1 and Dataset1 although they were saved to test1 and test2 respectively. When you opened test2 after opening test1, it would overwrite Dataset1. In the new release you can open files named test1 and test1 (from different folders), since the R objects contained within them have different names, nothing will be over-written. 4. Made improvements to the installation of BlueSky Statistics. We were looking for the path of R using a registry key which was no longer necessary. We have also created a technote at https://www.blueskystatistics.com/v/vspfiles/downloadables/BlueSky-R-Session-Creation-Steps.pdf that lists in detail how we create an R session when the application is launched. 5. Reversed the direction of the navigation tree icon displayed on the top right of the output window to match the way the arrow on the top left works. 6. Wrapped menus in the main application window when shrinking the main application window. 7. A warning message is displayed when a user tries to save a blank dataset. 8. We now support r data files with extension ".rda". Enhancements – Statistics ------------------------- 1. The interaction plots in Mixed Models now has the “Force Continuous” option. 2. Moved the “Crosstab, Two-way' dialog to Analysis > Contingency Tables > Legacy > Crosstab 2 way.” You should instead use, “Analysis > Contingency Tables >Crosstab, Multi-way.” 3. Changed the captions in the Alternate Hypothesis in the “Analysis> Means> Independent Samples T-Test” to “group1 != group2”, “group1 > group2”... 4. Added summary statistics to “Analysis > Means > T-Test, paired samples.” 5. A sample dataset has been added to test Bland-Altman plots: bland.altman.PEFR.1986.RData located in: “C:\Program Files\BlueSky Statistics\Sample Datasets and Demos\BlandAltman.” Enhancements – Data Manipulation -------------------------------- 1. The placement of the “Compute Dummy Variables” dialog has been changed to “Data > Compute Dummy Variables.” 2. When opening a dataset, we now display the first 40 columns/variables in the first page of the data editor window. You will need page through the dataset using the paging button/controls on the bottom right hand corner of the data editor window only if you have more than 40 columns/variables in the dataset. 3. Renamed “Data > Merge Dataset (tidy)” to “Data> Merge Datasets.” Moved older “Merge Datasets” dialog to “Data > Legacy” and renamed it to “Merge Datasets (legacy)” 4. A warning message is displayed when a user tries to save a blank dataset Bug Fixes --------- 1. Fixed an issue when plotting both continuous and discrete distributions where the lower and upper bounds of the sequence was getting generated incorrectly. The parameters entered in the dialog were not getting passed to the quantile function of the desired distribution. This impacted the dialogs under Distribution > Continuous and Distribution > Discrete that plotted distributions. 2. The GroupBy control in “Analysis Proportions > Proportion Test, independent samples” was restricted to factors with 2 levels; this has been fixed. 3. For “Analysis > Means > T-Test, One Sample” and “T-Test, Independent Samples,” the column header for the p-value was displaying Sig(2-tail) even when the alternate hypothesis was Population Mean > mu and Population Mean < mu. This is fixed. 4. Analysis functions like Correlation was not working when you split a dataset of class tbl_df. This has been fixed. For example when you converted an existing dataset using Reshape, the resulting dataset was of class tbl_df. When this was split using Data > Split dataset >For Group by analysis, and analysis like Correlation was run, an error was displayed. 5. We were unnecessarily printing a count label in the table with “Analysis > Contingency Tables > Crosstab, multi-way.” This has been fixed. 6. When building a model using a formula, for example, if the model for Y is a 2nd order polynomial regression on X, without the intercept (i.e. the formula is lm(Y ~ -1 + X + I(X^2) or lm(Y ~ -1 + poly(X,2,raw=T)), when you score a dataset using the model the “Make Prediction” window would display an error, “The predictor variables that the model requires for scoring are non available in the dataset. [1,poly,X,2, raw = T variables are not found]. This has been fixed. 7. In “Times Series > Plot Time Series (with Correlations),” ACF and PACF have fractional lags, we have replaced these with integer lags (Acf and Pacf functions). 8. Fixed issue with “Analysis > Time Series > Holtwinters” and “Automated Arima” reporting that the function forecast.Holtwinters and forecast.Arima could not be found. 9. Fixed issue with BSOZ (saved output) not displaying contents of empty tables correctly. 10. Fixed an issue where the decision tree diagram was not getting displayed when the model name was changed from the default treeModel1. 11. Deprecated connection method "dbDriver()" has been replaced with the new "RPostgress()" for creating connection to Postgres, for importing data. ===================================== BlueSky Statistics 6.30 Release Notes ===================================== New Features ============ 1) We have added support for Linear Mixed Models. See Model Fitting-> Mixed Models, basic. We have also created a document that explains our approach, this can be found in the \BlueSky Statistics\Docs folder. Sample datasets are available in the BlueSky Statistics\Sample Datasets and Demos folder. A polynomial control is available with Linear Mixed models 2) Support for scoring Linear Mixed Models has been added. Select the model created on the top right hand of the screen, click the score button. 3) We have removed the grey border when generating APA style tables. 4) Subsequent clicks to he 'Export to Word' option associated with tables in the output window, now appends to the same Word document that was launched the first time this button was clicked. Performance has been significantly improved. 5) An icon is displayed in the columns in the datagrid to show the type of variable i.e. factor, ordered factor, string, numeric, date or logical. 6) The titles of the tabs and text has been modified in Tools->Configuration settings dialog to make them more meaningful. 7) A new configuration setting option has been introduced under the 'Advanced' tab of the dialog displayed when clicking on the Tools->Configuration settings. This allows the user to control the conversion of character/string variables to factor for a new dataset.This option can be turned on or off. 8) A field has been added to the Tools->Configuration settings under 'Path Settings' called 'User R library path'. Additional R packages that users manually install will get installed in the path specified in the 'User R lib' no matter if installed from CRAN or from a ZIP file on disk. 9) A new dialog has been added for computing dummy variables. This can be found under Data-> Compute New Variables -> Compute Dummy Variables. You can specify what level to treat as a reference and whether dummy variables should be created for reference values. This was done on request for users using the Model Tuning dialogs where several of the algorithms required Factor Levels to be dummy coded. 10) Several users have requested the ability to change the order of factor levels (some of these requests originated as there was a need to control the order in which levels were displayed in charts like plot of means, for example the plot of means chart displayed means for December, November and October (incorrect order) instead of October, November , December (correct order)), other use cases and associated new dialogs are listed below Data->Factor Levels->Add New Levels (Adds new levels to a factor) Data->Factor Levels->Display Levels (Displays levels of a factor) Data->Factor Levels->Drop Unused Levels (Drops unused levels of a factor) Data->Factor Levels->Label NA as Missing (Labels missing values to any caption you specify) Data->Factor Levels->Lumping into 'Other' (Provides options for lumping factor levels to the "other" level i.e. lump the least frequent or keep the most common or least common levels or keep the levels that constitute at least a certain proportion or at most a specified proportion, the rest will be lumped into "other".) Data->Factor Levels->Reorder by Count (Reorder factor levels by descending/ascending order of the counts at each level) Data->Factor Levels->Reorder by Occurence in Dataset Data->Factor Levels->Reorder by One Other Variable (Levels in one factor variable are reordered based on the results of an arithmetic function applied to another variable) Data->Factor Levels->Reorder levels Manually (The user specifies the order of the levels) Data->Factor Levels->Specify levels to keep or replace by 'Other' Bug Fixes ========= 1) Fixed crash that would occur when you Merged datasets. This happened when you moved a dataset to a target variable list and the source variable list was empty and then moved that dataset back to the source variable list 2) Fixed an issue with plot of means where the X axis labels were not getting displayed correctly for factor variables 3) When you have a variable in a dataframe with all NAs, this is displayed as a 'logical' class/type in R. When such a dataset was loaded in the grid the 'logical' column was showing 'True' instead of NAs. This issue has been fixed. Now logical column will all NAs will show NA in the datagrid. 4) When user is editing a value in a factor column/variable in the datagrid and enters value that does not match to a valid level then we show a message to let the user know that there is no valid level for the factor that matches the entered level. 5) Fixes to dialog editor crashes 6) Dialog Designer: drag and drop to works correctly in the preview mode 7) Templated dialogs (e.g. Crosstab, Multi Way) was not working when executed on a dataset that has a split set. See Data->Split dataset->For Group By Analysis->Split to split a dataset. 8) "Data -> Reload Dataest from File" was not working on the datasets that had an underscore in the file name. This has been fixed. 9) "Tools -> Package -> R Package Details" was not working if an empty dataset was loaded it the grid, but it worked if a non-empty dataset was active in the datagrid. ===================================== BlueSky Statistics 6.20 Release Notes ===================================== Application improvements ========================== 1) Made improvements to data entry into a new dataset(blank). The arrow keys allow you to move between cells like Excel. Hitting the Enter key moves to the cell below so that you can enter values. 2) We always preserve the order of the variables in the source variable list on drag and drop. The order matches the order of variables in the dataset. 3) Comments/notes can be added to the analysis in the Output window. Clicking on the note icon above the analysis allows you to add notes. These notes are persisted when the output is saved. 4) You can delete the output associated with the analysis by clicking on the new trash bin icon displayed at the beginning of the output associated with the analysis. 5) Support copy and paste of datasets from Excel or tables from the Windows clipboard into the BlueSky Statistics datagrid. You must click the paste available from the File menu on the datagrid window or the paste available when you right click on the datagrid. 6) We have made the following Improvements - - Added a new icon to show/hide the output navigation tree. This is available on the top left of the output window. The output navigation tree allows you easily navigate the results of the analysis in the output window. This is useful when there is a large volume of analysis generated in the output window. - In the navigation tree, if the parent node of an analysis is clicked then the analysis output is scrolled to bring the title of the analysis in the visible area of the output window. - You can right click on a node in the navigation tree and delete that node and the related data from the output navigation window. - Provided a selection combobox in output navigation panel that allows you to filter the items displayed in the output window. - 'Default' selection mode option was not working correctly. It is now fixed. - The title of the analysis in the navigation tree is renamed to "Title" (earlier we called it "Header"). - In place of "Dataset" we now display "Dataset Name" in the navigation tree. - In the navigation tree, in place of "Command" for R syntax we now show a part of the syntax that was executed. - Different icons were implemented in navigation tree for Title, Notes, Dataset name, R Syntax, Tables, Graphics. 7) Output viewer has been updated to show the navigation tree as available in the output window. The show/hide navigation icon is available on the top left of the Output window. 8) The data grid window has been improved to support the directional arrow keys. 9) When you open a dataset in BlueSky Statistics we would create a R data frame object that we would sequentially increment e.g Dataset1, DAtaset2, Dataset3. We have now changed this we now create a R data frame object with the name of the file you are opening. So if you open a dataset called "Cars.csv" the R data frame object will be called "Cars.csv". If you open up an Excel file called "Cars.xlsx" it will be called "Cars.xlsx". If there are spaces, underscores or special characters in the name of the file then we put a period(.) in its place while generating a R data frame name. If the filename begins with numbers then we prefix the R data frame object with a letter "D". 10) If you open a Dataset that is already open, we will refresh and bring focus to that dataset in the UI datagrid. 11) When you open a BlueSky Statistics dialog, that dialog retain focus and stays on top of all the windows. you must click "Ok" or "Cancel" to continue using the application. 12) New output window can be opened using File -> New -> New Output Window 13) Saved output file can be opened using File -> Open Output Analysis and Modeling ===================== 1) Added support for basic mixed models under top level menu Model Fitting->Mixed Models. We current support a single nesting unit. We will be adding additional capabilities in upcoming releases. 2) Better scoring support for multinomial logistic and ordinal regression models. 3) Multi-way crosstabs displays odds ratio. 4) Added a configuration setting ("Show actual p-values (e.g. p-value=.0002) in the output instead of '<.001***' ") for displaying actual p-values. NOTE: The default setting is unchecked. 5) Updated the version of R we use to R 3.6.1 (NOTE: R 3.6.1 requires .NET 4.6.1, you will require this version of .NET or above) 6) Correlation is new sub menu under Analysis 7) The following dialogs have been updated: -Make Predictions (The required field indicator was not aligned) -Principal Component Analysis (PCA) Dialog has explanatory text added -Ordinal Regression Dialog has explanatory text added 8) Added the following new datasets to the "BlueSky Statistics\Sample datasets and Demos\Model Fitting and Scoring folder" iris.rdata for multinomial regression titanic test and training datasets for logistic regression Housing dataset for ordinal regression Defect Fixes ============== 1) In Multi-Variable One Sample T-Test, fixed an issue with summary statistics not getting displayed for one level of a factor variable. 2) Issue related to the manually expanding/shrinking the output window with the mouse. Instead of resizing the output window it resized the BlueSky Statistics R command syntax editor. 3) The BlueSky Statistics application crashed when a user tried to modify and save the default syntax script. 4) Extra asterisk (*) was displayed in Crosstab table title if no layer variable was selected. This is now fixed. 5) The sum of squares table in linear regression was summing the mean square table instead of the sum of square table. This has been fixed. 6) Fixed issue where characters in long column names were getting cut off in the tables that displayed the results of the analysis. 7) When generating syntax for the "Bar Chart" we were forcing as.numeric on 8) Minor syntax fix done to Summary Statistics, Selected Variables dialog. 9) The setting contrasts was not working correctly we have introduced new dialogs for setting contrasts. 10) Distribution dialogs will work even if the empty dataset is open(or active). For other dialogs an appropriate message is shown. Dialog Builder ============== 1) The default increment for spinner control in dialog editor has been set to 1. 2) Checking for duplicate names now works for the spinner controls and the required field indicator. 3) Fixed dialog editor to allow you to define rules that disables the following controls. -Slider control -Advanced slider control -Spinner control 4) Fix the issue with saving the R source code script files to a network share. 5) Fixed dialog editor to allow you to define rules that disables the following controls. -Slider control -Advanced slider control -Spinner control ===================================== BlueSky Statistics 6.1 Release Notes ===================================== Enhancements 1. Added support for 52 continuous and 25 discrete distributions. These are available under top level menu called Distributions Distributions: Continuous: Beta Probabilities Beta Quantiles Plot Beta Distribution Sample from Beta Distribution Cauchy Probabilities Cauchy Quantiles Plot Cauchy Distribution Sample from Cauchy Distribution Chi-squared Probabilities Chi-squared Quantiles Plot Chi-squared Distribution Sample from Chi-squared Distribution Exponential Probabilities Exponential Quantiles Plot Exponential Distribution Sample from Exponential Distribution F Probabilities F Quantiles Plot F Distribution Sample from F Distribution Gamma Probabilities Gamma Quantiles Plot Gamma Distribution Sample from Gamma Distribution Gumbel Probabilities Gumbel Quantiles Plot Gumbel Distribution Sample from Gumbel Distribution Logistic Probabilities Logistic Quantiles Plot Logistic Distribution Sample from Logistic Distribution Lognormal Probabilities Lognormal Quantiles Plot Lognormal Distribution Sample from Lognormal Distribution Normal Probabilities Normal Quantiles Plot Normal Distribution Sample from Normal Distribution t Probabilities t Quantiles Plot t Distribution Sample from t Distribution Uniform Probabilities Uniform Quantiles Plot Uniform Distribution Sample from Uniform Distribution Weibull Probabilities Weibull Quantiles Plot Weibull Distribution Sample from Weibull Distribution Discrete: Binomial Probabilities Binomial Quantiles Binomial Tail Probabilities Plot Binomial Distribution Sample from Binomial Distribution Geometric Probabilities Geometric Quantiles Geometric Tail Probabilities Plot Geometric Distribution Sample from Geometric Distribution Hypergeometric Probabilities Hypergeometric Quantiles Hypergeometric Tail Probabilities Plot Hypergeometric Distribution Sample from Hypergeometric Distribution Negative Binomial Probabilities Negative Binomial Quantiles Negative Binomial Tail Probabilities Plot Negative Binomial Distribution Sample from Negative Binomial Distribution Poisson Probabilities Poisson Quantiles Poisson Tail Probabilities Plot Poisson Distribution Sample from Poisson Distribution 2. We have improved support for Rasch models by adding new dialogs for Model Fitting->IRT->Multi-faceted Simple Rasch Models Model Statistics->IRT->Likelihood Ratio tests Model Statistics->IRT->Personfit Statistics 3. When working with date variables, the UTC offset is displayed in the variable grid. This is available as a new column. 4. In logistic regression, we have added support for a ROC Table. Note: For large datasets, this can introduce a performance penalty, so use this carefully. This ROC Table option is available when scoring a dataset with a Logistic model. You can access this by first creating a logistic regression model, selecting that model on the right hand top part of the screen, and clicking the Score button. 5. Added p value statistics to Ordinal Regression. 6. Improvements to File->Load Data from Package. You don't have to specify a R package. By default, we display all the datasets in the currently loaded packages. You can optionally load data from a single package as well. Additionally, we show a message when the package selected does not contain a dataset. 7. For variables open in the BlueSky Statistics data grid with the POSIXct class, we show the offset from UTC in the variable grid as a separate column. Bug Fixes 1. When you open an Excel file with a date, that date was not displayed correctly in BlueSky Statistics. We were incorrectly shifting the date by the UTC offset. This has been fixed. 2. When you created a new POSIXct date variable using the menu under Data->Dates->Convert String To date, the date was displayed with the incorrect UTC offset. This has been fixed. ==================================== BlueSky Statistics version 6.0 ==================================== Graphics and Visualizations --------------------------- 1) Graphics menu is now sorted alphabetically to make it quicker to find what you need. 2) Graphics syntax is now easier to read. 3) Global graphics themes are now available. In prior releases, you could set themes on each graphic dialog. Now you set themes once in the main application or output window by clicking on themes in the top menu bar. You can save a default theme and override a theme. You can also set default fonts, font size,image height and width. New Graphics Types ------------------ 1) Contour charts 2) Bar Charts (with means) 3) Improvements to the original Bar Chart 4) Pie Charts 5) Bulls eye charts 6) Line Chart (drawn in the order in which variables appear) 7) Line Chart (line drawn in the order of X axis variable) 8) Line Chart (stair step plot) 9) Improvements to PP and QQ plots to support beta, chisq, gamma, f, exp distributions 10) Scatter plot with square bins 11) Scatter plot with hex bins 12) Violin plots Data Management --------------- 1) File-> Load Dataset from R Package added to import practice datasetsfrom R packages 2) New options to"Remove extra spaces" from factor/character variablenameswhen opening SPSS datasets 3) Data-> Re-order Variables in the Dataset Alphabetically gives you the ability to order the variables alphabetically orreverse alphabetically. 4) Data->Reload Dataset from File restores the dataset from the file on disk Scoring and modeling -------------------- 1) Enhanced logistic regression to support dependent variables of class logical and numeric 2) Make scoring more robust when dependent variables are string and logical for decision tree and random forest models 3) When scoring a dataset and saving numeric predictions and predicted probabilities, the values are rounded honoring the number of decimal digits setting in Tools->Configuration Settings on the others tab. Dialog Designer --------------- 1) New icons for controls 2) New Slider control 3) New Advanced slidercontrol that displays the value of the slider in a textbox 4) New spinner control added (This is the last control on the grid, you need to click on the dropdown on the left-handside grid displaying all the available controls to see the spinner control.) General Features ---------------- -- All required field on the dialogs are indicated by a red asterisk . -- A link "Coming Soon" has been added to the top menu bar to list the capabilities being evaluated for delivery in up-coming releases -- Default dataset that loads when application is launched, now contains NA instead of zeros as before -- Dataset navigation buttons at the right bottom of the Data grid will now enable/disable depending on if there are still more columnsto load or either end has been reached Syntax Editor -------------- 1) R syntax pasted from the dialogs is much simpler and easier to read 2) Long lines of ggplot code cannow be split by ending at the “+” sign Output ------ 1) Fonts choices have been improved throughout (matching APA style) 2) Row/column headers in output tables are center aligned 3) Dataset nameis printed at the top of each analysis in the output. 4) Pasted syntax now includesthe title of the dialog which will help a user to easily locate specific pasted syntax 5) Dataset split message that appears in the output has been made bigger to easily locate the split slice Analysis -------- 1) Significance code with '***' is now shown as'(<.001)***' in all output tables 2) The logistic regression dialogsnow accept numeric dependent variables, in addition to the usual factors 3) The vertical barsymbol has been added to the GLZM dialog and linear model dialogs to support the creation of nestedmodels 4) Item Response Theory(IRT)analysis is now available on theModel Fitting menu. Included are: Simple Rasch models, Partial Credit models, and Rating Scale models 5) Changed the labels with Multi-way Anova to indicate that 2 factor variables need to bespecified as independent variables 6) Added support for calculatingMcDonald's omega estimatesin reliability analysis Defect Fixes ------------ 1) Fixed an issue where you could not load dialogs created with the Dialog Designer application in the main BlueSky Statistics application. 2) '#' appearing in color code (#112233) will not be ignored by syntaxparser 3) Fixed where "Reload dataset from file" wasnot refreshing the variable grid 4) Fixed a defect where if there is a statement above or below local and we try to run the whole syntax along with the statements before and after local, the last line after local() block gets trimmed. So if last line was BSkyLoadRefreshDataframe(), it is read as "frame()" 5) If user saves(Save As) an in-memory dataset(that is currently loaded in the data grid), to a fileon disk that is alreadyloaded in the data grid,then the user will not be allowed to overwrite the file 6) Fixed a crash where we were trying to call a function from BlueSky R package, when the BlueSky R package was not loaded correctly 7) Fixed an issue with Hierarchical Clustering where the number of rows in the dataset that were assigned to a cluster was printingincorrectly. Also addressed an issue where the cluster numbers were not getting saved to the dataset. Minor fixes ----------- 1) Corrected defect in McNemar and Fisher test incrosstabs 2) Improved formatting of Hosmer-Lemeshow test 3) Fixed an issue where a real number is entered in a cell of a variable name that is of class integer.The user will get a prompt, informing the user that a real number has been entered and will be allowed to optionally change the class of the integer variable to numeric ==================================== BlueSky Statistics version 5.40 ==================================== NEW FEATURES: -------------- 1) Added support for weighted datasets. This option is available in Data -> Set Weights. Once you specify the weighting variable we create a new dataset with rows replicated as defined in the weights. This is similar to what SPSS does internally. When you run frequencies, independent sample T Test, graphics commands, statistical tests etc on the new dataset (In BlueSky Statistics) with the rows replicated you will see exactly what you see in SPSS now. You will see identical results to what you see in SPSS. 2) Added the option to specify a weighting variable in Linear and Logistic regression. This allows an optional vectors of weights to be used during the fitting process. 3) Added support for logistic regression under 'Model Fitting'. Once the model is built, you can score the dataset, optionally obtain a confusion matrix, model statistics and an ROC curve by selecting the model and clicking score -This is available on the top right hand corner of the main application window. 4)We have updated the "Multi Way Anova" dialog with following capabilities: -display contrasts. -interaction plots. -support for type I, type II and type III tests. -pairwise comparison. 5) New reshape dialog with simplified R syntax has been added. 6) The "Multi variable one sample T-Test" and "Multi variable independent sample T-Test(with factor)" have been updated to allow you to specify the alternative hypothesis. 7) Added capabilities to support date manipulations. -The "String to date" dialog allows you to convert string to POSIXct date class. -The "Date to string" dialog allows you to convert the date (POSIXct and Date class) to string. 8) Simplified the syntax for frequencies and factor analysis. 9) If you launch a second instance of BlueSky Statistics the message that gets displayed has been improved. NOTE: You can only have one BlueSky Statistics instance running at a time. 10) We have improved the ability to browse the contents of the output window in BlueSky Statistics. This can be accessed from the menu (Layout > Show Navigation Tree). 11) Items in the output window can now be deleted. To delete an item from the output, just right click on the item(table/text/graphics) and choose the "Delete" option. 12) To make the output visually more appealing, we have introduced an option to hide the R syntax that gets displayed in the output window. This is controlled by an option in the configuration window ( Tools > Configuration Settings > Output tab ). By default we hide the R syntax in the output window. 13) When the option to show R syntax in the output is turned on ( Tools > Configuration Settings > Output tab ) then if you resize the output window, we wrap the R syntax so that it is always visible. 14) To run any line of R syntax just place the cursor on that line and hit the RUN button, you don’t have to select the entire line. 15) If you want to run your R script line by line, place your cursor on the first line and hit RUN, the cursor will automatically move to the next line and you can hit the RUN button again. This feature will work for simple R syntax which does not span multiple lines. Example 1: 3 lines below work a=10 b=20 c=a+b Example 2: 4 lines below will not work if(TRUE) { print("Great") } 16)Added helpful hints to indicate you have reached the beginning and end of a dataset when scrolling wide datasets. This is available on the paging controls on the bottom right hand corner of the screen. 17) Application launch has been made faster. 18) In the BlueSky Statistics syntax editor, just like a comma, the pipe (%>%) can be used to break a long R code statement. 19) When the application launches we open a new blank dataset this has been populated with zeros. You can right click on a row to delete a row or go into 'Data -> Delete Variables' to delete variables. 20) Clicking on a variable name in the datagrid sorts in ascending order by that variable name. Clicking again sorts in decending order. BUG FIXES: ----------- 1) Fixed an issue with factor analysis when saving scores using the regression method. 2) Fixed an issue when re-editing factor levels that were previously changed in the variable grid. This would result in the dialog not functioning correctly and incorrect levels being added. 3) Fixed an issue when you closed the empty dataset that gets created when BlueSky Statistics is launched and then attempted to save open datasets- those datasets would not get saved correctly. 4) Fixed an issue that was limiting the number of variables that Factor analysis can be run across. 5) Within block commands that use 'local()', cat("\n") can now be printed in the output to leave some extra spaces. 6) When you added a new factor variable and renamed the variable using the user interface and then tried to add new levels - this did not work and has been fixed. 7) Add new factor variable. Click in the cell where new variable name is shown. Cell goes in edit mode. Now add new factor levels to this new variable. Switch to datagrid and select a different level. Switch back to 'Variables' tab and the application crashes. This has been fixed. 8) When any existing factor level name is modified using the user interface, a blank level automatically gets added. If you try to modify the level, it does not take effect. This has been addressed. 9) Disable data grid navigation buttons (on the lower right hand side of the datagrid) if there are less than 16 columns in the data set. 10) Changing factor levels was not working because one or more levels had a single quotes in level name. 11) Aggregate control fixed: Text above the dropdown(that contains mean, median etc.) was getting chopped off. Similar issue with the label text above a textbox (which was almost at the bottom of the dialog). 12) Fixed significance codes for: - One Sample T Test and Independent Sample T Test - Multivariable one sample T Test and Multivariable Independent one sample T Test(with factor) 13) Fixed a defect: Select some syntax and hit RUN. After execution the cursor goes to the top of the script. Now it moves to the next line. 14) Datagrid navigation buttons disabled if either end of the datagrid is reached. If there are no more columns on the right, the right navigation button is disabled. If there are no more columns on the left then left navigation button is disabled. 15) Left navigation tree in the output window is fixed for look and feel. It now has a cleaner look. To access left navigation go to Layout -> Show navigation tree 16) In the R syntax editor, we now ignore square, curly and round brackets that appear inside single or double quotes. See example below: grepl("[", "a[b", fixed = TRUE) ==================================== BlueSky Statistics version 5.35 ==================================== New Features and fixes: ---------------------- 1) On app exit user can choose to not save the output and/or R syntax and the respective popups are not shown. 2) Adding new factors is made easy. - Open a dataset that has one or more factor column(s) - From BlueSky Statistics main window (where data grid is shown) click 'Variables' tab on the bottom left. - Click ellipses button from 'Values' column for a factor variable of your choice. This will open 'Edit Factor Levels' dialog. - In this 'Edit Factor Levels' window scroll to the bottom you will see a blank field. Here you can enter one or more (comma separated) factor levels. 3) If user try to save a memory dataset that is loaded in the UI grid then we show Save-As dialog so that user can save this memory dataset to a file. 4) Opening RData files with multiple data.frame objects is now supported. 5) License window was not showing the Browse/Activate buttons properly when license is expired.(Applies to Commercial Edition) 6) Right click output table to save it to a PDF is now supported. 7) If APA flag is turned ON from the configuration settings then exporting single output table or the whole output to PDF exports table in APA style is supported. 8) Right-click anywhere in data grid row to see the context menu. 9) In numeric column if user types in a character data, we show a warning asking if user want to change the 'class' of this column to 'character'. 10) Plot dialog now supports factor variables in list boxes for X and Y. 11) An improved RData warning dialog is shown when user opens a .RData file. Detailed message will only be shown if user clicks on the 'Advanced' button. 12) A flag has been added in configuration settings (under tab named 'Others') for turning ON/OFF the Rdata warning dialog. If you turn it OFF you do not get a warning dialog when opening a .RData file. 13) If a dialog requires one or more R packages to run, and user pastes the syntax, the required package(s) loading syntax will be pasted on the top of dialog syntax in the R command editor. 14) Underline removed from the column headers when the output table is exported to MSWord. 15) Saving RData file with another name (in RData format) closes the current dataset and load the newly created RData file. During this operation the Rdata warning dialog is not shown even if it is turned ON from configuration settings. 16) At app launch time if for some reason R is not found, we show a configuration window where user can set R path. After setting the correct R path user must relaunch the application for changes to take effect. 17) Selecting a block of cells in data grid and then hitting a DELETE key from keyboard shows a warning message and does not allow to delete multiple rows at once. 18) Data grid column can now be sorted when column header is clicked. Clicking the first first will sort in ascending order, clicking second time sort order changes to descending. On each click the sort order keeps changing. 19) Shading style removed from 'OK', 'Cancel' and 'Syntax' buttons in the dialogs. 20) Default install location is now "Program Files". 21) R command editor is hidden at launch time to have good amount of space for output. To Open R Command Editor user can hit the arrow button provided on the top right corner. Same button can be clicked again to hide the R Command Editor. New Dialogs ----------- -Ancova -Reliability Analysis -Summarizing Models for each group -Reshape (long to wide) -Reshape (wide to long) Old dialogs improved -------------------- -Frequency Table (clean syntax) -Plot (factor is suported for X and Y)