Absolute Values
Home
ProductsServicesPeopleContact Us

 

   

 

   Frequently Asked

Questions

 

 

Installation Application will not install.

Is SAS Base Version 8.2 or higher installed? EZFit will not function with earlier versions of the SAS System.

Is SAS/STAT installed?  The SAS/STAT component is required.  It contains “procs” that are necessary in the construction of logistic regression models.

Is SAS Integration Technologies installed? This component allows EZFit to launch and control a background SAS session.  SAS Integration Technologies is automatically installed when SAS 9 is installed.  For version 8.2, it is necessary to download and install SAS Integration Technologies from the SAS web site at www.sas.com.

Is Microsoft .NET Framework v 1.1 installed?  EZFit requires version 1.1 of the .NET Framework.  This is provided through Windows Update by Microsoft.

Licensing: Problems with licensing. 

Did you request a license? The initial registration license is temporary.  You must request a license by clicking "request license" upon launch of EZFit. 

Have you changed hardware since installation and licensing of EZFit? Changing out hardware may require reassignment of your license key.  Please contact licensing@absolutevalues.net  to determine if this is necessary.

Are you working offline?  If so, the license request e-mail was not sent.  Please contact licensing@absolutevalues.net for more information.

License request e-mail not sent?   It is not uncommon for a corporate firewall to prevent EZFit from sending the e-mail that requests a license.  Please save the file when prompted and forward it to: licensing@absolutevalues.net.

TOP               

Launch: Application does not run.

Are you running SAS version 8.2?  If so, the most recent hot fix for The SAS System must be applied for EZFit to function. A problem in SAS 8.2 prevents communication between EZFit and The SAS System.  Hot fix 82BB28 corrects this problem.  Applying the most recent hot fix bundle available for Version 8.2 from www.sas.com will enable EZFit to function correctly.  If you are running Version 9, EZFit requires no hot fix.

Is your SAS license valid? An out of date SAS license will prevent the application from running. 

Is SAS Integration Technologies installed?  This is a necessary component for the proper functioning of EZFit.

Have you removed any required components since installing EZFit?  Removal of SAS, SAS/STAT, SAS Integration Technologies or Microsoft’s .NET framework will result in EZFit not functioning.  If necessary, reinstall removed components and retry EZFit.  Note that uninstalling EZFit will also remove the SAS Integration Technologies components.  Please reinstall SAS Integration Technologies.

TOP               

Build:  This section contains answers to questions that commonly arise during model construction.  If you do not find an answer to your question here or in the program help files, please send a detailed e-mail to: support@absolutevalues.net. A consultant will usually respond within one business day.

Step 1 - Preprocess Data

Should I start with a full file or samples?  It depends on what you have available.  If you have access to the full file and choose to start with that, then EZFit will create modeling samples for you.  For example, if you are modeling response for a future campaign based on a past campaign, the full file would contain all mailed and delivered customer records from the past campaign appended with all candidate variables.  If, instead, you are provided with a sampling of mailed records, or if you prefer to create your own samples, then choose to start from samples.  This option will skip EZFit’s sampling procedure.

When would I use a stratifier variable? If your starting file contains an uneven distribution based on some variable, you may wish to stratify on this variable.  For example, in building a response model from a past campaign, you notice that the past campaign used a variable to segment the mail: All customers from groups A, B and C were mailed, while only portions of groups D, E, and F were mailed.  In this case, you may want to stratify on the “group” variable.  This will ensure that “enough’ observations from each group are selected for modeling. 

Why are some variables automatically set to “exclude”?  Variable set to “exclude” will not be included as candidates for model construction.  This option is provided to eliminate unnecessary work to create a dataset that contains ONLY modeling variables. EZFit recognizes certain variable names (like street address and name) and excludes them because they are not appropriate for modeling.  Other variables may be set to exclude based on being character variables with a large number of different values.  You can over ride the recommendations, but variables like name and street address are useless in modeling and will severely increase the CPU load and processing time. 

When I click “Reset table to original values”, my table does not reset.  Why? After changes to the “Variable Information” table are accepted, the newly accepted version becomes the basis for “original values”.   The only way to revert to the initial table after accepting changes is to start over.  This is by design. 

I made changes to the table, but I do not want to keep them.  How can I do this?  Choose the “Discard Dataset Changes” button and then click continue. 

I made changes to the table, but they were not accepted.  Why?

An error message is displayed.  The message box explains why the changes were not acceptable.  The following situations will result in rejection of your changes:

  • More than 1 dependent variable

  • More than 1 stratifier variable

  • A dependent variable with values other than 0 and 1

  • Missing values in stratifier or dependent variable

No error message is displayed.  Users frequently forget to select the “Accept Dataset Changes” button. Go back and ensure that the “Accept Dataset Changes” button is selected. 

After I click continue, the frequencies are not what I expected.  Why?  The displayed frequencies are based on the starting file that was designated in Step 1.  The only possible explanation is that you did not select the intended dataset.  Start over and select the correct dataset. 

The “Begin Stratified Sampling” (“Begin Non-stratified Sampling”) button is my only choice and this is not what I want.  Why? You designated a stratifier variable in the “Variable Information” dataset. (You did not designate a stratifier variable in the “Variable Information” dataset.)  Start over and correct the problem by changing the variable’s class designation. 

TOP               

Step 2 - Create Samples

Should I accept the proposed sampling criteria?  Unless you have a specific sampling scheme in mind, it is best to accept the proposed sampling criteria.  If you wish to do so, choose “Yes” and then click “Continue”.  Otherwise, choose “No” and click “Continue”. 

Do I need both development and validation samples?  For determining how well the model predicts for a different dataset from the same population, a validation sample is STRONGLY recommended.  The model is constructed using the development sample, and it is “validated” using a validation sample.  Although this is the ideal situation, sometimes there are not enough observations to have both development and validation samples.  In this case, it is not easy to “validate” the model.  When this situation arises, we recommend testing the model on a subset of the population or using some alternative form of validation like bootstrapping.  EZFit does not provide a process for bootstrapping. 

What are target and non-target values?  The target value is the desired value of the dependent variable (the thing you are trying to predict).  The target value is usually 1, and the non-target value is usually 0. 

What ratio of non-target to target value is preferable?  Usually, the original dataset has many times more non-target valued observations than target valued observations.  In order for logistic regression to identify differentiating factors among these, it is best to sample down to a reasonable ratio.  Although there is no standard, accepted ratio, we recommend using approximately 5 times as many non-target values as target values.  The ratio would then be 5:1.  Note that you can use weighting in the validation process to index the model results to the actual population ratio. 

What if the counts are not what I expected?  How do I know?  After the samples are created, EZFit will display frequencies for each of the samples created (development and, if present, validation).  If you do not like the distributions, you can revise the sampling criteria until desirable samples are achieved. 

TOP               

Step 3 - Bin Variables

I accepted the default values and started binning.  Is my EZFit session frozen?  More than likely, EZFit is working hard to find an appropriate binning strategy for each variable in your development file.  This is a highly CPU intensive step in the process.  If you have several hundred variables and thousands of observations, the binning process can take 2-3 hours.  The time required depends directly on the number of variables and the number of observations in your development file.  Be patient.  This is a highly complex process that, when done manually, can take a week or more.   

Once the binning is complete, how do I see what EZFit did?  You can click a variable name in the table presented to see individual variable binning results for non-missing values.   If you have several hundred variables, don’t worry.  At the end of the process, EZFit provides output for each model variable and its binning scheme.  You will also have SAS code to score a new file, which includes code to bin each model variable.  This means that to score a new file, only the model variables in their original forms are required.  

Why am I only allowed to change the “Use Original” value for some of the variables in my data set?  Because logistic regression requires numeric variables, it is not possible to use the original form of character variables. For character variables, the "Use Original" option will automatically be set to "N/A".

When would I want to “Use Original” as opposed to the binned form of a variable?  It is STRONGLY recommended that you use the binned form for a majority of the variables.  Using original, non-binned forms of variables can stress the logistic regression procedure and produce unpredictable results.  In some cases, there may be a variable that your business understanding provides more information about and that information would drive a binning strategy.  

For example, perhaps there is a variable that indicates customer channel preference with original form:

1 – walk-in, 2 – web, 3 – catalog, 4 – phone 

EZFit may create 2 bins because of similarity in the target rate among web, phone and catalog customers:

1 – walk-in,  2 – web, phone and catalog 

However, your management team may prefer to have all 4 categories present because of other differences like purchase frequency or cost of acquisition, or because other selection rules are applied at the time of list generation based on the channel preference variable.  In this case, choose “Yes” for Use Original, and the variable’s original form will be used.

When would I want to subset variables?  If you have a large number of variables and wish to construct a "quick" model, you may choose to restrict the set of candidate variables.  This will reduce the number of candidate variables used in "proc logistic" based on the parameters entered.   EZFit allows two options for restricting variables: minimum Chi Square value and number of variables. 

What if I select subset, but I don't like that set of variables?  Do I have to start over to get back all the variables?  No.  EZFit displays the subset of variables based on your criteria.  If you don't like the subset (feel it is too restrictive or too inclusive), you can choose to apply new criteria.  To include all variables, just click on "Reset Default Values".

TOP               

Step 4 - Develop Model

Depending on the selection method I choose, some of the input boxes are grayed out.  Why?  Only the valid input parameters for each selection method are available. 

I ran the selection, but it took a really long time and I would like to rerun it.  What can I do to reduce run time?  One thing you can do to reduce run time is to reduce the Maximum Number of Iterations allowed.  However, doing so may not result in an optimal model.

TOP               

Step 5 - Development Results

I used forward (or stepwise) for selection method.  Why isn't the recommended number of variables the same as the number of variables displayed in the "Preliminary Model Statistics" table?  The recommended number of variables is based on a combination of the output statistics in the table.  To avoid over-fitting, you may not always want to use all variables introduced by the logistic procedure.  It is best to either accept the recommendation, or arrive at a different conclusion based on the statistics provided.

What am I looking for in "Model Performance"?  The model performance is provided by deciles; the development file is cut into 10 groups containing (nearly) equal numbers of observations.  The Target Percent should be greatest in Decile 1, and exhibit a decreasing pattern down to Decile 10.  The more difference (spread) observed between the deciles, the better.  With real world data, there may not always be a strictly decreasing pattern displayed.  Keep in mind that given the input parameters and the available variables, the best fitting model has been obtained.  If you don't like what you see, try using different inputs.

How do I interpret the Correlation Matrix?  The matrix shows the variables by original name along the top and then by the recoded or original name on the left side, depending on which form is used in the model.  Correlations range between -1 and 1.  Larger absolute values indicate that a given pair of variables are more highly correlated than those with absolute values near zero.  If you are building a model to explain (such as models used in clinical trials), then high correlations can add confusion, resulting in the "chicken or egg" syndrome.  When models are used for prediction purposes only, higher correlations may be acceptable.  If your business has rules regarding maximum allowable correlation, then that value can be entered as a parameter in Step 4.

TOP               

Step 6 - Validate Model

When would I use weighting?  Typically, weighting is used to provide the validation results indexed to the overall population size/mix.  For example, if the validation file contains a 3:1 ratio of non-target to target values, but the original population actually contains a 10:1 ratio, you would want to weight the results.  Weighting can be achieved by using the original file (if a full file was selected at the beginning) or by entering values (either full file or samples). 

I started from a full file and want to weight the results.  When might I want to enter values (as opposed to using the full file) for weighting?  We recommend using the original file for weighting, but you can always enter values.  An example of when you might want to enter values:

Suppose you are building a response model, and you are using a full file from a prior acquisition campaign. You are aware that only a portion of the initial population was mailed (i.e., A different model was used to determine the mail file, and only portions of some customer segments were included as test cells.) 

To help alleviate the bias caused by use of the previous model, you decide to use a stratified sampling approach based on customer segment.  Since only portions of some segments were mailed, while all of some other segments were mailed, you would want to weight the results to determine the effect of the new model on the entire population.  Thus, the full file is not representative of the true population.  Using the full file as the weight basis would index the results on the mailed population, not the true population  In this case, if you want to index results for the true population, then you would need to enter population counts by customer segment in Step 6.

TOP               

Step 7 - Final Results

How do I interpret the Validation Sample Results? The validation sample chart shows the performance of the model on the validation sample. If the data is un-weighted, it should appear similar to the performance of the development sample. If the data is weighted, it will be different from the development sample in both counts and target %, but should still rank order. If the data is weighted to the full population size, it can be used to help determine the appropriate action to use at various score points. 

How do I interpret the Lift Chart Results? The lift column in the lift chart shows how the model compares to a random approach. If the lift for a group is 300, it means that group’s performance is 3 times better than average. If the lift for a group is 50, it means that the group’s performance is half that of the average. The cumulative lift is useful to evaluate the depth of the file.

Why are the values for number of observations different in the two output tables?  The lift chart is created from the un-weighted validation file. The validation sample chart includes the weighting specified in the validation inputs. This can result in very different quantities.

Why is the "Target %" decreasing in each group for the lift chart, but not the validation chart? Due to the weighting being applied, some variation in performance among groups can result.    

What can I do if I don't like the Validation Results?  This question can be interpreted a couple of different ways.

If the model doesn't validate (rank order) then the data needs to be evaluated for modifications and a new model built. Some modifications that can be made include a change in sampling criteria, changes in the binning inputs (like tightening/relaxing the change % for new bin), changes to model development inputs (like relaxing the max correlation) or the addition of new predictor variables.  There are times when a reasonable model cannot be constructed.

If, on the other hand, the bucketing of the validation results is bothersome, you can choose to revise the validation inputs.

TOP               

Step 8 - Documentation

Does EZFit save an electronic copy of the documentation?  Yes. The documentation will be saved in the model output directory as documentation.rtf.

There is a lot of output listed.  Which files do I need?  You will need all of the output if the model development is audited. Audits are common in industries like financial services when there are specific guidelines and laws regarding customer targeting.

If you need to make a presentation, the most commonly used output files are perf.html, corr.html, vars.rtf, lift.rtf and valid.rtf. The pieces actually used will be determined by your company standards and the needs of your audience. 

For scoring a new list, you will need the scoring code, valid.sas.

TOP               

Next Steps

How do I score a new list?  To score a new population, a SAS dataset must be created that contains the variables included in the final model. The naming and format of the variables must be the same as in the initial development file. Launch the SAS System.  Locate the scoring code created by EZFit  in the path and file name given in the final documentation. There are two changes you must make to the code:

1. Add a libref statement to assign P1 to the location of the new file you wish to score.  For example:

libname P1 'c:\documents\data\mydatasetdirectory';

2. Create a macro variable, &val, using the name of the new file as the macro value.  For example:

 %let val = newfilename;

Run the SAS code. The temporary file "val" will be created. This data set contains the score in a variable named "newscore".

TOP               

 

© 2004 Absolute Values  |  Privacy Policy  |  Guarantee