|
Installation:
Application will not install.
Is SAS Base Version 8.2 or higher installed? EZFit will not
function with earlier versions of the SAS System.
Is SAS/STAT installed? The SAS/STAT component is required. It
contains “procs” that are necessary in the construction of logistic
regression models.
Is SAS Integration Technologies installed?
This component allows
EZFit to launch and control a background SAS
session. SAS Integration Technologies is automatically installed
when SAS 9 is installed. For version 8.2, it is necessary to
download and install SAS Integration Technologies from the SAS web site at
www.sas.com.
Is
Microsoft .NET Framework v 1.1 installed?
EZFit requires version 1.1 of the .NET Framework. This is
provided through Windows Update by Microsoft.
Licensing: Problems with licensing.
Did you request a license? The initial registration license is
temporary. You must request a license by clicking "request license"
upon launch of EZFit.
Have you
changed hardware since installation and licensing of EZFit?
Changing out hardware may require reassignment of your license key.
Please contact
licensing@absolutevalues.net to determine if this is necessary.
Are you
working offline? If so, the license request e-mail was not sent.
Please contact
licensing@absolutevalues.net for more information.
License
request e-mail not sent? It is not uncommon for a
corporate firewall to prevent EZFit from sending the e-mail that requests
a license. Please save the file when prompted and forward it to:
licensing@absolutevalues.net.
TOP
Launch:
Application does not run.
Are you running SAS version 8.2? If so, the most recent hot fix
for The SAS System must be applied for EZFit to function. A
problem in SAS 8.2 prevents communication between EZFit and The SAS
System. Hot fix 82BB28 corrects this problem. Applying the
most recent hot fix bundle available for Version 8.2 from
www.sas.com will enable EZFit to function
correctly. If you are running Version 9, EZFit requires no hot fix.
Is your SAS license valid? An out of date SAS license will prevent
the application from running.
Is SAS Integration Technologies installed?
This is a necessary
component for the proper functioning of EZFit.
Have you removed any required components since installing EZFit?
Removal of SAS, SAS/STAT, SAS Integration Technologies or Microsoft’s .NET
framework will result in EZFit not functioning. If necessary, reinstall
removed components and retry EZFit. Note that uninstalling EZFit
will also remove the SAS Integration Technologies components. Please
reinstall SAS Integration Technologies.
TOP
Build: This section
contains answers to questions that commonly arise during model
construction. If you do not find an answer to your question here or in
the program help files, please send a detailed e-mail to:
support@absolutevalues.net. A consultant will usually respond within
one business day.
Step 1 -
Preprocess Data
Should I
start with a full file or samples? It depends on what you
have available. If you have access to the full file and choose to start
with that, then EZFit will create modeling samples for you. For example,
if you are modeling response for a future campaign based on a past
campaign, the full file would contain all mailed and delivered customer
records from the past campaign appended with all candidate variables. If,
instead, you are provided with a sampling of mailed records, or if you
prefer to create your own samples, then choose to start from samples.
This option will skip EZFit’s sampling procedure.
When would I use a stratifier variable?
If your starting file
contains an uneven distribution based on some variable, you may wish to
stratify on this variable. For example, in building a response model from
a past campaign, you notice that the past campaign used a variable to
segment the mail: All customers from groups A, B and C were mailed, while
only portions of groups D, E, and F were mailed. In this case, you may
want to stratify on the “group” variable. This will ensure that “enough’
observations from each group are selected for modeling.
Why are
some variables automatically set to “exclude”? Variable
set to “exclude” will not be included as candidates for model
construction. This option is provided to eliminate unnecessary work to
create a dataset that contains ONLY modeling variables. EZFit recognizes
certain variable names (like street address and name) and excludes them
because they are not appropriate for modeling. Other variables may be set
to exclude based on being character variables with a large number of
different values. You can over ride the recommendations, but variables
like name and street address are useless in modeling and will severely
increase the CPU load and processing time.
When I click “Reset table to original values”,
my table does not reset. Why? After changes to the “Variable
Information” table are accepted, the newly accepted version becomes the
basis for “original values”. The only way to revert to the initial table
after accepting changes is to start over. This is by design.
I made changes to the table, but I do not want to keep them. How can I do
this? Choose the “Discard Dataset Changes” button and then click
continue.
I made changes to the table, but they were not accepted. Why?
An error message is displayed. The
message box explains why the changes were not acceptable. The following
situations will result in rejection of your changes:
-
More than 1 dependent variable
-
More than 1 stratifier variable
-
A dependent variable with values
other than 0 and 1
-
Missing values in stratifier or dependent variable
No error message is displayed.
Users
frequently forget to select the “Accept Dataset Changes” button. Go back
and ensure that the “Accept Dataset Changes” button is selected.
After I click continue, the frequencies are not
what I expected. Why? The displayed frequencies are based on the
starting file that was designated in Step 1. The only possible
explanation is that you did not select the intended dataset. Start over
and select the correct dataset.
The “Begin Stratified Sampling” (“Begin
Non-stratified Sampling”) button is my only choice and this is not what I
want. Why? You designated a stratifier variable in the “Variable
Information” dataset. (You did not designate a stratifier variable in the
“Variable Information” dataset.) Start over and correct the problem by
changing the variable’s class designation.
TOP
Step 2 - Create Samples
Should I accept the proposed sampling criteria?
Unless you have a specific sampling scheme in mind, it is best to accept
the proposed sampling criteria. If you wish to do so, choose “Yes” and
then click “Continue”. Otherwise, choose “No” and click “Continue”.
Do I need both development and validation
samples? For determining how well the model predicts for a
different dataset from the same population, a validation sample is
STRONGLY recommended. The model is constructed using the development
sample, and it is “validated” using a validation sample. Although this is
the ideal situation, sometimes there are not enough observations to have
both development and validation samples. In this case, it is not easy to
“validate” the model. When this situation arises, we recommend testing
the model on a subset of the population or using some alternative form of
validation like bootstrapping. EZFit does not provide a process for
bootstrapping.
What are target and non-target values?
The target value is the desired value of the dependent variable (the thing
you are trying to predict). The target value is usually 1, and the
non-target value is usually 0.
What ratio of non-target to target value is
preferable? Usually, the original dataset has many times more
non-target valued observations than target valued observations. In order
for logistic regression to identify differentiating factors among these,
it is best to sample down to a reasonable ratio. Although there is no
standard, accepted ratio, we recommend using approximately 5 times as many
non-target values as target values. The ratio would then be 5:1. Note
that you can use weighting in the validation process to index the model
results to the actual population ratio.
What if the counts are not what I expected? How
do I know? After the samples are created, EZFit will display
frequencies for each of the samples created (development and, if present,
validation). If you do not like the distributions, you can revise the
sampling criteria until desirable samples are achieved.
TOP
Step 3 - Bin Variables
I accepted the default values and started
binning. Is my EZFit session frozen? More than likely, EZFit is
working hard to find an appropriate binning strategy for each variable in
your development file. This is a highly CPU intensive step in the
process. If you have several hundred variables and thousands of
observations, the binning process can take 2-3 hours. The time required
depends directly on the number of variables and the number of observations
in your development file. Be patient. This is a highly complex process
that, when done manually, can take a week or more.
Once the binning is complete, how do I see what
EZFit did? You can click a variable name in the table presented to
see individual variable binning results for non-missing values. If you have several hundred
variables, don’t worry. At the end of the process, EZFit provides output
for each model variable and its binning scheme. You will also have SAS
code to score a new file, which includes code to bin each model variable.
This means that to score a new file, only the model variables in their
original forms are required.
Why am I only allowed to change the “Use
Original” value for some of the variables in my data set? Because
logistic regression requires numeric variables, it is not possible to use
the original form of character variables. For character variables, the
"Use Original" option will automatically be set to "N/A".
When would I want to “Use Original” as opposed
to the binned form of a variable? It is STRONGLY recommended that
you use the binned form for a majority of the variables. Using original,
non-binned forms of variables can stress the logistic regression procedure
and produce unpredictable results. In some cases, there may be a variable
that your business understanding provides more information about and that
information would drive a binning strategy.
For example, perhaps there is a variable that indicates customer channel
preference with original form:
1 – walk-in, 2 – web, 3 – catalog, 4 – phone
EZFit may create 2 bins because of similarity in the target rate among web,
phone
and catalog customers:
1 – walk-in, 2 – web, phone and catalog
However, your management team may prefer to have all 4 categories present
because of other differences like purchase frequency or cost
of acquisition, or because other selection rules are applied at the time
of list generation based on the channel preference variable. In this
case, choose “Yes” for Use Original, and the variable’s original form will
be used.
When would I want to subset variables?
If you have a large number of variables and wish to construct a "quick"
model, you may choose to restrict the set of candidate variables.
This will reduce the number of candidate variables used in "proc logistic"
based on the parameters entered. EZFit allows two options for
restricting variables: minimum Chi Square value and number of variables.
What if I select subset, but I don't like that set
of variables? Do I have to start over to get back all the variables?
No. EZFit displays the subset of variables based on
your criteria. If you don't like the subset (feel it is too
restrictive or too inclusive), you can choose to apply new criteria.
To include all variables, just click on "Reset Default Values".
TOP
Step 4 - Develop Model
Depending on the selection method I choose, some of the input boxes are
grayed out. Why? Only the valid input parameters for
each selection method are available.
I
ran the selection, but it took a really long time and I would like to
rerun it. What can I do to reduce run time? One thing
you can do to reduce run time is to reduce the Maximum Number of
Iterations allowed. However, doing so may not result in an optimal
model.
TOP
Step 5 - Development Results
I used forward (or stepwise) for selection method.
Why isn't the recommended number of variables the same as the number of
variables displayed in the "Preliminary Model Statistics" table? The
recommended number of variables is based on a combination of the output
statistics in the table. To avoid over-fitting, you may not always
want to use all variables introduced by the logistic procedure. It
is best to either accept the recommendation, or arrive at a different
conclusion based on the statistics provided.
What am I looking for in "Model Performance"?
The model performance is provided by deciles; the development file
is cut into 10 groups containing (nearly) equal numbers of observations.
The Target Percent should be greatest in Decile 1, and exhibit a
decreasing pattern down to Decile 10. The more difference (spread)
observed between the deciles, the better. With real world data,
there may not always be a strictly decreasing pattern displayed.
Keep in mind that given the input parameters and the available
variables, the best fitting model has been obtained. If you
don't like what you see, try using different inputs.
How do I interpret the Correlation Matrix?
The matrix shows the variables by original name along the top and then by
the recoded or original name on the left side, depending on which form is
used in the model. Correlations range between -1 and 1. Larger
absolute values indicate that a given pair of variables are more highly
correlated than those with absolute values near zero. If you are
building a model to explain (such as models used in clinical trials), then
high correlations can add confusion, resulting in the "chicken or egg"
syndrome. When models are used for prediction purposes only, higher
correlations may be acceptable. If your business has rules regarding
maximum allowable correlation, then that value can be entered as a
parameter in Step 4.
TOP
Step 6 - Validate Model
When
would I use weighting? Typically, weighting is used to
provide the validation results indexed to the overall population size/mix.
For example, if the validation file contains a 3:1 ratio of non-target to
target values, but the original population actually contains a 10:1 ratio,
you would want to weight the results. Weighting can be achieved by
using the original file (if a full file was selected at the beginning) or
by entering values (either full file or samples).
I
started from a full file and want to weight the results. When might
I want to enter values (as opposed to using the full file) for weighting?
We recommend using the original file for weighting, but you can
always enter values. An example of when you might want to enter
values:
Suppose you are building a
response model, and you are using a full file from a prior acquisition
campaign. You are aware that only a portion of the initial population
was mailed (i.e., A different model was used to determine the mail file,
and only portions of some customer segments were included as test cells.)
To help alleviate the bias caused by use of the previous model, you
decide to use a stratified sampling approach based on customer segment. Since only
portions of some segments were mailed, while all of some other segments
were mailed, you would want to weight the results to determine the effect
of the new model on the entire population. Thus, the full file is
not representative of the true population. Using the full file as
the weight basis would index the results on the mailed population, not the
true population In this case, if you want to index results for the
true population, then you would need to enter population counts by
customer segment in Step 6.
TOP
Step 7 - Final Results
How do I interpret the Validation Sample Results?
The validation
sample chart shows the performance of the model on the validation sample.
If the data is un-weighted, it should appear similar to the performance of
the development sample. If the data is weighted, it will be different from
the development sample in both counts and target %, but should still rank
order. If the data is weighted to the full population size, it can be used
to help determine the appropriate action to use at various score points.
How do I interpret the Lift Chart Results?
The lift column
in the lift chart shows how the model compares to a random approach. If
the lift for a group is 300, it means that group’s performance is 3 times
better than average. If the lift for a group is 50, it means that the
group’s performance is half that of the average. The cumulative lift is
useful to evaluate the depth of the file.
Why are the values for
number of observations different in the two output
tables?
The lift chart is
created from the un-weighted validation file. The validation sample chart
includes the weighting specified in the validation inputs. This can result
in very different quantities.
Why is the "Target %" decreasing in each
group for the lift chart, but not the validation chart?
Due to the weighting being applied, some variation in performance
among groups can result.
What can I do if I don't like the Validation
Results?
This question can be interpreted a couple of different ways.
If the model doesn't
validate (rank order) then the data needs to be evaluated for
modifications and a new model built. Some modifications that can be made
include a change in sampling criteria, changes in the binning inputs
(like tightening/relaxing the change % for new bin), changes to model
development inputs (like relaxing the max correlation) or the addition
of new predictor variables. There are times when a reasonable model
cannot be constructed.
If, on the other
hand, the bucketing of the validation results is bothersome, you can
choose to revise the validation inputs.
TOP
Step 8 - Documentation
Does
EZFit save an electronic copy of the documentation?
Yes. The
documentation will be saved in the model output directory as
documentation.rtf.
There is a lot of output listed. Which
files do I need?
You
will need all of the output if the model development is audited. Audits
are common in industries like financial services when there are specific
guidelines and laws regarding customer targeting.
If you need to make a
presentation, the most commonly used output files are perf.html, corr.html,
vars.rtf, lift.rtf and valid.rtf. The pieces actually used will be
determined by your company standards and the needs of your audience.
For scoring a new list,
you will need the scoring code, valid.sas.
TOP
Next Steps
How
do I score a new list? To
score a new population, a SAS dataset must be created that contains the
variables included in the final model. The naming and format of the
variables must be the same as in the initial development file. Launch the
SAS System. Locate the scoring code created by EZFit in the
path and file name given in the final documentation. There are two changes
you must make to the code:
1. Add a libref
statement to assign P1 to the location of the new file you wish
to score. For example:
libname P1
'c:\documents\data\mydatasetdirectory';
2. Create a macro
variable, &val, using the name of the new file as the macro
value. For example:
%let val =
newfilename;
Run the SAS code. The
temporary file "val" will be created. This data set contains the
score in a variable named "newscore".
TOP
|