weka.classifiers.timeseries.core
Class TSLagMaker

java.lang.Object
  extended by weka.classifiers.timeseries.core.TSLagMaker
All Implemented Interfaces:
java.io.Serializable

public class TSLagMaker
extends java.lang.Object
implements java.io.Serializable

A class for creating lagged versions of target variable(s) for use in time series forecasting. Uses the TimeseriesTranslate filter. Has options for creating averages of consecutive lagged variables (which can be useful for long lagged variables). Some polynomials of time are also created (if there is a time stamp), such as time^2 and time^3. Also creates cross products between time and the lagged and averaged lagged variables. If there is no date time stamp in the data then the user has the option of having an artificial time stamp created. Time stamps, real or otherwise, are used for modeling trends rather than using a differencing-based approach. Also has routines for dealing with a date timestamp - i.e. it can detect a monthly time period (because months are different lengths) and maps date time stamps to equal spaced time intervals. For example, in general, a date time stamp is remapped by subtracting the first observed value and adding this value divided by the constant delta (difference between consecutive steps) to the result. In the case of a detected monthly time period, the remapping involves subtracting the base year and then adding to this the number of the month within the current year plus twelve times the number of intervening years since the base year. Also has routines for adding new attributes derived from a date time stamp to the data - e.g. AM indicator, day of the week, month, quarter etc. In the case where there is no real data time stamp, the user may specify a nominal periodic variable (if one exists in the data). For example, month might be coded as a nominal value. In this case it can be specified as the primary periodic variable. The point is, that in all these cases (nominal periodic and date-derived periodics), we are able to determine what the value of these variables will be in future instances (as computed from the last known historic instance).

Version:
$Revision: 45942 $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
Serialized Form

Nested Class Summary
static class TSLagMaker.Periodicity
          Enum defining periodicity
 
Constructor Summary
TSLagMaker()
           
 
Method Summary
 void addCustomPeriodic(java.lang.String customPeriodic)
          Add a custom date-derived periodic
 double advanceSuppliedTimeValue(double valueToAdvance)
          Utility method to advance a supplied time value by one unit according to the periodicity set for this LagMaker.
 void clearCustomPeriodics()
          Clear all custom date-derived periodic fields.
 void clearLagHistories()
          Clears any history accumulated in the lag creating filters.
 Instances createTimeLagCrossProducts(Instances insts)
           
 double decrementSuppliedTimeValue(double valueToDecrement)
           
static TSLagMaker.Periodicity determinePeriodicity(Instances insts, java.lang.String timeName)
          Utility method that uses heuristics to identify the periodicity of the data with respect to a time stamp.
 boolean getAddAMIndicator()
          Return true if an AM indicator attribute is to be created.
 boolean getAddDayOfMonth()
          Return true if a day of the month attribute is to be created.
 boolean getAddDayOfWeek()
          Return true if a day of the week attribute is to be created.
 boolean getAddMonthOfYear()
          Returns true if a month of the year attribute is to be created.
 boolean getAddNumDaysInMonth()
          Return true if a num days in the month attribute is to be created.
 boolean getAddQuarterOfYear()
          Returns true if a quarter attribute is to be created.
 boolean getAddWeekendIndicator()
          Returns true if a weekend indicator attribute is to be created.
 boolean getAdjustForTrends()
          Returns true if we are adjusting for trends via a real or artificial time stamp.
 boolean getAdjustForVariance()
          Returns true if we are adjusting for variance by taking the log of the target(s).
 double getArtificialTimeStartValue()
          Returns the current value of the artificial time stamp.
 boolean getAverageConsecutiveLongLags()
          Returns true if consecutive long lagged variables are to be averaged.
 int getAverageLagsAfter()
          Return the point after which long lagged variables will be averaged.
 double getCurrentTimeStampValue()
          Returns the current (i.e.
 java.util.Map<java.lang.String,java.util.ArrayList<CustomPeriodicTest>> getCustomPeriodics()
          Get the date-derived custom periodic attributes in use.
 double getDeltaTime()
          Return the difference between time values.
 java.util.List<java.lang.String> getFieldsToLag()
          Get the names of the fields to create lagged variables for.
 java.lang.String getFineTuneLags()
          Get the ranges used to fine tune the creation of lagged attributes.
 java.lang.String getLagRange()
          Get the ranges used to fine tune lag selection
 int getMaxLag()
          Get the maximum lag to create.
 int getMinLag()
          Get the minimum lag to create.
 int getNumConsecutiveLongLagsToAverage()
          Get the number of consecutive long lagged variables to average.
 java.lang.String[] getOptions()
          Gets the current settings of the LagMaker.
 java.util.List<java.lang.String> getOverlayFields()
          Get overlay fields
 TSLagMaker.Periodicity getPeriodicity()
          Gets the Periodicity representing the time stamp in use for this lag maker.
 java.lang.String getPrimaryPeriodicFieldName()
          The name of the primary periodic attribute or null if one hasn't been specified.
 java.lang.String getTimeStampField()
          Get the name of the time stamp field.
 Instances getTransformedData(Instances insts)
          Creates a transformed data set based on the user's settings
 void incrementArtificialTimeValue(int increment)
          Increment the artificial time value with the supplied incrememt value.
 boolean isUsingAnArtificialTimeIndex()
          Returns true if an artificial time index is in use.
 java.util.Enumeration<Option> listOptions()
          Returns an enumeration describing the available options.
 Instance processInstance(Instance source, boolean incrementTime, boolean setAnyPeriodic)
           
 Instance processInstance(Instance source, boolean incrementTime, boolean setAnyPeriodic, boolean temporary)
          Process an instance in the original format and produce a transformed instance as output.
 Instance processInstancePreview(Instance source, boolean incrementTime, boolean setAnyPeriodic)
           
 void reset()
          Reset the lag maker.
 void setAddAMIndicator(boolean am)
          Set whether to create an AM indicator attribute.
 void setAddDayOfMonth(boolean d)
          Set whether to create a day of the month attribute.
 void setAddDayOfWeek(boolean d)
          Set whether to create a day of the week attribute.
 void setAddMonthOfYear(boolean m)
          Set whether to create a month of the year attribute.
 void setAddNumDaysInMonth(boolean d)
          Set whether to create a numeric attribute that holds the number of days in the month.
 void setAddQuarterOfYear(boolean q)
          Set whether to create a quarter attribute.
 void setAddWeekendIndicator(boolean w)
          Set whether to create a weekend indicator attribute.
 void setAdjustForTrends(boolean a)
          Set whether to adjust for trends or not.
 void setAdjustForVariance(boolean v)
          Set whether to adjust for variance in the data by taking the log of the target(s).
 void setArtificialTimeStartValue(double value)
          Set the starting value for the artificial time stamp.
 void setAverageConsecutiveLongLags(boolean avg)
          Sets whether to average consecutive long lagged variables.
 void setAverageLagsAfter(int a)
          Set at which point consecutive long lagged variables are to be averaged (default = 2, i.e.
 void setCustomPeriodics(java.util.Map<java.lang.String,java.util.ArrayList<CustomPeriodicTest>> custom)
          Set the date-derived custom periodic fields to use/compute
 void setFieldsToLag(java.util.List<java.lang.String> names)
          Set the names of the fields to create lagged variables for
 void setFineTuneLags(java.lang.String ranges)
          Set ranges by which to fine-tune the creation of lagged attributes.
 void setLagRange(java.lang.String lagRange)
          Set ranges to fine tune lag selection.
 void setMaxLag(int max)
          Set the maximum lag to create (default = 12, i.e.
 void setMinLag(int min)
          Set the minimum lag to create (default = 1, i.e.
 void setNumConsecutiveLongLagsToAverage(int c)
          Set the number of long lagged variables to average for each averaged variable created (default = 2, e.g.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setOverlayFields(java.util.List<java.lang.String> overlayNames)
          Set the names of fields in the data that are to be considered "overlay" fields - i.e.
 void setPrimaryPeriodicFieldName(java.lang.String p)
          Set the name of a periodic attribute in the data.
 void setTimeStampField(java.lang.String name)
          Set the name of the time stamp field in the data
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TSLagMaker

public TSLagMaker()
Method Detail

reset

public void reset()
Reset the lag maker.


listOptions

public java.util.Enumeration<Option> listOptions()
Returns an enumeration describing the available options.

Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the LagMaker.

Returns:
an array of strings suitable for passing to setOptions

getCustomPeriodics

public java.util.Map<java.lang.String,java.util.ArrayList<CustomPeriodicTest>> getCustomPeriodics()
Get the date-derived custom periodic attributes in use.

Returns:
a Map, keyed by field name, of custom date-derived periodic fields.

addCustomPeriodic

public void addCustomPeriodic(java.lang.String customPeriodic)
Add a custom date-derived periodic

Parameters:
customPeriodic - the new custom date-derived periodic in textual form.

clearCustomPeriodics

public void clearCustomPeriodics()
Clear all custom date-derived periodic fields.


setCustomPeriodics

public void setCustomPeriodics(java.util.Map<java.lang.String,java.util.ArrayList<CustomPeriodicTest>> custom)
Set the date-derived custom periodic fields to use/compute

Parameters:
custom - a Map, keyed by field name, of custom date-derived periodic fields to use.

setFieldsToLag

public void setFieldsToLag(java.util.List<java.lang.String> names)
                    throws java.lang.Exception
Set the names of the fields to create lagged variables for

Parameters:
names - a List of field names for which to create lagged variables
Throws:
java.lang.Exception - if a problem occurs

getFieldsToLag

public java.util.List<java.lang.String> getFieldsToLag()
Get the names of the fields to create lagged variables for.

Returns:
a List of field names for which lagged variables will be created.

setOverlayFields

public void setOverlayFields(java.util.List<java.lang.String> overlayNames)
Set the names of fields in the data that are to be considered "overlay" fields - i.e. they will be externally provided for future instances.

Parameters:
overlayNames - the names of the fields that are to be considered "overlay" fields

getOverlayFields

public java.util.List<java.lang.String> getOverlayFields()
Get overlay fields

Returns:
a list of field names that are set as "overlay" fields

setTimeStampField

public void setTimeStampField(java.lang.String name)
Set the name of the time stamp field in the data

Parameters:
name - the name of the time stamp field

getTimeStampField

public java.lang.String getTimeStampField()
Get the name of the time stamp field.

Returns:
the name of the time stamp field or null if one hasn't been specified.

setAdjustForTrends

public void setAdjustForTrends(boolean a)
Set whether to adjust for trends or not. If there is no time stamp field specified, and this is set to true, then an artificial time stamp will be created.

Parameters:
a - true if we are to adjust for trends via a real or artificial time stamp

getAdjustForTrends

public boolean getAdjustForTrends()
Returns true if we are adjusting for trends via a real or artificial time stamp.

Returns:
true if we are adjusting for trends via a real or artificial time stamp in the data.

setAdjustForVariance

public void setAdjustForVariance(boolean v)
Set whether to adjust for variance in the data by taking the log of the target(s).

Parameters:
v - true to adjust for variance by taking the log of the target(s).

getAdjustForVariance

public boolean getAdjustForVariance()
Returns true if we are adjusting for variance by taking the log of the target(s).

Returns:
true if we are adjusting for variance.

setFineTuneLags

public void setFineTuneLags(java.lang.String ranges)
Set ranges by which to fine-tune the creation of lagged attributes.

Parameters:
ranges - a list of ranges as a string

getFineTuneLags

public java.lang.String getFineTuneLags()
Get the ranges used to fine tune the creation of lagged attributes.

Returns:
the ranges as a string

setMinLag

public void setMinLag(int min)
Set the minimum lag to create (default = 1, i.e. t-1).

Parameters:
min - the minimum lag to create

getMinLag

public int getMinLag()
Get the minimum lag to create.

Returns:
the minimum lag to create.

setMaxLag

public void setMaxLag(int max)
Set the maximum lag to create (default = 12, i.e. t-12).

Parameters:
max - the maximum lag to create.

getMaxLag

public int getMaxLag()
Get the maximum lag to create.

Returns:
the maximum lag to create.

setLagRange

public void setLagRange(java.lang.String lagRange)
Set ranges to fine tune lag selection.

Parameters:
lagRange - a set of ranges (e.g. 2,3,4,7-9).

getLagRange

public java.lang.String getLagRange()
Get the ranges used to fine tune lag selection

Returns:
the ranges (if any) used to fine tune lag selection

setAverageConsecutiveLongLags

public void setAverageConsecutiveLongLags(boolean avg)
Sets whether to average consecutive long lagged variables. Setting this to true creates new variables that are averages of long lags and the original lagged variables involved are removed.

Parameters:
avg - true if consecutive long lags are to be averaged.

getAverageConsecutiveLongLags

public boolean getAverageConsecutiveLongLags()
Returns true if consecutive long lagged variables are to be averaged.

Returns:
true if consecutive long lagged variables are to be averaged.

setAverageLagsAfter

public void setAverageLagsAfter(int a)
Set at which point consecutive long lagged variables are to be averaged (default = 2, i.e. start replacing lagged variables after t-2 with averages).

Parameters:
a - the point at which to start averaging consecutive long lagged variables.

getAverageLagsAfter

public int getAverageLagsAfter()
Return the point after which long lagged variables will be averaged.

Returns:
the point after which long lagged variables will be averaged.

setNumConsecutiveLongLagsToAverage

public void setNumConsecutiveLongLagsToAverage(int c)
Set the number of long lagged variables to average for each averaged variable created (default = 2, e.g. a set average after value of 2 and a num consecutive to average = 2 will average t-3 and t-4 into a new variable, t-5 and t-6 into a new variable ect.

Parameters:
c - the number of consecutive long lagged variables to average.

getNumConsecutiveLongLagsToAverage

public int getNumConsecutiveLongLagsToAverage()
Get the number of consecutive long lagged variables to average.

Returns:
the number of long lagged variables to average.

setPrimaryPeriodicFieldName

public void setPrimaryPeriodicFieldName(java.lang.String p)
Set the name of a periodic attribute in the data. This attribute has to be nominal and cyclic so that it is possible to know what the value will be given the current one.

Parameters:
p - the name of the primary periodic attribute (if any) in the data.

getPrimaryPeriodicFieldName

public java.lang.String getPrimaryPeriodicFieldName()
The name of the primary periodic attribute or null if one hasn't been specified.

Returns:
the name of the primary periodic attribute or null if one hasn't been specified.

setAddAMIndicator

public void setAddAMIndicator(boolean am)
Set whether to create an AM indicator attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
am - true if an AM indicator attribute is to be created.

getAddAMIndicator

public boolean getAddAMIndicator()
Return true if an AM indicator attribute is to be created.

Returns:
true if an AM indiciator attribute is to be created.

setAddDayOfWeek

public void setAddDayOfWeek(boolean d)
Set whether to create a day of the week attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
d - true if a day of the week attribute is to be created.

getAddDayOfWeek

public boolean getAddDayOfWeek()
Return true if a day of the week attribute is to be created.

Returns:
true if a day of the week attribute is to be created.

setAddDayOfMonth

public void setAddDayOfMonth(boolean d)
Set whether to create a day of the month attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
d - true if a day of the month attribute is to be created.

getAddDayOfMonth

public boolean getAddDayOfMonth()
Return true if a day of the month attribute is to be created.

Returns:
true if a day of the month attribute is to be created.

setAddNumDaysInMonth

public void setAddNumDaysInMonth(boolean d)
Set whether to create a numeric attribute that holds the number of days in the month.

Parameters:
d - true if a num days in month attribute is to be created.

getAddNumDaysInMonth

public boolean getAddNumDaysInMonth()
Return true if a num days in the month attribute is to be created.

Returns:
true if a num days in the month attribute is to be created.

setAddWeekendIndicator

public void setAddWeekendIndicator(boolean w)
Set whether to create a weekend indicator attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
w - true if a weekend indicator attribute is to be created.

getAddWeekendIndicator

public boolean getAddWeekendIndicator()
Returns true if a weekend indicator attribute is to be created.

Returns:
true if a weekend indicator attribute is to be created.

setAddMonthOfYear

public void setAddMonthOfYear(boolean m)
Set whether to create a month of the year attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
m - true if a month of the year attribute is to be created.

getAddMonthOfYear

public boolean getAddMonthOfYear()
Returns true if a month of the year attribute is to be created.

Returns:
true if a month of the year attribute is to be created.

setAddQuarterOfYear

public void setAddQuarterOfYear(boolean q)
Set whether to create a quarter attribute. Has no effect if there isn't a date-based time stamp in the data.

Parameters:
q - true if a quarter attribute is to be added.

getAddQuarterOfYear

public boolean getAddQuarterOfYear()
Returns true if a quarter attribute is to be created.

Returns:
true if a quarter attribute is to be created.

isUsingAnArtificialTimeIndex

public boolean isUsingAnArtificialTimeIndex()
Returns true if an artificial time index is in use.

Returns:
true if an artificial time index is in use.

setArtificialTimeStartValue

public void setArtificialTimeStartValue(double value)
                                 throws java.lang.Exception
Set the starting value for the artificial time stamp.

Parameters:
value - the value to initialize the artificial time stamp with.
Throws:
java.lang.Exception - if an artificial time stamp is not being used.

getArtificialTimeStartValue

public double getArtificialTimeStartValue()
                                   throws java.lang.Exception
Returns the current value of the artificial time stamp. After training, after priming, and prior to forecasting, this will be equal to the number of training instances seen.

Returns:
the current value of the artificial time stamp.
Throws:
java.lang.Exception - if an artificial time stamp is not being used.

getCurrentTimeStampValue

public double getCurrentTimeStampValue()
                                throws java.lang.Exception
Returns the current (i.e. most recent) time stamp value. Unlike an artificial time stamp, the value after training, after priming and before forecasting, will be equal to the time stamp of the most recent priming instance.

Returns:
the current time stamp value
Throws:
java.lang.Exception - if the lag maker is not adjusting for trends or no time stamp attribute has been specified.

incrementArtificialTimeValue

public void incrementArtificialTimeValue(int increment)
Increment the artificial time value with the supplied incrememt value.

Parameters:
increment - the value to increment by.

getDeltaTime

public double getDeltaTime()
Return the difference between time values. This may be only approximate for periods based on dates. It is best to used date-based arithmetic in this case for incrementing/decrementing time stamps.

Returns:
the (average) difference between time values.

getPeriodicity

public TSLagMaker.Periodicity getPeriodicity()
Gets the Periodicity representing the time stamp in use for this lag maker. If the lag maker is not adjusting for trends, or an artificial time stamp is being used, then null is returned.

Returns:
the Periodicity in use, or null if the lag maker is not adjusting for trends or is using an artificial time stamp.

createTimeLagCrossProducts

public Instances createTimeLagCrossProducts(Instances insts)
                                     throws java.lang.Exception
Throws:
java.lang.Exception

determinePeriodicity

public static TSLagMaker.Periodicity determinePeriodicity(Instances insts,
                                                          java.lang.String timeName)
Utility method that uses heuristics to identify the periodicity of the data with respect to a time stamp. If the time stamp is not a date then the periodicity is UNKNOWN with a delta set by computing the average difference between consecutive time stamp values.

Parameters:
insts - the instances to determine the periodicity from
timeName - the name of the time stamp attribute
Returns:
the Periodicity of the data.

getTransformedData

public Instances getTransformedData(Instances insts)
                             throws java.lang.Exception
Creates a transformed data set based on the user's settings

Parameters:
insts - the instances to transform
Returns:
a transformed data set
Throws:
java.lang.Exception - if a problem occurs during the creation of lagged and auxiliary attributes.

processInstance

public Instance processInstance(Instance source,
                                boolean incrementTime,
                                boolean setAnyPeriodic)
                         throws java.lang.Exception
Throws:
java.lang.Exception

processInstancePreview

public Instance processInstancePreview(Instance source,
                                       boolean incrementTime,
                                       boolean setAnyPeriodic)
                                throws java.lang.Exception
Throws:
java.lang.Exception

processInstance

public Instance processInstance(Instance source,
                                boolean incrementTime,
                                boolean setAnyPeriodic,
                                boolean temporary)
                         throws java.lang.Exception
Process an instance in the original format and produce a transformed instance as output. Assumes that the lag maker has been configured an initialized with a call to getTransformedDataset()

Parameters:
source - an instance in original format
incrementTime - true if any time stamp value should be incremented based on the time stamp value from the last instance seen and set in the outputted instance
setAnyPeriodic - true if any user-specified periodic value should be set in the transformed instance based on the value from the last instance seen.
Returns:
a transformed instance
Throws:
java.lang.Exception - if something goes wrong.

clearLagHistories

public void clearLagHistories()
                       throws java.lang.Exception
Clears any history accumulated in the lag creating filters.

Throws:
java.lang.Exception - if something goes wrong.

advanceSuppliedTimeValue

public double advanceSuppliedTimeValue(double valueToAdvance)
Utility method to advance a supplied time value by one unit according to the periodicity set for this LagMaker.

Parameters:
valueToAdvance - the time value to advance
Returns:
the advanced value or the original value if this lag maker is not adjusting for trends

decrementSuppliedTimeValue

public double decrementSuppliedTimeValue(double valueToDecrement)