Difference between revisions of "Understanding Optimization"

From MultiCharts
Jump to navigation Jump to search
(Created page with "Category:Optimization")
 
Line 1: Line 1:
 +
A strategy is created by implementing trading concepts, ideas, and observations of historical market behavior, into a trading system. The very idea of a trading system implies a degree of optimization to market behavior. 
 +
 +
The process of strategy optimization further enhances and automates this process. Strategy optimization is the search for the set of optimum parameters for the defined criteria. By testing a range of signal input values, optimization aids in selecting the values that correspond, based on historical data, to the best strategy performance. Optimization aids in better understanding of strategy's characteristics and in creating new criteria for entries and exits.
 +
 +
Different traders use different criteria to define strategy performance.  Some traders use the highest net profit, while other traders use the lowest drawdown.  MultiCharts lets the trader define his own criteria.
 +
 +
Optimization can have detrimental effects if the user searches for the combination of inputs based solely on the best performance over a period of historical data and focuses to much on market conditions that may never occur again. This approach is known as over-optimization or curve-fitting. Performance will not be the same in real trading, since historical patterns are highly unlikely to be repeated.
 +
 +
'''Optimization Methods'''
 +
 +
There are two optimization methods: Exhaustive Search and Genetic Algorithm.
 +
 +
<br>
 +
== Understanding Exhaustive Search Optimization ==
 +
 +
'''Exhaustive''', also called '''Brute-Force''', optimization systematically goes through all the potential combinations in search for the best solution. The advantage of this approach is every single combination will be checked and the absolute optimum solution identified.
 +
 +
The length of time required for Exhaustive optimization is proportional to the total number of all possible solutions. The drawback of this approach is that, unless relatively few parameters are involved, the period of time required to reach a solution may turn out to be unacceptably long. Thus, exhaustive optimization may only be suitable when there is a limited number of possible solutions.
 +
 +
<br>
 +
== Understanding Genetic Algorithm Optimization ==
 +
 +
'''Genetic Algorithms''' optimization evaluates only the more promising combinations, finding near-optimum solutions in a fraction of time that would be required by the brute-force approach, making Genetic Algorithms optimization powerful enough to analyze strategies with hundreds of parameters. Genetic Optimizer settings add flexibility to this technique.
 +
 +
GA-based search algorithms utilize methods that mimic a biological evolution. The algorithms start by testing a number of random combinations, select the combinations with the most potential, and then further combine and modify them to finally arrive at the best input combinations.
 +
Instead of mechanically checking every conservable combination, genetic algorithms quickly narrow down the number of potential winners, finding and focusing on the areas that are most profitable and most stable. Thus, genetic algorithms avoid superfluous calculations in the lowest net profit potential zones. GA approach is well known and accepted in many fields where optimization is required.
 +
 +
The drawback of the GA approach is that the solution found will be a solution approaching the absolute optimum solution, but not necessarily the absolute optimum solution itself. This drawback, however, is handsomely offset by the processing power and time savings in cases with a large number of possible solutions. 
 +
 +
In general, GA's work is primarily about two abstracts: an Individual (or Genome) and an Algorithm (i.e. Genetic Algorithm itself). Each Genome instance represents a single unique inputs combination, while GA itself defines how the evolution should take place. The GA uses a given trading strategy to determine how 'fit' a genome is for survival, e.g. how much Net Profit does an inputs combination generates in case Net Profit was selected as an Optimization Criteria.
 +
 +
Here are some GA definitions that help in understanding the process:
 +
 +
<br>'''Fitness''' - the overall performance of an individual (e.g. Net Profit).
 +
<br>'''Genome (Individual) ''' - a unique combination of strategy input values.
 +
<br>'''Gene''' - one of the input variables of a given strategy.
 +
<br>'''Chromosome''' - a set of genes, usually related in function.
 +
<br>'''Crossover''' - a procedure for generating a “child” from two “parent” genomes. Crossover involves multiple genomes.
 +
<br>'''Mutation''' - a process when a gene is changed and receives a value that is neither from the "mother" or the "father". Mutation involves only a single genome each time.
 +
<br>'''Generation (Population) ''' - a group of individuals (genomes), all “born” at about the same time.
 +
<br>'''Convergence''' - the extent of improvement in the average fitness between two consecutive generations; as the degree of improvement is decreasing, the generations are said to be converging.
 +
 +
In brief, the optimization process works as follows:
 +
 +
* Based on a multitude of inputs combinations provided, a population of genomes is created.
 +
* The fitness of each individual is evaluated.
 +
* The fittest members are retained, and the least fit members gradually discarded.
 +
* A new population of individuals is generated from the remaining members of the previous population by applying the crossover and mutation operations, as well as selection and/or replacement strategies (built into the GA).
 +
* The fitness of these new individuals is evaluated, the fittest members retained, and the least fit members gradually discarded
 +
* The process is repeated, until the specified degree of convergence or generation number is reached (depends on GA setting selected).
 +
 +
<br>
 +
=== Detailed Description of GA Process ===
 +
 +
In detail, the optimization process works as follows:
 +
 +
# After a number of all possible combinations is determined, an optimal number of individuals is selected.
 +
# Each individual is selected at random. These individuals form the first Generation. The optimal number of individuals is automatically placed in '''Population Size''' field and can be changed manually.
 +
 +
<br><div style="background-color: #E5F6FF;">Tip: An excessively large Population Size value will result in an increase in calculation time, while an overly small Population Size value will result in a decrease in calculation accuracy.</div>
 +
 +
<br><div style="background-color: #E3FBE5;">Note: MultiCharts' GA support artificially exclusive population. This means that identical individuals cannot exist inside the same population, and thus the population size can not exceed the total number of input combinations. The population size is constant for each generation.</div>
 +
 +
# The fitness of each individual is evaluated and the least fit individuals discarded.
 +
# A new population of individuals is generated from the remaining members of the previous population by applying the crossover and mutation operations, as well as selection and/or replacement strategies that depend on the GA subtype:
 +
 +
<br>
 +
'''Crossover and Mutation'''
 +
 +
MultiCharts uses the so-called Array Uniform Crossover. With this Crossover type, each of the child’s genes can come from each of the parents with equal probability.
 +
<br>In the '''Crossover Probability''' field, the probability of a crossover for each individual is specified; the usual value range is 0.95-0.99, with the default value of 0.95.
 +
<br>MultiCharts uses the so-called Random Flip Mutation. With this Mutation type, each gene can be replaced with any other possible gene on random basis.
 +
<br>In the '''Mutation Probability''' field, the probability of a mutation for each individual is specified; the usual value range is 0.01-0.05, with the default value of 0.05.
 +
<br><div style="background-color: #E5F6FF;">Tip: An excessively large Mutation Probability value will cause the search to become a primitive random search.</div>
 +
 +
<br>
 +
'''GA Subtypes and Replacement Schemas'''
 +
 +
GA subtype defines the way that GA creates new individuals and replaces old individuals when creating next generations.
 +
<br>GA subtype can be set in the '''Genetic Algorithm Subtype''' drop-down list.
 +
<br>Two GA subtypes are available: '''Basic''' and '''Incremental'''.
 +
<br>'''Basic''' subtype is the standard so-called “simple genetic algorithm”. This algorithm uses non-overlapping generations and Elitism mode (optional). For each generation, the algorithm creates an entirely new population of individuals (if the '''Elitism''' option is selected, the most fit individuals move on to the next generation).
 +
 +
<br>
 +
'''Elitism'''
 +
 +
Elitism mode, available for the Basic GA subtype only, allows the fittest individuals to survive and produce “children” over a span of multiple generations.
 +
<br>'''Incremental''' subtype does not create an entirely new population for each generation. It simply adds only one or two children to the population each time the next generation is created. These one or two children replace one or two individuals in the previous generation. The individuals to be replaced by the children are chosen according to the Replacement Schemas used.
 +
 +
<br>'''Replacement Schemas'''
 +
 +
Replacement Schemas are available for Incremental subtype only. Schemas define how a new generation should be integrated into the population. There are three schemes available: '''Worst''', '''Parent''', and '''Random'''.
 +
 +
<br>'''Worst''' – least fit individuals are replaced
 +
<br>'''Parent''' – parent individuals are replaced
 +
<br>'''Random''' – individuals are replaced randomly
 +
 +
<br>
 +
'''Offspring Number'''
 +
 +
Offspring Number is the number of children to be added each time that a new generation is created. '''One''' or '''Two''' children can be added.
 +
* The fitness of each individual is evaluated and the least fit individuals discarded.
 +
* The process is repeated, until the specified degree of convergence or generation number is reached (depends on GA setting selected).
 +
 +
<br>
 +
'''GA Convergence Type'''
 +
 +
Genetic Algorithms optimization process has no implicit final result and thus can proceed forever. Therefore, an "ending-point" must be specified, indicating when the optimization process must come to an end.
 +
 +
Two GA optimization "ending-point" criteria types can be selected: '''Terminate-Upon-Generation''' and '''Terminate-Upon-Convergence'''.
 +
 +
'''Terminate-Upon-Generation''' will stop the optimization process once the specified '''Maximum Number of Generations''' is reached.
 +
 +
'''Terminate-Upon-Convergence''' will stop the optimization process once the defined '''Convergence Rate''' is reached, or once the defined '''Maximum Number of Generations''' is reached.
 +
 +
GA optimization "ending-point" criterion is selected in the '''Conversion Type''' drop-down list.
 +
 +
The desired Maximum Number of Generations, Minimum Number of Generations, and Conversion Rate can be set in the corresponding text boxes.
 +
 +
<br>
 +
'''Convergence Rate'''
 +
 +
Convergence Rate of generations is the ratio between the Convergence value of the two most recent generations and the Convergence value of the current generation and the generation N generations ago.
 +
<br>GA calculation is stopped after meeting С [x – N] / C [x] >= P condition where:
 +
<br>x – ordinal number of the current generation;
 +
<br>С[x] – convergence value of the two most recent generations;
 +
<br>N – defined minimal number of the generations;
 +
<br>P - convergence rate; values used are usually close to 1, with the default value of 0.99.
 +
<br><div style="background-color: #E3FBE5;">Note: Convergence Rate is not calculated for generations that have an ordinal number less than the defined minimal number of the generations.</div>
 +
 +
<br>
 +
'''Further Reading'''
 +
 +
This is only a brief introduction to genetic algorithms. We recommend that you learn more about GA on the Internet, e.g. [[http://en.wikipedia.org/wiki/Genetic_algorithm Wikipedia]]
 +
<br>
 +
 
[[Category:Optimization]]
 
[[Category:Optimization]]

Revision as of 16:16, 1 February 2012

A strategy is created by implementing trading concepts, ideas, and observations of historical market behavior, into a trading system. The very idea of a trading system implies a degree of optimization to market behavior.

The process of strategy optimization further enhances and automates this process. Strategy optimization is the search for the set of optimum parameters for the defined criteria. By testing a range of signal input values, optimization aids in selecting the values that correspond, based on historical data, to the best strategy performance. Optimization aids in better understanding of strategy's characteristics and in creating new criteria for entries and exits.

Different traders use different criteria to define strategy performance. Some traders use the highest net profit, while other traders use the lowest drawdown. MultiCharts lets the trader define his own criteria.

Optimization can have detrimental effects if the user searches for the combination of inputs based solely on the best performance over a period of historical data and focuses to much on market conditions that may never occur again. This approach is known as over-optimization or curve-fitting. Performance will not be the same in real trading, since historical patterns are highly unlikely to be repeated.

Optimization Methods

There are two optimization methods: Exhaustive Search and Genetic Algorithm.


Understanding Exhaustive Search Optimization

Exhaustive, also called Brute-Force, optimization systematically goes through all the potential combinations in search for the best solution. The advantage of this approach is every single combination will be checked and the absolute optimum solution identified.

The length of time required for Exhaustive optimization is proportional to the total number of all possible solutions. The drawback of this approach is that, unless relatively few parameters are involved, the period of time required to reach a solution may turn out to be unacceptably long. Thus, exhaustive optimization may only be suitable when there is a limited number of possible solutions.


Understanding Genetic Algorithm Optimization

Genetic Algorithms optimization evaluates only the more promising combinations, finding near-optimum solutions in a fraction of time that would be required by the brute-force approach, making Genetic Algorithms optimization powerful enough to analyze strategies with hundreds of parameters. Genetic Optimizer settings add flexibility to this technique.

GA-based search algorithms utilize methods that mimic a biological evolution. The algorithms start by testing a number of random combinations, select the combinations with the most potential, and then further combine and modify them to finally arrive at the best input combinations. Instead of mechanically checking every conservable combination, genetic algorithms quickly narrow down the number of potential winners, finding and focusing on the areas that are most profitable and most stable. Thus, genetic algorithms avoid superfluous calculations in the lowest net profit potential zones. GA approach is well known and accepted in many fields where optimization is required.

The drawback of the GA approach is that the solution found will be a solution approaching the absolute optimum solution, but not necessarily the absolute optimum solution itself. This drawback, however, is handsomely offset by the processing power and time savings in cases with a large number of possible solutions.

In general, GA's work is primarily about two abstracts: an Individual (or Genome) and an Algorithm (i.e. Genetic Algorithm itself). Each Genome instance represents a single unique inputs combination, while GA itself defines how the evolution should take place. The GA uses a given trading strategy to determine how 'fit' a genome is for survival, e.g. how much Net Profit does an inputs combination generates in case Net Profit was selected as an Optimization Criteria.

Here are some GA definitions that help in understanding the process:


Fitness - the overall performance of an individual (e.g. Net Profit).
Genome (Individual) - a unique combination of strategy input values.
Gene - one of the input variables of a given strategy.
Chromosome - a set of genes, usually related in function.
Crossover - a procedure for generating a “child” from two “parent” genomes. Crossover involves multiple genomes.
Mutation - a process when a gene is changed and receives a value that is neither from the "mother" or the "father". Mutation involves only a single genome each time.
Generation (Population) - a group of individuals (genomes), all “born” at about the same time.
Convergence - the extent of improvement in the average fitness between two consecutive generations; as the degree of improvement is decreasing, the generations are said to be converging.

In brief, the optimization process works as follows:

  • Based on a multitude of inputs combinations provided, a population of genomes is created.
  • The fitness of each individual is evaluated.
  • The fittest members are retained, and the least fit members gradually discarded.
  • A new population of individuals is generated from the remaining members of the previous population by applying the crossover and mutation operations, as well as selection and/or replacement strategies (built into the GA).
  • The fitness of these new individuals is evaluated, the fittest members retained, and the least fit members gradually discarded
  • The process is repeated, until the specified degree of convergence or generation number is reached (depends on GA setting selected).


Detailed Description of GA Process

In detail, the optimization process works as follows:

  1. After a number of all possible combinations is determined, an optimal number of individuals is selected.
  2. Each individual is selected at random. These individuals form the first Generation. The optimal number of individuals is automatically placed in Population Size field and can be changed manually.


Tip: An excessively large Population Size value will result in an increase in calculation time, while an overly small Population Size value will result in a decrease in calculation accuracy.


Note: MultiCharts' GA support artificially exclusive population. This means that identical individuals cannot exist inside the same population, and thus the population size can not exceed the total number of input combinations. The population size is constant for each generation.
  1. The fitness of each individual is evaluated and the least fit individuals discarded.
  2. A new population of individuals is generated from the remaining members of the previous population by applying the crossover and mutation operations, as well as selection and/or replacement strategies that depend on the GA subtype:


Crossover and Mutation

MultiCharts uses the so-called Array Uniform Crossover. With this Crossover type, each of the child’s genes can come from each of the parents with equal probability.
In the Crossover Probability field, the probability of a crossover for each individual is specified; the usual value range is 0.95-0.99, with the default value of 0.95.
MultiCharts uses the so-called Random Flip Mutation. With this Mutation type, each gene can be replaced with any other possible gene on random basis.
In the Mutation Probability field, the probability of a mutation for each individual is specified; the usual value range is 0.01-0.05, with the default value of 0.05.


Tip: An excessively large Mutation Probability value will cause the search to become a primitive random search.


GA Subtypes and Replacement Schemas

GA subtype defines the way that GA creates new individuals and replaces old individuals when creating next generations.
GA subtype can be set in the Genetic Algorithm Subtype drop-down list.
Two GA subtypes are available: Basic and Incremental.
Basic subtype is the standard so-called “simple genetic algorithm”. This algorithm uses non-overlapping generations and Elitism mode (optional). For each generation, the algorithm creates an entirely new population of individuals (if the Elitism option is selected, the most fit individuals move on to the next generation).


Elitism

Elitism mode, available for the Basic GA subtype only, allows the fittest individuals to survive and produce “children” over a span of multiple generations.
Incremental subtype does not create an entirely new population for each generation. It simply adds only one or two children to the population each time the next generation is created. These one or two children replace one or two individuals in the previous generation. The individuals to be replaced by the children are chosen according to the Replacement Schemas used.


Replacement Schemas

Replacement Schemas are available for Incremental subtype only. Schemas define how a new generation should be integrated into the population. There are three schemes available: Worst, Parent, and Random.


Worst – least fit individuals are replaced
Parent – parent individuals are replaced
Random – individuals are replaced randomly


Offspring Number

Offspring Number is the number of children to be added each time that a new generation is created. One or Two children can be added.

  • The fitness of each individual is evaluated and the least fit individuals discarded.
  • The process is repeated, until the specified degree of convergence or generation number is reached (depends on GA setting selected).


GA Convergence Type

Genetic Algorithms optimization process has no implicit final result and thus can proceed forever. Therefore, an "ending-point" must be specified, indicating when the optimization process must come to an end.

Two GA optimization "ending-point" criteria types can be selected: Terminate-Upon-Generation and Terminate-Upon-Convergence.

Terminate-Upon-Generation will stop the optimization process once the specified Maximum Number of Generations is reached.

Terminate-Upon-Convergence will stop the optimization process once the defined Convergence Rate is reached, or once the defined Maximum Number of Generations is reached.

GA optimization "ending-point" criterion is selected in the Conversion Type drop-down list.

The desired Maximum Number of Generations, Minimum Number of Generations, and Conversion Rate can be set in the corresponding text boxes.


Convergence Rate

Convergence Rate of generations is the ratio between the Convergence value of the two most recent generations and the Convergence value of the current generation and the generation N generations ago.
GA calculation is stopped after meeting С [x – N] / C [x] >= P condition where:
x – ordinal number of the current generation;
С[x] – convergence value of the two most recent generations;
N – defined minimal number of the generations;
P - convergence rate; values used are usually close to 1, with the default value of 0.99.


Note: Convergence Rate is not calculated for generations that have an ordinal number less than the defined minimal number of the generations.


Further Reading

This is only a brief introduction to genetic algorithms. We recommend that you learn more about GA on the Internet, e.g. [Wikipedia]