2 possible strategy of handling data with redshift
- distribution style
- even distribution
- round-robin over all slices to avhieve load-balancing tasks among each CPU
- Good if a table wont’t be joined which will result in lots of shuffling of data.
- all distribution A.K.A boardcasting
- tables will be replicated on all slices to speed up joins
- When join, there are no shuffling.
- auto distribution
- leaving decision to redshift
- key distribution
- rows having similar values are placed in the same slice
- sorting key
- Define its columns as sort key
- rows are sorted befoer distribution to slices
- reduce the query time for each node
- good for columns that are used frequently in sorting such as primary of dimension table