Recently I published my experiments on using deep reinforcement learning for portfolio allocation. Few open problem I struggled and still don't have an answer are
1. For the problem specifically like stock market, how one can decide when to stop the training of RL agent ? i.e. What is the metric that can determine agent has been trained ?
2. How can one determine the deployed agent is now deviating from the trained data distribution so that it can be retrained ?
I appreciate any comments/suggestions or any mistake I might have made.
Here are the links https://medium.com/@vivekys/value-investing-machine-d2718d35d19b
https://medium.com/@vivekys/value-investing-machine-2a-43ce2d05f2a2
https://medium.com/@vivekys/value-investing-machine-2b-638d71da7e56
https://medium.com/@vivekys/value-investing-machine-3-4c1053221940
https://medium.com/@vivekys/value-investing-machine-4-eede8823632