TY - GEN
T1 - Multi-band Environments for Optical Reinforcement Learning Gym for Resource Allocation in Elastic Optical Networks
AU - Morales, Patricia
AU - Franco, Patricia
AU - Lozada, Astrid
AU - Jara, Nicolas
AU - Calderon, Felipe
AU - Pinto-Rios, Juan
AU - Leiva, Ariel
N1 - Publisher Copyright:
© 2021 IFIP TC6 WG6.10.
PY - 2021/6/28
Y1 - 2021/6/28
N2 - The use of additional fiber bands for optical communications-known as Multi-band or Band-division multiplexing (BDM)-allows to increase the traffic served in transparent optical networks. In recent years, many proposals have emerged as a solution for resource allocation in such multi-band architectures. This work presents a novel approach based on reinforcement learning (RL) techniques to accommodate multi-band elastic optical network resources. Two new environments were implemented and added to the Optical-RL-Gym toolkit considering four scenarios with different band availability. Six agents were tested in four real network topologies, contrasting their episode rewards on a large number of training steps. Results show Trust Region Policy Optimization (TRPO) as the best performing agent, with consistent output across all the scenarios and network topologies considered. In addition, we illustrate the blocking probability behavior in relation to the traffic load, and band usage distribution, allowing further discussions.
AB - The use of additional fiber bands for optical communications-known as Multi-band or Band-division multiplexing (BDM)-allows to increase the traffic served in transparent optical networks. In recent years, many proposals have emerged as a solution for resource allocation in such multi-band architectures. This work presents a novel approach based on reinforcement learning (RL) techniques to accommodate multi-band elastic optical network resources. Two new environments were implemented and added to the Optical-RL-Gym toolkit considering four scenarios with different band availability. Six agents were tested in four real network topologies, contrasting their episode rewards on a large number of training steps. Results show Trust Region Policy Optimization (TRPO) as the best performing agent, with consistent output across all the scenarios and network topologies considered. In addition, we illustrate the blocking probability behavior in relation to the traffic load, and band usage distribution, allowing further discussions.
KW - multi-band optical networks
KW - reinforcement learning
KW - resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85112246722&partnerID=8YFLogxK
U2 - 10.23919/ONDM51796.2021.9492435
DO - 10.23919/ONDM51796.2021.9492435
M3 - Conference contribution
AN - SCOPUS:85112246722
T3 - 25th International Conference on Optical Network Design and Modelling, ONDM 2021
BT - 25th International Conference on Optical Network Design and Modelling, ONDM 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Optical Network Design and Modelling, ONDM 2021
Y2 - 28 June 2021 through 1 July 2021
ER -