Reinforcement Learning

The New Frontier of Telecommunications

scenario

Reinforcement Learning

Reinforcement Learning (RL) is, alongside supervised and unsupervised learning, one of the three fundamental paradigms of Machine Learning. These are techniques aimed at enabling a computer to learn to perform a task without being explicitly programmed to do so.

Over the years, RL has proven to be an extremely effective approach in numerous fields. One such field is robotics, allowing machines to perform complex operations like movement, object manipulation, or social navigation. Another notable example is board games, where, thanks to a RL algorithm developed by Google in 2006, it became possible to defeat the world champion of the game Go, historically the last of such games (like chess) to remain undefeated by artificial intelligence.

The general operation of an RL algorithm is quite simple and intuitive, having numerous similarities with how humans learn from experience. The main elements that constitute any RL algorithm, regardless of its specific type, are three:

The main elements of an RL algorithm

The entity that interacts with the environment to learn how to act to achieve certain objectives or maximize some notion of cumulative reward.

The context in which the agent operates and interacts. It encompasses all the relevant information needed for the agent to make decisions and take actions, as well as all the rules, dynamics, and constraints that govern its behavior.

A quantitative measure associated with an agent's action, which the agent receives from the environment as feedback in response to the action taken. This feedback is used by the agent to assess the effectiveness of its actions and guide its decision-making process, distinguishing between actions that help achieve the objectives and those that hinder them.

The core of an RL algorithm

The core of an RL algorithm lies in the way the agent interacts with the environment based on the feedback provided by the reward. The agent explores the environment by taking actions according to decision-making processes and formal criteria known as the policy. On one hand, it seeks to exploit the acquired knowledge to choose actions that lead to optimal outcomes; on the other hand, it explores new scenarios by selecting actions randomly to step out of its "comfort zone." This allows it to adapt to potential changes in the environment and consider solutions it might not have otherwise considered. Through this process, the agent undergoes the learning phase, during which the policy is updated based on the received feedback.

RL in Telecommunications

Numerous technologies are emerging and revolutionizing the world of telecommunications. Networks are becoming increasingly complex, and the performance expected from them is ever higher. It is now clear to everyone that AI is an essential tool to meet these changes, and RL plays a predominant role in this scenario.

The enormous complexity of networks makes them easily escape human supervision, creating the need to rely on algorithms that can reach where humans cannot. RL has proven to be a tool capable of achieving performance far above human levels and, moreover, autonomously, hence one of the reasons it fits into this scenario.

Particularly effective and suitable in the context of telecommunications is the approach that involves using Deep Learning in RL algorithms: Deep Reinforcement Learning. In contexts where the environment is particularly complex (like a network), the agent can use a neural network to process the information it receives from the environment to estimate the quantities that allow it to evaluate the correct action among those possible. This drastically increases the performance and effectiveness of these algorithms and overcomes a series of problems that RL algorithms naturally.

Use case

RL, thanks to its versatility, finds applications in numerous scenarios, even very different from each other. Some examples of interest in the telco world are:

  • Solving "classic" Machine Learning tasks: the problem is restructured as a decision-making task. An example is time series forecasting. The agent learns to predict future values (action) based on past observations (environment) and an evaluation metric, such as prediction accuracy (reward).

  • Routing optimization: based on a metric of interest to be optimized (e.g., bandwidth, delay, etc.), the agent, when a communication request occurs between two nodes, chooses the optimal path to optimize the metric in the long term. This means that optimization will also consider possible future communications following a proactive approach.

Use case noc intelligente

RL can be used to create a Network Operation Center (NOC) as it is useful in the following aspects:

  • Network automation: The agent collects and processes network and telemetry data to perform actions aimed at ensuring the correct and efficient functioning of the network, taking corrective measures when a fault occurs, or acting proactively on the same. Examples are the dynamic configuration of devices, traffic engineering, or fault prevention.

  • Network optimization: Based on metrics of interest, the algorithm can perform optimization tasks to increase network performance, such as routing optimization capable of outperforming common protocols like OSPF or EIGRP.

  • Action Recommender Engine (ARE): A recommendation system that supports NOC operators by suggesting steps to take when a problem arises, representing an evolution of the widely used codebook.

Advantages of Reinforcement Learning

The role of Net Reply

Technologies such as 5G, IoT, Edge Computing, and Digital Twins lay their implementation foundations, among others, on AI, making it clear how new technologies and new networks can no longer be thought of without this tool. Mastering Artificial Intelligence is and will be essential to ensuring quality and up-to-date solutions.

At Net Reply, the study of these tools is continuously growing, and the drive for innovation is ever-increasing. We are committed to adopting and experimenting with new technologies to offer cutting-edge and evolving solutions. Our methodology embraces innovation to give new life to solutions consolidated over time, allowing us to enhance existing resources while always keeping a constant and attentive eye on the future.