Deterministic and Stochastic Simulation of the COVID-19 Epidemic with the SEIR Model

This work regards the simulation of the spread of the COVID-19 disease in a community by applying the deterministic and stochastic Susceptible-Exposed-Infective-Recovered (SEIR) epidemic models. The developed computational method for the stochastic variant allows to realistically simulate the spread of COVID-19 in a medium-sized community and to study the effect of preventive measures such as quarantine and vaccination. The results of the simulations are compared with the deterministic version of the SEIR model. The comparison makes it possible to conclude that the epidemic outbreak can be prevented even though the basic reproduction number is greater than one.


I. Introduction
An epidemic is the rapid spread of an infectious disease among a population, producing many infected individuals in a short period of time. The mathematical modeling of an epidemic allows to describe the spreading of the infectious disease. This information is of vital importance to the definition of public policies to control the spread of the epidemic.
There are two main broad classes of mathematical epidemic models: deterministic and stochastic. In the deterministic models, the epidemic process is described by a system of differential equations that, together with the initial conditions, determines the evolution of the epidemic. The community is considered homogeneous and it is assumed that individuals mix uniformly with each other. Therefore, these models do not incorporate any arbitrariness [1]. In the stochastic models, the spread of an infectious disease is a random process that takes place locally through the close contact with infectious individuals. Stochastic models are considered more realistic than deterministic models [2]. These models are based on the Monte Carlo approach. Essentially, this consists in the repetition of a random behavior of the population, a large number of times. Such procedure usually involves considerable computing power. The Susceptible-Exposed-Infectious-Removed (SEIR) mathematical epidemic model is the most appropriate to describe the spread of an infectious disease with latency period, like COVID-19. In the generic SEIR model, the population is divided into four compartments that represent susceptible, rufino@ipb.pt exposed, infectious and recovered individuals. It is known that the SARS-CoV-2, the causative agent of COVID-19, has an incubation period (time between infection and onset of symptoms [3]). The SEIR model also assumes that an individual is contagious only when in the infectious period. This compartmentalization does not correspond exactly to the reality of COVID-19, because the duration of the latency and contagion periods vary from person to person [3]. Nevertheless, the SEIR model is currently the one that most closely reproduces the characteristics of the propagation of the COVID-19 disease.
The main focus of this paper is the stochastic SEIR epidemic model and its computational simulation with regard to the spreading of the COVID-19 infectious disease in a small-sized community (20000 individuals), using disease parameters available in the literature. Different scenarios are explored, corresponding to hypothetical control measures based on quarantine and vaccination. Specifically, we look for the critical vaccination coverage needed to prevent a major epidemic outbreak, considering different quarantine scenarios.
The computational strategy is similar to the one used in a previous work [4], adapted for a more complete parametric study. Each execution of the stochastic SEIR model, with specific parameters, incorporates randomness by running a certain number of independent simulations. The simulation set is executed in parallel, in a small HPC cluster; at the end of the model execution, results from individual simulations are joined.
The rest of this paper is organized as follows. In section II, the new version of the SEIR method used in this investigation is presented, as well as some details regarding its properties. Details about the computational implementation and simulations are presented in Section III. In section IV, the results of the computer simulations with the stochastic SEIR model are presented and compared with the deterministic model. The paper ends with some final considerations presented in Section V.
II. SEIR Epidemic Model The basic SEIR model considers that, at each point in time (day), each individual belongs to one (and only one) of four compartments: S (susceptible to catch the disease), E (exposed infected, but does not transmit the disease), I (infective, meaning he has got the disease and is able to infect others) and R (recovered from infection and can no longer be infected again).
In this work, the generic SEIR model is modified to reflect two important preventive measures: quarantine and vaccination. A new Q compartment corresponds to infected individuals, placed in isolation (quarantine), who cannot infect others. Vaccinated individuals are included in compartment R, as they cannot be infected again. Another new compartment, D, is for individuals who died because of the infectious disease. The model assumes that births and natural deaths are balanced, and that the population members only decrease due to the COVID-19, as dictated by the infectious fatality rate (IFR) of this disease.

A. Deterministic SEIR Model
Consider a population of size n. Let s(t), e(t), i(t), q(t), r(t) and d(t) denote, respectively, the fraction of the community that belongs to the compartments S, E, I, Q, R and D (illustrated in Figure 1), at a given day t > 0. Assuming that the population is closed (meaning that there are no births, deaths and immigration or emigration during the study period), then for any t.
Assuming that the population is homogeneous, individuals mix uniformly, and s(t), e(t), i(t), q(t), r(t) and d(t) are differentiable functions, then the variation of the fraction of the population in compartments S, E, I, Q, R and D, along the time t (in days), is given by the system of differential equations In the system of ordinary differential equations (1), P is the rate of contacts between susceptible and infectious individuals, 1/£ is the incubation period, 1 / y is the infectious period, x is the fraction of infectious quarantined, 5 is the fraction of infectious that died, and 1/p is the quarantine period. The system (1), along with the initial values s(0) = 1 -i0, e(0) = 0, ¿(0) = i0, q(0) = 0, r(0) = 0 and d(0) = 0 (where i0 > 0 represents the initial fraction of infective, assumed small), fully define the dynamics of the deterministic SEIR model considered in this work. The initial value problem (1) can be rapidly solved by a numerical method; also, the solving time does not depend on the dimension n of the population.
The parameters e and y are disease-specific for COVID-19, while the contact rate p is behaviour-specific and is different for each region, country or culture, and can vary in time to reflect societal and political actions [5]. The quarantine period 1/p is set by the health care policies of each country. In most European countries this period is 14 days. The fraction x of infectious individuals that are quarantined (or insulated) includes documented infected and hospitalized individuals. Its value depends on the number of tests performed.
The mortality rate 8 depends on the aggressiveness of the disease and the existence of more vulnerable susceptibles. The mortality rate is related to the IFR rate. In the general SEIR model case, and assuming that 8 << y, it is shown in [6] that IFR « -

B. Basic Reproduction Number
The basic reproduction number, denoted R0, is a measure of the potential that a disease has to trigger an epidemic outbreak. It is interpreted as the average number of new infections caused by an infectious individual when inserted in a community made up only of susceptible people [6]. When an infectious disease has R0 > 1 and an infected is inserted in a susceptible community, then an epidemic outbreak takes place, infecting a substantial part of the population. On the other hand, when R0 < 1, there is no risk of a major epidemic outbreak (for details see [1]). For the case of a single infected compartment (e.g. SIR model [1]), R0 is equal to the product of the infection rate (P) and the mean duration of the infection (1 /y). However, for models with several infected compartments, the basic reproduction number is defined as the number of new infections produced by a typical infective individual in a population at a disease-free equilibrium state (DFE) [7].
Next, we make an estimate of R0 for the SEIR model, defined by Equation (1), based on the methodology suggested by Driessche and Watnough [8]. According to these authors, R0 is the dominant eigenvalue of the next generation matrix G = F F _1, where F and V are the Jacobian matrices of the functions g and $ . g is the rate of appearance of new infection in infectious compartments (E, I and Q), given by and $ is the rate of transfer of individuals into (in alternate ways) and out of the same compartments, given by The next generation matrix is then and the spectral radius of this matrix gives the basic reproduction number of the SEIR model defined by the Equation (1): The basic reproduction number is a measure of the potential that a disease has to trigger an epidemic outbreak, which is why it is essential to estimate its value. Since the appearance of COVID-19, many works have been published on this topic. In [8] and [9] one can find a list of R0 values obtained by different researchers from all over the world, where the values collected vary widely between them, ranging from 1.95 to 6.49. Linka and co-authors [5] estimated the R0 values for the 27 countries in Europe based on data provided by the published health entities in each country. The values found for the beginning of the outbreak range from 0.91 (Latvia) to 6.33 (Germany), with the general value obtained for Europe being 4.22.
As a major outbreak will not occur if R0 < 1, it is then possible to estimate the fraction of the population that needs to be immune or resistant to an infection from the virus and the illness it causes, for an epidemic to be prevented. If immunization is done through vaccination, this fraction is called the critical vaccination coverage and is given by The critical vaccination coverage is a crucial quantity: if more than this fraction of the population is vaccinated, then the whole community is protected from an epidemic, and not only the vaccinated, a situation called herd immunity [2]. For the basic reproduction number R0 = 4.22, found for Europe [5], the herd immunity threshold would be 76%. If immunization is achieved only by contracting the disease, then herd immunity will only be achieved after at least 76% of the population has been infected. A country like Portugal, with a prevalence of 0.69% [10], is a long way from achieving herd immunity, even if unrecorded asymptomatic cases are included.
The value of the reproduction number varies with the evolution of the epidemic, due to the variation in the contact rate.
The contact rate p (the rate at which an infectious individual comes into contact and affects others), is not constant; instead, it is modulated by social behaviour and public health policies. The quantification of new infections during the COVID-19 pandemic, in a specific day t , is done through the effective reproduction number, Rt, which considers the contact rate at time t: P(t). Consequently, the effective reproduction number enables to measure the effects of public health interventions.

C. Stochastic SEIR Model
In a homogeneous community where individuals mix uniformly with each other, the size of an epidemic outbreak is determined by the basic reproduction number (fl0) and by the incubation and infectious periods (1/e and 1 / y), and there is no incertitude or randomness in the final number of infected. However, in some cases, this contradicts reality: for instance, the introduction of a small number of infected in a community may not necessarily have as a consequence a large outbreak (even if R0 >1); such situation can happen, for example, if the infectious isolate themselves from the rest of the community or, by chance, if their contacts are restricted to immune individuals. This motivates the formulation of models of a stochastic nature.
A stochastic version of the SEIR model incorporates the randomness inherent in the spread of an infectious disease. To introduce this version, we assume again a closed homogeneous and uniformly mixing community of n individuals. However, the number of susceptible, exposed, infectious, quarantined, recovered and dead individuals at time t , is now given by contact is made with an individual randomly selected from the community. Any susceptible that receives such contact immediately becomes exposed for a random period that follows an exponential distribution with mean 1 /e . The infectious periods are also independent and exponentially distributed with a mean 1/y. The quarantine Of) and death (S) rates, as well as the quarantine period, are assumed to be constant.
Despite assuming a closed homogeneous community, the stochastic SEIR model is more realistic than its deterministic version, once the number of infectious contacts, and the latency of infectious periods, are all random, varying among individuals.

D. Stochastic SEIR Model
The pseudo-code of the stochastic SEIR model is represented in Algorithm 1. Its inputs (line 1) are: initial number of infected (m ); infectious rate (P ); individuals remove rate (e) from compartment E; individuals remove rate (y) from compartment I; quarantine (f), vaccination ($) and death (S) rates. Algorithm 1 repeats (line 2) the simulation Nsim times. In each simulation, the population is represented by a vector pop of n cells. Each cell pop(i) stores the current status of the i'th individual (with i = 1,..., n): 0 for susceptible, >100 and <200 for exposed, >0 and <100 for infected, >200 for quarantined, -1 for recovered, -2 for vaccinated and -3 for dead.
In the first day (t = 0) of a simulation (line 3), an initial number i0 = m of infected individuals are randomly selected; for each of these individuals their specific cell in the population vector is set to p, (the infection period given by an exponential distribution with expected value 1 /y ); for the remaining individuals, that are susceptible, their cell value is set to 0. In the next days (t > 0) the model dynamics repeat (while loop in line 4). At each day, the vector cell of the exposed, infected and quarantined individuals decreases one unit (line 7); when it reaches 0 (again susceptible) is set to -1 (recovered) (line 8); when it reaches 100 (end of the exposed period) is set to P, 2021 16th Iberian Conference on Information Systems and Technologies (CISTI) 23 -26 June 2021, Chaves, Portugal ISBN: 978-989-54659-1-0 (infectious), where p¡ is randomly generated by an exponential distribution with parameter y (line 9); when it reaches 200 (end of the quarantine period) is set to -1 (recovered) (line 10). 1. inputs: rn, fi, e, y, x, # and S 2. for j -1 ... Nsim 3. choose rn random individuals and the correspoding infectious period 4. while 0 < pop(i) for some i 5. for i -1 . The simulation stops when there are no more exposed, infectious and quarantined persons (condition in line 4 is false).
To get an accurate probability distribution of the target variables, the number of simulations (Asim) should be relatively large. The variables targeted by this work are: the total number of infected and dead individuals, the duration of the epidemic outbreak, the maximum number of simultaneous infected individuals, the day in which this maximum happens (epidemic peak), and the total number of vaccinated individuals.
III. Com pu t a t io n a l Simulations Each execution of Algorithm 1 (execution of the stochastic SEIR model) involves a specific combination of input parameters, whose effect is sampled through Nsim simulations. To generate accurate probability distributions, Nsim should be sufficiently large. To conduct such number of simulations in a reasonable amount of time, a simple parallelization approach was applied to Algorithm 1: the main loop (line 2) is fully split into mutually exclusive subranges; each subrange translates in an independent job to be executed by a separate processor in an HPC cluster. This is possible because each simulation (each loop iteration) is a completely independent of the others, allowing for a simple SPMD (single program, multiple data) approach.
In addition, in this work we have also studied the effect of variations on certain input parameters (quarantine, vaccination and infectious contact rates). Each specific combination of input parameters implies a run of Algorithm 1. Thus, the parallelization approach described above was extended, to contemplate the need to generate different sets of cluster jobs.
The computational environment used for this work consisted on 8 homogeneous nodes of a small Linux cluster, with 32 CPUcores each (2 x AMD EPYC 7351 16-core CPUs per node), for a total of 256 cores. A single MATLAB installation, networkshared by all nodes, was also used (no distributed computing toolboxes of MATLAB were used). The implementation of Algorithm 1, and of its parallelization strategy, were both made in order to take maximum advantage of such environment.
Algorithm 1 was implemented as a MATLAB [11] program, but with the lower and upper limits of the simulations loop (line 2) as input parameters, thus restraining the loop to a subrange of simulations. The lower limit was used to seed the MATLAB random number generator, ensuring a different strain of random numbers per subrange. Each running instance of the program saves the results of its subrange of simulations in specific CSV files. These are later consolidated by a BASH script, which further explores MATLAB (through a secondary MATLAB program) to produce histograms and related graphics.
IV. Results a n d Discussion This section presents the results obtained with the stochastic SEIR model, by applying Algorithm 1 for the spread of COVID-19 in a community with a dimension similar to a small city (n = 20000 individuals), considering that a unique infectious individual (m = 1) is introduced initially in the community.

A. Parameters of the simulations
The model parameters 1/e and 1/y that were used (1/e = 3.7 and 1/y = 4.3) are the mean of the values presented in the literature (see, for instance, [5], [6], [9], [12], [13]). Concerning the infectious contact rate p, simulations assumed p = 0.5 and P = 1. Also, it was considered a constant death rate 8 = 0.0093, agreeing with the IFR rate of Portugal [10]. According to the Equation (7) and assuming d = 0 and x = 0 , the values of these parameters corresponds to R0 « 2.1when p = 0.5, and to R0 « 4.1 when p = 1 . These values aim to represent, respectively, a situation in which there is some care to prevent the transmission of the disease by the community, but without mandatory measures of social distancing, and the situation in which there are none preventive measures. The value R0 « 4.1 is close to the average value registered in Europe in the beginning of the first outbreak of COVID-19 [5].  The bimodal distribution, which is characteristic of the stochastic SIR method [2], is also observed here for the case of the SEIR method. Moreover, the simulation data reveals that a large number of simulations resulted in a small epidemic outbreak, and another portion of the simulations ended up in a major epidemic outbreak. The proportion of simulations that result in a small outbreak is approximately 0.34, corresponding to 3355 simulations in 10000 (in this case, 1521 individuals are infected); conversely the proportion of simulations that results in a major outbreak is 0.66, corresponding to 6645 simulations (in which case, 18479 individuals become infected). In short, without any preventive measures there is a 66% probability of a major epidemic outbreak lasting 296 days, during which 92% of the population becomes infected, of which 14% is infected simultaneously on the 75th day after the start of the pandemic. Considering an IFR « 4%, this corresponds to 736 deaths. Figure 3 shows the probabilities of a major epidemic outbreak and the corresponding basic reproduction number as a function of vaccination and quarantine rates. The basic reproduction number R0 was calculated using Equation (7). It is observable in Figure 3 a) that, for vaccination rates above 0.002 and quarantine rates above 0.1, the probability of a large outbreak is reduced to values below 0.05. However, this range of values corresponds to R0 > 1, indicating that, from a deterministic point of view, an epidemic outbreak will occur (Figure 3 b)). It is also verified that, from a deterministic standpoint, vaccination rates below 0.01 have little effect on the reduction of R0. According to Equation (7), for R0 < 1 it is necessary to verily 0.5^ + 0.764^ > 0.264. This inequality is verified for J = 0 if x > 0.35 and for x = 0 if J > 0.53.

B. Results and discussion
The discrepancy between the deterministic (Figure3 b)) and stochastic (Figure 3 a)) models, derives from the fact that the deterministic model considers that an infected individual infects always susceptible individuals, as long as they exist in sufficient quantity, while the stochastic model considers all possibilities, including that of not having contagion. These cases are accounted for in terms of the probability of a small outbreak. Vaccination acts directly on the compartment of the susceptible, reducing its number. As a result, there will be fewer people infected. Quarantine acts only on infectious, preventing them from infecting others. The higher the quarantine rate, the less likely it is that a susceptible person will come into contact with an infectious person and thus contracts the disease.
However, the results presented in Figure 3 a) show also that the combination of quarantine with vaccination, even with modest rates, allows to significantly reduce the total number of infected. Therefore, in a scenario in which a vaccine (no fully effective), is made progressively available to the community, the isolation of the detected infectious can decisively contribute to a reduction in the risk of occurring a major epidemic outbreak. The results also show that it is possible to prevent a major epidemic outbreak with less than critical vaccination coverage.

16th Iberian Conference on Information Systems and Technologies (CISTI)
23 -26 June 2021, Chaves, Portugal ISBN: 978-989-54659-1-0 According to Equation (8) and considering f = 0.5, this rate is near 52%, which corresponds to the vaccinating of about 10500 individuals. As can be seen in Figure 3 a), this is possible even for vaccination rates below 0.01, as long as there is an adequate quarantine rate. For example, with a vaccination rate of 0 = 0.003, which involves vaccinating approximately 3559 people, the probability of a major outbreak is reduced to around 0.02.  Figure 4 shows the same results as in Figure 3, but with f = 1. It may now be observed that the probabilities of a major outbreak are greater. With d = 0.01 and x = 0.2 , the probability of a major outbreak remains high (about 0.6). In the same way, the corresponding values of R0 remain above 2.5. In these extreme circumstances, R0 < 1 only if 0 > 0.764, with X = 0, or else for x = 1, with 0 = 0 . This shows that when a contagion rate is very high, only a large-scale containment measurement application can contain the epidemic outbreak.
V. Conclusion The developed computational method, based on the SEIR statistical model, allows to realistically simulate the spread of COVID-19 in a medium-sized community and to study the effect of the level of preventive measures such as quarantine and vaccination. The risk of an epidemic outbreak can be reversed by preventive measures such as vaccination or quarantine. Also, if some social distance measures are in place, which reduces the rate of contagion to a moderate value, it is possible to prevent the appearance of a major epidemic outbreak, by vaccinating only a small fraction of the population, well below the critical vaccination coverage.
In comparison with the stochastic model, the deterministic model predicts that a large outbreak will occur for the two cases studied. This occurs because the deterministic model considers that an infected individual infects always susceptible individuals, as long as they exist in sufficient quantity. Despite its realism, stochastic SElR methods demand considerable computational power to ensure that the empirical frequency distribution reflects all probabilities of occurrence of the various possible scenarios. This is a motivation for further improving their implementation.