Introduction & Summary
Computer system users, administrators, and designers usually have a goal of highest performance at lowest cost. Modeling and simulation of system design trade off is good preparation for design and engineering decisions in real world jobs.
In this Web site we study computer systems modeling and simulation. We need a proper knowledge of both the techniques of simulation modeling and the simulated systems themselves.
The scenario described above is but one situation where computer simulation can be effectively used. In addition to its use as a tool to better understand and optimize performance and/or reliability of systems, simulation is also extensively used to verify the correctness of designs. Most if not all digital integrated circuits manufactured today are first extensively simulated before they are manufactured to identify and correct design errors. Simulation early in the design cycle is important because the cost to repair mistakes increases dramatically the later in the product life cycle that the error is detected. Another important application of simulation is in developing "virtual environments". e.g. for training. Analogous to the holodeck in the popular science-fiction television program Star Trek, simulations generate dynamic environments with which users can interact "as if they were really there." Such simulations are used extensively today to train military personnel for battlefield situations, at a fraction of the cost of running exercises involving real tanks, aircraft, etc.
Dynamic modeling in organizations is the collective ability to understand the implications of change over time. This skill lies at the heart of successful strategic decision process. The availability of effective visual modeling and simulation enables the analyst and the decision-maker to boost their dynamic decision by rehearsing strategy to avoid hidden pitfalls.
System Simulation is the mimicking of the operation of a real system, such as the day-to-day operation of a bank, or the value of a stock portfolio over a time period, or the running of an assembly line in a factory, or the staff assignment of a hospital or a security company, in a computer. Instead of building extensive mathematical models by experts, the readily available simulation software has made it possible to model and analyze the operation of a real system by non-experts, who are managers but not programmers.
A simulation is the execution of a model, represented by a computer program that gives information about the system being investigated. The simulation approach of analyzing a model is opposed to the analytical approach, where the method of analyzing the system is purely theoretical. As this approach is more reliable, the simulation approach gives more flexibility and convenience. The activities of the model consist of events, which are activated at certain points in time and in this way affect the overall state of the system. The points in time that an event is activated are randomized, so no input from outside the system is required. Events exist autonomously and they are discrete so between the execution of two events nothing happens. The SIMSCRIPT provides a process-based approach of writing a simulation program. With this approach, the components of the program consist of entities, which combine several related events into one process.
In the field of simulation, the concept of "principle of computational equivalence" has beneficial implications for the decision-maker. Simulated experimentation accelerates and replaces effectively the "wait and see" anxieties in discovering new insight and explanations of future behavior of the real system.
Consider the following scenario. You are the designer of a new switch for asynchronous transfer mode (ATM) networks, a new switching technology that has appeared on the marketplace in recent years. In order to help ensure the success of your product in this is a highly competitive field, it is important that you design the switch to yield the highest possible performance while maintaining a reasonable manufacturing cost. How much memory should be built into the switch? Should the memory be associated with incoming communication links to buffer messages as they arrive, or should it be associated with outgoing links to hold messages competing to use the same link? Moreover, what is the best organization of hardware components within the switch? These are but a few of the questions that you must answer in coming up with a design.
With the integration of artificial intelligence, agents and other modeling techniques, simulation has become an effective and appropriate decision support for the managers. By combining the emerging science of complexity with newly popularized simulation technology, the PricewaterhouseCoopers, Emergent Solutions Group builds a software that allows senior management to safely play out "what if" scenarios in artificial worlds. For example, in a consumer retail environment it can be used to find out how the roles of consumers and employees can be simulated to achieve peak performance.
Statistics for Correlated Data
We concern ourselves with n realizations that are related to time, that is having n correlated observations; the estimate of the mean is given by
mean = S Xi / n, where the sum is over i = 1 to n.
A = S [1 - j/(m + 1)] r j,x
where the sum is over j = 1 to m, then the estimated variance is:
[1 + 2A ] S 2 / n
Where S 2 = the usual variance estimate
r j,x = the jth coefficient of autocorrelation
m = the maximum time lag for which autocorrelations are computed, such that j = 1, 2, 3. m
As a good rule of thumb, the maximum lag for which autocorrelations are computed should be approximately 2% of the number of n realizations, although each r j,x could be tested to determine if it is significantly different from zero.
Sample Size Determination: We can calculate the minimum sample size required by
n = [1 + 2A ] S 2 t 2 / ( d 2 mean 2 )
Application: A pilot run was made of a model, observations numbered 150, the mean was 205.74 minutes and the variance S 2 = 101, 921.54, estimate of the lag coefficients were computed as: r 1,x = 0.3301 r 2,x = 0.2993, and r 3,x = 0.1987. Calculate the minimum sample size to assure the estimate lies within + d = 10% of the true mean with a = 0.05.
n = [ (1.96) 2 (101,921.54) < 1 + 2 [(1-1/4) 0.3301 + (1 - 2/4) 0.2993 + (1- 3/4) 0.1987]> ] / (0.1) 2 (205.74) 2
What Is Central Limit Theorem?
For practical purposes, the main idea of the central limit theorem (CLT) is that the average of a sample of observations drawn from some population with any shape-distribution is approximately distributed as a normal distribution if certain conditions are met. In theoretical statistics there are several versions of the central limit theorem depending on how these conditions are specified. These are concerned with the types of assumptions made about the distribution of the parent population (population from which the sample is drawn) and the actual sampling procedure.
One of the simplest versions of the theorem says that if is a random sample of size n (say, n larger than 30) from an infinite population, finite standard deviation. then the standardized sample mean converges to a standard normal distribution or, equivalently, the sample mean approaches a normal distribution with mean equal to the population mean and standard deviation equal to standard deviation of the population divided by the square root of sample size n. In applications of the central limit theorem to practical problems in statistical inference, however, statisticians are more interested in how closely the approximate distribution of the sample mean follows a normal distribution for finite sample sizes, than the limiting distribution itself. Sufficiently close agreement with a normal distribution allows statisticians to use normal theory for making inferences about population parameters (such as the mean )
using the sample mean, irrespective of the actual form of the parent population.
It is well known that whatever the parent population is, the standardized variable will have a distribution with a mean 0 and standard deviation 1 under random sampling. Moreover, if the parent population is normal, then it is distributed exactly as a standard normal variable for any positive integer n. The central limit theorem states the remarkable result that, even when the parent population is non-normal, the standardized variable is approximately normal if the sample size is large enough (say > 30). It is generally not possible to state conditions under which the approximation given by the central limit theorem works and what sample sizes are needed before the approximation becomes good enough. As a general guideline, statisticians have used the prescription that if the parent distribution is symmetric and relatively short-tailed, then the sample mean reaches approximate normality for smaller samples than if the parent population is skewed or long-tailed.
In this lesson, we will study the behavior of the mean of samples of different sizes drawn from a variety of parent populations. Examining sampling distributions of sample means computed from samples of different sizes drawn from a variety of distributions, allow us to gain some insight into the behavior of the sample mean under those specific conditions as well as examine the validity of the guidelines mentioned above for using the central limit theorem in practice.
Under certain conditions, in large samples, the sampling distribution of the sample mean can be approximated by a normal distribution. The sample size needed for the approximation to be adequate depends strongly on the shape of the parent distribution. Symmetry (or lack thereof) is particularly important. For a symmetric parent distribution, even if very different from the shape of a normal distribution, an adequate approximation can be obtained with small samples (e.g. 10 or 12 for the uniform distribution). For symmetric short-tailed parent distributions, the sample mean reaches approximate normality for smaller samples than if the parent population is skewed and long-tailed. In some extreme cases (e.g. binomial) samples sizes far exceeding the typical guidelines (e.g. 30) are needed for an adequate approximation. For some distributions without first and second moments (e.g. Cauchy), the central limit theorem does not hold.
What Is a Least Squares Model?
Many problems in analyzing data involve describing how variables are related. The simplest of all models describing the relationship between two variables is a linear, or straight-line, model. The simplest method of fitting a linear model is to "eye-ball'' a line through the data on a plot. A more elegant, and conventional method is that of "least squares", which finds the line minimizing the sum of distances between observed points and the fitted line.
Realize that fitting the "best'' line by eye is difficult, especially when there is a lot of residual variability in the data.
Know that there is a simple connection between the numerical coefficients in the regression equation and the slope and intercept of regression line.
Know that a single summary statistic like a correlation coefficient does not tell the whole story. A scatter plot is an essential complement to examining the relationship between the two variables.
ANOVA: Analysis of Variance
The tests we have learned up to this point allow us to test hypotheses that examine the difference between only two means. Analysis of Variance or ANOVA will allow us to test the difference between 2 or more means. ANOVA does this by examining the ratio of variability between two conditions and variability within each condition. For example, say we give a drug that we believe will improve memory to a group of people and give a placebo to another group of people. We might measure memory performance by the number of words recalled from a list we ask everyone to memorize. A t-test would compare the likelihood of observing the difference in the mean number of words recalled for each group. An ANOVA test, on the other hand, would compare the variability that we observe between the two conditions to the variability observed within each condition. Recall that we measure variability as the sum of the difference of each score from the mean. When we actually calculate an ANOVA we will use a short-cut formula
Thus, when the variability that we predict (between the two groups) is much greater than the variability we don't predict (within each group) then we will conclude that our treatments produce different results.
Exponential Density Function
An important class of decision problems under uncertainty concerns the chance between events. For example, the chance of the length of time to next breakdown of a machine not exceeding a certain time, such as the copying machine in your office not to break during this week.
Exponential distribution gives distribution of time between independent events occurring at a constant rate. Its density function is:
f(t) = l exp(- l t),
where l is the average number of events per unit of time, which is a positive number.
The mean and the variance of the random variable t (time between events) are 1/ l. and 1/ l 2. respectively.
Applications include probabilistic assessment of the time between arrival of patients to the emergency room of a hospital, and arrival of ships to a particular port.
Comments: Special case of both Weibull and gamma distributions.
You may like using Exponential Applet to perform your computations.
You may like using the following Lilliefors Test for Exponentially to perform the goodness-of-fit test.
An important class of decision problems under uncertainty is characterized by the small chance of the occurrence of a particular event, such as an accident. Gives probability of exactly x independent occurrences during a given period of time if events take place independently and at a constant rate. May also represent number of occurrences over constant areas or volumes. The following statements describe the Poisson Process.
- The occurrences of the events are independent. The occurrence of events from a set of assumptions in an interval of space or time has no effect on the probability of a second occurrence of the event in the same, or any other, interval.
- Theoretically, an infinite number of occurrences of the event must be possible in the interval.
- The probability of the single occurrence of the event in a given interval is proportional to the length of the interval.
- In any infinitesimally small portion of the interval, the probability of more than one occurrence of the event is negligible.
Poisson process are often used, for example in quality control, reliability, insurance claim, incoming number of telephone calls, and queuing theory.
An Application: One of the most useful applications of the Poisson Process is in the field of queuing theory. In many situations where queues occur it has been shown that the number of people joining the queue in a given time period follows the Poisson model. For example, if the rate of arrivals to an emergency room is l per unit of time period (say 1 hr), then:
P ( n arrivals) = l n e - l / n!
The mean and variance of random variable n are both l. However if the mean and variance of a random variable having equal numerical values, then it is not necessary that its distribution is a Poisson.
P ( 0 arrival) = e - l P ( 1 arrival) = l e - l / 1! P ( 2 arrival) = l 2 e - l / 2!
and so on. In general:
P ( n+1 arrivals ) = l Pr ( n arrivals ) / n.
You may like using Poisson Applet to perform your computations.
Goodness-of-Fit for Poisson