Developments in the field of statistical data analysis often parallel or follow advancements in other fields to which statistical methods are fruitfully applied. Because practitioners of the statistical analysis often address particular applied decision problems, methods developments is consequently motivated by the search to a better decision making under uncertainties.
Decision making process under uncertainty is largely based on application of statistical data analysis for probabilistic risk assessment of your decision. Managers need to understand variation for two key reasons. First, so that they can lead others to apply statistical thinking in day to day activities and secondly, to apply the concept for the purpose of continuous improvement. This course will provide you with hands-on experience to promote the use of statistical thinking and techniques to apply them to make educated decisions whenever there is variation in business data. Therefore, it is a course in statistical thinking via a data-oriented approach.
Statistical models are currently used in various fields of business and science. However, the terminology differs from field to field. For example, the fitting of models to data, called calibration, history matching, and data assimilation, are all synonymous with parameter estimation.
Your organization database contains a wealth of information, yet the decision technology group members tap a fraction of it. Employees waste time scouring multiple sources for a database. The decision-makers are frustrated because they cannot get business-critical data exactly when they need it. Therefore, too many decisions are based on guesswork, not facts. Many opportunities are also missed, if they are even noticed at all.
Knowledge is what we know well. Information is the communication of knowledge. In every knowledge exchange, there is a sender and a receiver. The sender make common what is private, does the informing, the communicating. Information can be classified as explicit and tacit forms. The explicit information can be explained in structured form, while tacit information is inconsistent and fuzzy to explain. Know that data are only crude information and not knowledge by themselves.
Data is known to be crude information and not knowledge by itself. The sequence from data to knowledge is: from Data to Information, from Information to Facts, and finally, from Facts to Knowledge. Data becomes information, when it becomes relevant to your decision problem. Information becomes fact, when the data can support it. Facts are what the data reveals. However the decisive instrumental (i.e. applied) knowledge is expressed together with some statistical degree of confidence.
Fact becomes knowledge, when it is used in the successful completion of a decision process. Once you have a massive amount of facts integrated as knowledge, then your mind will be superhuman in the same sense that mankind with writing is superhuman compared to mankind before writing. The following figure illustrates the statistical thinking process based on data in constructing statistical models for decision making under uncertainties.
The above figure depicts the fact that as the exactness of a statistical model increases, the level of improvements in decision-making increases. That's why we need statistical data analysis. Statistical data analysis arose from the need to place knowledge on a systematic evidence base. This required a study of the laws of probability, the development of measures of data properties and relationships, and so on.
Statistical inference aims at determining whether any statistical significance can be attached that results after due allowance is made for any random variation as a source of error. Intelligent and critical inferences cannot be made by those who do not understand the purpose, the conditions, and applicability of the various techniques for judging significance.
Considering the uncertain environment, the chance that "good decisions" are made increases with the availability of "good information." The chance that "good information" is available increases with the level of structuring the process of Knowledge Management. The above figure also illustrates the fact that as the exactness of a statistical model increases, the level of improvements in decision-making increases.
Knowledge is more than knowing something technical. Knowledge needs wisdom. Wisdom is the power to put our time and our knowledge to the proper use. Wisdom comes with age and experience. Wisdom is the accurate application of accurate knowledge and its key component is to knowing the limits of your knowledge. Wisdom is about knowing how something technical can be best used to meet the needs of the decision-maker. Wisdom, for example, creates statistical software that is useful, rather than technically brilliant. For example, ever since the Web entered the popular consciousness, observers have noted that it puts information at your fingertips but tends to keep wisdom out of reach.
Almost every professionals need a statistical toolkit. Statistical skills enable you to intelligently collect, analyze and interpret data relevant to their decision-making. Statistical concepts enable us to solve problems in a diversity of contexts. Statistical thinking enables you to add substance to your decisions.
We will apply the basic concepts and methods of statistics you've already learned in the previous statistics course to the real world problems. The course is tailored to meet your needs in the statistical business-data analysis using widely available commercial statistical computer packages such as SAS and SPSS. By doing this, you will inevitably find yourself asking questions about the data and the method proposed, and you will have the means at your disposal to settle these questions to your own satisfaction. Accordingly, all the applications problems are borrowed from business and economics. By the end of this course you'll be able to think statistically while performing any data analysis.
There are two general views of teaching/learning statistics: Greater and Lesser Statistics. Greater statistics is everything related to learning from data, from the first planning or collection, to the last presentation or report. Lesser statistics is the body of statistical methodology. This is a Greater Statistics course.
There are basically two kinds of "statistics" courses. The real kind shows you how to make sense out of data. These courses would include all the recent developments and all share a deep respect for data and truth. The imitation kind involves plugging numbers into statistics formulas. The emphasis is on doing the arithmetic correctly. These courses generally have no interest in data or truth, and the problems are generally arithmetic exercises. If a certain assumption is needed to justify a procedure, they will simply tell you to "assume the. are normally distributed" -- no matter how unlikely that might be. It seems like you all are suffering from an overdose of the latter. This course will bring out the joy of statistics in you.
Statistics is a science assisting you to make decisions under uncertainties (based on some numerical and measurable scales). Decision making process must be based on data neither on personal opinion nor on belief.
It is already an accepted fact that "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." So, let us be ahead of our time.
Popular Distributions and Their Typical Applications
Application: Gives probability of exactly successes in n independent trials, when probability of success p on single trial is a constant. Used frequently in quality control, reliability, survey sampling, and other industrial problems.
Example: What is the probability of 7 or more "heads" in 10 tosses of a fair coin?
Comments: Can sometimes be approximated by normal or by Poisson distribution.
Application: Gives probability of exactly n i outcomes of event i, for i = 1, 2. k in n independent trials when the probability p i of event i in a single trial is a constant. Used frequently in quality control and other industrial problems.
Example: Four companies are bidding for each of three contracts, with specified success probabilities. What is the probability that a single company will receive all the orders?
Comments: Generalization of binomial distribution for ore than 2 outcomes.
Application: Gives probability of picking exactly x good units in a sample of n units from a population of N units when there are k bad units in the population. Used in quality control and related applications.
Example: Given a lot with 21 good units and four defective. What is the probability that a sample of five will yield not more than one defective?
Comments: May be approximated by binomial distribution when n is small related to N.
Application: Gives probability of requiring exactly x binomial trials before the first success is achieved. Used in quality control, reliability, and other industrial situations.
Example: Determination of probability of requiring exactly five tests firings before first success is achieved.
Application: Gives probability of exactly x failures preceding the sth success.
Example: What is the probability that the third success takes place on the 10th trial?
Application: Gives probability similar to Poisson distribution when events do not occur at a constant rate and occurrence rate is a random variable that follows a gamma distribution.
Example: Distribution of number of cavities for a group of dental patients.
Comments: Generalization of Pascal distribution when s is not an integer. Many authors do not distinguish between Pascal and negative binomial distributions.
Application: Gives probability of exactly x independent occurrences during a given period of time if events take place independently and at a constant rate. May also represent number of occurrences over constant areas or volumes. Used frequently in quality control, reliability, queuing theory, and so on.
Example: Used to represent distribution of number of defects in a piece of material, customer arrivals, insurance claims, incoming telephone calls, alpha particles emitted, and so on.
Comments: Frequently used as approximation to binomial distribution.
Application: A basic distribution of statistics. Many applications arise from central limit theorem (average of values of n observations approaches normal distribution, irrespective of form of original distribution under quite general conditions). Consequently, appropriate model for many, but not all, physical phenomena.
Example: Distribution of physical measurements on living organisms, intelligence test scores, product dimensions, average temperatures, and so on.
Comments: Many methods of statistical analysis presume normal distribution.
A so-called Generalized Gaussian distribution has the following pdf:
A.exp[-B|x| n ], where A, B, n are constants. For n=1 and 2 it is Laplacian and Gaussian distribution respectively. This distribution approximates reasonably good data in some image coding application.
Slash distribution is the distribution of the ratio of a normal random variable to an independent uniform random variable, see Hutchinson T. Continuous Bivariate Distributions. Rumsby Sci. Publications, 1990.
Application: A basic distribution of statistics for variables bounded at one side - for example x greater than or equal to zero. Gives distribution of time required for exactly k independent events to occur, assuming events take place at a constant rate. Used frequently in queuing theory, reliability, and other industrial applications.
Example: Distribution of time between re calibrations of instrument that needs re calibration after k uses; time between inventory restocking, time to failure for a system with standby components.
Comments: Erlangian, exponential, and chi- square distributions are special cases. The Dirichlet is a multidimensional extension of the Beta distribution.
Distribution of a product of iid uniform (0, 1) random? Like many problems with products, this becomes a familiar problem when turned into a problem about sums. If X is uniform (for simplicity of notation make it U(0,1)), Y=-log(X) is exponentially distributed, so the log of the product of X1, X2. Xn is the sum of Y1, Y2. Yn which has a gamma (scaled chi-square) distribution. Thus, it is a gamma density with shape parameter n and scale 1.
Application: Gives distribution of time between independent events occurring at a constant rate. Equivalently, probability distribution of life, presuming constant conditional failure (or hazard) rate. Consequently, applicable in many, but not all reliability situations.
Example: Distribution of time between arrival of particles at a counter. Also life distribution of complex nonredundant systems, and usage life of some components - in particular, when these are exposed to initial burn-in, and preventive maintenance eliminates parts before wear-out.
Comments: Special case of both Weibull and gamma distributions.
Application: A basic distribution of statistics for variables bounded at both sides - for example x between o and 1. Useful for both theoretical and applied problems in many areas.
Example: Distribution of proportion of population located between lowest and highest value in sample; distribution of daily per cent yield in a manufacturing process; description of elapsed times to task completion (PERT).
Comments: Uniform, right triangular, and parabolic distributions are special cases. To generate beta, generate two random values from a gamma, g 1. g 2. The ratio g 1 /(g 1 +g 2 ) is distributed like a beta distribution. The beta distribution can also be thought of as the distribution of X1 given (X1+X2), when X1 and X2 are independent gamma random variables.
There is also a relationship between the Beta and Normal distributions. The conventional calculation is that given a PERT Beta with highest value as b lowest as a and most likely as m, the equivalent normal distribution has a mean and mode of (a + 4M + b)/6 and a standard deviation of (b - a)/6.
See Section 4.2 of, Introduction to Probability by J. Laurie Snell (New York, Random House, 1987) for a link between beta and F distributions (with the advantage that tables are easy to find).
Application: Gives probability that observation will occur within a particular interval when probability of occurrence within that interval is directly proportional to interval length.
Example: Used to generate random valued.
Comments: Special case of beta distribution.
The density of geometric mean of n independent uniforms(0,1) is:
P(X=x) = n x (n-1) (Log[1/x n ]) (n-1) / (n-1).
z L = [U L -(1-U) L ]/L is said to have Tukey's symmetrical l -distribution.
Application: Permits representation of random variable whose logarithm follows normal distribution. Model for a process arising from many small multiplicative errors. Appropriate when the value of an observed variable is a random proportion of the previously observed value.
In the case where the data are lognormally distributed, the geometric mean acts as a better data descriptor than the mean. The more closely the data follow a lognormal distribution, the closer the geometric mean is to the median, since the log re-expression produces a symmetrical distribution.
Example: Distribution of sizes from a breakage process; distribution of income size, inheritances and bank deposits; distribution of various biological phenomena; life distribution of some transistor types.
The ratio of two log-normally distributed variables is log-normal.
Application: Gives distribution of radial error when the errors in two mutually perpendicular axes are independent and normally distributed around zero with equal variances.
Example: Bomb-sighting problems; amplitude of noise envelope when a linear detector is used.
Comments: Special case of Weibull distribution.
Application: Gives distribution of ratio of two independent standardized normal variates.
Example: Distribution of ratio of standardized noise readings; distribution of tan(x) when x is uniformly distributed.
The probability density curve of a chi-square distribution is asymmetric curve stretching over the positive side of the line and having a long right tail. The form of the curve depends on the value of the degrees of freedom.Applications: The most widely applications of Chi-square distribution are:
- Chi-square Test for Association is a (non-parametric, therefore can be used for nominal data) test of statistical significance widely used bivariate tabular association analysis. Typically, the hypothesis is whether or not two different populations are different enough in some characteristic or aspect of their behavior based on two random samples. This test procedure is also known as the Pearson chi-square test.
Application: General time-to-failure distribution due to wide diversity of hazard-rate curves, and extreme-value distribution for minimum of N values from distribution bounded at left.
The Weibull distribution is often used to model "time until failure." In this manner, it is applied in actuarial science and in engineering work.
It is also an appropriate distribution for describing data corresponding to resonance behavior, such as the variation with energy of the cross section of a nuclear reaction or the variation with velocity of the absorption of radiation in the Mossbauer effect.
Example: Life distribution for some capacitors, ball bearings, relays, and so on.
Comments: Rayleigh and exponential distribution are special cases.
Application: Limiting model for the distribution of the maximum or minimum of N values selected from an "exponential-type" distribution, such as the normal, gamma, or exponential.
Example: Distribution of breaking strength of some materials, capacitor breakdown voltage, gust velocities encountered by airplanes, bacteria extinction times.
The t distributions were discovered in 1908 by William Gosset who was a chemist and a statistician employed by the Guinness brewing company. He considered himself a student still learning statistics, so that is how he signed his papers as pseudonym "Student". Or perhaps he used a pseudonym
due to "trade secrets" restrictions by Guinness.
Note that there are different t distributions, it is a class of distributions. When we speak of a specific t distribution, we have to specify the degrees of freedom. The t density curves are symmetric and bell-shaped like the normal distribution and have their peak at 0. However, the spread is more than that of the standard normal distribution. The larger the degrees of freedom, the closer the t-density is to the normal density.
Why Is Every Thing Priced One Penny Off the Dollar?
Here's a psychological answer. Due to a very limited data processing ability we humans rely heavily on categorization (e.g. seeing things as "black or white" requires just a binary coding scheme, as opposed to seeing the many shades of gray). Our number system has a major category of 100's (e.g. 100 pennies, 200 pennies, 300 pennies) and there is a affective response associated with these groups--more is better if you are getting them; more is bad if you are giving them. Advertising and pricing takes advantage of this limited data processing by $2.99, $3.95, etc. So that $2.99 carries the affective response associated with the 200 pennies group. Indeed, if you ask people to respond to "how close together" are 271 & 283 versus "how close together" are 291 & 303, the former are seen as closer (there's a lot of methodology set up to dissuade the subjects to just subtract the smaller from the larger). Similarly, prejudice, job promotions, competitive sports, and a host of other activates attempt to associate large qualitative differences with what are often minor quantitative differences, e.g. gold metal in Olympic swimming event may be milliseconds difference from no metal.
Yet another motivation: Psychologically $9.99 might look better than $10.00, but there is a more basic reason too. The assistant has to give you change from your ten dollars, and has to ring the sale up through his/her cash register to get at the one cent. This forces the transaction to go through the books, you get a receipt, and the assistant can't just pocket the $10 him/herself. Mind you, there's nothing to stop a particularly untrustworthy employee going into work with a pocketful of cents.
There's sales tax for that. For either price (at least in the US), you'll have to pay sales tax too. So that solves the problem of opening the cash register. That, plus the security cameras ;).
There has been some research in marketing theory on the consumer's behavior at particular price points. Essentially, these are tied up with buyer expectations based on prior experience. A critical case study in UK on price pointing of pantyhose (tights) shown that there were distinct demand peaks at buyer anticipated price points of 59p, 79p, 99p, Ј1.29 and so on. Demand at intermediate price points was dramatically below these anticipated points for similar quality goods. In the UK, for example, prices of wine are usually set at key price points. The wine retailers also confirm that sales at different prices (even a penny or so different) does result in dramatically different sales volumes.
Other studies showed the opposite where reduced price showed reduced sales volumes, consumers ascribing quality in line with price. However, it is not fully tested to determine if sales volume continued to increase with price.
Other similar research turns on the behavior of consumers to variations in price. The key issue here is that there is a Just Noticeable Difference (JND) below which consumers will not act on a price increase. This has practical application when increasing charge rates and the like. The JND is typically 5% and this provides the opportunity for consultants etc to increase prices above prior rates by less than the JND without customer complaint. As an empirical experiment, try overcharging clients by 1, 2. 5, 6% and watch the reaction. Up to 5% there appears to be no negative impact.
Conversely, there is no point in offering a fee reduction of less than 5% as clients will not recognize the concession you have made. Equally, in periods of price inflation, price rises should be staged so that the individual price rise is kept under 5%, perhaps by raising prices by 4% twice per year rather than a one off 8% rise.
A Short History of Probability and Statistics
The original idea of "statistics" was the collection of information about and for the "state". The word statistics drives directly not from any classical Greek or Latin roots, but from the Italian word for state.
The birth of statistics occurred in mid-17 th century. A commoner, named John Graunt, who was a native of London, begin reviewing a weekly church publication issued by the local parish clerk that listed the number of births, christenings, and deaths in each parish. These so called Bills of Mortality also listed the causes of death. Graunt who was a shopkeeper organized this data in the forms we call descriptive statistics, which was published as Natural and Political Observation Made upon the Bills of Mortality. Shortly thereafter, he was elected as a member of Royal Society. Thus, statistics has to borrow some concepts from sociology, such as the concept of "Population". It has been argued that since statistics usually involves the study of human behavior, it cannot claim the precision of the physical sciences.
Probability has much longer history. Probability is derived from the verb to probe meaning to "find out" what is not too easily accessible or understandable. The word "proof" has the same origin that provides necessary details to understand what is claimed to be true.
Probability originated from the study of games of chance and gambling during the sixteenth century. Probability theory was a branch of mathematics studied by Blaise Pascal and Pierre de Fermat in the seventeenth century. Currently; in 21 st century, probabilistic modeling are used to control the flow of traffic through a highway system, a telephone interchange, or a computer processor; find the genetic makeup of individuals or populations; quality control; insurance; investment; and other sectors of business and industry.
New and ever growing diverse fields of human activities are using statistics; however, it seems that this field itself remains obscure to the public. Professor Bradley Efron expressed this fact nicely: During the 20 th Century statistical thinking and methodology have become the scientific framework for literally dozens of fields including education, agriculture, economics, biology, and medicine, and with increasing influence recently on the hard sciences such as astronomy, geology, and physics. In other words, we have grown from a small obscure field into a big obscure field.
Daston L. Classical Probability in the Enlightenment. Princeton University Press, 1988.
The book points out that early Enlightenment thinkers could not face uncertainty. A mechanistic, deterministic machine, was the Enlightenment view of the world.
Gillies D. Philosophical Theories of Probability. Routledge, 2000. Covers the classical, logical, subjective, frequency, and propensity views.
Hacking I. The Emergence of Probability. Cambridge University Press, London, 1975. A philosophical study of early ideas about probability, induction and statistical inference.
Peters W. Counting for Something: Statistical Principles and Personalities. Springer, New York, 1987. It teaches the principles of applied economic and social statistics in a historical context. Featured topics include public opinion polls, industrial quality control, factor analysis, Bayesian methods, program evaluation, non-parametric and robust methods, and exploratory data analysis.
Porter T. The Rise of Statistical Thinking. 1820-1900, Princeton University Press, 1986. The author states that statistics has become known in the twentieth century as the mathematical tool for analyzing experimental and observational data. Enshrined by public policy as the only reliable basis for judgments as the efficacy of medical procedures or the safety of chemicals, and adopted by business for such uses as industrial quality control, it is evidently among the products of science whose influence on public and private life has been most pervasive. Statistical analysis has also come to be seen in many scientific disciplines as indispensable for drawing reliable conclusions from empirical results.This new field of mathematics found so extensive a domain of applications.
Stigler S. The History of Statistics: The Measurement of Uncertainty Before 1900. U. of Chicago Press, 1990. It covers the people, ideas, and events underlying the birth and development of early statistics.
Tankard J. The Statistical Pioneers. Schenkman Books, New York, 1984.
This work provides the detailed lives and times of theorists whose work continues to shape much of the modern statistics.
Different Schools of Thought in Statistics
There are few different schools of thoughts in statistics. They are introduced sequentially in time by necessity.
The Birth Process of a New School of Thought
The process of devising a new school of thought in any field has always taken a natural path. Birth of new schools of thought in statistics is not an exception. The birth process is outlined below:
Given an already established school, one must work within the defined framework.
A crisis appears, i.e. some inconsistencies in the framework result from its own laws.
- Reluctance to consider the crisis.
- Try to accommodate and explain the crisis within the existing framework.
- Conversion of some well-known scientists attracts followers in the new school.
The perception of a crisis in statistical community calls forth demands for "foundation-strengthens". After the crisis is over, things may look different and historians of statistics may cast the event as one in a series of steps in "building upon a foundation". So we can read histories of statistics, as the story of a pyramid built up layer by layer on a firm base over time.
Other schools of thought are emerging to extend and "soften" the existing theory of probability and statistics. Some "softening" approaches utilize the concepts and techniques developed in the fuzzy set theory, the theory of possibility, and Dempster-Shafer theory.
The following Figure illustrates the three major schools of thought; namely, the Classical (attributed to Laplace ), Relative Frequency (attributed to Fisher ), and Bayesian (attributed to Savage ). The arrows in this figure represent some of the main criticisms among Objective, Frequentist, and Subjective schools of thought. To which school do you belong? Read the conclusion in this figure.
What Type of Statistician Are You?
Click on the image to enlarge it
Further Readings :
Plato, Jan von, Creating Modern Probability. Cambridge University Press, 1994. This book provides a historical point of view on subjectivist and objectivist probability school of thoughts.
Press S. and J. Tanur, The Subjectivity of Scientists and the Bayesian Approach. Wiley, 2001. Comparing and contrasting the reality of subjectivity in the work of history's great scientists and the modern Bayesian approach to statistical analysis.
Weatherson B. Begging the question and Bayesians, Studies in History and Philosophy of Science. 30(4), 687-697, 1999.
Bayesian, Frequentist, and Classical Methods
The problem with the Classical Approach is that what constitutes an outcome is not objectively determined. One person's simple event is another person's compound event. One researcher may ask, of a newly discovered planet, "what is the probability that life exists on the new planet?" while another may ask "what is the probability that carbon-based life exists on it?"
Bruno de Finetti, in the introduction to his two-volume treatise on Bayesian ideas, clearly states that "Probabilities Do not Exist". By this he means that probabilities are not located in coins or dice; they are not characteristics of things like mass, density, etc.
Some Bayesian approaches consider probability theory as an extension of deductive logic (including dialogue logic, interrogative logic, informal logic, and artificial intelligence) to handle uncertainty. It purports to deduce from first principles the uniquely correct way of representing your beliefs about the state of things, and updating them in the light of the evidence. The laws of probability have the same status as the laws of logic. These Bayesian approaches are explicitly "subjective" in the sense that they deal with the plausibility which a rational agent ought to attach to the propositions he/she considers, "given his/her current state of knowledge and experience." By contrast, at least some non-Bayesian approaches consider probabilities as "objective" attributes of things (or situations) which are really out there (availability of data).
A Bayesian and a classical statistician analyzing the same data will generally reach the same conclusion. However, the Bayesian is better able to quantify the true uncertainty in his analysis, particularly when substantial prior information is available. Bayesians are willing to assign probability distribution function(s) to the population's parameter(s) while frequentists are not.
From a scientist's perspective, there are good grounds to reject Bayesian reasoning. The problem is that Bayesian reasoning deals not with objective, but subjective probabilities. The result is that any reasoning using a Bayesian approach cannot be publicly checked -- something that makes it, in effect, worthless to science, like non replicative experiments.
Bayesian perspectives often shed a helpful light on classical procedures. It is necessary to go into a Bayesian framework to give confidence intervals the probabilistic interpretation which practitioners often want to place on them. This insight is helpful in drawing attention to the point that another prior distribution would lead to a different interval.
A Bayesian may cheat by basing the prior distribution on the data; a Frequentist can base the hypothesis to be tested on the data. For example, the role of a protocol in clinical trials is to prevent this from happening by requiring the hypothesis to be specified before the data are collected. In the same way, a Bayesian could be obliged to specify the prior in a public protocol before beginning a study. In a collective scientific study, this would be somewhat more complex than for Frequentist hypotheses because priors must be personal for coherence to hold.
A suitable quantity that has been proposed to measure inferential uncertainty; i.e. to handle the a priori unexpected, is the likelihood function itself.
If you perform a series of identical random experiments (e.g. coin tosses), the underlying probability distribution that maximizes the probability of the outcome you observed is the probability distribution proportional to the results of the experiment.
This has the direct interpretation of telling how (relatively) well each possible explanation (model), whether obtained from the data or not, predicts the observed data. If the data happen to be extreme ("atypical") in some way, so that the likelihood points to a poor set of models, this will soon be picked up in the next rounds of scientific investigation by the scientific community. No long run frequency guarantee nor personal opinions are required.
There is a sense in which the Bayesian approach is oriented toward making decisions and the frequentist hypothesis testing approach is oriented toward science. For example, there may not be enough evidence to show scientifically that agent X is harmful to human beings, but one may be justified in deciding to avoid it in one's diet.
In almost all cases, a point estimate is a continuous random variable. Therefore, the probability that the probability is any specific point estimate is really zero. This means that in a vacuum of information, we can make no guess about the probability. Even if we have information, we can really only guess at a range for the probability.
Therefore, in estimating a parameter of a given population, it is necessary that a point estimate accompanied by some measure of possible error of the estimate. The widely acceptable approach is that a point estimate must be accompanied by some interval about the estimate with some measure of assurance that this interval contains the true value of the population parameter. For example, the reliability assurance processes in manufacturing industries are based on data driven information for making product-design decisions.
Objective Bayesian: There is a clear connection between probability and logic: both appear to tell us how we should reason. But how, exactly, are the two concepts related? Objective Bayesians offers one answer to this question. According to objective Bayesians, probability generalizes deductive logic: deductive logic tells us which conclusions are certain, given a set of premises, while probability tells us the extent to which one should believe a conclusion, given the premises certain conclusions being awarded full degree of belief. According to objective Bayesians, the premises objectively (i.e. uniquely) determine the degree to which one should believe a conclusion.
Further Readings :
Bernardo J. and A. Smith, Bayesian Theory, Wiley, 2000.
Congdon P. Bayesian Statistical Modelling, Wiley, 2001.
Corfield D. and J. Williamson, Foundations of Bayesianism. Kluwer Academic Publishers, 2001. Contains Logic, Mathematics, Decision Theory, and Criticisms of Bayesianism.
Land F. Operational Subjective Statistical Methods. Wiley, 1996. Presents a systematic treatment of subjectivist methods along with a good discussion of the historical and philosophical backgrounds of the major approaches to probability and statistics.
Press S. Subjective and Objective Bayesian Statistics: Principles, Models, and Applications. Wiley, 2002.
Zimmerman H. Fuzzy Set Theory. Kluwer Academic Publishers, 1991. Fuzzy logic approaches to probability (based on L.A. Zadeh and his followers) present a difference between "possibility theory" and probability theory.
Rumor, Belief, Opinion, and Fact
Statistics is the science of decision making under uncertainty, which must be based on facts not on rumors, personal opinion, nor on belief.
As a necessity the human rational strategic thinking has evolved to cope with his/her environment. The rational strategic thinking which we call reasoning is another means to make the world calculable, predictable, and more manageable for the utilitarian purposes. In constructing a model of reality, factual information is therefore needed to initiate any rational strategic thinking in the form of reasoning. However, we should not confuse facts with beliefs, opinions, or rumors. The following table helps to clarify the distinctions:
Rumor, Belief, Opinion, and Fact