the science of collecting, analyzing, presenting, and interpreting data. Governmental needs for census data as well as information about a variety of economic activities provided much of the early impetus for the field of statistics. Currently the need to turn the large amounts of data available in many applied fields into useful information has stimulated both theoretical and practical developments in statistics. Data are the facts and figures that are collected, analyzed, and summarized for presentation and interpretation. Data may be classified as either quantitative or qualitative. Quantitative data measure either how much or how many of something, and qualitative data provide labels, or names, for categories of like items. For example, suppose that a particular study is interested in characteristics such as age, gender, marital status, and annual income for a sample of 100 individuals. These characteristics would be called the variables of the study, and data values for each of the variables would be associated with each individual. Thus, the data values of 28, male, single, and $30,000 would be recorded for a 28-year-old single male with an annual income of $30,000. With 100 individuals and 4 variables, the data set would have 100 4 = 400 items. In this example, age and annual income are quantitative variables; the corresponding data values indicate how many years and how much money for each individual. Gender and marital status are qualitative variables. The labels male and female provide the qualitative data for gender, and the labels single, married, divorced, and widowed indicate marital status. Sample survey methods are used to collect data from observational studies, and experimental design methods are used to collect data from experimental studies. The area of descriptive statistics is concerned primarily with methods of presenting and interpreting data using graphs, tables, and numerical summaries. Whenever statisticians use data from a samplei.e., a subset of the populationto make statements about a population, they are performing statistical inference. Estimation and hypothesis testing are procedures used to make statistical inferences. Fields such as health care, biology, chemistry, physics, education, engineering, business, and economics make extensive use of statistical inference. Methods of probability were developed initially for the analysis of gambling games. Probability plays a key role in statistical inference; it is used to provide measures of the quality and precision of the inferences. Many of the methods of statistical inference are described in this article. Some of these methods are used primarily for single-variable studies, while others, such as regression and correlation analysis, are used to make inferences about relationships among two or more variables. science of making valid inferences about the characteristics of a group of persons or objects on the basis of numerical information obtained from a randomly selected sample of the group. There are two broad subdivisions of this subject: descriptive statistics and theoretical statistics. The principal descriptive quantity derived from sample data is the mean, which is the arithmetic average of the sample data. It serves as the most reliable single measure of the value of a typical member of the sample. If the sample contains a few values that are so large or so small that they have an exaggerated effect on the value of the mean, the sample is more accurately represented by the median, the value that half the sample values fall below and half above. As measures of the dispersion of the values about their mean, the quantities most commonly used are the variance and its square root, the standard deviation. The variance is calculated by determining the mean, subtracting it from each of the sample values (yielding the deviation of the samples), and then averaging the squares of these deviations. The mean and standard deviation of the sample are used as estimates of the corresponding characteristics of the entire group from which the sample was drawn. They do not, in general, completely describe the distribution of values within either the sample or the parent group; indeed, different distributions may have the same mean and standard deviation. They do, however, provide a complete description of the so-called normal distribution, in which positive and negative deviations from the mean are equally common and small deviations are much more common than large ones. For a normally distributed set of values, a graph showing the dependence of the frequency of the deviations upon their magnitudes is a bell-shaped curve. About 68 percent of the values will differ from the mean by less than the standard deviation, and almost 100 percent will differ by less than three times the standard deviation. The theory of statistics is grounded in mathematical probability and in idealized concepts of the group under study, called the population, and the sample. The statistician may view the population as a set of balls from which the sample is selected at random, that is, in such a way that each ball has the same chance as every other one for inclusion in the sample. The characteristic of interest in the population is idealized as a physical property of the balls; for example, they may be of two colours, red and blue. As an illustration, suppose one is studying opinions on a certain issue, and the characteristic is described as the favouring of an associated policy. The members of the population having this characteristic may be identified with the red balls, and those not having it may be identified with the blue balls. The problem under study is usually stated in the form of a question about the proportions of balls having specified colours; for example, one may wish to test whether a majority of the population is in favour of the policy under consideration. The model described above has been studied in the context of probability theory since the 17th century. It has been shown that when the sample is drawn at random, the membership of the sample is governed by the composition of the population according to well-determined laws of probability. Statistics makes use of these laws by devising methods of inferring the composition of the population from that of the sample. The theory of statistics makes it possible to evaluate the performance of a statistical procedure in terms of the proportions of samples leading to a correct conclusion. Inferences made in statistics are of two types. The first is estimation, which involves the determination, with a possible error due to sampling, of the unknown value of a population characteristic, such as the proportion having a specific attribute or the average value of some numerical measurement. Estimates of population characteristics are generally accompanied by the standard errors of the estimates; these are margins that determine the possible errors arising from the fact that the estimates are based on random samples and not on a complete population census. The second type of inference is hypothesis testing. It involves the definitions of a hypothesis as one set of possible population values and an alternative, a different set. There are many statistical procedures for determining on the basis of a sample whether the true population characteristic belongs to the set of values in the hypothesis or the alternative. Statistics is used in every type of scientific work and in much commercial and industrial work. For very large populations, the size of the sample needed for standard statistical procedures is entirely independent of the size of the underlying population. This is illustrated in a very dramatic way in general elections for public office. Statisticians are able to make very accurate estimates of the outcome of the election on the basis of very small sample returns. Additional reading General works Overviews are provided in David R. Anderson, Dennis J. Sweeney, and Thomas A. Williams, Introduction to Statistics: Concepts and Applications, 3rd ed. (1994), an introductory treatment with modest mathematical prerequisites; Judith M. Tanur et al., Statistics: A Guide to the Unknown, 3rd ed. (1989), containing a variety of statistical applications on topics of interest to the general reader; David Freedman et al., Statistics, 2nd ed. (1991), an innovative treatment of a variety of topics at the introductory level; William Mendenhall, Dennis D. Wackerly, and Richard L. Schaeffer, Mathematical Statistics with Applications, 4th ed. (1990), a solid foundation in statistical theory with real-world applications; Robert V. Hogg and Allen T. Craig, Introduction to Mathematical Statistics, 4th ed. (1978), a comprehensive presentation of the fundamentals and underlying concepts of mathematical statistics; Alexander M. Mood, Franklin A. Graybill, and Duane C. Boes, Introduction to the Theory of Statistics, 3rd ed. (1974), which offers a comprehensive introduction to classical statistical theory; John Freund, Mathematical Statistics, 5th ed. (1992), an introductory text that assumes a knowledge of calculus; David S. Moore and George P. McCabe, Introduction to the Practice of Statistics, 2nd ed. (1993); John Neter, William Wasserman, and G.A. Whitmore, Applied Statistics, 4th ed. (1992), a fairly rigorous introductory textbook; and George W. Snedecor and William G. Cochran, Statistical Methods, 8th ed. (1989), a comprehensive introduction to the fundamentals of statistical methods for data analysis. Harry V. Roberts, Data Analysis for Managers with MINITAB, 2nd ed. (1991); and Barbara F. Ryan, Brian L. Joiner, and Thomas A. Ryan, Jr., MINITAB Handbook, 2nd ed., rev. (1992), discuss the popular MINITAB statistical software package and its application. Descriptive statistics John W. Tukey, Exploratory Data Analysis (1977), is the classic text on the subject. Other studies include Richard P. Runyon, Descriptive and Inferential Statistics: A Contemporary Approach (1977); Frederick Hartwig and Brian E. Dearing, Exploratory Data Analysis (1979); David C. Hoaglin, Frederick Mosteller, and John W. Tukey (eds.), Understanding Robust and Exploratory Data Analysis (1983); S.H.C. Du Toit, A.G.W. Steyn, and R.H. Stumpf, Graphical Exploratory Data Analysis (1986); and Herman J. Loether and Donald G. McTavish, Descriptive and Inferential Statistics: An Introduction, 4th ed. (1993). Probability Lawrence B. Mohr, Understanding Significance Testing (1990), provides a brief overview. More in-depth treatments are provided by William Feller, An Introduction to Probability Theory and Its Applications, 2nd ed., vol. 2 (1971), a classic treatment of probability at a rigorous mathematical level; Samuel Kotz and Norman L. Johnson (eds.), Encyclopedia of Statistical Sciences (1982 ); J.G. Kalbfleisch, Probability and Statistical Inference, 2nd ed., 2 vol. (1985); H.T. Nguyen and G.S. Rogers, Fundamentals of Mathematical Statistics, 2 vol. (1989); and Robert V. Hogg and Elliot A. Tanis, Probability and Statistical Inference, 4th ed. (1993). Estimation and hypothesis testing Discussions of these topics are found in general statistical texts, especially those by Anderson, Sweeney, and Williams; by Mendenhall, Wackerly, and Schaeffer; by Moore and McCabe; and by Neter, Wasserman, and Whitmore, all cited above in the general works section. Bayesian methods Treatments of this topic include Peter M. Lee, Bayesian Statistics: An Introduction (1989), a comprehensive introductory text on Bayesian statistics; James S. Press, Bayesian Statistics: Principles, Models, and Applications (1989), a comprehensive introductory treatment of the underlying theory and practical applications of Bayesian statistics; James O. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd ed. (1985), a comprehensive discussion of the basic issues and principles of Bayesian analysis and decision theory; George E.P. Box and George C. Tiao, Bayesian Inference in Statistical Analysis (1973, reissued 1992), an exploration of the use of Bayes's theorem in scientific problems; Howard Raiffa, Decision Analysis: Introductory Lectures on Choices Under Uncertainty (1968), which contains illustrative examples in decision analysis in the face of uncertainty; and J.Q. Smith, Decision Analysis: A Bayesian Approach (1988). Experimental design Douglas C. Montgomery, Design and Analysis of Experiments, 3rd ed. (1991), an introductory text, is directed to individuals with a moderate statistical background and contains many engineering applications. Charles R. Hicks, Fundamental Concepts in the Design of Experiments, 3rd ed. (1982), comprehensively treats the fundamental concepts of experimental design. William G. Cochran and Gertrude M. Cox, Experimental Designs, 2nd ed. (1992), provides a detailed account of the most useful experimental designs and the situations under which they are most suitable. B.J. Winer, Donald R. Brown, and Kenneth M. Michaels, Statistical Principles in Experimental Design, 3rd ed. (1991), is a comprehensive reference written for those doing research primarily in the biological and behavioral sciences. Steven R. Brown and Lawrence E. Melamed, Experimental Design and Analysis (1990), is also useful.Introductory works on general linear models include Franklin A. Graybill, Theory and Application of the Linear Model (1976), an introductory treatment of linear models for experimenters and statistical consultants; Irwin Guttman, Linear Models: An Introduction (1982); and Annette J. Dobson, An Introduction to Generalized Linear Models (1990). Various aspects are discussed in S.R. Searle, Linear Models (1971), a comprehensive description of general procedures for the estimation of a hypothesis that tests for linear models with an emphasis on unbalanced data; Frederick Mosteller and John W. Tukey, Data Analysis and Regression: A Second Course in Statistics (1977); Cuthbert Daniel, Fred S. Wood, and John W. Gorman, Fitting Equations to Data: Computer Analysis of Multifactor Data, 2nd ed. (1980); N.R. Draper and H. Smith, Applied Regression Analysis, 2nd ed. (1981), a development of regression analysis with an emphasis on practical applications, although theoretical results are stated without proof; Thomas H. Wonnacott and Ronald J. Wonnacott, Regression: A Second Course in Statistics (1981); R. Dennis Cook and Sanford Weisberg, Residuals and Influence in Regression (1982); R.R. Hocking, The Analysis of Linear Models (1985); Ronald Christensen, Plane Answers to Complex Questions: The Theory of Linear Models (1987), a comprehensive description of the application of the projective approach to linear models, and Linear Models for Multivariate, Time Series, and Spatial Data (1991); David G. Kleinbaum, Lawrence L. Kupper, and Keith E. Miller, Applied Regression Analysis and Other Multivariable Methods, 2nd ed. (1988); Bruce L. Bowerman and Richard T. O'Connell, Linear Statistical Models: An Applied Approach, 2nd ed. (1990), targeted to the fields of business, science, and engineering; John Neter, William Wasserman, and Michael H. Kutner, Applied Linear Statistical Models, 3rd ed. (1990), a comprehensive, applications-oriented text that presents some theoretical concepts; and Samprit Chatterjee and Bertram Price, Regression Analysis by Example, 2nd ed. (1991).Multivariate methods are presented in Donald F. Morrison, Multivariate Statistical Methods, 3rd ed. (1990), an elementary resource written for those in the behavioral and life sciences, outlines how to apply multivariate techniques in data analysis. Richard A. Johnson and Dean W. Wichern, Applied Multivariate Statistical Analysis, 3rd ed. (1992), presents multivariate methods comprehensively with an emphasis on applications aimed at readers with a beginning to intermediate background in statistics. William R. Dillon and Matthew Goldstein, Multivariate Analysis: Methods and Applications (1984), an applications-oriented text, is aimed at practitioners who need not deal with the underlying mathematical concepts. Ronald Christensen, Log-Linear Models (1990), a thorough description of log-linear models for contingency tables, is designed to fill a niche between elementary and advanced texts. Yvonne M.M. Bishop, Stephen E. Fienberg, and Paul W. Holland, Discrete Multivariate Analysis (1975), is a comprehensive reference with an emphasis on both theory and practical examples. Brian S. Everitt and Graham Dunn, Applied Multivariate Data Analysis (1992); and J.D. Jobson, Applied Multivariate Data Analysis, 2 vol. (199192), may also be consulted. Time series and forecasting Studies include John J. McAuley, Economic Forecasting for Business: Concepts and Applications (1986); Paul Newbold and Theodore Bos, Introductory Business Forecasting (1990); Spyros Makridakis and Steven C. Wheelwright, The Handbook of Forecasting: A Manager's Guide, 2nd ed. (1987), and Forecasting Methods for Management, 5th ed. (1989); Joan Callahan Compton and Stephen B. Compton, Successful Business Forecasting (1990); Spyros Makridakis, Forecasting, Planning, and Strategy for the 21st Century (1990); Bruce L. Bowerman and Richard T. O'Connell, Forecasting and Time Series: An Applied Approach, 3rd ed. (1993); Peter J. Brockwell and Richard A. Davis, Time Series: Theory and Methods, 2nd ed. (1991), a discussion of the specific techniques for handling time series data along with their mathematical basis; George E.P. Box, Gwilym M. Jenkins, and Gregory C. Reinsel, Time Series Analysis: Forecasting and Control, 3rd ed. (1994), a classic text that derives time series models and discusses areas of application; and Alan Pankratz, Forecasting with Univariate Box-Jenkins Models (1983), which presents concepts of the univariate Box-Jenkins methods in such a way that readers need not have a sophisticated mathematical background. Nonparametric methods E.L. Lehmann and H.J.M. D'Abrera, Nonparametrics: Statistical Methods Based on Ranks (1975), a classic book, provides an introduction to nonparametric methods for the analysis and planning of comparative studies. Jean Dickinson Gibbons, Nonparametric Statistics (1993), is also an introduction. Sidney Siegel and N. John Castellan, Jr., Nonparametric Statistics for the Behavioral Sciences, 2nd ed. (1988), focuses on a step-by-step treatment of how to implement nonparametric statistical tests. W.J. Conover, Practical Nonparametric Statistics, 2nd ed. (1980), is a comprehensive treatment at a moderate mathematical level. Further discussions can be found in Wayne W. Daniel, Applied Nonparametric Statistics, 2nd ed. (1990); and P. Sprent, Applied Nonparametric Statistical Methods, 2nd ed. (1993). Statistical quality control Introductions are provided by Donald J. Wheeler and David S. Chambers, Understanding Statistical Process Control, 2nd ed. (1992); Thomas Pyzdek, Pyzdek's Guide to SPC, vol. 1, Fundamentals (1989), a complete introduction to problem solving using SPC; Ellis R. Ott and Edward G. Schilling, Process Quality Control: Troubleshooting and Interpretation of Data, 2nd ed. (1990), a classic reference on using statistics for quality problem solving; and John T. Burr, SPC Tools for Everyone (1993). More advanced treatments include Douglas C. Montgomery, Introduction to Statistical Quality Control, 2nd ed. (1991), on control charts, designed experiments, and acceptance sampling; and Thomas P. Ryan, Statistical Methods for Quality Improvement (1989), on control charts and other graphical and statistical methods. Special aspects of statistical quality control are presented in Richard B. Clements, Handbook of Statistical Methods in Manufacturing (1991), a comprehensive reference for manufacturing applications with a focus on quality presented in a how-to framework; James R. Evans and William M. Lindsay, The Management and Control of Quality, 2nd ed. (1993), a textbook written for business curricula that covers both technical and managerial issues of quality; and Frank C. Kaminsky, Robert D. Davis, and Richard J. Burke, Statistics and Quality Control for the Workplace (1993). W. Edwards Deming, The New Economics (1993), emphasizes systems and statistical thinking. Sample survey methods Richard L. Scheaffer, William Mendenhall, and Lyman Ott, Elementary Survey Sampling, 4th ed. (1990), is an elementary treatment of the basic issues concerning sample designs. Morris H. Hansen, William N. Hurwitz, and William G. Madow, Sample Survey Methods and Theory, 2 vol. (1953), serves as a practical guide for designers of sample surveys (vol. 1), and gives a comprehensive presentation of sampling theory (vol. 2). Donald P. Warwick and Charles A. Lininger, The Sample Survey: Theory and Practice (1975), provides a comprehensive introduction to the design and execution of sample surveys. Leslie Kish, Survey Sampling (1965), comprehensively treats the use of sampling methods in the social and behavioral sciences. William G. Cochran, Sampling Techniques, 3rd ed. (1977), contains a comprehensive treatment of sampling methods with an emphasis on theory. Vic Barnett, Sample Survey Principles and Methods (1991), is also of interest. Decision analysis Works on this topic include John W. Pratt, Howard Raiffa, and Robert Schlaifer, Introduction to Statistical Decision Theory (1995), a thorough treatment; and the books by Berger; Raiffa; and Smith, all cited in the section on Bayesian methods above. David R. Anderson Dennis J. Sweeney Thomas A. Williams

# STATISTICS

## Meaning of STATISTICS in English

Britannica English vocabulary. Английский словарь Британика. 2012