06-19, 2011

How to deal with data with STATA

Abstract

留学生MBAdissertationTable of Contents

1 Introduction

1.1 Research motives

Stata is a software package designed for data management and statistical analysis, which is developed by the United States Computer Resource Centre. Stata is a software package designed for data management and statistical analysis, which is developed by the United States Computer Resource Centre. Stata is a prominent characteristic of only a few occupied disk spaces, concise output, the method chosen as advanced as a complete and the graphics are beautifully produced, directly or graphics processing software such as word processing software, and word direct call. But how do we deal with data sets with Stata is the primary target of the course work.

1.2 Literature Review

Many people have research how to use the STATA.for example, the web introduces the STATA from six parts such as Overview of Stata, creating and using "log" and "do" files, data management, file management, basic data description, basics in regression analysis(http://www.princeton.edu/~erp/stata/main.html). Svend Juul(2005) systematically and thoroughly introduced the function and usage of Stata 8.0，which is intended mainly for the beginner, but knowledge of fundamental Windows http://www.ukthesis.org/dissertation_writing/MBA/functions is necessary. There are a STATA manuals distributed throughout the Research Lab(http://rlab.lse.ac.uk/DataService/stata.asp), which is a basic guide to STATA.

1.3 Research focuses

The paper focuses on six aspects as follows. First,Stata data management capabilities. Stata data management space is impacted by the computer's operating system and the impact of computer memory expansion. for example, as the 640 computer system with K memory, the Stata version 3.1 can manage 2,400 records × 99 variables, and with the expansion of computer memory increased; and the version 4.0 of Stata can manage 4,800 records × 99 variables; The version5.0, according to the computer configuration settings and record number of variables, such as 32 M expansion of computer memory, can handle 10 million data 2.

Record number of variables and can transact each other, that is to say, if reducing the record number of variables the number of variables will increase, but to reduce can increase the number recorded. Variables can be converted to packet instructions variables (variables dumb), with the string variable mapping into digital code. Data files can be both horizontal and vertical links ,and to trip data disaggregated data, or vice versa. We can resume, the revised implementation of the order. Numerical function can be used or a new string function variables from the keyboard or disk can be read into the data.

Second, statistical functions of Stata. statistical functions of Stata is strong, in addition to the traditional method of statistical analysis, Stata has collected nearly 20 years of development of the new methods, such as Cox proportional hazards regression, with the index Weibull regression, a wide range of results and orderly results of the logistic regression, Poisson regression, two negative return of negative two and generalized regression, and other random effects model. #p#分页标题#e#

Specifically, the Stata statistical analysis has the following capacities: numerical analysis of the general variable information, parameter estimation, t-test, single-factor and multi-factor analysis of variance, analysis of covariance, interactive effect model, balanced and non-equilibrium design, nested design, random effects, and are more than the number of February 2, the lack of data processing, homogeneity of variance test normality tests, such as variable transformation. Analysis of the general classification information: parameter estimation, contingency table analysis (T test, in connection number, the exact probability), epidemiological analysis forms. General levels of information: Rank transform, rank sum test, the rank correlation. Correlation and regression analysis: simple correlation, partial correlation, canonical correlation, and as many as several dozens of regression analysis methods, such as multiple linear regression, stepwise regression, the weighted regression, and stabilize the bond return, two-stage return percentile (median) return, residual analysis, high-impact analysis, curve fitting, random effects linear regression model, and so on. Risk Analysis: conditions and the conditions of the non-logistic regression, a wide range of results with the results of logistic regression orderly, Probit regression, and other generalized linear model, random effects of logistic regression, the random effects of the Poisson regression, and so on. Survival Analysis: survival curves baseline estimates of the relative risk estimates, the Kaplan-Meier survival curves, life-table analysis, the number of rank test, Mantel-Haenszel test, Wilcoxon-Gehan test, Cox proportional hazard model, and normal Censored Tobit regression, the index return and Weibull regression, and so on. Other Methods: quality control, cluster sampling design efficiency, and evaluation of diagnostic tests, kappa, etc. Third, the mapping function Stata Stata the mapping module, the main provision of the following eight basic graphics production: histogram (histogram), bar charts (bar), the percentage of the map (oneway), the percentage circle diagram (pie ), the plot (two way), the plot matrix (matrix), the star map (star), the median Fig. These ingenious graphics applications, the vast majority of users to meet the requirements of the statistical mapping. In some non-graphics commands, Stata but also offered a special drawing some graphics functions such as survival analysis, mapping provides a survival curve, regression analysis provided in the plans, such as residuals.

4. matrix operation function of Stata .matrix algebra multivariate statistical analysis is an important tool for Stata to provide a multivariate statistical analysis of the matrix for basic computing, such as matrix increase, plot, inverse, Cholesky decomposition, the plot Kronecker ; also provided some high-level computing, such as the characteristic root, eigenvectors, singular value decomposition; End in the implementation of certain statistical analysis of orders, also provide some system matrix, such as the estimated coefficient vectors, it is estimated that the coefficient of covariance matrix. Although the capacity of the largest Stata only allowed 400 - 400 of the matrix, and use it to complete the day-to-day work. the statistical analysis is obviously unrealistic, but to use it to do some exercises to enhance multivariate statistical analysis of the efficiency of teaching is undoubtedly helpful. Stata procedures is a statistical analysis software, but it also has a strong programming language features that provide the user to the development and application of a vast space, users can give full play to their wisdom and talent and skilled application of the kind of skills, truly arbitrary. In fact, the Stata document (High statistics) are using Stata prepared their own language.#p#分页标题#e#

2 CASE studies

2.1data entry, document archiving and transfer orders, as well as data management order

2.2 description statistical orders and output results show

2.3 t-test and single-factor analysis of variance

2.4 correlation analysis

2.5 T-test and single-factor analysis of variance

留学生MBAdissertationConclusions

References

http://www.princeton.edu/~erp/stata/main.html

Appendices

相关文章

UKthesis provides an online writing service for all types of academic writing. Check out some of them and don't hesitate to place your order.