Introduction
R, S, SAS, Stata, SPSS, Eviews and other statistical and econometric packages all provide a way for users to interact with the program in “batch mode” — running prewritten routines in which sequences of commands are written and submitted to the program, producing all analytical output in one step.
There are several compelling reasons to take steps to ensure that the code that comprises these routines is well-organized:
- They are often worked on by multiple people
- They are often very complicated/long
- They are often reused on other projects.
The following tips will help you make sure that your code is in a usable condition for easy editing and use, collaboration, and integration in future projects.
Data coding
Although data coding doesn’t technically count as a programming step, several data coding practices can make sure that your data management and analysis is accurate (and as easy as possible).
- Code variables consistently. If you have a series of Yes/No variables, code them all using “yes” or “no” or “0” and “1” or “y”and “n” — whichever works best for you. The key is to code them all consistently, so you don’t find yourself wondering which coding scheme you used for different variables. This can easily lead to mistakes.
- Watch cases. Some applications do not differentiate between upper and lower cases in string variables. Some do. A good practice is to use all lower-case for discrete string variables.
- Missing value codes. Use consistent missing value codes. Common codes include “na”, “-9” and “-999”. It is a good idea to use a separate code for missing values instead of leaving them blank. A separate code can clear up confusion regarding whether a data point is actually missing.
- NA/Don’t Know/Refused codes. For survey data, “refused” or “don’t know” are different than “missing” — the latter implies that no answer was given at all. It is a good idea to code these separately.
Variable naming
Many of the same rules of thumb that apply to data coding also apply to naming conventions within your code. It is important to have consistent naming conventions because doing so prevents confusion and reduces the likelihood of errors. Other tips include:
- Use a consistent naming type. Common types include:
Camelback: VariableName
Underscore: variable_name (Note: R doesn’t allow the underscore. Use a “.” instead – variable.name)
- Watch out for case sensitivity. Some applications are case sensitive when it comes to variable names, and some aren’t. Using lower-case names across the board might be a good idea if you use multiple systems.
- Use easy-to-understand mnemonics. Some systems limit the length of variable names, which might require using mnemonics such as geoloc instead of geographical_location. Using these mnemonics is fine – it even saves time – but they should be easy to remember and consistent.
- Label your variables. If the system that you’re using provides such a capability, it is a good idea to add variable labels that connect variable mnemonics to a short description of the variable in the analytical output produced by the system. This reduces the burden of having to remember the names of many variable codes.
- Label your values. If you can, add labels to the values used. For example, a coding scheme that uses “1” for less, “2” for the same and “3” for no, adding value labels reduces the chance that you’ll mix up what the numbers stand for.
- Keep a codebook. Especially if you are unable to use variable and value labels in your data analysis package, keep a codebook that records the variable names and the meanings of the values.
Comment your code
Every system that allows coded input also has a special character that indicates a comment. For example:
#this is an r comment
* this is a SAS comment;
Use comments to
1) Provide structure to your documents – setting out section headers that explain what a large part of the program does. Example:
# # # # # input the data # # # # #
2) Add explanations of important pieces of code:
x=y # set x equal to y
Comments make it easy for you to find parts of code, and for you and others to quickly understand your code in the future.
Note: This tutorial is available as a downloadable PDF file.