StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
15276163
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-03-07T16:15:08.543
FavoriteCount
0
LastActivityDate
2013-03-07T16:15:08.543
LastEditDate
LastEditorUserId
0
OwnerUserId
1623007
ParentId
15263883
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
SAS has a lot of options for avoiding repeating code; in that way it's actually a lot like python, although the method for accomplishing it is a little different as you do have a separate compilation step (so you can't just say WHERE like you ask directly). First off, you have the macro variable. If you're just repeating text several times, you can define it in a macro variable, like so: <pre><code>%let condition=X in (1,10,100,1000); </code></pre> Macro variables are treated as if they were text you had written. They do not need quotation marks or other text qualifiers, unless they are intended to contain them as legal code, ie: <pre><code>%let condition=X in ('A','B','C'); </code></pre> would be legal, but <pre><code>%let condition="X in ('A','B','C')"; </code></pre> would probably not be what you want (unless you want that to be evaluated as a string, anyhow). Through macro variables, you also have the ability to generate larger amounts of code in a datastep and then include it. For example, if you have a dataset containing a list of conditions, you could apply them this way: <pre><code>data conditions; format condition $50.; input condition $; datalines4; if x = 15 then y=5; if x = 20 then y=10; if x = 20 and z = 5 then y=15; if x = 20 and z = 10 then y=20; ;;;; run; proc sql; select condition into :condlist separated by ' ' from conditions; quit; data want; set have; &condlist; run; </code></pre> That would take the conditions from "conditions" dataset and push it into a macro variable "&condlist". The PROC SQL call is the easiest way to get it into a macro variable, but there are others; CALL SYMPUT also can do it in a data step, or you can write it to a text file and then %include the text file as code as well. This is more commonly used in advanced programming by generating calls to a macro, with the conditions dataset providing the macro parameters; in this case you might have a macro <pre><code>%macro cond(x=,y=,z=); if x=&x and z=&z then y=&y; %mend cond; </code></pre> Then you could generate calls to cond from a dataset with just x,y,z values: <pre><code>proc sql; select cats('%cond(x=',x,',y=',y,',z=',z,')') into :condlist separated by ' ' from conditions; quit; </code></pre> and use it in the same way. Macro programming in general is a good solution for avoiding code creep; a macro is written once and then can be run multiple times with different parameters. A macro can be anywhere from one line of code (like above) executed inside a data step, to hundreds of lines containing multiple DATA and PROC steps. Macro programming is a complex topic in and of itself, and worth reading more on. <hr> You can also write a function in SAS. PROC FCMP (function compile) allows you to write fairly complex functions and execute them in your data step or even your PROC statements. <a href="http://www.lexjansen.com/pharmasug/2011/tu/pharmasug-2011-tu07.pdf" rel="nofollow">http://www.lexjansen.com/pharmasug/2011/tu/pharmasug-2011-tu07.pdf</a> is a good place to start with FCMP if you have 9.2; if you have 9.3, I haven't seen any papers yet (but there may be some out there) showing the newer things in FCMP. FCMP is fairly new so there are still a lot of changes in each iteration of SAS. Here's an example of FCMP to do your condition: <pre><code>proc fcmp outlib=work.funcs.Test; /* where will the functions be saved */ function condition(x); /* declare a function returning a number */ if x in (1,10,100,1000) then return(1); else return(0); endsub; quit; data have; do x = 1,5,10,20,100,150,1000,1500; output; end; run; options cmplib=work.funcs; data want; set have; if condition(x) then output; run; </code></pre> <hr> You also have the CALL EXECUTE statement, which allows you to directly execute code from a dataset. Using the same CONDITIONS dataset: <pre><code>data _null_; set conditions end=eof; if _n_ = 1 then call execute('data want; set have;'); call execute(condition); if eof then call execute('run;'); run; </code></pre> That would effectively construct a data step that executes immediately following your data null step with the same code as in the macro variable example. Call execute works a little differently, so while in this example there shouldn't be any difference, there are a few issues with timing that can cause problems (or can be advantageous); which you use depends on the circumstance. Particularly for CALL EXECUTE, read up on the documentation and online papers (SUGI papers most commonly) to find out more details. <hr> In addition to directly executing code via macro variables or CALL EXECUTE, you have a lot of other ways of performing tasks to avoid wallpaper code. For example, to more easily perform the if statements above, you might be able to use a format. Formats convert one value to another value; most commonly you might have something like 'DOLLAR6.2' which would give you $3.50 from the number 3.5. However, formats can also be used to replace if-this-then-that expressions. If there were only X and Y (and no Z conditions), then you could do this, given this conditions dataset: <pre><code>data conditions; input x y; datalines; 1 5 2 10 3 20 4 50 5 100 ;;;; run; data for_fmt; set conditions; rename x=start y=label; fmtname='XTOY'; type='i'; *type=i means numeric informat, so numeric to numeric conversion. Informat = to numeric, Format= to character.; run; proc format cntlin=for_fmt; quit; data want; set have; y = input(x,XTOY.); run; </code></pre> There you have one line of code converting x to y. (Of course there is a bit of code setting up the format, but it can be separated from the main code, and included in the set-up portion of your code, like a .h file in c). You also have hash table lookups, which are really helpful when you have more complex conversions - either 1 to many or many to 1. They work just like they sound - you load the hash table into memory and perform lookups. <a href="http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf" rel="nofollow">http://support.sas.com/rnd/base/datastep/dot/hash-getting-started.pdf</a> is one good place to start. <hr> Finally, one good way to avoid repeating code is to use fewer separate datasets. SAS data steps and procedures have the "BY" statement available, which means they treat each different value of the BY variable(s) as effectively a separate dataset. The variable names and lengths need to match, as it is still technically one dataset, but if you have many datasets of similar data, and want to perform the same action to each, you can perform them once with a BY statement rather than multiple times. For example, say you had the dataset SASHELP.CARS. You might want to calculate something separately for each make of car. You could either do: <pre><code>data acura; set sashelp.cars; if make='ACURA'; run; data honda; set sashelp.cars; if make='HONDA'; run; </code></pre> And then run your code on each dataset separately. However, a more SASsy way to do it is to use the BY statement: <pre><code>proc means data=sashelp.cars; by make; var mpg_city mpg_highway; run; </code></pre> Now you get a separate page for each make. You can use the BY statement in data step processing as well; you get variables FIRST.make and LAST.make which tell you if you're on the first record of a new MAKE or the last record of a MAKE (the record just before a change in value), which allow you to do things based on where you are in a dataset's BY group (for example, <code>if first.make then counter=0;</code> would allow you to have a counter that is reset each time you have a new value in <code>make</code>. ) The only caveat for BY groups is you have to sort your dataset by the BY variable prior to using it (or have an index on that variable, or both). This is really helpful for analysis of bootstrap samples or other processes where you have many nearly-identical datasets and perform identical actions on them.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. PODefining variables in sas to clean up code
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USJoe
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. PODefining variables in sas to clean up code
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId
1. COThanks for the VERY complete answer!
 singulars
 PostPostId
 PO
 UserUserId
 USRishi

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.