SASCommon function
I. mathematical functions
ABS(x) Find the absolute value of X.
MAX(x1,x2,…,xn) Find the largest one in all independent variables.
MIN(x1,x2,…,xn) Find the smallest of all independent variables.
MOD(x,y) X is divided by the remainder of y.
SQRT(x) Seek the square root of X.
ROUND(x,eps) For example, the ROUND (5654.5654, 0.01) result is 5654.57, and the ROUND (5654.5654, 10) result is 5650.
CEIL(x) The smallest integer that is greater than or equal to X. When x is integer, it is x itself, otherwise it is the nearest integer on the right side of X.
FLOOR(x) Find the largest integer smaller than x. When x is integer, it is x itself, otherwise it is the nearest integer on the left side of X.
INT(x) Ask x to discard the result of the decimal part.
FUZZ(x) When the difference between X and its four and five integral values is less than 1E-12, four rounds of five entries are taken.
LOG(x) Find the natural logarithm of X.
LOG10(x) Find the common logarithms of X.
EXP(x) Exponential function.
SIN(x), COS(x), TAN(x) The sine, cosine and tangent functions of X are obtained.
ARSIN(y) Calculate the inverse function of function y=sin (x) in interval, y take the value between [-1,1].
ARCOS(y) The inverse function of function y=cos (x) is calculated, and Y takes the value between [-1,1].
ATAN(y) The inverse function of function y=tan (x) is calculated, and Y takes the interval value.
SINH(x), COSH(x), TANH(x) Hyperbolic sine, cosine, tangent
ERF(x) Error function
GAMMA(x) Complete function
In addition, there are symbolic function SIGN, function first derivative function DIGAMMA, second derivative function TRIGAMMA, error function residual function ERFC, function natural logarithm LGAMMA, ORDINAL function, AIRY function, DAIRY function, Bess function.The El function JBESSEL, the modified Bessel function IBESSEL, and so on.
Two. Array function
The array function calculates the dimension and upper and lower bounds of the array, which is good for writing portable programs. Array functions include:
DIM(x) Find the number of elements in the first dimension of the array x (note that when the lower bound is 1, the number of elements is the same as the upper bound, otherwise the number of elements is not necessarily the same as the upper bound).
DIM k(x) Find the number of elements of the array x k dimension.
LBOUND(x) Find the lower bounds of the first dimension of the array X.
HBOUND(x) Find the upper bounds of the first dimension of an array X.
LBOUND k(x) The lower bound of the K dimension of the array x is obtained.
HBOUND k(x) The upper bound of the K dimension of the array x is obtained.
Three. Character function
The more important character functions are:
TRIM(s) Returns the result of removing the trailing space of string s.
UPCASE(s) The result of converting all lowercase letters in the string s to uppercase letters.
LOWCASE(s) Converts all the uppercase letters in the string s to lowercase letters.
INDEX(s,s1) Find the location where S1 appears in S. Return to 0 if not found.
RANK(s) The ASCII code value of the character s.
BYTE(n) The corresponding character of the value of the first n ASCII code.
REPEAT(s,n) The character expression s is repeated N times.
SUBSTR(s,p,n) Extract the substrings of n characters long from the P character in the string s.
TRANWRD(s,s1,s2) Replace all string S1 from string s to the result of string S2.
Other character functions include COLLATE, COMPRESS, INDEXC, LEFT, LENGTH, REVERSE, RIGHT, SCAN, TRANSLATE, VERIFY, COMPBL, DEQUOTE, INDEXW, QUOTE, SOUNDEX, TRIMN, INDEXW.
example1: substr
data_null_;
x=”1234ABCD”;
y=substr(x,3,2);
substr(x,1,2)=”EF”;
put x=/ y=;
run;
Output:
x=EF34ABCD
y=34
Be careful:
1. Must be extracted from character variables, does not work on numerical variables, must be converted to character variables, if it is a digital variable, in the call to the substr function will automatically convert the digital variable to character variables, but it should be noted that the converted character variables are in the best 12. format.
2. nThe length can not exceed the length after P, for example, s=scorecard, b=Substr (s, 5,5) system will have hints.
3. If n is missing, SAS extracts all the characters after p, and if it is replaced, n cannot be missing.
4. For Chinese character interception, if substr function is used to output scrambled code, ksubstr function is used. In addition, ksubstrb function can intercept bytes.
example2: tranwrd
data_null_;
x=”ABabCDEFGABCD”;
y=tranwrd(x,”AB”,”ef”);
puty=;
run;
Output: y=efabCDEFGefCD
Be careful:
TRANWRDString substitution for functions is case sensitive.
example3: compress
COMPRESS (,,)
source Specifies the source string for a character to be removed.
chars Specifies the initial character of a column, which is removed from source by default.
modifiers Specifies a modifier, the specific function of a function. Such as:
a Increase (A – Z, a – z) to the initial character (chars).
d Add the number to the initial character (chars).
f Add the underscore and letters (A – Z, a – z) to the initial character (chars).
g Add graphic characters to initial characters (chars).
k Instead of removing the initial character (chars), these characters are returned.
l Add lowercase letters (a – z).
n Add numbers, underlines and letters (A – Z, a – z).
p Add punctuation marks.
s Add spaces, including spaces, horizontal tabs, vertical tabs, carriage returns, line breaks and page breaks.
t Cut off the tail space.
u Increase uppercase (A – Z).
w Add printable characters.
X Add sixteen binary characters.
data_null_;
x=”ABabCDEFGABCD”;
y=compress(x,”A”,”l”);
puty=;
run;
Output: y=BCDEFGBCD
Be careful:
1. Only source, remove spaces.
2. When source, chars is removed, chars is removed from source.
3.source ,chars,modifiersSometimes, modifiers K decides to retain or remove. When K is not available, remove chars plus modifiers specified.
example4: cat
CAT(A,B): Splice strings A and B and retain all spaces in the beginning and the end (same as A||B).
CATS(A,B): Splice strings A and B and remove all spaces from the beginning and the end (same as strip (A) ||strip (B)).
CATX(“x”,A,B): Splice strings A and B and remove all the space between the beginning and the end, and add a specified string “x” (with strip (A) | “x” | | strip (B))
CATT(A,B): Splice strings A and B and remove the space at the end of each string (same as trim (A) ||trim (B)).
example:
data_null_;
a = ‘am’;
b = ‘learning’;
c =’ SAS’;
s1 = cat(a,b,c);
s2 = cats(a,b,c);
s3 = catx(‘_’,a,b,c);
s4 = catt(‘I’,a,b,c);
put s1 = / s2 = / s3 = / s4 = ;
run;
Output:
s1 = am learning SAS
s2 = am learning SAS
s3 = am_learning_SAS
s4 = I am learning SAS
Four, date and time functions
Frequently used date and time functions are:
MDY(m,d,yr) Generate SAS date value for M D month of yr
Statements |
Results |
mn=8; dy=27; yr=12; birthday= mdy(mn,dy,yr); put birthday; |
08/27/2012 |
mn=7; dy=11; yr=12; anniversary = dy(mn,dy,yr); put anniversary; |
11JUL2012 |
YEAR(date) From SAS date value date to year
MONTH(date) From SAS date value date to month
DAY(date) Get date from SAS date value date
WEEKDAY(date) What day is it from SAS date value date?
QTR(date) Get quarterly value from SAS date value date
HMS(h,m,s) Generate SAS time value from hour h, minute m, and second s.
DHMS(d,h,m,s) From SAS date value D, hour h, minute m, second s to generate SAS date time value
DATEPART(dt) Date part of SAS date time value DT
INTNX(interval,from,n) Calculate the SAS date after N in interval from from. Among them, interval can take’YEAR’,’QTR’,’MONTH’,’WEEK’,’DAY’and so on. For example, INTNX (‘MONTH’,’16Dec19)97’d, 3) turned out to be March 1, 1998. Note that it always returns the start value of a cycle.
INTCK(interval,from,to) Calculates the number of interval intervals between the date from and the date to, where interval takes’MONTH’, and so on. For example, INTCK (‘YEAR’,’31Dec1996’d,’1Jan1998’d) calculates 19The number of annual intervals between 31 December 1996 and 1 January 1998 yielded a result of 2, although the actual interval between the two dates was only one year.
Other date and time functions include DATE, TODAY, DATETIME, DATEJUL, JULDATE, HOUR, MINUTE, SECOND, TIME, TIMEPART, etc. For details, see SAS system – base SASSoftware manual, “SAS system SAS/ETS software manual”.
Five. Distribution density function and distribution function
As a statistical computing language, SAS provides a number of functions related to probability distribution. Distribution density, probability and cumulative distribution function can be invoked in several unified formats.
The distribution function value = CDF (‘distribution’, X < parameter table >);
Density = PDF (‘distribution’, X < parameter table >);
Probability value = PMF (‘distribution’, X < parameter table >);
Logarithmic density = LOGPDF (‘distribution’, X < parameter table >);
Logarithmic probability = LOGPMF (‘distribution’, X < parameter table >);
CDFThe distribution function specified by the’distribution’is calculated. PDF calculates the distribution density function value. PMF calculates the distribution probability of discrete distribution. LOGPDF is the natural logarithm of PDF and LOGPMF is the natural logarithm of PMF. The function is calculated at the X of the independent variable, < parameter table.> represents an optional parameter table.
Distribution type values can be: BERNOULLI, BETA, BINOMIAL, CAUCHY, CHISQUARED, EXPONENTIAL, F, GAMMA, GEOMETRIC, HYPERGEOMETRIC, LAPLACE, LOGISTIC, LOGNORMAL, NEGBINOMIAL, NORMAL or GAUSSIAN, PARETO, POISSON, T, UNIFORM, W.ALD or IGAUSS, and WEIBULL. You can only write the first four letters.
For example, PDF (‘NORMAL’, 1.96) calculates the density value at 1.96 for the standard normal distribution (0.05844), and CDF (‘NORMAL’, 1.96) calculates the distribution function value at 1.96 for the standard normal distribution (0.975). PMFFor continuous distribution, that is, PDF.
In addition to the above unified format calls, SAS provides separate density and distribution functions for commonly used distributions.
PROBNORM(x) Standard normal distribution function
PROBT(x,df<,nc>) The t distribution function of degree of freedom is DF. Optional parameter NC is a non central parameter.
PROBCHI(x,df<,nc>) Chi square distribution function whose degree of freedom is DF. Optional parameter NC is a non central parameter.
PROBF(x,ndf,ddf<,nc>) F(ndf,ddf)The distribution function of distribution. Optional parameter NC is a non central parameter.
PROBBNML(p,n,m) Let the random variable Y obey the two item distribution B (n, P), this function calculates P (Y m).
POISSON((lambda,n) The probability of Poisson distribution Y n with parameter lambda.
PROBNEGB(p,n,m) The probability that the parameter is (n, P) the negative two item distributes Y m.
PROBHYPR(N,K,n,x<,r>) The distribution function of hypergeometric distribution. Let N products have K non-conforming products, extract n samples, in which the number of non-conforming products less than equal to the probability of X is this function value. The optional parameter r is the unevenness, default is 1, R represents how many times the probability of extracting unqualified products is the probability of extracting qualified products.
PROBBETA(x,a,b) The distribution function of the Beta distribution parameter is (a, b).
PROBGAM(x,a) The distribution function of the Gamma distribution is a.
PROBMC The probabilities and critical values of multiple comparison tests for multiple sets of mean are calculated.
PROBBNRM(x,y,r) The distribution function of the standard two element normal distribution, R is the correlation coefficient.
Six. Quantile function
Quantile function is an inverse function of probability distribution function. The independent variable is between 0 and 1. Quantile function calculates the left quantile of the distribution. SAS provides six common continuous distribution quantile functions.
PROBIT(p) The left P quantile of standard normal distribution. The result is between 5 and 5.
TINV(p, df <,nc>) The left P quantile of t distribution with degree of freedom is DF. Optional parameter NC is a non central parameter.
CINV(p,df<,nc>) The left P quantile of chi square distribution with DF degrees of freedom. Optional parameter NC is a non central parameter.
FINV(p,ndf,ddf<,nc>) F(ndf,ddf)The distribution of left P quantiles. Optional parameter NC is a non central parameter.
GAMINV(p,a) The left P quantile of gamma distribution with a parameter of a.
BETAINV(p,a,b) The left P quantile of the beta distribution is (a, b).
Seven. Random number function
SASIt can be used for stochastic simulation. It provides a common distributed pseudorandom number generating function.
1.Uniformly distributed random number
There are two uniformly distributed random number functions: UNIFORM (seed), seed must be a constant, or an odd number of 5, 6, and 7 bits. RANUNI (seed) and seed are any constant less than 2**31-1. The same random step in the same data step.Multiple calls to the number function will result in different results, but the same sequence of random numbers will be obtained from the same seed in different data steps. If the number of random seeds is 0 or negative, the seed time is adopted.
2.Normal distribution random number
There are two kinds, NORMAL (seed), seed = 0, or 5, 6, 7 odd numbers. RANNOR (seed) and seed are arbitrary numerical constants.
3.Exponential distribution random number
RANEXP(seed),seedFor arbitrary values, a random number of exponential distribution with a parameter of 1 is generated. The exponential distribution of lambda can be obtained by RANEXP (seed) /lambda.
In addition, if Y = alpha-beta * LOG (RANEXP (seed)), then Y is the extremum distribution of position parameter alpha and scale parameter beta. If Y=FLOOR (-RANEXP (seed) /LOG (P)), then Y isThe geometric distribution variables of parameter p.
4.Gamma distribution random number
RANGAM(seed, alpha),seedFor any numerical constant, alpha>, 0, the gamma distribution with alpha is obtained. Let X = RANGAM (seed, alpha), then Y = beta * X is the random number of GAMMA distribution with shape parameter alpha and scale parameter beta.If alpha is an integer, then Y=2*X is a chi square distributed random number whose degree of freedom is 2*alpha.
If alpha is a positive integer, then Y = beta * X is the sum of Erlang distribution random numbers, alpha independent mean values being the exponential distribution variables of beta.
If Y1 = RANGAM (seed, alpha), Y2 = RANGAM (seed, beta), then Y = Y1 /(Y1 + Y2) is the Beta distribution random number with the parameter (alpha, beta).
5.Triangular random number
RANTRI(seed,h),seedFor any numerical constant, 0< h< 1. This distribution ranges from 0 to 1, with a density of 0 to h of 2x/h, and a h to 1 of 2 (1-x) / (1-h).
6.Cauchy distribution random number
RANCAU(seed),seedIt is an arbitrary numerical constant. The standard Cauchy distribution random number with a position parameter of 0 and a scale parameter of 1 is generated. Y = alpha + beta * RANCAU (seed) is a general Cauchy distribution random number with position parameter alpha and scale parameter beta.
7.Two item distributed random numbers
RANBIN(seed,n,p)A random number of two distributions with a parameter of (n, P) is generated, and seed is an arbitrary number.
8.Poisson distribution random number
RANPOI(seed,lambda)The parameter lambda> is generated; the Poisson distribution random number of 0; seed is an arbitrary number.
9.General discrete distributed random numbers
RANTBL(seed, p1, …, pn)Generate 1, 2,… The probability of n is P1, respectively. The discrete random number of PN.
Eight. Sample statistic function
The sample statistics function takes the input independent variables as a set of samples, and calculates the sample statistics. Its calling format is “function name” (independent variable 1, independent variable 2,… “Argument n” or “function name (list of OF variable names)”. For example, SUM is a summation function, if x1, X2, X3 sum is required.You can use SUM (x1, X2, x3) or SUM (OF x1-x3). These sample statistical functions only compute non-missing values in the independent variables, such as excluding missing values when averaging.
The statistical functions of each sample are:
MEAN Mean value
MAX Maximum value
MIN Minimum value
N Number of non missing data
NMISS The number of missing values.
SUM Seek harmony
VAR Variance
STD Standard deviation
STDERR The standard error of mean estimation is calculated by STD/SQRT (N).
CV Coefficient of variation
RANGE Extreme difference
CSS Sum of squares of deviations
USS Sum of squares
SKEWNESS Skewness
KURTOSIS Kurtosis