SAS common functions – one week hot

SASCommon function

I. mathematical functions

ABS(x) Find the absolute value of X.

MAX(x1,x2,…,xn) Find the largest one in all independent variables.

MIN(x1,x2,…,xn) Find the smallest of all independent variables.

MOD(x,y) X is divided by the remainder of y.

SQRT(x) Seek the square root of X.

ROUND(x,eps) For example, the ROUND (5654.5654, 0.01) result is 5654.57, and the ROUND (5654.5654, 10) result is 5650.

CEIL(x) The smallest integer that is greater than or equal to X. When x is integer, it is x itself, otherwise it is the nearest integer on the right side of X.

FLOOR(x) Find the largest integer smaller than x. When x is integer, it is x itself, otherwise it is the nearest integer on the left side of X.

INT(x) Ask x to discard the result of the decimal part.

FUZZ(x) When the difference between X and its four and five integral values is less than 1E-12, four rounds of five entries are taken.

LOG(x) Find the natural logarithm of X.

LOG10(x) Find the common logarithms of X.

EXP(x) Exponential function.

SIN(x), COS(x), TAN(x) The sine, cosine and tangent functions of X are obtained.

ARSIN(y) Calculate the inverse function of function y=sin (x) in interval, y take the value between [-1,1].

ARCOS(y) The inverse function of function y=cos (x) is calculated, and Y takes the value between [-1,1].

ATAN(y) The inverse function of function y=tan (x) is calculated, and Y takes the interval value.

SINH(x), COSH(x), TANH(x) Hyperbolic sine, cosine, tangent

ERF(x) Error function

GAMMA(x) Complete function

In addition, there are symbolic function SIGN, function first derivative function DIGAMMA, second derivative function TRIGAMMA, error function residual function ERFC, function natural logarithm LGAMMA, ORDINAL function, AIRY function, DAIRY function, Bess function.The El function JBESSEL, the modified Bessel function IBESSEL, and so on.

Two. Array function

The array function calculates the dimension and upper and lower bounds of the array, which is good for writing portable programs. Array functions include:

DIM(x) Find the number of elements in the first dimension of the array x (note that when the lower bound is 1, the number of elements is the same as the upper bound, otherwise the number of elements is not necessarily the same as the upper bound).

DIM k(x) Find the number of elements of the array x k dimension.

LBOUND(x) Find the lower bounds of the first dimension of the array X.

HBOUND(x) Find the upper bounds of the first dimension of an array X.

LBOUND k(x) The lower bound of the K dimension of the array x is obtained.

HBOUND k(x) The upper bound of the K dimension of the array x is obtained.

Three. Character function

The more important character functions are:

TRIM(s) Returns the result of removing the trailing space of string s.

UPCASE(s) The result of converting all lowercase letters in the string s to uppercase letters.

LOWCASE(s) Converts all the uppercase letters in the string s to lowercase letters.

INDEX(s,s1) Find the location where S1 appears in S. Return to 0 if not found.

RANK(s) The ASCII code value of the character s.

BYTE(n) The corresponding character of the value of the first n ASCII code.

REPEAT(s,n) The character expression s is repeated N times.

SUBSTR(s,p,n) Extract the substrings of n characters long from the P character in the string s.

TRANWRD(s,s1,s2) Replace all string S1 from string s to the result of string S2.

Other character functions include COLLATE, COMPRESS, INDEXC, LEFT, LENGTH, REVERSE, RIGHT, SCAN, TRANSLATE, VERIFY, COMPBL, DEQUOTE, INDEXW, QUOTE, SOUNDEX, TRIMN, INDEXW.

example1: substr

data_null_;

x=”1234ABCD”;

y=substr(x,3,2);

substr(x,1,2)=”EF”;

put x=/ y=;

run;

Output:

x=EF34ABCD

y=34

Be careful:

1. Must be extracted from character variables, does not work on numerical variables, must be converted to character variables, if it is a digital variable, in the call to the substr function will automatically convert the digital variable to character variables, but it should be noted that the converted character variables are in the best 12. format.

2. nThe length can not exceed the length after P, for example, s=scorecard, b=Substr (s, 5,5) system will have hints.

3. If n is missing, SAS extracts all the characters after p, and if it is replaced, n cannot be missing.

4. For Chinese character interception, if substr function is used to output scrambled code, ksubstr function is used. In addition, ksubstrb function can intercept bytes.

example2: tranwrd

data_null_;

x=”ABabCDEFGABCD”;

y=tranwrd(x,”AB”,”ef”);

puty=;

run;

Output: y=efabCDEFGefCD

Be careful:

TRANWRDString substitution for functions is case sensitive.

example3: compress

COMPRESS (，，)

source Specifies the source string for a character to be removed.

chars Specifies the initial character of a column, which is removed from source by default.

modifiers Specifies a modifier, the specific function of a function. Such as:

a Increase (A – Z, a – z) to the initial character (chars).

d Add the number to the initial character (chars).

f Add the underscore and letters (A – Z, a – z) to the initial character (chars).

g Add graphic characters to initial characters (chars).

k Instead of removing the initial character (chars), these characters are returned.

l Add lowercase letters (a – z).

n Add numbers, underlines and letters (A – Z, a – z).

p Add punctuation marks.

s Add spaces, including spaces, horizontal tabs, vertical tabs, carriage returns, line breaks and page breaks.

t Cut off the tail space.

u Increase uppercase (A – Z).

w Add printable characters.

X Add sixteen binary characters.

data_null_;

x=”ABabCDEFGABCD”;

y=compress(x,”A”,”l”);

puty=;

run;

Output: y=BCDEFGBCD

Be careful:

1. Only source, remove spaces.

2. When source, chars is removed, chars is removed from source.

3.source ，chars，modifiersSometimes, modifiers K decides to retain or remove. When K is not available, remove chars plus modifiers specified.

example4: cat

CAT(A,B): Splice strings A and B and retain all spaces in the beginning and the end (same as A||B).

CATS(A,B): Splice strings A and B and remove all spaces from the beginning and the end (same as strip (A) ||strip (B)).

CATX(“x”,A,B): Splice strings A and B and remove all the space between the beginning and the end, and add a specified string “x” (with strip (A) | “x” | | strip (B))

CATT(A,B): Splice strings A and B and remove the space at the end of each string (same as trim (A) ||trim (B)).

example:

data_null_;

a = ‘am’;

b = ‘learning’;

c =’ SAS’;

s1 = cat(a,b,c);

s2 = cats(a,b,c);

s3 = catx(‘_’,a,b,c);

s4 = catt(‘I’,a,b,c);

put s1 = / s2 = / s3 = / s4 = ;

run;

Output:

s1 = am learning SAS

s2 = am learning SAS

s3 = am_learning_SAS

s4 = I am learning SAS

Four, date and time functions

Frequently used date and time functions are:

MDY(m,d,yr) Generate SAS date value for M D month of yr

Statements

Results

mn=8; dy=27; yr=12;

birthday= mdy(mn,dy,yr);

put birthday;

08/27/2012

mn=7; dy=11; yr=12;

anniversary = dy(mn,dy,yr);

put anniversary;

11JUL2012

YEAR(date) From SAS date value date to year

MONTH(date) From SAS date value date to month

DAY(date) Get date from SAS date value date

WEEKDAY(date) What day is it from SAS date value date?

QTR(date) Get quarterly value from SAS date value date

HMS(h,m,s) Generate SAS time value from hour h, minute m, and second s.

DHMS(d,h,m,s) From SAS date value D, hour h, minute m, second s to generate SAS date time value

DATEPART(dt) Date part of SAS date time value DT

INTNX(interval,from,n) Calculate the SAS date after N in interval from from. Among them, interval can take’YEAR’,’QTR’,’MONTH’,’WEEK’,’DAY’and so on. For example, INTNX (‘MONTH’,’16Dec19)97’d, 3) turned out to be March 1, 1998. Note that it always returns the start value of a cycle.

INTCK(interval,from,to) Calculates the number of interval intervals between the date from and the date to, where interval takes’MONTH’, and so on. For example, INTCK (‘YEAR’,’31Dec1996’d,’1Jan1998’d) calculates 19The number of annual intervals between 31 December 1996 and 1 January 1998 yielded a result of 2, although the actual interval between the two dates was only one year.

Other date and time functions include DATE, TODAY, DATETIME, DATEJUL, JULDATE, HOUR, MINUTE, SECOND, TIME, TIMEPART, etc. For details, see SAS system – base SASSoftware manual, “SAS system SAS/ETS software manual”.

Five. Distribution density function and distribution function

As a statistical computing language, SAS provides a number of functions related to probability distribution. Distribution density, probability and cumulative distribution function can be invoked in several unified formats.

The distribution function value = CDF (‘distribution’, X < parameter table >);

Density = PDF (‘distribution’, X < parameter table >);

Probability value = PMF (‘distribution’, X < parameter table >);

Logarithmic density = LOGPDF (‘distribution’, X < parameter table >);

Logarithmic probability = LOGPMF (‘distribution’, X < parameter table >);

CDFThe distribution function specified by the’distribution’is calculated. PDF calculates the distribution density function value. PMF calculates the distribution probability of discrete distribution. LOGPDF is the natural logarithm of PDF and LOGPMF is the natural logarithm of PMF. The function is calculated at the X of the independent variable, < parameter table.> represents an optional parameter table.

Distribution type values can be: BERNOULLI, BETA, BINOMIAL, CAUCHY, CHISQUARED, EXPONENTIAL, F, GAMMA, GEOMETRIC, HYPERGEOMETRIC, LAPLACE, LOGISTIC, LOGNORMAL, NEGBINOMIAL, NORMAL or GAUSSIAN, PARETO, POISSON, T, UNIFORM, W.ALD or IGAUSS, and WEIBULL. You can only write the first four letters.

For example, PDF (‘NORMAL’, 1.96) calculates the density value at 1.96 for the standard normal distribution (0.05844), and CDF (‘NORMAL’, 1.96) calculates the distribution function value at 1.96 for the standard normal distribution (0.975). PMFFor continuous distribution, that is, PDF.

In addition to the above unified format calls, SAS provides separate density and distribution functions for commonly used distributions.

PROBNORM(x) Standard normal distribution function

PROBT(x,df<,nc>) The t distribution function of degree of freedom is DF. Optional parameter NC is a non central parameter.

PROBCHI(x,df<,nc>) Chi square distribution function whose degree of freedom is DF. Optional parameter NC is a non central parameter.

PROBF(x,ndf,ddf<,nc>) F(ndf,ddf)The distribution function of distribution. Optional parameter NC is a non central parameter.

PROBBNML(p,n,m) Let the random variable Y obey the two item distribution B (n, P), this function calculates P (Y m).

POISSON((lambda,n) The probability of Poisson distribution Y n with parameter lambda.

PROBNEGB(p,n,m) The probability that the parameter is (n, P) the negative two item distributes Y m.

PROBHYPR(N,K,n,x<,r>) The distribution function of hypergeometric distribution. Let N products have K non-conforming products, extract n samples, in which the number of non-conforming products less than equal to the probability of X is this function value. The optional parameter r is the unevenness, default is 1, R represents how many times the probability of extracting unqualified products is the probability of extracting qualified products.

PROBBETA(x,a,b) The distribution function of the Beta distribution parameter is (a, b).

PROBGAM(x,a) The distribution function of the Gamma distribution is a.

PROBMC The probabilities and critical values of multiple comparison tests for multiple sets of mean are calculated.

PROBBNRM(x,y,r) The distribution function of the standard two element normal distribution, R is the correlation coefficient.

Six. Quantile function

Quantile function is an inverse function of probability distribution function. The independent variable is between 0 and 1. Quantile function calculates the left quantile of the distribution. SAS provides six common continuous distribution quantile functions.

PROBIT(p) The left P quantile of standard normal distribution. The result is between 5 and 5.

TINV(p, df <,nc>) The left P quantile of t distribution with degree of freedom is DF. Optional parameter NC is a non central parameter.

CINV(p,df<,nc>) The left P quantile of chi square distribution with DF degrees of freedom. Optional parameter NC is a non central parameter.

FINV(p,ndf,ddf<,nc>) F(ndf,ddf)The distribution of left P quantiles. Optional parameter NC is a non central parameter.

GAMINV(p,a) The left P quantile of gamma distribution with a parameter of a.

BETAINV(p,a,b) The left P quantile of the beta distribution is (a, b).

Seven. Random number function

SASIt can be used for stochastic simulation. It provides a common distributed pseudorandom number generating function.

1．Uniformly distributed random number

There are two uniformly distributed random number functions: UNIFORM (seed), seed must be a constant, or an odd number of 5, 6, and 7 bits. RANUNI (seed) and seed are any constant less than 2**31-1. The same random step in the same data step.Multiple calls to the number function will result in different results, but the same sequence of random numbers will be obtained from the same seed in different data steps. If the number of random seeds is 0 or negative, the seed time is adopted.

2．Normal distribution random number

There are two kinds, NORMAL (seed), seed = 0, or 5, 6, 7 odd numbers. RANNOR (seed) and seed are arbitrary numerical constants.

3．Exponential distribution random number

RANEXP(seed)，seedFor arbitrary values, a random number of exponential distribution with a parameter of 1 is generated. The exponential distribution of lambda can be obtained by RANEXP (seed) /lambda.

In addition, if Y = alpha-beta * LOG (RANEXP (seed)), then Y is the extremum distribution of position parameter alpha and scale parameter beta. If Y=FLOOR (-RANEXP (seed) /LOG (P)), then Y isThe geometric distribution variables of parameter p.

4．Gamma distribution random number

RANGAM(seed, alpha)，seedFor any numerical constant, alpha&gt, 0, the gamma distribution with alpha is obtained. Let X = RANGAM (seed, alpha), then Y = beta * X is the random number of GAMMA distribution with shape parameter alpha and scale parameter beta.If alpha is an integer, then Y=2*X is a chi square distributed random number whose degree of freedom is 2*alpha.

If alpha is a positive integer, then Y = beta * X is the sum of Erlang distribution random numbers, alpha independent mean values being the exponential distribution variables of beta.

If Y1 = RANGAM (seed, alpha), Y2 = RANGAM (seed, beta), then Y = Y1 /(Y1 + Y2) is the Beta distribution random number with the parameter (alpha, beta).

5．Triangular random number

RANTRI(seed,h)，seedFor any numerical constant, 0< h< 1. This distribution ranges from 0 to 1, with a density of 0 to h of 2x/h, and a h to 1 of 2 (1-x) / (1-h).

6．Cauchy distribution random number

RANCAU(seed)，seedIt is an arbitrary numerical constant. The standard Cauchy distribution random number with a position parameter of 0 and a scale parameter of 1 is generated. Y = alpha + beta * RANCAU (seed) is a general Cauchy distribution random number with position parameter alpha and scale parameter beta.

7．Two item distributed random numbers

RANBIN(seed,n,p)A random number of two distributions with a parameter of (n, P) is generated, and seed is an arbitrary number.

8．Poisson distribution random number

RANPOI(seed,lambda)The parameter lambda&gt is generated; the Poisson distribution random number of 0; seed is an arbitrary number.

9．General discrete distributed random numbers

RANTBL(seed, p1, …, pn)Generate 1, 2,… The probability of n is P1, respectively. The discrete random number of PN.

Eight. Sample statistic function

The sample statistics function takes the input independent variables as a set of samples, and calculates the sample statistics. Its calling format is “function name” (independent variable 1, independent variable 2,… “Argument n” or “function name (list of OF variable names)”. For example, SUM is a summation function, if x1, X2, X3 sum is required.You can use SUM (x1, X2, x3) or SUM (OF x1-x3). These sample statistical functions only compute non-missing values in the independent variables, such as excluding missing values when averaging.

The statistical functions of each sample are:

MEAN Mean value

MAX Maximum value

MIN Minimum value

N Number of non missing data

NMISS The number of missing values.

SUM Seek harmony

VAR Variance

STD Standard deviation

STDERR The standard error of mean estimation is calculated by STD/SQRT (N).

CV Coefficient of variation

RANGE Extreme difference

CSS Sum of squares of deviations

USS Sum of squares

SKEWNESS Skewness

KURTOSIS Kurtosis

Leave a Reply Cancel reply