SAS Formats and Informats have always been very convenient to use. They make it easy to convert, format or validate data.

Recently we were investigating some data validation issues. Some parts of the data validation was done using prxmatch but that was not too easy to maintain (i.e. when the same check had to be performed in multiple locations or for multiple data sources).

That’s when we discovered that you can use a regex with a SAS Informat… Something that was reported as new back in 2012.

Here’s a small example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
PROC FORMAT;
  INVALUE doors (default=50 max=200)
    "/ 2dr/i"   (REGEXP)  = 2
    "/ 4dr/i"   (REGEXP)  = 4
    "s/(.*)([\d])dr?(.*?)$/\2/i"   (REGEXPE)  = _SAME_
    "/V40\s*$/i"   (REGEXP)  = 5
    OTHER = .;
RUN;

PROC SQL;
  SELECT Make, Model, input(Model, doors.) as Doors 
  FROM sashelp.cars
  WHERE calculated doors > 4;
QUIT;

In this small example, a new informat called doors is created. It is used to parse the Model name of the cars defined in the SASHELP.CARS table to extract the number of doors. It will first look for the value 2dr for cars with 2 doors, 4dr for cars with 4 doors. Of course, this is not practical if you have many combinations. The alternative is to use the substitution to extract the part of the string that matters. The 3rd test is looking for a digit followed by the d and optionally an r, if it matches, it will keep the digit. So this 3rd version would also cover the first two tests (it is just a more generic version of the same test). Last is a sample for the exceptions… when there is no indication about the number of doors.

Once you have to informat you are free to use it anywhere… for example in a PROC SQL.

More Info ? There is some great content on the SAS Website…