Five sets of flat files are built from the CRSP Stock and Index databases. A number of our subscribers have long requested alternate formats of our databases in order to automate or streamline their processes and easily ingest CRSP data into them. These flat files are intended for these purposes.
SIZ – 1925 US Stock and Indexes
SAZ – 1925 US Stock
SXZ – 1962 US Stock and Indexes
S6Z – 1962 US Stock
SFZ - 1925 US Indexes
File Extensions:
SAS: *.sas7bdat
ASCII: *.dat
R: *.rds
Stock Files
File Names
Delist – Daily Return Information
sfz_del
Delist – Monthly Return Information
sfz_mdel
Distributions
sfz_dis
Name History
sfz_nam
NASDAQ History
sfz_ndi
Security Header Information
sfz_hdr
Shares History
sfz_shr
Time Series – Daily Primary
sfz_dp_dly
Time Series – Daily Secondary
sfz_ds_dly
Time Series - Monthly
sfz_mind
Stock and Index Files*
File Names
Index Membership
sfz_mbr
Portfolio Membership - Daily
sfz_portd
Portfolio Membership - Monthly
sfz_portm
* Available in 1925 and 1962 US Stock & Indexes, and 1925 US Indexes, but are NOT available in 1925 and 1962 US Stock.
Index Files
File Names
Index Header
sfz_indhdr
Rebalance
sfz_rb
Time Series – Daily Index
sfz_dind
Time Series – Monthly Index
sfz_mind
Comparisons between the legacy CRSPAccess Stock and Index databases and the CRSP flat files will reveal differences, outlined here:
Derived Data
The flat files provide all of the underlying data that can be used to generate derived data items, but do not include the derived items themselves. Examples of derived data that TSQuery and ts_print can generate that are not found in the flat include excess and cumulative returns, last non-missing, recent, previous period versions of time series data items.
Double Precision
CRSPAccess stored most items, including commonly used Price (PRC), Ask or High Price (ASKHI), and Bid or Low Price (BIDLO), as 4-byte floating point numbers. The SAS format defaults to storing those items as 8-byte (double precision) floating point numbers. For the majority of historical values, this does not result in any difference. However, the conversion to SAS’s double precision, makes more apparent a known issue of phantom precision related to decimal pricing. For example, IBM’s price on 11/27/2015 of $138.46 will, if displayed with five decimal places, show $138.46001. The ASCII version, by outputting only seven significant digits in scientific notation, will have just 1.384600E+02, and no phantom precision.
CRSP is re-engineering our back office to correct the phantom precision and increase the real precision throughout the Stock and Indexes flat files in the coming year.
9-Character CUSIP
Subscribers have long-requested the addition of 9-character cusips to our databases. The flat files include them in the name- and header history files.
CRSP Total Market
The investable CRSP Total Market Index is included in the index files associated with the stock and index database. Daily price-only and total returns, levels, and counts are included.
Volumes differences
The introduction of double precision allows for the monthly volumes to be stored in same unit (1 share) as daily volumes rather than as a unit of 100 shares. So monthly volumes in the flat files are now stored as one hundred times the value in legacy CRSPAccess. The change also allowed for four missing daily volumes for Citigroup in 2009 and 2010 to be replaced with actual values that were in excess of two billion, and for those monthly volumes to be recalculated.
KYPERMNO
CALDT
New VOL
MCALDT
New MVOL
70519
08/05/2009
2,674,463,281
08/31/2009
22,798,732,177
70519
12/17/2009
3,772,638,437
12/31/2009
15,021,795,593
70519
12/18/2009
2,813,697,156
70519
12/07/2010
3,267,829,406
08/31/2009
13,427,190,606
Data Category
Each data item in each table is associated with a Data Category. The Data Category identifies the type of item and the associated formats for use with SAS, ASCII, R, and SQL.
Data Category
Description
SAS Type
SAS Format
ASCII
SQL Type
R Type
DESCRIPTION
Wide Character Field Containing Text Iinformation
CHARACTER
$W.
VARCHAR(W)
TEXT
ID
Field Containing an Alphanumeric Identifier
CHARACTER
$W.
VARCHAR(W)
TEXT
FLAG
Alphanumeric Field Containing a Code Value
CHARACTER
$W.
VARCHAR(W)
TEXT
NAME
Alphanumeric Field for Names
CHARACTER
$W.
VARCHAR(W)
TEXT
DATE
Date Field
NUMERIC
YYYYMMDD10.
YYYY-MM-DD
DATE
DATE
START DATE
Start of a Date Range - Paired with an End Date
NUMERIC
YYYYMMDD10.
YYYY-MM-DD
DATE
DATE
END DATE
End of a Date Range - Paired with a Start Date
NUMERIC
YYYYMMDD10.
YYYY-MM-DD
DATE
DATE
TIMESTAMP
Date and Time, Including Seconds
NUMERIC
DATETIME
YYYY-MM-DD HH:MM:SS
DATETIME
DATETIME
AMOUNT
Fixed Point Number
NUMERIC
W.N
WW.NNNNN
DECIMAL (P,N)
FLOAT
RATIO
Calculated Floating Point Number
NUMERIC
PERCENTW.N
1.234567890123E+12
FLOAT
FLOAT
VALUE
Field with a Wide Range of Values
NUMERIC
E20.
1.234567890123E+12
FLOAT (OR DECIMAL)
FLOAT
CODE
Integer Field that Represents One or More Characteristics
NUMERIC
W.
NNNN
INT
INT
KEY
Integer Field that is Used as a Key
NUMERIC
W.
NNNNNNN
INT
INT
NUMBER
Integer Value < 2,000,000,000
NUMERIC
W.
NNNNNNNNNN
INT
INT
QUANTITY
Integer Field with Some Values in Excess of 2,000,000,000
NUMERIC
COMMAW.
NNNNNNNNNNNNN
BIGINT
FLOAT
Missing Values
In CRSPAccess, missing values were indicated by the use of defined non-null values (e.g -99 for returns and 0 for prices). In the flat files, missing values are now represented by null values.
Missing values are displayed as follows:
ASCII missing values, regardless of type, is an empty string
SAS missing values are an empty string for character fields and SAS missing (displayed as a.) for numberic fields
R missing values are an empty string for character fields, and ‘R’ missing (displayed as N/A) for float, int, date and datatime.
Five sets of flat files are built from the CRSP Stock and Index databases. A number of our subscribers have long requested alternate formats of our databases in order to automate or streamline their processes and easily ingest CRSP data into them. These flat files are intended for these purposes.
File Extensions:
* Available in 1925 and 1962 US Stock & Indexes, and 1925 US Indexes, but are NOT available in 1925 and 1962 US Stock.
Comparisons between the legacy CRSPAccess Stock and Index databases and the CRSP flat files will reveal differences, outlined here:
Derived Data
The flat files provide all of the underlying data that can be used to generate derived data items, but do not include the derived items themselves. Examples of derived data that TSQuery and ts_print can generate that are not found in the flat include excess and cumulative returns, last non-missing, recent, previous period versions of time series data items.
Double Precision
CRSPAccess stored most items, including commonly used Price (PRC), Ask or High Price (ASKHI), and Bid or Low Price (BIDLO), as 4-byte floating point numbers. The SAS format defaults to storing those items as 8-byte (double precision) floating point numbers. For the majority of historical values, this does not result in any difference. However, the conversion to SAS’s double precision, makes more apparent a known issue of phantom precision related to decimal pricing. For example, IBM’s price on 11/27/2015 of $138.46 will, if displayed with five decimal places, show $138.46001. The ASCII version, by outputting only seven significant digits in scientific notation, will have just 1.384600E+02, and no phantom precision.
CRSP is re-engineering our back office to correct the phantom precision and increase the real precision throughout the Stock and Indexes flat files in the coming year.
9-Character CUSIP
Subscribers have long-requested the addition of 9-character cusips to our databases. The flat files include them in the name- and header history files.
CRSP Total Market
The investable CRSP Total Market Index is included in the index files associated with the stock and index database. Daily price-only and total returns, levels, and counts are included.
Volumes differences
The introduction of double precision allows for the monthly volumes to be stored in same unit (1 share) as daily volumes rather than as a unit of 100 shares. So monthly volumes in the flat files are now stored as one hundred times the value in legacy CRSPAccess. The change also allowed for four missing daily volumes for Citigroup in 2009 and 2010 to be replaced with actual values that were in excess of two billion, and for those monthly volumes to be recalculated.
Data Category
Each data item in each table is associated with a Data Category. The Data Category identifies the type of item and the associated formats for use with SAS, ASCII, R, and SQL.
Missing Values
In CRSPAccess, missing values were indicated by the use of defined non-null values (e.g -99 for returns and 0 for prices). In the flat files, missing values are now represented by null values.
Missing values are displayed as follows: