Synthetic Data FAQ

This document attempts to answer frequently asked questions regarding synthetic data for the following APIs that provide access to Medicare claims data:

Please see the Synthetic Data Guide for more information.

How do I access the synthetic data? How is the data useful to my organization? What exactly does the data contain?

See the Synthetic Data Guide.

A new set of synthetic data was released. What will happen to the other types of synthetic data? Is it being replaced?

All previously released sets of synthetic data will persist indefinitely; they are not replaced. New sets of synthetic data will instead provide additional synthetic beneficiaries. Any combination of synthetic data can be used going forward.

What are the differences between the most recent set of synthetic data and the old sets?

See the Release History section of our Guide

How often are new data sets loaded?

The current plan is to release a new set of updating synthetic data quarterly to regularly add new sets of synthetic beneficiaries and claims. This will allow us to generate new data that can take advantage of any data generation quality improvements (and any new fields) over time.

Are field values in the synthetic data consistent with real values and expected formats?

Yes, they are consistent with expected formats and we're working hard to ensure that they are consistent with real records and values, as much as is reasonable.

Does the synthetic data contain at least one record with each possible code for all coded values?

No, the data only covers some codes for coded values.

Does the synthetic data contain all possible fields?

Not yet, though the coverage of fields should improve over time. As of August, 2021, the synthetic data includes the following maximum percentages of possible fields (any given record may contain less):

Field type	Maximum percentage of fields included
Beneficiary	27%
Carrier	79%
Durable Medical Equipment (DME)	84%
Home Health Aide (HHA)	63%
Hospice	66%
Inpatient	83%
Outpatient	75%
Part D Events	93%
Skilled Nursing Facility (SNF)	73%

Do the data types for fields in the synthetic data always match the data types in the production data (string, bool, integer)?

Yes, they should.

Does the size distribution of each EOB in synthetic data generally match the size of production EOBs?

Not necessarily, although improvements are constantly being made to make any new data more prod-like in various ways including distribution of claim types in EOBs.

Where can I ask questions not answered here?

You can join the Google Groups for any APIs you access and ask there:

Blue Button 2.0 (BB2.0): https://groups.google.com/g/developer-group-for-cms-blue-button-api
Beneficiary Claims Data API (BCDA): https://groups.google.com/forum/#!forum/bc-api
Data at the Point of Care (DPC): https://groups.google.com/forum/#!forum/dpc-api
Medicare Claims Data to Part D Sponsors (AB2D): https://groups.google.com/g/ab2d-api

Home
For BFD Users
- Making Requests to BFD
- API Changelog
- Migrating to V2 FAQ
- Synthetic and Synthea Data
  - Synthetic Data Guide
  - Synthetic Data FAQ
- BFD SAMHSA Filtering

For BFD Contributors and Maintainers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthetic Data FAQ

Synthetic Data FAQ

How do I access the synthetic data? How is the data useful to my organization? What exactly does the data contain?

A new set of synthetic data was released. What will happen to the other types of synthetic data? Is it being replaced?

What are the differences between the most recent set of synthetic data and the old sets?

How often are new data sets loaded?

Are field values in the synthetic data consistent with real values and expected formats?

Does the synthetic data contain at least one record with each possible code for all coded values?

Does the synthetic data contain all possible fields?

Do the data types for fields in the synthetic data always match the data types in the production data (string, bool, integer)?

Does the size distribution of each EOB in synthetic data generally match the size of production EOBs?

Where can I ask questions not answered here?

Clone this wiki locally