fix: 🐛 Floats ending in .0 now being parsed as float type #874

ncarvajalc · 2022-10-12T21:33:03Z

Float numbers ending in .0 where being parsed as integers in an OpenSearch instance when using the @searchable directive. If this data was the first being uploaded to the instance, searching queries wouldn't work as expected. The change creates a JSON file which is uploaded with the Python streaming function to lambda. The JSON has all the schema data types of the @searchable model fields. A modified version of the Python streaming function enables the DynamoDB event fields to be correctly parsed to the correct data type described in the schema.

✅ Closes: #866, #371

Description of changes

Changes

graphql-searchable-transformer.ts
This file had the following change:

Original file

// creates streaming lambda
const lambda = createLambda(
stack,
context.api,
parameterMap,
lambdaRole,
domain.domainEndpoint,
isProjectUsingDataStore,
region,
);

New file

// creates streaming lambda
const lambda = createLambda(
stack,
context.api,
parameterMap,
lambdaRole,
domain.domainEndpoint,
isProjectUsingDataStore,
region,
this.searchableObjectTypeDefinitions, //Added parameter
);

Notice that this.searchableObjectTypeDefinitions was added as a parameter. This was needed as the searchableObjectTypeDefinitions has the schema that the user wants to create with its corresponding attributes and data types.

create-streaming-lambda.ts
The createLambda function resides in this file and it had to be changed to create a JSON file with the following structure:

{
	"model1name": {
		"attr1": "dataType1",
		"attr2": "dataType2",
		...
		"attrN": "dataTypeN"
	},
	"model2name": {
		"attr1": "dataType1",
		"attr2": "dataType2",
		...
		"attrN": "dataTypeN"
	},
	...
	"modelNname": {
		"attr1": "dataType1",
		"attr2": "dataType2",
		...
		"attrN": "dataTypeN"
	}
}

This is later used to map the Schema data types to the corresponding Python data types, which solves the bug described in the issue

To create it, the searchableObjectTypeDefinitions was added as a parameter:

export const createLambda = (
  stack: Stack,
  apiGraphql: GraphQLAPIProvider,
  parameterMap: Map<string, CfnParameter>,
  lambdaRole: IRole,
  endpoint: string,
  isProjectUsingDataStore: boolean,
  region: string,
  searchableObjectTypeDefinitions: { node: ObjectTypeDefinitionNode; 
                                     fieldName: string }[],
): IFunction

Inside that function, just before the return statement, a helper function was created:

generateSchemaDataTypes(searchableObjectTypeDefinitions);

return apiGraphql.host.addLambdaFunction(
    ...

The helper function looks as follows:

const generateSchemaDataTypes = (searchableObjectTypeDefinitions:
								{ node: ObjectTypeDefinitionNode;
								  fieldName: string; }[]): void => {
  const schemaDataTypes: SchemaDataTypes = {};
  for (const def of searchableObjectTypeDefinitions) {
    const modelName = def.node.name.value.toLowerCase();

    const attributeTypes: AttributeTypes = {};
    def.node.fields?.forEach((f) => {
      attributeTypes[f.name.value] = findNamedType(f.type);
    });
    schemaDataTypes[modelName] = attributeTypes;
  }

  // Paths to export JSON file and lambda function script
  const libPath = path.join(__dirname, '..', '..', 'lib');
  const schemaPath = path.join(libPath, DATA_TYPE_SCHEMA_FILENAME);
  const streamingFunctionPath = path.join(libPath, STREAMING_FUNCTION_FILENAME);
  fs.writeFileSync(schemaPath, JSON.stringify(schemaDataTypes));

  // Zip the file
  const zip = new AdmZip();
  zip.addLocalFile(schemaPath);
  zip.addLocalFile(streamingFunctionPath);
  zip.writeZip(path.join(libPath, STREAMING_LAMBDA_ZIP_FILENAME));
};

Initially every model needs to be analyzed to find its corresponding name and attributes with their data types. Then for each of the fields in the model the data types must be extracted. To do that a helper function findNamedType needed to be created. The function looks like this:

const findNamedType = (typeNode: TypeNode) : string => {
  switch (typeNode.kind) {
    case 'NamedType':
      return typeNode.name.value;
    case 'ListType':
    case 'NonNullType':
      return findNamedType(typeNode.type);
    default:
      throw new Error(`Unknown type ${typeNode}`);
  }
};

Recursion was needed for this function as the TypeNode interface permits nested TypesNodes inside of it according to its definition:

// ast.d.ts

// Name

export interface NameNode {
  readonly kind: 'Name';
  readonly loc?: Location;
  readonly value: string;
}

...

// Type Reference

export type TypeNode = NamedTypeNode | ListTypeNode | NonNullTypeNode;

export interface NamedTypeNode {
  readonly kind: 'NamedType';
  readonly loc?: Location;
  readonly name: NameNode;
}

export interface ListTypeNode {
  readonly kind: 'ListType';
  readonly loc?: Location;
  readonly type: TypeNode;
}

export interface NonNullTypeNode {
  readonly kind: 'NonNullType';
  readonly loc?: Location;
  readonly type: NamedTypeNode | ListTypeNode;
}

Notice that the nesting ends with a NamedTypeNode as it doesn't allow more ListTypeNode which allow TypeNode themselves. Whenever the NamedTypeNode is found, it extracts the attribute's data type and returns it.

After every field and its data type has been extracted from its model it is added to the model's name and that happens for every model. After that a JSON file is generated and exported to the lib folder. Finally it is zipped with the python_streaming_function.py file to then be uploaded to the lambda function.

package.json
The package.json needed to be changed as the python_streaming_function.py was being zipped during yarn build time.

Original file

"scripts": {
    "build": "tsc && cd streaming-lambda && bestzip --force node ../lib/streaming-lambda.zip python_streaming_function.py",
    "watch": "tsc -w",
    "clean": "rimraf ./lib",
    "test": "jest"
  },

New file

"scripts": {
    "build": "tsc && cd streaming-lambda && cp python_streaming_function.py ../lib",
    "watch": "tsc -w",
    "clean": "rimraf ./lib",
    "test": "jest"
  },

adm-zip dependencies where also added.

python_streaming_function.py
This file had to be changed to read the JSON file passed to the lambda and map to the corresponding GraphQL types found in the schema, which solves the issue of floats ending in .0 being parsed to int.
This new functions where added:

# Custom mapper to match Python types to GraphQL Schema types
def map_to_gql_types(fields):
   mapped_fields = {}
   # Get GraphQL schema types
   schema_types = getGQLSchema()
   for field in fields:
   	v = fields[field]		
   	data_type = schema_types[field] if field != '__typename' else None
   	if data_type == 'Float' and isinstance(v, int):
   		mapped_fields[field] = float(v)
   	else:
   		mapped_fields[field] = v
   return mapped_fields

# Gets schema from json file
def getGQLSchema():
   with open('schema_datatypes.json') as f:
   	schema = json.load(f)	
   return schema

The map_to_gql_types function can be extended to solve other issues, such as when a AWSDate is used with a timezone with the @serchable directive. For now it just fixes the Float being parsed to int issue. The function gets the serialized DynamoDB event fields and compares it with the fields data types in the JSON file. If it finds that there is a mismatch between data types, then it serializes it to the correct type found in the schema.

Issue #, if available

This PR closes issues #866 and #371

Description of how you validated changes

Made a sample app without the changes implemented and used the same app with the changes implemented and tested both locally and in the cloud. Before the changes, whenever a register with a Float type ending with .0 was send, the index created in the OpenSearch instance would have a data type long. When making use of the searchable queries both through the app and directly through AppSync, the {eq: number} filter wouldn't work as expected. After the changes, the index created in the OpenSearch instance would have a data type float and the searchable queries would work as expected.

This was tested using different schemas with different data types and making use of relationships and other models with and without the @searchable directive.

All previous tests are running successfully, but no new tests where created.

Checklist

PR description included
yarn test passes
Tests are changed or added
Relevant documentation is changed or added (and PR referenced)
New AWS SDK calls or CloudFormation actions have been added to relevant test and service IAM policies

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Float numbers ending in .0 where being parsed as integers in an OpenSearch instance when using the @searchable directive. If this data was the first being uploaded to the instance, searching queries wouldn't work as expected. The change creates a JSON file which is uploaded with the Python streaming function to lambda. The JSON has all the schema data types of the @searchable model fields. A modified version of the Python streaming function enables the DynamoDB event fields to be correctly parsed to the correct data type described in the schema. ✅ Closes: aws-amplify#866

ncarvajalc requested a review from a team as a code owner October 12, 2022 21:33

chrisbonifacio assigned ncarvajalc Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: 🐛 Floats ending in .0 now being parsed as float type #874

fix: 🐛 Floats ending in .0 now being parsed as float type #874

ncarvajalc commented Oct 12, 2022

fix: 🐛 Floats ending in .0 now being parsed as float type #874

Are you sure you want to change the base?

fix: 🐛 Floats ending in .0 now being parsed as float type #874

Conversation

ncarvajalc commented Oct 12, 2022

Description of changes

Changes

Issue #, if available

Description of how you validated changes

Checklist