fix(spanner_dbapi): replace insecure pickle with json for partition deserialization#17014
fix(spanner_dbapi): replace insecure pickle with json for partition deserialization#17014sinhasubham wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the use of pickle with json for serializing and deserializing partition IDs to mitigate security risks associated with insecure deserialization. It introduces _serialize_value and _deserialize_value helper functions to handle specific types like bytes, datetime, and protobuf messages. Review feedback points out that MessageToDict defaults to camelCase, which could break compatibility with code expecting snake_case, and suggests using preserving_proto_field_name=True. Additionally, the reviewer noted that protobuf messages are currently deserialized as dictionaries rather than original message objects, which may lead to issues with nested field types.
| return { | ||
| "__type__": "protobuf", | ||
| "class": val.__class__.__name__, | ||
| "value": MessageToDict(val._pb), |
There was a problem hiding this comment.
MessageToDict converts Protobuf field names to camelCase by default. If the existing code expects snake_case (which is standard for Python Protobuf objects and would have been preserved by the previous pickle implementation), this will cause KeyError or AttributeError in downstream code. Using preserving_proto_field_name=True ensures compatibility with Pythonic naming conventions.
| "value": MessageToDict(val._pb), | |
| "value": MessageToDict(val._pb, preserving_proto_field_name=True), |
| return { | ||
| "__type__": "protobuf", | ||
| "class": val.__class__.__name__, | ||
| "value": MessageToDict(val), |
There was a problem hiding this comment.
MessageToDict converts Protobuf field names to camelCase by default. If the existing code expects snake_case (which is standard for Python Protobuf objects and would have been preserved by the previous pickle implementation), this will cause KeyError or AttributeError in downstream code. Using preserving_proto_field_name=True ensures compatibility with Pythonic naming conventions.
| "value": MessageToDict(val), | |
| "value": MessageToDict(val, preserving_proto_field_name=True), |
| elif t == "protobuf": | ||
| return _deserialize_value(val["value"]) |
There was a problem hiding this comment.
The class information stored during serialization is currently ignored, and Protobuf messages are deserialized as dictionaries. This is a breaking change from the previous pickle-based implementation which restored the original message objects. Furthermore, nested bytes or Timestamp fields within these messages will remain as strings (base64 or ISO format) because MessageToDict performs these conversions and they are not automatically reversed by the current _deserialize_value logic. Consider using google.protobuf.json_format.ParseDict if the original message types must be restored.
This PR resolves a critical Insecure Deserialization vulnerability (potential Remote Code Execution) in the
spanner_dbapimodule [b/510871112] . Previously, the module utilizedpickle.loads()to decode partition IDs provided by users via theRUN PARTITIONstatement, creating a direct vector for arbitrary code execution attack payloads.We have fully eliminated
pickleusage in this module and migrated to standardjsonserialization.