This post is a bit unusual and is a story about debugging. We explore how the Go YAML library (gopkg.in/yaml.v3) works internally by trying to figure out why unmarshaling and marshaling back a CloudFormation (CFN) template in Go doesn’t work by default.

func main() {
	var cf interface{}
	yaml.Unmarshal([]byte(`
myMadeUpResource:
  Properties:
    splitTest: !Split [ "|" , "a||c|" ]
`), &cf)
	out, _ := yaml.Marshal(&cf)
	fmt.Println(string(out))
	// Prints:
	// myMadeUpResource:
	//   Properties:
	//       splitTest:
	//         - '|'
	//         - a||c|
}

If you have a keen eye, you might notice that the intrinsic CFN function, !Split, is dropped from the output!

The library parses the YAML document into a tree of yaml.Node objects. Nodes have a few interesting fields:

  • The Kind represents the high-level type of a node. For example, a scalar, a map, or a sequence.
  • The Tag is additional information about the data type of the node that’s mostly useful when the Kind is a scalar. For example, !!str, !!int, !!seq, or !!map.
  • The Value is only populated if a node is a scalar otherwise its empty.
  • Finally, Content holds the child nodes and is only populated if the Kind is a map or sequence.

To visualize this information, let’s see how splitTest is rendered in the tree:

(Kind=mapping, Tag=!!map, Value="")
├── (Kind=scalar,   Tag=!!str,  Value="splitTest", Content=nil)
└── (Kind=sequence, Tag=!Split, Value="") # 2 nodes in Content below.
    ├── (Kind=scalar, Tag="!!str", Value="|",     Content=nil)
    └── (Kind=scalar, Tag="!!str", Value="a||c|", Content=nil) 

From the tree, we have two observations:

  1. Named sequences and maps are represented as a pair of nodes. The first node is the name of the sequence and a scalar, splitTest, and the second node holds the elements of the sequence.
  2. !Split shows up under the node’s Tag field. Normally, nodes with a “sequence” Kind have the !!seq tag. However, “!” is a special character, and thanks to it !Split gets parsed as local tag [1] instead.

Unfortunately, since we marshaled the template into an empty interface we lose all this rich information.
From decode.go#L722-725, the sequence ["|", "a||c|"] gets decoded to a []interface{}.

case reflect.Interface:
	// No type hints. Will have to use a generic sequence.
	iface = out
	out = settableValueOf(make([]interface{}, l))

And then the slice gets assigned to a string map map[string]interface{}, where the key is the string "splitTest" and value is the slice decode.go#L801-820.

if d.unmarshal(n.Content[i+1], e) {
	// k is "splitTest" and is unmarshaled from n.Content[i]
	// e is the []interface{} described above
	out.SetMapIndex(k, e) 
}

This is why when we marshal back cf into a YAML document, the intrinsic function is ignored! The map lost the local tag information about !Split. Mystery solved □.

So what now?

Luckily, gopkg.in/yaml.v3 is pretty awesome and we can unmarshal a YAML document directly to a yaml.Node.

func main() {
	var cf yaml.Node
	yaml.Unmarshal([]byte(`
myMadeUpResource:
  Properties:
    splitTest: !Split [ "|" , "a||c|" ]
`), &cf)
	out, _ := yaml.Marshal(&cf)
	fmt.Println(string(out))
	// Prints:
	// myMadeUpResource:
	//   Properties:
	//     splitTest: !Split ["|", "a||c|"]
}

This marshals the output as we want 🙌! This is because when we unmarshal to a yaml.Node no information is lost unlike the empty interface. Afterwards, when it marshals a sequence node, if the Tag is not the default !!seq (encode.go#L451-L456), it writes it to the output (encode.go#L477):

...
case SequenceNode:
	// rtag is set to "!!seq"
	rtag = seqTag 
}
if rtag == stag { 
	// stag is "!Split" and not equal to "!!set" 
	// so we keep the tag as "!Split" instead.
	tag = ""
}
...
// tag is not empty, so write it.
e.must(yaml_sequence_start_event_initialize(&e.event, []byte(node.Anchor), []byte(tag), tag == "", style))

Takeaways

While trying to figure out what happens when you (un)marshal a CFN template, I found reading the code and visualizing the data helped me form hypotheses and the debugger test them.

  1. Read the code.
    Glancing at yaml.Unmarshal, I learned about the yaml.Node data structure and the d.unmarshal method that holds the main decoding logic.
  2. Visualize the data.
    After printing the AST, I observed that named sequences and maps seemed to be rendered as a pair of nodes. So I searched in the codebase for "+1", that led me to confirm my hypothesis and understand how maps are unmarshaled.
  3. Use the debugger.
    The debugger is extremely helpful to figure out which code path is taken during execution and what the variables in the scope hold.

[1] https://camel.readthedocs.io/en/latest/yamlref.html#tags