Optimize Python C extension readStruct for nested structs

In this area of code, there already exists an optimization where we skip calling `klass(**kwargs)` on the top-level struct and instead init an empty object then PyObject_SetAttr the properties onto it.

This was done because Python's keyword
argument matching is O(n) string comparisons per argument, making this expensive for structs with many fields.

However, nested structs which are decoded within the C extension did not get this same optimization, causing them to go through the slow class initialization path.

In this change, for mutable nested structs, we create the instance up front with a no-arg constructor (klass()) and set decoded fields directly via PyObject_SetAttr (the same way we fast-path the top-level struct).

Immutable structs (TFrozenBase subclasses) continue to use the kwargs
path since their generated __setattr__ blocks attribute mutation.

Immutability is detected via PyObject_IsSubclass against TFrozenBase,
with the class reference cached in a function-local static.

Benchmarks show up to 3.6x speedup for deeply nested struct hierarchies, with no impact on flat structs. I had Claude generate me some benchmarks to showcase performance in different nested scenarios.

Note the test change. I asked about potentially changing this to look more like what the codegen produces here https://github.com/apache/thrift/pull/3349/changes#r3026288637. Currently, this fails because its not marked as frozen but has a erroring setattr
diff --git a/lib/py/src/ext/protocol.tcc b/lib/py/src/ext/protocol.tcc
index b517c38..123ca69 100644
--- a/lib/py/src/ext/protocol.tcc
+++ b/lib/py/src/ext/protocol.tcc
@@ -874,17 +874,46 @@
 template <typename Impl>
 PyObject* ProtocolBase<Impl>::readStruct(PyObject* output, PyObject* klass, PyObject* spec_seq) {
   int spec_seq_len = PyTuple_Size(spec_seq);
-  bool immutable = output == Py_None;
+  bool immutable = false;
   ScopedPyObject kwargs;
+  ScopedPyObject created_output;
   if (spec_seq_len == -1) {
     return nullptr;
   }
 
-  if (immutable) {
-    kwargs.reset(PyDict_New());
-    if (!kwargs) {
-      PyErr_SetString(PyExc_TypeError, "failed to prepare kwargument storage");
-      return nullptr;
+  if (output == Py_None) {
+    static PyObject* TBaseModule = nullptr;
+    static PyObject* TFrozenBase = nullptr;
+    if (!TFrozenBase) {
+      if (!TBaseModule) {
+        TBaseModule = PyImport_ImportModule("thrift.protocol.TBase");
+      }
+      if (!TBaseModule) {
+        return nullptr;
+      }
+      TFrozenBase = PyObject_GetAttrString(TBaseModule, "TFrozenBase");
+      if (!TFrozenBase) {
+        return nullptr;
+      }
+    }
+    // Immutable structs are produced by two codegen paths:
+    //   1. "frozen2" mode: classes inherit from TFrozenBase
+    //   2. "python.immutable" annotation: classes get a __setattr__ that raises TypeError
+    immutable = PyObject_IsSubclass(klass, TFrozenBase)
+        || reinterpret_cast<PyTypeObject*>(klass)->tp_setattro != PyObject_GenericSetAttr;
+
+    if (immutable) {
+      kwargs.reset(PyDict_New());
+      if (!kwargs) {
+        PyErr_SetString(PyExc_TypeError, "failed to prepare kwargument storage");
+        return nullptr;
+      }
+    } else {
+      created_output.reset(PyObject_CallObject(klass, nullptr));
+      if (!created_output) {
+        return nullptr;
+      }
+      output = created_output.get();
     }
   }