Wim Onderbeke

Adding metadata to Pydantic fields

Adding metadata to Pydantic fields

  ·   3 min read

Recently at Qargo, we implemented functionality to convert Pydantic models to XML. This led us to xsdata and its extension xsdata-pydantic. While exploring its source code, I discovered an elegant pattern for storing additional metadata about XML structure (like whether a field is an element or attribute) directly on Pydantic fields.

This pattern has applications beyond XML serialization—you might use it for database column metadata, API documentation, custom validation rules, or any scenario where you need to attach extra information to your model fields. In this post, I’ll share two approaches for implementing this pattern: extending FieldInfo and using Python’s type annotations. The code examples assume Pydantic version 2.10.

Approach 1: Extending FieldInfo

The first approach, used by xsdata-pydantic, involves creating a custom FieldInfo class. This follows traditional object-oriented design by extending Pydantic’s FieldInfo with your custom attributes. Here’s how it works:

from typing import Any
from pydantic import fields

class XMLFieldInfo(fields.FieldInfo):
    xml_metadata: dict[str, Any] | None

    __slots__ = ("xml_metadata",)

    def __init__(self, metadata: dict[str, Any] | None = None, **kwargs: Any):
        super().__init__(**kwargs)
        self.xml_metadata = metadata
        self.json_schema_extra = {
            "xml_metadata": metadata,
        }

def XMLField(
    metadata: dict[str, Any] | None = None,
    **kwargs: Any,
) -> Any:
    return XMLFieldInfo(
        metadata=metadata,
        **kwargs,
    )

Notice the use of __slots__ to maintain memory efficiency, preventing the creation of a __dict__ for each field instance. The separate XMLField function serves as a factory, following Pydantic’s pattern and keeping the typing system happy.

You can then use XMLField just like Pydantic’s built-in Field:

from pydantic import BaseModel

class Person(BaseModel):
    first_name: str = XMLField(
        metadata={
            "type": "Element",
            "required": True,
            "name": "first_name",
        },
        alias="first_name",
    )

Accessing the metadata is straightforward through the model’s field information (by making use of the built-in model_fields):

first_name_field = Person.model_fields["first_name"]
print(first_name_field.xml_metadata)
# Output: {'type': 'Element', 'required': True, 'name': 'first_name'}

The Trade-off

While this approach is clean and follows familiar OO patterns, it comes with a significant drawback: you lose all typing and auto-completion support. Your IDE won’t know about the custom parameters you’ve added, making it easier to introduce bugs. This is especially problematic when upgrading Pydantic versions where the internal API might change.

Approach 2: Using Type Annotations

The second approach leverages Python’s Annotated type, introduced in Python 3.9 (available via typing_extensions for earlier versions). This system allows you to attach metadata directly to type hints:

from dataclasses import dataclass
from typing import Annotated
from pydantic import BaseModel

@dataclass
class XMLMetadata:
    type: str
    required: bool
    name: str

class Person(BaseModel):
    first_name: Annotated[
        str,
        XMLMetadata(
            type="Element",
            required=True,
            name="first_name",
        ),
    ]

Using a dataclass for metadata provides structure and type safety. Pydantic automatically stores any annotations it doesn’t recognize as metadata on the field.

Retrieving the metadata requires iterating through the field’s metadata items:

first_name_field = Person.model_fields["first_name"]
xml_metadata = next(
    metadata_item 
    for metadata_item in first_name_field.metadata 
    if isinstance(metadata_item, XMLMetadata)
)
print(xml_metadata)
# Output: XMLMetadata(type='Element', required=True, name='first_name')

Composing with Other Annotations

One powerful aspect of this approach is how well it composes with Pydantic’s existing annotation system:

from pydantic import AfterValidator, Field

def validate_xml_name(value: str) -> str:
    if not value.isidentifier():
        raise ValueError("Invalid XML element name")
    return value

class Document(BaseModel):
    root_element: Annotated[
        str,
        Field(min_length=1),
        AfterValidator(validate_xml_name),
        XMLMetadata(
            type="Element",
            required=True,
            name="root",
        ),
    ]

This co-location of all field-related information makes the schema immediately understandable.

Conclusion

Custom metadata in Pydantic fields opens up powerful patterns for building domain-specific frameworks. While the FieldInfo extension approach offers a familiar object-oriented pattern, the Annotated approach provides better developer experience with full typing support and cleaner composition.

The key insight from exploring xsdata-pydantic is that Pydantic’s flexibility allows us to extend models with domain-specific information without compromising the core validation functionality. Whether you’re building XML serializers, ORM mappings, or API documentation generators, these patterns provide a solid foundation for metadata-driven development.