Encoding, Validation, and Why the Order Matters — ASVS 5.0 V1 & V2

OWASP ASVS 5.0 splits two concepts that are often lumped together under "input handling" into separate chapters: V1 covers encoding, escaping, and injection prevention; V2 covers validation, sanitization, and business logic. The separation is deliberate. These mechanisms operate at different points in the request lifecycle and serve different purposes — conflating them leads to gaps where both are applied half-heartedly and neither is applied correctly.

V1 and V2: What the Split Actually Means

V1 is about how you safely use data when passing it to an interpreter — a browser renderer, a SQL engine, an OS shell. The question it answers is: given that this data will reach an interpreter, how do you ensure it cannot alter the structure of the command or document it's embedded in?

V2 is about whether data makes sense for your application in the first place. Does this field contain a valid email address? Does this order quantity stay within bounds? Does this checkout flow follow the expected sequence of steps?

The ASVS control objective for V1 states this directly: "Input validation serves as a defense-in-depth mechanism to protect against unexpected or dangerous content. However, since its primary purpose is to ensure that incoming content matches functional and business expectations, requirements related to this can be found in the Validation and Business Logic chapter."

In plain terms: validation reduces attack surface and enforces business rules. Encoding prevents injection. You need both, and mixing up which one does what is how you end up with applications that validate inputs but still have SQL injection — or that encode outputs but accept business logic that an attacker can bend to their advantage.

V1: Context-Aware Output Encoding

The core concept in V1 is that output encoding must match the interpreter that will consume the data. A single "sanitize everything" function applied uniformly is insufficient because different interpreters treat special characters differently.

What "context" means

Consider the string O'Brien. In an HTML text node, the apostrophe is harmless. In an HTML attribute delimited by single quotes, it breaks out of the attribute value. In a SQL query built by string concatenation, it is a syntax character that can terminate a string literal. In a JavaScript string literal, it has yet another interpretation. The same character, four different risk profiles depending on context.

ASVS 1.2.1 requires that output encoding for an HTTP response, HTML document, or XML document be relevant to the specific context: HTML element content, HTML attribute values, CSS, or HTTP headers each have different encoding requirements. 1.2.2 extends this to URLs — query and path parameters require percent-encoding or base64url encoding, and the requirement explicitly prohibits dynamically constructed URLs that allow javascript: or data: protocols. 1.2.3 covers JavaScript and JSON output, where the characters that matter — forward slashes, angle brackets, Unicode line terminators — differ from HTML contexts.

Encoding order matters

V1.1 establishes an architectural constraint that is easy to overlook: decoding must happen exactly once and must happen before any validation or sanitization step. Encoding output must happen as the final step before the data reaches the interpreter, not earlier.

The problem with early encoding is double-encoding. If you HTML-encode user input at ingress, store <script> in the database, then re-encode it at output, you get &lt;script&gt; — malformed content that breaks rendering. Worse, early encoding can be defeated: if you encode then validate, an attacker can encode their payload at the character level to bypass the validation, which then decodes it before sending it to the interpreter.

The rule: data at rest stays in its original, unencoded form. Encoding happens at the point of output, immediately before the interpreter sees it.

Injection Prevention: The V1.2 Requirements

ASVS groups injection prevention under encoding because the correct solution is almost always output encoding or parameterization — treating data as data rather than as part of an instruction structure.

SQL injection

Requirement 1.2.4: database queries must use parameterized queries, ORMs, or entity frameworks. This applies to SQL, HQL, NoSQL, Cypher, and stored procedures.

# Vulnerable — string concatenation lets user input alter query structure
query = f"SELECT * FROM users WHERE username = '{username}'"

# Correct — parameter is passed separately; driver handles escaping
cursor.execute("SELECT * FROM users WHERE username = ?", (username,))

The ASVS adds a note that is worth reading carefully: parameterized queries are not always sufficient. Table names, column names, and ORDER BY clauses cannot be parameterized in most database drivers. Including user-supplied data in these positions — even escaped — produces either a broken query or an injectable one. The correct approach for dynamic column or table names is an allowlist of permitted values validated server-side.

# User-controlled ORDER BY — cannot be parameterized
ALLOWED_SORT_COLUMNS = {"created_at", "price", "name"}
if sort_col not in ALLOWED_SORT_COLUMNS:
    sort_col = "created_at"
query = f"SELECT * FROM products ORDER BY {sort_col}"

OS command injection

Requirement 1.2.5 requires that OS calls use parameterized OS queries or contextual output encoding. The canonical fix is to avoid shell invocation entirely by passing arguments as a list rather than a shell string:

# Vulnerable — shell=True means the string is interpreted by /bin/sh
subprocess.run(f"convert {user_filename} output.png", shell=True)

# Correct — arguments passed as list; no shell interpolation
subprocess.run(["convert", user_filename, "output.png"])

If shell invocation is genuinely unavoidable, every argument that contains user-controlled content must be escaped for the shell context — not HTML-encoded, not URL-encoded, but shell-escaped.

LDAP and XPath injection

Requirements 1.2.6 and 1.2.7 extend the same principle to LDAP directories and XPath queries. LDAP injection works because LDAP filter syntax uses special characters (*, (, ), \, \x00) that can alter filter semantics if embedded unescaped. XPath is similarly vulnerable when user input is concatenated into XPath expressions.

The ASVS requires either parameterized queries or precompiled queries. For XPath, parameterized queries are supported by most XML libraries. For LDAP, the correct approach is escaping input per the LDAP encoding rules or using an LDAP library that handles this automatically.

Template injection

Requirement 1.3.7 addresses server-side template injection (SSTI), which is often overlooked because the mechanism is application-specific rather than database-specific. Many template engines — Jinja2, Twig, Freemarker, Pebble — execute expressions in the template context. If user input is incorporated into the template string itself (rather than passed as a variable to a static template), an attacker can escape the expression context and execute arbitrary code.

# Vulnerable — user input is part of the template source
template = env.from_string(f"Hello, {user_input}!")

# Correct — user input is passed as a variable to a static template
template = env.get_template("greeting.html")
template.render(name=user_input)

The ASVS requirement is direct: templates must not be constructed from untrusted input. Where that is unavoidable, any untrusted content included in template construction must be sanitized or strictly validated.

CSV and formula injection

Requirement 1.2.10 is less well-known but practically significant for any application that exports data to spreadsheet formats. When a cell value begins with =, +, -, @, a tab, or a null character, spreadsheet applications treat it as a formula. An attacker who can place a value like =HYPERLINK("http://attacker.com","Click me") into exported data can embed malicious formulas in files sent to other users.

The fix is to prefix any such field value with a single quote, per RFC 4180 — a presentation-layer concern that is easy to miss during development and difficult to retrofit later.

V1.3: Sanitization — When Encoding Isn't Enough

Encoding preserves the semantic meaning of the input while making it safe for the target context. Sanitization removes or transforms content — and sometimes changes what it means. The ASVS treats sanitization as a fallback when encoding is not feasible.

The canonical case is rich text from a WYSIWYG editor. When users submit formatted HTML, you cannot HTML-encode the entire input — that strips the intentional markup. Instead, parse the HTML and permit only the elements and attributes on an explicit allowlist, stripping everything else. Requirement 1.3.1 requires this to be done with a well-known, actively maintained sanitization library (DOMPurify client-side; equivalents like bleach server-side) rather than a bespoke regex.

SVG deserves special mention. Requirement 1.3.4 requires that user-supplied SVG be sanitized to allow only drawing elements and attributes, explicitly blocking <script> elements and <foreignObject> — which can embed arbitrary HTML and JavaScript inside an SVG file served from your origin.

Requirement 1.3.2 addresses eval() and dynamic code execution features: avoid them. Where avoidance is impossible, sanitize input before passing it. This applies equally to Spring Expression Language, Python's exec(), JavaScript's Function() constructor, and similar mechanisms in other runtimes.

V2: Input Validation and Allowlists

Where V1 is about safe use of data, V2 is about ensuring data meets expectations before it enters processing logic at all.

Allowlists over blocklists

Requirement 2.2.1 requires positive validation — allowlists of permitted values, patterns, and ranges — rather than blocklist approaches that try to identify and reject known-bad inputs.

An allowlist defines exactly what is acceptable. A blocklist attempts to enumerate all possible bad inputs — a set that is always incomplete. A blocklist for SQL injection that strips DROP, DELETE, and -- will miss D\nROP, unicode homoglyphs, and comment syntax variations. An allowlist that accepts only digits for an order quantity field doesn't need to consider any of these.

# Blocklist — fragile, incomplete by definition
if "DROP" in user_input.upper() or "--" in user_input:
    reject()

# Allowlist — accepts only what you expect
import re
if not re.fullmatch(r"\d{1,6}", user_input):
    reject()

Requirement 2.2.2 states a principle that warrants emphasis: client-side validation is a usability control, not a security control. Any validation JavaScript performs in the browser can be bypassed by an attacker making direct HTTP requests. All authoritative validation must happen at a trusted server-side layer.

Contextual consistency

Requirement 2.2.3 adds a dimension that single-field validation misses: combinations of fields must be checked for logical consistency. An address form that accepts a city and postal code independently is less correct than one that verifies the postal code actually belongs to the stated city. A date range where the end date precedes the start date should be rejected even if both dates individually pass format validation. This kind of cross-field validation is explicitly in scope for ASVS Level 2.

V2.3: Business Logic Security

Business logic vulnerabilities are distinct from injection and data validation issues. They don't exploit parser behavior or bypass input filters — they exploit the application's own rules by using legitimate functionality in unintended ways.

Step sequencing

Requirement 2.3.1 requires that multi-step business flows enforce sequential processing. Users must not be able to skip steps. The practical implication: server-side state must track where in a workflow a user legitimately is, and any attempt to jump to a later step without completing earlier ones must be rejected.

In a checkout flow where step 3 is payment and step 4 is order confirmation, an application that advances based on a client parameter rather than server-side state can be bypassed by submitting the step 4 request directly. The server must validate that step 3 completed before permitting step 4.

Price tampering and quantity manipulation

Price tampering illustrates how business logic vulnerabilities interact with insufficient server-side validation. Consider a shopping cart implementation that stores the item price in a hidden form field, or in a client-accessible JavaScript variable, and uses that price for order totaling. An attacker who can modify the request — through a proxy, browser developer tools, or a crafted direct request — can set any price they choose.

The correct design is to never trust the client for pricing. Prices are retrieved server-side from the product catalog at checkout time, using the product ID (which the client provides) but never the price (which only the server should know). The client can tell the server what to purchase; it should not be telling the server how much it costs.

Quantity manipulation follows the same pattern. A server that accepts a quantity of -1 and processes a refund, or accepts 1000000 and attempts to reserve inventory, has failed to enforce business logic limits. Requirement 2.3.2 requires that these limits — maximum order quantities, minimum transfer amounts, rate caps — be implemented according to documented specifications.

Atomicity

Requirement 2.3.3 requires that business logic operations succeed in their entirety or roll back to the previous consistent state. A funds transfer that debits one account before crediting another must not leave the system in a state where the debit occurred without the credit. Database transactions or equivalent compensating logic are the mechanism. Partial success states are exploitable: an attacker who can reliably trigger failure after the debit but before the credit has a mechanism for draining accounts.

The Interaction with Frontend (V3) and API (V4) Concerns

V1 and V2 define requirements that apply regardless of delivery mechanism, but some concerns are amplified at the frontend or API layer.

V3 reinforces V1's XSS requirements by requiring that text displayed in the browser use safe DOM APIs (createTextNode, textContent) rather than innerHTML. CSP provides an additional layer but does not replace output encoding — an XSS payload embedded via innerHTML executes before CSP script restrictions take effect.

V4 notes that input validation requirements apply equally to API endpoints and identifies schema validation (JSON Schema, XML Schema) as the most effective mechanism for full validation coverage of HTTP APIs — every incoming request checked against a formal definition before business logic sees it. DTD validation is explicitly prohibited due to XXE risk: a crafted DOCTYPE can cause the XML parser to fetch external URLs or read local files. ASVS 1.5.1 requires XML parsers to be configured with external entity resolution disabled.

Putting It Together

The reason ASVS separates V1 and V2 is that they answer different questions at different points in the request lifecycle:

Input arrives. Validate against business expectations (V2): format, type, range, logical consistency.
Input is processed. Apply business logic constraints (V2): is this operation permitted given documented rules?
Data reaches an interpreter. Apply context-appropriate encoding or parameterization (V1): SQL parameters, HTML encoding, shell argument lists.
Output is rendered. Where encoding is insufficient — rich HTML, SVG, user-controlled templates — sanitize via an appropriate library (V1.3).

Validation at step 1 reduces attack surface and enforces business correctness but does not make step 3 unnecessary: a legitimate email address can still contain characters that need HTML encoding when displayed in a browser. Encoding at step 3 prevents injection but does not enforce that business logic was respected — an encoded price manipulation is still a price manipulation.

Both mechanisms are necessary. Neither substitutes for the other.