What Are HTML Entities?
HTML entities are special sequences of characters used to represent reserved characters and symbols in HTML documents. Because characters like <, >, and & have special meaning in HTML markup (they define tags and entity references), they cannot appear literally in page content without causing parsing errors. HTML entities provide a safe escape mechanism: instead of writing the literal character, you write a symbolic code that the browser renders as the intended glyph.
Every HTML entity begins with an ampersand (&) and ends with a semicolon (;). Between these delimiters, the entity is identified either by a human-readable name (a named entity) or by the character’s Unicode code point expressed in decimal (numeric entity) or hexadecimal (hex entity).
Why Character Escaping Matters
Proper character escaping is a fundamental requirement for any web application. Without it, user-supplied content that contains angle brackets or ampersands can break the page layout or, far worse, introduce security vulnerabilities. When a browser encounters an unescaped <script> tag inside user content, it executes the enclosed JavaScript — this is the basis of Cross-Site Scripting (XSS), one of the most common and dangerous web vulnerabilities.
By encoding all untrusted output with HTML entities, developers ensure that the browser treats the content as text rather than markup. Every modern template engine and framework (React, Angular, Vue, Jinja2, ERB) performs this escaping by default, but understanding how it works under the hood is essential for security audits, debugging rendering issues, and working with raw HTML strings.
Named vs Numeric vs Hex Entities
HTML supports three syntactic forms for entities, each with its own trade-offs:
Named Entities
Named entities use a mnemonic alias defined by the HTML specification. Examples include & for the ampersand, < for the less-than sign, © for the copyright symbol, and — for the em dash. Named entities are the most readable and maintainable form, but the HTML spec defines a limited (though large) set of names. If a character does not have a named entity, you must fall back to a numeric or hex reference.
Numeric (Decimal) Entities
Numeric entities reference a character by its Unicode code point in decimal notation: & represents U+0026 (the ampersand), © represents U+00A9 (copyright symbol). Numeric entities can represent any Unicode character, including those without named aliases. They are slightly less readable than named entities but more portable and universally supported.
Hexadecimal Entities
Hex entities use the &#x prefix followed by the code point in hexadecimal: & for the ampersand, © for the copyright symbol. Developers who work frequently with Unicode code charts often prefer hex notation because Unicode tables are organized by hexadecimal code points. Like numeric entities, hex entities can represent any Unicode character.
Common HTML Entities Reference
The five characters that must always be escaped in HTML content are:
&(ampersand) →&or&<(less-than) →<or<>(greater-than) →>or>"(double quote) →"or"'(single quote) →'or'
Beyond these five, common entities include (non-breaking space), © (copyright), ® (registered trademark), ™ (trademark), — (em dash), – (en dash), … (ellipsis), and € (euro sign). The full list defined by the HTML5 specification contains over 2,200 named character references.
XSS Prevention and HTML Escaping
Cross-Site Scripting (XSS) occurs when an attacker injects malicious scripts into web pages viewed by other users. The most common vector is storing or reflecting user input that is not properly escaped. For example, if a comment field allows a user to submit the text <script>alert('XSS')</script> and the application renders it without escaping, every visitor’s browser will execute the script.
The primary defense is output encoding: before inserting any user-controlled value into an HTML context, replace all special characters with their entity equivalents. This ensures the browser interprets the value as display text, not as executable code. Modern frameworks do this automatically in their template systems, but developers must remain vigilant when using raw HTML insertion APIs such as innerHTML or equivalent framework-specific methods that bypass automatic escaping.
HTML Entities in Email
HTML email clients have notoriously inconsistent rendering engines. Some older email clients do not support the full range of named entities, which is why best practice for HTML email is to use numeric or hex entities for special characters. The &, <, >, and " named entities are universally supported, but less common ones like — may not render correctly in all clients. When in doubt, prefer the numeric form (—) for maximum compatibility.
Character Encoding: UTF-8 and ASCII
In the early days of the web, many pages used ASCII or ISO-8859-1 encoding, which could only represent 128 or 256 characters respectively. HTML entities were essential for displaying characters outside these limited sets. With the widespread adoption of UTF-8 (which can encode all 1,114,112 Unicode code points), it is now possible to include most characters directly in HTML source without entities.
However, entities remain necessary for the five reserved HTML characters and are still preferred in several situations: when the source file’s encoding is uncertain, when working with legacy systems, when a character is difficult to type on a standard keyboard, or when clarity in the source code is more important than brevity (e.g., using © makes the intent clearer than pasting the literal © glyph).
Special Characters in HTML Forms
When users submit form data containing special characters, the browser URL-encodes the values (for GET requests) or includes them in the request body (for POST requests). The server must decode these values and then re-encode them as HTML entities when rendering them back into the page. Failing to do so is one of the most common sources of XSS vulnerabilities. Always remember: input validation and output encoding are complementary defenses, and output encoding is the more critical of the two.
HTML Entities vs URL Encoding
HTML entity encoding and URL encoding (percent-encoding) serve different purposes and should not be confused:
- HTML entities (
&,<) are used inside HTML documents to represent reserved characters in the HTML context. - URL encoding (
%26,%3C) is used inside URLs to represent characters that have special meaning in the URL syntax (such as&,=,?,/).
When an HTML page contains a URL with query parameters, the ampersands in the URL must be HTML-entity-encoded as & in the href attribute. For example: <a href="/search?q=hello&page=2">. Omitting this encoding can cause the HTML parser to misinterpret &page as an entity reference. URL parameters themselves should use encodeURIComponent for proper percent-encoding.
Frequently Asked Questions
What happens if I don’t escape HTML characters?
Unescaped characters can break your page layout (e.g., a stray < starts a new tag) and open the door to XSS attacks. At minimum, always escape the five reserved characters: &, <, >, ", and '.
Should I encode all non-ASCII characters?
If your page uses UTF-8 encoding (which is the default for modern browsers and HTML5), you do not need to encode non-ASCII characters like accented letters, CJK characters, or emoji. They will display correctly as long as the <meta charset="utf-8"> tag is present. However, encoding them can improve compatibility with legacy systems and email clients.
Is ' valid in HTML?
' is defined in XML and XHTML but was not part of the HTML4 specification. It is valid in HTML5 and supported by all modern browsers. For maximum backward compatibility, you can use the numeric form ' instead.
Can HTML entities be nested?
No. HTML entities cannot be nested. An entity reference like &lt; does not produce a less-than sign — it produces the literal text <. The browser decodes entities in a single pass, so each entity resolves independently.
Is my data safe when using this tool?
Yes. This tool runs entirely in your browser using JavaScript. No data is sent to any server. You can verify this by opening your browser’s developer tools and monitoring the Network tab while using the tool — you will see zero outgoing requests containing your input.