HTML Hypertext Markup Language - the Fundamentals (1)

Anatomy of HTML

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
<!DOCTYPE html>
<html lang="en">
<head>
<!-- Metadata: character encoding, title, description -->
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Brief description of the page">

<!-- Title shown in the browser tab -->
<title>My Web Page</title>

<!-- Link external CSS stylesheet -->
<link rel="stylesheet" href="styles.css">

<!-- Optional internal CSS -->
<style>
body {
font-family: Arial, sans-serif;
}
</style>
</head>

<body>
<!-- Visible content of the page -->
<header>
<h1>Welcome to My Website</h1>
</header>

<nav>
<!-- Navigation links -->
<ul>
<li><a href="#">Home</a></li>
<li><a href="#">About</a></li>
</ul>
</nav>

<main>
<!-- Main content area -->
<section>
<h2>About This Page</h2>
<p>This is a sample paragraph in the main section.</p>
</section>
</main>

<footer>
<!-- Footer content -->
<p>&copy; 2025 My Website</p>
</footer>

<!-- Link external JavaScript file (best practice: before closing </body>) -->
<script src="script.js"></script>

<!-- Optional inline JavaScript -->
<script>
console.log('Page loaded');
</script>
</body>
</html>

  1. <!DOCTYPE html>: Doctype to be declared as HTML
  2. <html lang="">: Root element that wraps up the whole HTML page, for internationalize purpose we usually specify the lang(en, zh, fr, ja, ar, es).
  3. <head>: Contains everything else you want to include in the html file, including metadata, title, or internal or external CSS links. It does not include any components shown in the user interface!
    • <meta>: Contains metadata like character encoding(usually UTF-8), viewport settings and SEO data.
    • <title>: Browser tab and search engine results.
    • <link>: External CSS sheet or browser tab icons.
    • <style>:inline CSS
    • <script>: Reference to javascript. Using this directly at head will block html from rendering, therefore we only use it with defer(runs after dom ready) or async(parallel with html render) attribute for better performance.
  4. <body>: Contains everything that displays on the page. Usually includes<nav>, <main>, <header>, <footer>, <section>.
  5. <script>: External or inline JS, ideally put at bottom.

White space rule

In HTML element content, no matter how much white spaces we include, the HTML parser will only recognize the sequence of white spaces as a single space.

1
2
3
4
5
<p>White space rule</p>
<p>White
space
rule</p>
<!--These two paragraphs are rendered the same in your browser!-->

Tip: Accessing the innerHTML of elements from JavaScript will keep all the whitespace intact. This may return unexpected results if the whitespace is trimmed by the browser.


Character references

As in almost all programming languages, certain characters have special meanings and are usually treated as part of syntax instead of plain text. Like in python, when we want to print out "", we need to type print(\"\").

HTML also has reserved characters that are part of the markup, not normal text. If you want these characters to show up in the browser instead of being interpreted, you must use HTML character references:

Literal char Char ref
> &gt;
< &lt;
&quot;
&apos;
& &amp;

HTML Comments

Like in all languages, we can use ctrl + / to conveniently comment lines of codes in HTML.

In HTML the comment usually start with <--! and ends with -->

1
<!--This is a line of comment.-->

The Head of HTML

Metadata and <meta>

Metadata: don’t get confused by its fancy name, this is basically the same as data, however here we regard the HTML file to be the data that the metadata is describing(like the author, the date, the encoding standard).

Adding Metadata:

  • Adding character encoding:
    1
    <meta charset="utf-8" />

    utf-8 is a universal character set that includes pretty much any character from any human language. We usually add this for better performance and internationalization.

  • Adding more metadata:
    1
    2
    <meta name="author" content="Ian Zhang"/>
    <meta name="description" content="HTML fundamentals" />
    name=”?” Data contained Still used for SEO purpose?
    author The author Optional, not used for ranking
    description The description of the site used for SEO snippet
    keywords Keyword of the site Usually ignored by search engines
    generator specifies the what software or platform was used to create the page usually removed for security reasons
    revisit-after tells the search engine to crawl the page again after certain period of time usually ignored by search engine
    These tags are usually used for SEO purposes only and does not affect the file and its output directly.

Important Meta Tags

  • Viewport: This tag tells the browser how wide the page should be, relative to the screen, and how zooming and scaling should work

    1
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    • width= : can be set to specific pixels like width=600 (Only accept positive values from 1 to 10000), and can also be set to special value width=device-width.
      This value establishes the vm unit in CSS.
    • height= : same as width but controls only the vertical axis, the special value is height=device-height.
      This value establishes the vh unit in CSS.
    • initial-scale= : controls the zoom level when the page is first loaded. Ranging from 1 to 10 and the default is 1.
    • minimum-scale= / maximum-scale= : the minimum/maximum zoom level allowed, by default are 0.1 and 10.
    • user-scalable= : controls whether zooming is allowed on the page. Accepted values are 0, 1, no, yes.
    • interactive-widget= : Specifies the effect that interactive UI widgets, such as a virtual keyboard, have on the page’s viewports.
      Valid values: resizes-visual, resizes-content, or overlays-content. Default: resizes-visual.
  • Robot: Defines the crawl and indexing behavior that web crawlers should use with the page.

    1
    <meta name="robots" content="noindex" />

    content=:

    • index: default behaviour, allow robot to index the page.
    • noindex: restrict robot from indexing the page.
    • follow: also default behaviour, allow robot to follow links on the page.
    • nofollow: opposite of follow.
    • all: equivalent to index + follow, used by google.
    • none: equivalent to noindex + nofollow, used by google.

    This only affect cooperative robots but not malicious users. In addition, the robot still have to access the page to read the meta tag, so we can consider using a robot.txt file to reduce bandwidth.


<link> tag

  • Using <link> tag we can add custom icons for the site.
1
<link rel="icon" href="favicon.ico" type="image/x-icon" />
  • Linking style sheets.
1
<link rel="stylesheet" href="styles.css">

These two are the most common use of <link> tag, however the tag is also used for preload of resource and many other ways.


Applying JS using <script> tag

In the <head>, we usually use <script> tag to apply external js script.

The <script> tag usually have two attributes:

  • src=: contains the path to the script
  • defer: a boolean attribute that ensures the HTML are parsed before loading the script

There are some strategies for introducing a js script to our file only after the HTML elements are parsed:

  • Placing the <script> at the bottom of the <body> instead of the <head>, however depending on the internet connection and the length of the HTML file, this may cause some latency before the script is applied.
  • Use <script type="module">, so that the browser treats it as a module and will be executed after all HTML are parsed.
  • We can also use defer and async attribute for external scripts in head, or wrap the script in eventlistener, but unless we want to support very old browsers, using type="module" is sufficient.

The body of HTML

Emphasis

In HTML5, we use <em> to replace <i>, and <strong> to replace <b>, as we want to highlight the semantic meaning of the italic or bold text.

Here’s the best rule you can remember: It’s only appropriate to use <b>, <i>, or <u> to convey a meaning traditionally conveyed with bold, italics, or underline when there isn’t a more suitable element; and there usually is. Consider whether <strong>, <em>, <mark>, or <span> might be more appropriate.

Use the elements based on their semantics, not their appearence!


Void Elements

Some elements don’t have closed tag and are composed of only one tag:

  • <img>
  • <br>
  • <hr>
  • <input>
  • <link>
  • <href>
  • <meta>

Attributes

Typically, an attribute looks like this:

1
<label attr=""></label>

We can add identifiers using attribute, define the labels behaviour, or apply inline css styles:

  • class or id identifiers id="", class=""
  • define behaviour
    1
    2
    3
    <input type="text"> 
    <!-- this defines that the input is text-->
    <!-- you can also set the "type" attribute to "password", "email", etc.-->
  • inline css style=""

List

Ordered and unordered list

  • ordered list <ol>
  • unordered list <ul>

We always use <li> for list items:

1
2
3
4
5
6
7
8
9
<ul>
<li> This is an unordered list</li>
<li> This is an unordered list</li>
</ul>>

<ol>
<li>This is an ordered list</li>
<li>This is an ordered list</li>
</ol>
  • This is an unordered list
  • This is an unordered list
  1. This is an ordered list
  2. This is an ordered list

Nested list

To use nested list, we simply add new <ul> <ol> in the list item:

1
2
3
4
5
6
7
8
<ul>
<li>The first layer</li>
<li> The second layer
<ol>
<li>The second layer item</li>
</ol>
</li>
</ul>
  • The first layer
  • The second layer
    1. The second layer item

Description list

The purpose of description lists is to mark up a set of items and their associated descriptions, such as terms and definitions.

Description lists use a different wrapper than the other list types — <dl>; in addition each term is wrapped in a <dt> (description term) element, and each description is wrapped in a <dd> (description definition) element.

1
2
3
4
5
6
<dl>
<dt>The HTML</dt>
<dd>A language that serves as the main skeleton of a website</dd>
<dt>The CSS</dt>
<dd>Defines the styling of HTML skeleton</dd>
</dl>
The HTML
A language that serves as the main skeleton of a website
The CSS
Defines the styling of HTML skeleton

To put it plainly, <dt> is like the item of the first layer list, and <dd> is like the item of the second layer list, and <dl> is like the <ol> that wraps everything up.
Tip: there can be multiple description for one term, just like there can be multiple items in a nested list.