Aug 7, 2019 6 min read

VAST (Video Ad Serving Template) Explained

VAST (Video Ad Serving Template) Explained
Table of Contents

How do digital video ads work? Great question. Let's find out.

What is VAST?

VAST (video ad serving template) is a specification used by ad platforms and video players to deliver and playback digital video ads.

The flow of VAST is simple: publishers call an ad server or an SSP VAST tag and receive a VAST response.

VAST responses are XML documents. The XML document contains metadata players use to playback an ad and notify interested parties about an ad transaction.

Sound nerdy? We are just getting started, Champ.

You might be all worried that you do not know what XML is, but that’s ok! Just think about it as a language used to communicate data between computer programs. XML is a way to structure and deliver data.

How does VAST work?

A video ad player can request VAST from an ad server by calling a VAST tag. A VAST tag is a URL provided by an ad server that looks something like this:

Let’s break this tag down to understand what is happening here.

Domain / Server

If you were mailing a letter, you need an address. The domain name within a VAST tag, like the city and street name of your letter.

An ad player calls over the internet to a domain name, which then points to an ad server that will ultimately return a VAST response.

But much like a city and street name, the domain only provides a general location. Our example VAST tag points to the Ad Tech Explained ad server (

Partner / Publisher Identifier

Typically there will be a publisher or partner identifier within the tag. In this tag it is 1234. The partner identifier will tell an ad server exactly which publisher is making this VAST request. To continue our metaphor — if the domain points to the city and street, the ID points to the house.


Ad servers use parameters to collect information about the environment a potential advertisement will run on. Think of the parameters as the actual contents of our metaphorical letter. The ad server uses this information to select which ad it should return to this partner.

In our example, we are telling the ad server that the ad is displaying on the world-renowned website in a player that is 1920 pixels wide and 1080 pixels tall, and the user’s IP address is

These values are valuable pieces of metadata that allow an ad server to target a specific website, player size, or even a user location using IP geolocation.

Once an ad server selects an ad, it will return to the video player an XML VAST response.

Think of calling a URL in your web browser — when you go to a web page, you are requesting data from a server (actually many servers) that will return everything your browser needs to display a web page, including HTML, text, images, and more. An ad server sends data back in the raw markup language of XML, which a video player then uses to run an ad.

So essentially, the video player is like, “Yo, what up ad server. I need an ad, here’s what's up,” and the ad server is all like, “Hold on man, I got you, lemme find something. Ok, here you go, take this.”

The this is the XML VAST response.

VAST Response Walkthrough

Once the video player receives the XML, the video player reads metadata contained in the VAST document and extracts the values required to playback an ad.

The video platform also needs to notify the various ad platforms involved in the ad transaction. The XML document consists of items called elements and attributes that contain all the information a video player needs to play an ad and inform ad platforms about what is going on.

Check out this sample VAST response to understand each of these items and how they make up a final digital video advertisement.

There are many items within a VAST response, but we will focus on the most important. If you are interested in learning about all VAST fields and masochism is your thing, download the latest 4.2 specification put out by the IAB on their website.

Media Files / Creatives

For now, we will focus on possibly the most important value in the VAST response, the actual video file, or in industry parlance, the creative.

This video file is the ad itself and what the user views in the video player. If we comb through our VAST document, we will see a media file element, which sounds like what we are trying to find. Anything wrapped in less than or greater than characters is an element, so let’s look at the <MediaFiles> element:

    <MediaFile id="5241" delivery="progressive" type="video/mp4" bitrate="2000" width="1280" height="720" minBitrate="1500" maxBitrate="2500" scalable="1" maintainAspectRatio="1" codec="0"><![CDATA[]]>
    <MediaFile id="5244" delivery="progressive" type="video/mp4" bitrate="1000" width="854" height="480" minBitrate="700" maxBitrate="1500" scalable="1" maintainAspectRatio="1" codec="0"><![CDATA[]]>
    <MediaFile id="5246" delivery="progressive" type="video/mp4" bitrate="600" width="640" height="360" minBitrate="500" maxBitrate="700" scalable="1" maintainAspectRatio="1" codec="0"><![CDATA[]]>

A lot is going on here but let’s break it down.

An element starts with the element name (ex. <MediaFiles>) and ends with a closing tag in front of that name (ex. </MediaFiles>). So we can see that there are multiple <MediaFile> elements within the main <MediaFiles> element. Multiple media file element means that there are several different video files to choose from. Why is this?

The answer lies within the attributes of the various <MediaFile> elements. You will see that there are parameters and associated values wrapped within the <> symbols like width, height, bitrate, and more — including the MediaFile element name.

These parameters contain values that provide information about each creative file that a video player can read and use to select the perfectly suited video file for the environment the player intends to play it back in.

For example, if a user plays the ad on a mobile device, the player should select the smallest video file so that playback does not eat up their monthly data cap. However, if the user played the ad on a 4K connected TV in their living room, the video player should select the highest-quality file available.

The video player can look at the height, width, and bitrate to understand more about a file to select the right creative, which is why attributes are crucial — they provide extra information about a particular element.

If a user wants to watch the video in HD quality on their laptop, the video player should select the highest quality video available.

The next step is to look between <MediaFile> and </MediaFile> to find the URL that leads directly to the video file.

The value of the <MediaFile> element is a URL that points to an MP4 video file hosted on an external server.

If you copy this path and paste it into your web browser address bar, your browser will playback the video file in its native ad player, and you can view the ad, but no one will know. How sad.

So now that the player has the creative to playback, how does anyone know it was delivered? That’s when tracking events come in!

Tracking Events

    <Tracking event="start"></Tracking>
    <Tracking event="firstQuartile"></Tracking>
    <Tracking event="midpoint"></Tracking
    <Tracking event="thirdQuartile"></Tracking>
    <Tracking event="complete"></Tracking

Publisher video players can notify ad servers and other partners when a specific event has occurred. More specifically, there are events for when the ad starts, when it has reached 25% (firstQuartile), 50% (midpoint), 75% (thirdQuartile), and complete.

When a video player reaches any of these events during playback, the video player should call the URL provided in that specific tracking event. These URLs are often referred to as "tracking pixels" or "pixels" for short. Calling the URL sends a signal to an ad server or platform that the event has occurred, and they can now increment that metric for this specific ad by 1.

It is typical to see multiple tracking pixels for the same event from different ad tech vendors in a VAST response. Many parties may be involved in an ad transaction, and each one must count the events for a particular ad if they want to prove it ran and collect that cheddar.

And voila! You now have the building blocks to form a basic understanding of how VAST works.

To summarize, VAST is an XML template returned by an ad server or SSP when a video player calls a VAST tag. The video player then reads specific parts of the XML document in the response and extracts the metadata required to facilitate a video ad transaction. The video player plays back the creative and notifies the platforms involved in the transaction when specific events occur. Easy peezy!

Great! You’ve successfully signed up.
Welcome back! You've successfully signed in.
You've successfully subscribed to Ad Tech Explained.
Your link has expired.
Success! Check your email for magic link to sign-in.
Success! Your billing info has been updated.
Your billing was not updated.