Anatomy of an Amazon EC2 Resource ID

September 21st, 2009 |  Published in Analysis  |  49 Comments

New technique enables observation of EC2 usage, uncovers stunning data

Each time you allocate a resource using EC2 – an instance, a volume or a snapshot – you receive a unique identifier. This is the EC2 resource ID. Have you ever wondered what this ID represents? Well, I did. After noticing similarities between the IDs of resources requested in close succession, I started digging.

The outcome of this digging is a definition of the components that formulate an EC2 resource ID. The marvel is that this definition allows us to externally count the number of resources provisioned within a certain time frame – enabling us for the first time to observe EC2’s usage patterns. For example, we can count how many instances are launched on a certain day, in a given EC2 region.

Before continuing, I’d like to emphasize that these findings are circumstantial. While the patterns are indisputable, there remain unknowns and quirks that remind us that such “black box” observation has its limits. Note also that we can estimate how many new resources are created but not how many are already active, how many were later deleted, etc. The total number of servers running on EC2 remains a mystery.

Results

In one 24-hour period measured in September 2009, the estimation indicated the following volume of usage on Amazon EC2’s us-east-1 region:

50,242 instances requested

12,840 EBS volumes requested

30,925 EBS snapshots requested

41,121 reservations requested

Disambiguation: a reservation in this context is an atomic launch of one or more instances. This does not imply a reserved instance. For example, if you launch 1 instance, you get 1 instance ID and 1 reservation ID; if you launch 2 instances in one command, you get 2 instance IDs and still 1 reservation ID.

These numbers are impressive, to say the least. Even more impressive is a small hint, lurking between the numbers, that implies that just over the past month Amazon crossed a significant threshold (see below for more details):

8.4 million EC2 instances launched (since EC2’s debut).

UPDATE (Oct 7th 2009): RightScale applied the findings for the two years worth of data they have in their systems. Based on that data, they estimate the number of instances launched is actually 15.5 million! They also plotted the numbers over two years – worth checking out.

Anatomy of a Resource ID

So how were the numbers above calculated? To find out, let’s decompose an EC2 resource ID. After comparing hundreds of IDs, this opaque identifier turned out to be a little more transparent than you’d expect.

Type

The most trivial of the fields, the type is one of the following values, depending on the resource type:

i – instance

r – reservation

vol – EBS volume

snap – EBS snapshot

ami – Amazon machine image

aki – Amazon kernel image

ari – Amazon ramdisk image

Inner ID

The Inner ID is a 16-bit counter of resources allocated. Each time a resource is requested, the Inner ID increments by one. For instance and reservation IDs, it increments by two (i.e., these Inner IDs are always even). Instead of counting from 0-FFFF as you’d expect, the Inner ID uses the following cycle:

4000-7FFF

0000-3FFF

C000-FFFF

8000-BFFF

(This cycle can be easily normalized by XORing with 4000.) When the Inner ID has exhausted its space, a new series begins (see below) and the cycle restarts.

Series Marker

For a given resource type, there is one active 8-bit Series ID. This Series ID, however, is not embedded directly into the resource ID. Instead, it is XORed to the leftmost 8 bits of the Inner ID. The result, which I call the Series Marker, is embedded in the ID to the left of the Inner ID.

For example, on the resource ID above the Series ID would be e5 = a7 XOR 42.

Series IDs usually decrement by one each time the Inner ID completes a cycle. I say “usually” because while this is the most common behavior, from time to time Series IDs seem to jump around in a pattern which is yet to be explained.

UPDATE (Oct 7th 2009): RightScale contributed the missing piece: to normalize a series ID, XOR with E5 – this irons out the “jumps” I noticed perfectly.

Superseries Marker

For a given resource type, there is one active 8-bit Superseries ID. Like the Series ID, the Superseries ID is not embedded directly into the resource ID. Instead, it is XORed to the rightmost 8 bits of the Inner ID. The result – the Superseries Marker – is the leftmost byte of the resource ID.

For example, on the resource ID above the Superseries ID would be 69 = 31 XOR 58.

The Superseries ID changes so rarely that originally I had assumed it was some kind of checksum. This would have been odd as it limits the total available IDs to 224 = 16.8 million. Up to very recently, the Superseries ID for all resource types – instances, images, volumes, snapshots, etc. – was 69 (in the us-east-1 region (for eu-west-1 the Superseries ID is 74). These days, new instances use the Superseries ID 68. This subtle change, unnoticed by the industry, may hint at an astonishing achievement: 8.4 million instances launched since EC2’s debut! (Instance IDs are even so 8.4M = 16.8M / 2.)

UPDATE (Oct 7th 2009): RightScale suggested to normalize the Superseries ID by XORing with 69. In this technique, the superseries ID for us-east-1 was 0, and the recent change incremented it to 1.

Regions

Note that since each EC2 region is a completely separate system, the IDs in each region are independent of each other.

Counting Resources

Now that we have an idea of what an ID represents, how do we use that knowledge to estimate the number of resources provisioned by EC2 in a given time frame? The process is quite straightforward, and can be applied to time frames ranging from minutes up to weeks, months and years.

During the 24-hour period measured, one resource of each type was requested from EC2 every hour. In practice this means an instance was launched, an EBS volume was created and an EBS snapshot taken. The IDs that EC2 assigned to these resources were recorded, along with the time of their creation (as indicated in the timestamp returned from EC2 itself). Finally, the resource were released (instance terminated, volume and snapshot deleted) in order to minimize expenses. This process repeated every hour, which seems to be frequent enough so as not to miss any series rollovers.

The results – IDs and timestamps – were then analyzed using a combination of scripts and Excel spreadsheets. The Superseries, Series and Inner IDs were extracted from the resource IDs. Finally, the IDs were normalized and combined to yield a single number – a number that represents the continuum of resource IDs.

With this number, it’s plain sailing to measure or plot how many resources EC2 provisioned between any two samples.

Summary

The analysis, measurements and description above are based purely on observation. I cannot make any guarantees as to the accuracy of the technique. Even with confidence regarding the analysis of an ID, whether or not we can use that to infer overall usage is open to debate. In theory, Amazon could be allocating resources internally for various purposes. Is this performed on a scale large enough to throw the figures off course? Only time (and Amazon) will tell.

Final word: if you have any insights, corrections or additions to this research – please feel free to jump in the conversation or email me. I’ll be sure to give credit in updates or future posts.

Thanks to Eric Hammond, Nati Shalom and Avner Algom and Peter Weinstein of the IGT for reading drafts of this.

If you enjoyed the post, please share it:

Responses

Feed Trackback Address

Shlomo says:

September 21st, 2009 at 1:11 pm (#)

Guy,

Thanks for this fantastic analysis of these opaque IDs.

Can you share the original spreadsheet data so others can try to crack the remaining unknowns?

Guy Rosen says:

September 21st, 2009 at 2:41 pm (#)

@Shlomo,

Thanks for the complements. I will consider sharing my IDs, although perhaps it might be something that is better to send in private to any interested parties.

The significant gap remains the Series IDs – as I said, sometimes they jump instead of just decreasing by 1 as expected. In the day sampled for the chart this happened once, and I had to “correct” in order for the graph to line up.

In previous data sets I collected this also happened: the Series ID jumps up by a few numbers instead of just decrementing.

BTW, if you have various logs with IDs and timestamps from your own activity, it would be great to receive additional confirmation of the findings.

Tweets that mention Anatomy of an Amazon EC2 Resource ID :: Jack of all Clouds — Topsy.com says:

September 21st, 2009 at 3:17 pm (#)

[…] This post was mentioned on Twitter by garnaat, Guy Rosen, Carl Brooks, Shlomo Swidlerand others. Shlomo Swidler said: RT @guyro: How to measure @AmazonAWS adoption by analyzing #EC2 ID numbers – http://bit.ly/4zHHAk […]

mike says:

September 21st, 2009 at 4:11 pm (#)

where exactly are you obtaining the id’s from?

Guy Rosen says:

September 21st, 2009 at 4:12 pm (#)

@mike – I simply provision instances/volumes/snapshots from my own EC2 account.

mike says:

September 21st, 2009 at 4:12 pm (#)

disregard my previous comment – its early and i missed that part ;/

mike says:

September 21st, 2009 at 4:13 pm (#)

great insight btw – thanks!

CloudBzz says:

September 21st, 2009 at 5:43 pm (#)

Guy – this is amazing analysis. Have you been able to uncover the instance ID pattern in other clouds, or just Amazon?

Dmitriy says:

September 21st, 2009 at 5:56 pm (#)

Phenomenal depth – thank you very much for this research.

I have been casually observing IDs and also noticed their tendency to be even. But lately (say within a month) I have been noticing more odd numbers.

It also looked to me like EU region started having more odd IDs sooner than US – which makes me thing a better randomization of IDs was in works for some time and was deployed to EU first.

Just FYI.

caleb says:

September 21st, 2009 at 6:13 pm (#)

I agree with most of what you say, but I have a couple of long running instances that have an odd INNER id.

i-284f3a41

for example

99% of my instances however, are even.

Guy Rosen says:

September 21st, 2009 at 6:51 pm (#)

@CloudBzz – have not ventured this deep into other cloud providers YET 🙂

dubek says:

September 21st, 2009 at 7:20 pm (#)

I guess it’s another case of the German Tank Problem:

http://en.wikipedia.org/wiki/German_tank_problem

Very nice work!

Guy Rosen says:

September 21st, 2009 at 8:02 pm (#)

@dubek – wow, I learned something today! Luckily it’s easier for us: our “tanks” are imprinted with dates of manufacture!

Serviço de Cloud Computing da Amazon em larga expansão | EstratégiasIT says:

September 22nd, 2009 at 10:31 am (#)

[…] EC2, o serviço de Cloud Computing da Amazon, encontra-se em larga expansão. Há um estudo que estima o lançamento de 50.000 novas instâncias por dia. O consultor tecnológico Guy Rosen […]

Four short links: 22 September 2009 | Tech-monkey.info Blogs says:

September 22nd, 2009 at 12:22 pm (#)

[…] EC2 Usage Guessed From Sequential IDs — The Superseries ID changes so rarely that originally I had assumed it was some kind of checksum. This would have been odd as it limits the total available IDs to 224 = 16.8 million. Up to very recently, the Superseries ID for all resource types – instances, images, volumes, snapshots, etc. – was 69 (in the us-east-1 region (for eu-west-1 the Superseries ID is 74). These days, new instances use the Superseries ID 68. This subtle change, unnoticed by the industry, may hint at an astonishing achievement: 8.4 million instances launched since EC2’s debut! (Instance IDs are even so 8.4M = 16.8M / 2.) (via mattb on delicious) The real-time baby (what this means for media absorption) Microsoft’s Apple employee theft shocker! […]

Corey Leong (coreyleong) ‘s status on Tuesday, 22-Sep-09 12:07:06 UTC – Identi.ca says:

September 22nd, 2009 at 2:07 pm (#)

[…] helpful article on ec2’s resource id’s http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/ […]

Four short links: 22 September 2009 | Biginfo says:

September 23rd, 2009 at 12:19 am (#)

[…] EC2 Usage Guessed From Sequential IDs — The Superseries ID changes so rarely that originally I had assumed it was some kind of checksum. This would have been odd as it limits the total available IDs to 224 = 16.8 million. Up to very recently, the Superseries ID for all resource types – instances, images, volumes, snapshots, etc. – was 69 (in the us-east-1 region (for eu-west-1 the Superseries ID is 74). These days, new instances use the Superseries ID 68. This subtle change, unnoticed by the industry, may hint at an astonishing achievement: 8.4 million instances launched since EC2’s debut! (Instance IDs are even so 8.4M = 16.8M / 2.) (via mattb on delicious) « RSS never blocks you or goes down: why social networks need to be decentralized Papercraft Wall-E checks email, dances a jig when new message arrives » […]

tech: Four short links: 22 September 2009 | tech3bite says:

September 23rd, 2009 at 7:13 am (#)

[…] EC2 Usage Guessed From Sequential IDs — The Superseries ID changes so rarely that originally I had assumed it was some kind of checksum. This would have been odd as it limits the total available IDs to 224 = 16.8 million. Up to very recently, the Superseries ID for all resource types – instances, images, volumes, snapshots, etc. – was 69 (in the us-east-1 region (for eu-west-1 the Superseries ID is 74). These days, new instances use the Superseries ID 68. This subtle change, unnoticed by the industry, may hint at an astonishing achievement: 8.4 million instances launched since EC2’s debut! (Instance IDs are even so 8.4M = 16.8M / 2.) (via mattb on delicious) […]

Twitted by wespennest says:

September 23rd, 2009 at 11:40 am (#)

[…] This post was Twitted by wespennest […]

Amazon startet 50.000 neue Instanzen an einem Tag, 8.4 Millionen insgesamt | Server in den Wolken says:

September 23rd, 2009 at 12:11 pm (#)

[…] Anatomy of an Amazon EC2 Resource ID […]

Philipp says:

September 23rd, 2009 at 8:44 pm (#)

Thank you for this great article. I linked to it in my blog, but since it is in german only I thought it might be nice to let you know through a comment.

I found your blog through this article and will definitly keep it in my feedreader. Keep up the good work.

Greetings from Germany.

Philipp

egrep-cloud-cambrian-watch-2009-09-24 « すでにそこにある雲 says:

September 24th, 2009 at 12:46 pm (#)

[…] Anatomy of an Amazon EC2 Resource ID :: Jack of all Clouds – 一日で5万インスタンス生成ですか。すげ […]

Erik Giberti says:

September 27th, 2009 at 3:50 am (#)

Guy,

Thanks for posting this. Although I’m not positive this is entirely accurate. I recently ran into an issue using Elastic Load Balancing that would make me thing otherwise.

I was leveraging AutoScaling behind an ELB, but wasn’t letting AutoScaling put the machine in automatically (probably a mistake). As such – if a machine was terminated abruptly, either by an administrator or AutoScaling and the shutdown sequence failed to run the de-register code I would end up with instances in the ELB that were no longer assigned.

Usually within a day or two a new instance will fire up with the same instance id. The ELB then saw this instance as active and so started sending requests to the instance that was no longer owned by me. I find it hard to believe the entire ID space and the entire series marker had cycled through in less than 48 hours. I think there may be more to this than you’ve uncovered thus far.

Erik

Amazon EC2 Shows Amazing Growth | CloudAve says:

September 27th, 2009 at 10:28 am (#)

[…] the state of the cloud, has done some research on the resource identifier used by Amazon EC2 and come up with some interesting stats. I thought I will add it here at Cloud Ave for the benefit of our readers.During one 24 hour period […]

Chase says:

September 28th, 2009 at 3:48 pm (#)

hey Guy,

uve been quoted here:

http://bigdatamatters.com/bigdatamatters/2009/09/private-cloud-eucalyptus.html

btw, its a nice blog post

vipin sahu says:

September 30th, 2009 at 2:45 pm (#)

nice analysis thanks for the info