Hi! I’m Dan, and I run Draftsim, a website that helps people practice by drafting in a simulated environment against seven “bots” programmed with artificial intelligence. Along with some key collaborators, I’ve been working on a big project this year that I’m excited to finally share with you.
Have you ever wondered what a draft format looks like in one picture? Or what makes a format really “synergy dependent”? You’re in luck!
This past year we’ve been collecting anonymous data from the thousands of drafts that occur each day on Draftsim. It’s finally time to take a look at what we can learn from all that sweet, sweet data.
I owe all the credit for this project to Arseny Khakhalin (thank you Reddit for helping us connect!) and Bobby Mills from Bard College in New York. They helped me collect all the data on the site, do the analysis, and create these awesome visualizations.
How much data are we talking about? Well, a lot.
Here’s how many human drafts we analyzed for each format:
- M19 Part I – 108,000 drafts (July 2018)
- M19 Part II – 48,000 drafts (August 2018)
- DOM – 51,000 drafts (April 2018)
- RIX (including XLN) – 10,000 drafts (February 2018)
In all, that’s actually over 200,000 drafts!
We simply captured the format being drafted, the cards presented to the user in the pack, and the card the user picked given his or her current card pool.
For each draft, we looked at the final pile of cards a user ended up with, and dutifully marked which cards were drafted together in a table that reflected all possible “card pairs.” After going through thousands of drafts, we compared these co-occurrences of cards in final draft pools to numbers that you could expect to get by chance, as if all drafts were completely random. Obviously, some pairs of cards were drafted together way more often than you’d expect by random chance, while some cards were drafted together less often.
At this point we used a standard statistical technique called multidimensional scaling to visualize the allegiances of different cards in a nice plot. To do so, we imagined that each card is represented by a point, and distances between these points depend on how often they are drafted together. Cards with a synergy were assigned small distances between them, while cards that are only drafted together by chance were assigned large distances.
We then utilized a function in the statistical computing language R called cmdscale (the whole analysis was performed in R), and asked it to find an arrangement of points in a 2D or 3D space that would achieve these predefined pairwise distances as well as it possibly could. The function finds an arrangement of points, and then we visualize it either with ggplot (for 2D plots), or with rgl package (for 3D), to get the pictures you’ll see below.
Results and Analysis
So enough with the talk about how we got there – let’s see what we found out! We’ll start with the recent M19 data, mostly from August.
On this plot, each point represents a card. Red cards are red, green ones are green, and so on, except that the white cards are gold colored. Colorless and artifact cards are gray. All multicolored cards, regardless of guild, are shown in purple for simplicity’s sake.
If two cards are often drafted together, they are shown close to each other on this plot. If the cards are almost never drafted together, they will be very far from each other on the plot.
Note also that the X and Y axes are not really interpretable; they are just a way to plot the points, but there is no deep meaning to them.
If you’re wondering which card each dot represents, a similar picture that has each point labeled can be found here.
As you can see, it is not easy to represent a complex drafting landscape in a 2D image, so a card may be in the center both if it is drafted with any color (like most artifact cards), and also if it is drafted in decks of two opposing colors (like Gruul or Izzet on this plot). But at least you can spot things that are suspicious and investigate them.
Plotting in 3D might be more informative, but the trouble with 3D plots is that you still need to show them on 2D screens, which really limits their utility. Here’s a 3D projection of M19 drafts: pretty, but hard to use.
The first thing we can see here is that cards of the same color are drafted together, which seems obvious. But even then, exceptions prove the rule, and we have quite a few interesting exceptions.
For one, the black and white cards (represented by golden dots) don’t form distinct clusters, which hints that in M19 drafts, players tend to pair W with B more often than with any other color, and vice versa.
The reality is a bit more complicated: white and black cards actually do form distinct clusters, but these clusters are closer to each other than any other two color clusters. So when the multidimensional scaling tried to show all cards on a 2D picture, and looked for some complexity to sacrifice, it sort of folded them together. But if you look at the 3D graph earlier, you’ll see that the white and black clusters are actually distinct. This is an artifact of the 2D visualization method, but it is an artifact that appeared for a reason.
We can also hunt for individual points that don’t belong to the cloud. See that red dot on the very left, far from the main red cluster? That’s Sarkhan’s Unsealing, a card that benefits from large, beefy creatures, and even though it is strictly red, players treated it as a Gruul (RG) card. Or look at that green card that is closer to artifact cards than to other green ones: that’s Scapeshift — which is completely unplayable, however some Draftsim users like to raredraft, so it was drafted as if it were color agnostic.
What about that lonely white card on the far right? It is Aethershield Artificer, which pumps artifact creatures, making it an obvious synergy payoff for the blue/white Artifacts Matter deck.
Enough about relationships, what about power?
We can also mine draft results for other, more straightforward types of information. For example, which cards are the absolute winners that are always picked first? Here’s the top five list:
- Sarkhan, Fireblood
- Palladia-Mors, the Ruiner
- Cleansing Nova
- Sarkhan’s Unsealing
- Lathliss, Dragon Queen
This is interesting. While these are mostly great pack-one-pick-ones, they’re certainly not what I have rated as the top five cards in the set. Sarkhan especially is strange, but people love planeswalkers and the card is deceptively worse than it looks.
And what cards don’t see any love from drafters? Here are the bottom five, most frequently picked last (so shown backwards, from worst to slightly less worse):
Apparently, “tower” cards are bad.
Do all draft formats have similar shapes? Not at all, and of course the Ixalan and Rivals of Ixalan formats are special here, because their synergies are based largely not only on color pairs, but also on tribes that are in those colors. So the plot ends up looking completely different:
Here we have Vampires on the left (with Anointed Deacon and multicolored Legion Lieutenant leading the charge), Merfolk on the right, and the red guys on the top — Pirates and Dinosaurs alike. What is the half-ring of multicolored cards surrounding the red cloud? Pirates are on the left (leaning towards the black Pirates), and Dinosaurs on the right (towards green, and to a lesser extent, white creatures).
Look how closely WB Vampires and UG Merfolk are clustered. These are the two tribes in the set that were represented by only two colors, so they end up grouped very closely together.
Contrast that with Grixis Pirates (B/R/U) and to Naya Dinosaurs (G/W/R). The relationships are much more spread apart and less tightly correlated than the two-color tribes.
Red is in a special place because the color “belongs” to neither Vampires nor Merfolk. When you pick a red card, assuming an equal archetype distribution, it has an equal chance of ending up in a tribal deck of any of the four colors — either a Pirate deck (RB or UR) or a Dinosaur deck (GR or RW). In contrast, it’s not so with black, where you can end up in Pirates (RB or UB) or Vampires (WB) only, if you stick to the prescribed tribal archetypes and groupings of synergy.
In XLN/RIX, GB and UW were “tribeless,” making their synergies — if they existed at all — much less pronounced. It makes sense that we don’t see them bumping up against each other here.
Dominaria seems to have the most separation between colors of all three. Perhaps this has to do with the high individual power level of many of the cards in the set. Moreover, in DOM, artifacts do not form an amorphous cloud in the center, but make a cluster of their own, very close to U/W/B, which happen to be the colors that care much more about historic triggers.
Not too surprisingly, the blue cards that pull closer to red are Wizards synergy cards (eg Naban, Dean of Iteration, Vodalian Arcanist, and Wizard’s Retort), and the ones that are closer to white are either artifact/historic payoffs or enablers (The Antiquities War, Artificer’s Assistant, Zahid, Djinn of the Lamp, Karn’s Temporal Sundering, Sentinel of the Pearl Trident).
I’m not sure how to explain why white is so closely clustered together versus red, however. This possibly indicates a replacability/lack of synergy among white’s cards or perhaps a tendency of the white cards to only be drafted with each other.
Bonus: Just for fun, here are the other two 3D visualizations that we generated for RIX and DOM:
Where To Go From Here? Some Exciting Future Possibilities…
So this is all really cool, but what can we do with it?
Training Draftsim’s AI to use this data and capture synergy
Synergy in Magic is a very difficult thing to measure because it shifts from format to format. In one block, R&D may introduce a new mechanic that explicitly drives synergy (such as historic, kicker, or cycling), but another might primarily use tribes, such as in Ixalan. And let’s not even get started with cards that are “better for aggro decks” or “control” decks. After all, most decks in limited are just some flavor of midrange, right?
Rather than individually scoring or programming all these factors for each set, it would make a lot of sense to enlist Draftsim’s wonderful user base to help train the algorithm to recognize synergistic pairings of cards.
We could then in turn train the AI to make more realistic decisions based on the clusters of cards that are frequently drafted together.
Encapsulate the shifting “draft metagame”
It would be fascinating to track how cards and color pairs become progressively more or less overvalued during the course of the format. You might see certain color pairs move around as archetypes go in and out of vogue or one color gets recognized as “bad.” And, like in constructed, a draft metagame seems like it would eventually settle somewhere.
And wouldn’t it be really fun to be able to quantitatively measure what cards people “slept on” at the beginning of a format? Or to assign archetypes and see how our understanding of what cards belong in them changes? You could then use this information to improve your card and archetype evaluation methods.
In fact, this leads into our next article where we’ll do a quantitative analysis of how much a format changes over the course of a month. Here’s a little taste of what we’ll be looking at:
That’s right – we have before and after data that we can use to measure changes. Double the data means double the fun!
Clustering decks instead of cards
This whole article we’ve been looking at defining a format by the relationships between individual cards. However, we can also zoom out and examine the the end product of your draft as well: your final collection of drafted cards.
By looking at the features of these collections, we can use unsupervised learning algorithms to group decks into archetypes. This will give us a clear picture of the kinds of decks that you will likely face in a draft metagame, the key cards in those decks, and frequency of those decks being drafted.
Not only would this help all of us learn a ton about drafting, we could start exploring draft bot “personalities.” Maybe one bot at your table is leaning towards drafting something aggressive, while another bot just can’t wait to Cancel every bomb you draft. These personalities could very well mimic the same drafting preferences you see from people at your local game store.
One thing that really surprised me was the shapes of the formats — even in the more meaningful 3D visualizations. RIX looks incredibly different than the more “normal” M19 and Dominaria formats. Perhaps by looking at these shapes and patterns, we can quickly figure out some important or valuable points about approaching a format. As as we get more data from different sets, more patterns should emerge.
This is such a treasure trove of data that I’m sure there are many ways we could use this methodology to learn more about Magic and about its different draft formats. Let us know what other awesome things you think we could do or what else you’d like to see measured and analyzed!