Sequencing the DNA of Real Estate: An AI-Driven Approach for Comparing Assets

Posted by Or Hiltch Jul 16, 2018

The meeting point of highly enhanced data science and computational power allows for previously unheard of marketview accuracy. Skyline AI CTO Or Hiltch breaks down today’s advances, and why commercial real estate will never be the same again.

During the course of a dinner with the real estate vanguard, you’re bound to hear about the importance of location, location, location. It won’t be long before size and age of the asset come up, either.

Indeed, when analyzing a potential real estate deal, the analysis usually starts with establishing a peer group consisting of similar properties of similar size and age, all in the same area where the subject property is located.

 The logic behind comp analysis is that similar properties should behave the same: if the building across the street sold for $X, and your building is roughly the same as that building (similar size, year built, rents, etc.), then your building should sell for roughly the same price, with some adjustments where needed. But with today’s advances in AI and data science, must we still rely on linear comparison analyses as the best method of asset understanding?

Sequencing the DNA of Real Estate 1

Sequencing Real Estate DNA: 3 views of geo-spatial data layers over San Francisco. Source: Skyline AI

Unsupervised Learning Algorithms - Using Artificial Intelligence for Clustering Comps

Machine learning, a fundamental concept of AI research since the field’s inception, is the study of computer algorithms that improve automatically through experience. Machine learning models are trained using datasets of historical data with the aim of achieving certain tasks: prediction of values, classification, anomaly detection, and more.

Sequencing the DNA of Real Estate 2-1

Sale price (dollars per unit) by Virtual Neighborhood

A subclass of machine learning is the set of unsupervised learning algorithms. Contrary to supervised learning algorithms, which learn from historical datasets that are “tagged” according to some logic or target, unsupervised learning algorithms learn from “unlabelled” data — that is to say, the model is asked to perform tasks over the data without having any a-priori knowledge about it.

Instead, the algorithm finds correlations between the features comprising the dataset: topographical properties, cash flow data, transaction history, financing terms, data about restaurants and bars, workplaces, commute times, and more. Understanding these types of deep relationships between so many factors over a period of decades is beyond the processing ability of even the most capable human analyst. Moreover, these correlations often go against so called “common sense,” and so they are never sought after.

Sequencing the DNA of Real Estate 3
Clustering of Skyline AI Virtual Neighborhoods

Skyline AI Virtual Neighborhoods

Skyline AI uses the most comprehensive commercial real estate data set in the industry, mining data from over 100 different sources and analyzing over 10,000 different attributes on each asset for the last 50 years. Powered by natural language processing and high-performance data infrastructure, all data is compiled into one large “data lake” and then cross-validated to make sure the data used is accurate.

This enables our data science teams to design and train an ensemble of machine learning models, including unsupervised ones, in order to form “Virtual Neighborhoods” — clusters of properties deemed similar according to thousands of different signals in the data, some of which are represented by deeply hidden correlations and may be used in advanced comp analysis.

Say that we are looking at a multifamily property in the Atlanta-Fulton submarket. We would like to construct a peer group and perform rent comp analysis to assess the properties’ current condition when it comes to rent and value add potential.

Following a traditional comp analysis, we let an experienced human real estate analyst source the comps. The analysis ends up containing 36 comparable assets, all within a radius of 8 miles from the property, roughly from the same property class (from B — to B++), built between certain years (1999–2018), with certain occupancy levels, and so on.


Rule-based peer group construction as performed by a human analyst: 36 comps

The comparison in this case is for two-bedroom units, and we can see two charts showing us where our property is located compared to its peer group. On both charts, we construct a grid system showing us the location of our property (the yellow dot) vs. its peer group (the blue dots), where the left chart shows rent per unit vs year built and the right one rent per square foot vs year built.

But where would we expect our property to be located based on those comps? One way to visualize this is by leveraging linear regression. During this process, our previous observations, like the rent values of the comps, are plotted against some other dimension (for example, the year built). Then, using an iterative algorithm called gradient descent, we fit a line through those samples (“the fitted line”) as such that an error function is minimized (the error representing the distance between our line and the peer group rents).

Once the regression is done, if we draw a vertical line between the yellow dot (representing our property) and the fitted line, that’s where we’d “expect” our property to be positioned based on the comps.

Following this type of traditional comp analysis, one may conclude that the asset is a bit more expensive than its average peer. But is that really the case?


 The iterative process of fitting a line into a set of observations using Gradient Descent

Eliminating Cognitive Bias

If we let the clustering algorithm do the work instead of a human analyst, we eliminate any potential for cognitive bias and let the data do the work:


AI-constructed peer group: 412 comps

In this case, it turned out that there are ~10x more similar properties that may be discovered using machine learning for comp analysis. Using the enhanced comp set, it turns out that our asset (yellow dot) is actually below the green line — that is to say, the asset’s rent per two-bedroom unit is actually below the average value.

In this example we can see that by leveraging an enhanced comp set constructed by the AI, we were able to reach a fundamentally different conclusion: an asset that initially appeared as expensive to rent was actually under-occupied.

The DNA of Real Estate

When two properties appear to have the same “property DNA,” AI-driven analysis can come up with a great deal of insight concerning the future performance of one asset based on the performance of the other. Amazingly, this is true even for properties in entirely different states. This is exactly where certain types of Artificial Intelligence algorithms may be applied to create enormous value beyond the reach of any human analyst.

Real estate properties and their prices are complex entities. For this reason, as we’ve seen in the peer groups example, traditional methods of using just a few obvious intuitive indicators do not always work very well.

The prices of real estate involve complex interactions among many different input features which behave differently in different parts of the input space. Letting an AI mastermind, comprised of a combination of both supervised and unsupervised models, learn and understand the nature of these inputs and interactions may produce far greater results across multiple domains in real estate investment analysis.

Over the last few years, we’ve seen AI disrupt a number of traditional industries, and the real estate market should be no different. We believe that the power of Skyline AI’s technology to understand vast amounts of data that affect real estate behavior will unlock billions of dollars in untapped value.