MQnomix® – RealEstate & Analytics


Why Predictive Modeling is so important for Real Estate development?


Triggered by fortuitous (extreme) events, the transformation of the entire business landscape is profound. The large-scale metamorphosis, from brick and mortar to digital, is nothing short of a quantum leap in the way the business ecosystem will work its way into the new state of equilibrium. Make no mistake, the path towards it is convoluted and fraught with perils, letting entrepreneurs with almost no margin of error during this one-of-a-kind journey that requires to

be nimble,

get access to relevant & timely data,

have the tools and talent (e.g. Predictive Modelling, AI, etc.) to solve the right kind of problems.

All of these, and more, are cardinal to the digital competency backbone.

But where does this plethora of must-have traits leave the real-estate industry?

For those interested, the Chief Strategist of VTS (, a digital real estate platform, answers some of these concerns. Moreover, as a Data and Decision-Modeling boutique consulting firm, we consider that there is no better way to showcase the benefits of Predictive Modeling for the Real Estate industry, than to have a brief description of how our clients can benefit from it.

While briefly described, the case study is capable of bringing home the point that, in this hypercompetitive and intricate business environment, access to real-time data combined with Predictive Modeling succor investors, and entrepreneurs alike, to stay ahead of the curve.

The Problem

Doing the right thing in the residential space could mean a lot of things, like building the right type of home (now, in the retrospect, that’s an easy pick), installing intelligent devices (IoT), knowing the right mix between green and living areas, choosing between an individual and a community pool. Standard variables like zip-codes, no. of bedrooms and bathrooms, etc., are still at play when it comes to getting it right from the very beginning.

CRE is no different. The extreme events wrought havoc upon commercial real estate development. The ensuing chain of events led to a completely different approach to office development, plagued by WFH (working from home) policies that (now) have a major role in the demand for office spaces.

And, as always, business success is about managing the scarcest of the resources, and that is the time! Yes, time & timing, not necessarily money, is what will eventually make the difference between success and failure. Some may disagree, but just think about it for a moment…

Questions like “how fast can we build the project?”, “how fast can we get the financing we need?” or, perhaps an even more important one, “how fast can we sell the project?” are about building, financing and selling projects within a certain time frame! Without having the time perspective, it seems that nothing else makes sense. Actually, it would be rather odd if the time variable wouldn’t play the most important role. Therefore, our business case is concentrated around modeling the time variable. In concrete terms, the Time-to-Sell is the outcome variable of interest.

Variables & Data Cleansing

After getting the relevant data and a good grasp of variables definition and meaning, including some of their pitfalls (!), the next step is going through a data cleansing process, removing and replacing the incoherent or missing data, extract the outliers and treat them as a separate data group, and so forth. This process is tedious, but without it we are bound to get into a GIGO (garbage-in-garbage-out) loophole that renders it entirely useless…So be forewarned!

Data Modeling & Models

The data modeling part is about using specific techniques based on specialized programming languages, like SAS, Python, SQL, etc., that transform the original data into the right kind of data. For example, the original data set was altered such that, instead of having just one data per row, by using “Do Loops”, we were able to keep track of the time period when a real estate unit was sold. In addition, we had to use simple but effective variable standardization techniques that helped to obtain a parsimonious model. To keep things as simple as possible, but no simpler than this, having a parsimonious model is of the essence!

Now, since Time-until-Selling is the outcome variable of interest, the obvious choice to model it is based upon the Survival Analysis (SA) body of theory (BoT). Within the vast SABoT, encompassing time-continuous and discrete models, there are the established, well-known, functions that can also be successfully applied for modeling time-discrete variables.

Outcome Variables & Graphs

In what follows, a host of outcome graphs and explanations thereof are presented.


From the graph of the standardized MOM (months-on-market) values, it becomes clear that the higher the MOM_stdz value is, the lower is becoming the probability of selling a property fast. As a side note, as it is suggested by MLS (Multiple Listing Service, a real estate platform), the MOM manipulation should come as no surprise, since the effect of lowering the values of this covariate renders the property, ceteris paribus, a ”desirability” status when, in fact, it is less so, or even not at all…

Odds Ratio Estimates

In the above ODDS ratio estimates table, one unit increase in the variable’s value (i.e. MOM_stdz=2), decreases the ODDS of selling fast a Property to 16% of the ODDS of selling it fast, given MOM_stdz=1. As one can see, when MOM value is incrementally increased, there is a sharp decrease in the selling ODDS, thus making a strong case for a spot-on price valuation of a property, earlier in the selling process rather than later.


Discount= round((1-(closeprice/listprice))*100,0.01);

Odds Ratio Estimates

Again, for the purpose of analyzing the output, we are exclusively working with standardized values for all variables included in the model, except for the Event_Month, which is considered Categorical in this Survival Model.

Now, let’s look at the Discount_stdz graph above. What’s becoming obvious is that, the Discount_stdz covariate has a sizable impact if it is higher than the 0 (zero) threshold value, given that the other two covariates, MOM_stdz and Event_Month, are both set to their mean and base value=max., respectively.

Regarding the ODDS Ratio, displayed in the above table, if the variable Discount_stdz is increased by 1 unit, the ODDS of selling fast a real estate property is 3.375 higher. It makes sense that a price discount (applied to the List price) should have a positive impact upon the Probability of selling fast, whereas the impact of a price increase upon the Prob(Selling Fast a Property) should be negative, i.e. the Probability tends to zero. Moreover, the graph highlights that a discount value above the mentioned threshold has an exponential-like impact. However, not all things are created equal. As we shall see, the point is that the effectiveness of a discount policy depends upon the concrete circumstances of other factors.


Now, it is time to get a better picture of the combined effect of Event_Month and Discount_stdz upon the Probability of selling a property fast, given that MOM_stdz equals its average value.

The Event_Month variable takes on values from 1 to 4, where each value represents the actual month in which the property was sold, whereas the meaning of the Discount_stdz was already explained.

So, what are the most important aspects of the above graph?

First and foremost, in itself, the discount doesn’t play too much of a role, or it doesn’t have too much of an impact upon the Probability of selling a property fast if the selling event takes place in the first month, Event_Month=1 (while MOM_stdz= mean value).

However, in order to pull this off in such a short period of time, we must have a spot-on valuation.

That is, a separate, parsimonious(!) and highly significant model, that tells us what the correct asking price should be. If we are considering a buy-to-let property, we replace the asking price with the rental (lease) value.

Next, the impact of the Discount_stdz values upon the Probability of selling a property fast, considering all the other Event_Month values (2 to 4), is relatively similar, given the same average MOM_stdz. In a nutshell, the curves for Event_Month=2 to 4 are rather tight in terms of the impact of the Discount_stdz values upon the Probability of selling a property fast. Noteworthy is that they all display an exponential-like increase, once they get over the threshold value.

Based on the available data, the highest impact of a discount is when the Event_Month =3. However, this can well be an artifact of a rather small data set used for this example.


Now, what about the above graph? Based upon the information within it, we conjecture that the Probability of selling fast a property depends upon the Discount value applied, as a function of the number of months property is active (MOM), given that all the properties sold have an Event_Month larger than 90 days but lower than 120 days (i.e Event_Month=4).

It is worth mentioning that, a standardized MOM (MOM_stdz) value of -1.97, corresponds to a MOM value of 0 months, whereas a 0.933 MOM_stdz value has a corresponding MOM value of 4 months. Consequently, each of the 5 (five) curves from above are for MOM=0 to 4 months.

Based on these curves, one thing that stands out is that the greatest impact of the Discount is for the blue curve, having a MOM value of 0. That is, given that a property is sold after 4 months, the longer a RE property is available on the market, the higher is the Discount impact upon the Probability of selling fast that property (i.e., during the 4-th month).

A property that is available for a longer time period, puts additional pressure on owners/managers to offer additional discounts. Eventually, the property is going to be sold (rented/leased), but only if it is correctly priced given its location, living area, amenities, proximity from shopping malls, schools, and other exogenous conditions.1

Last but not least, one of the most important graphs is presented below. The ROC curve displays how well the model actually fits the available data. ROC is computing the predictive power of the model. For a parsimonious model and a small database, getting a ROC (area under the curve) value of 0.918 is very promising indeed (1.0 is max.)!

Is there anything else that we can do to enhance our modeling capabilities?! In fact, there is!

What we can do is, based on the model that we have described, to get the values for the predicted probabilities of the dependent variable (e.g. Time-to-Sell, Time-to-Lease, etc.) for a portfolio of properties, and apply them to Monte Carlo Simulation (MCS) models. One example of such a MCS model could be a probabilistic estimation of a portfolio’s Cash-Flow. Since the above dependent variable has a corresponding probability for any of the possible scenarios, we can think of using them for probabilistic best case-worst case estimation of the cash-flow. This approach is superior to its deterministic counterpart…

Interested to find out more about the benefits of modeling and analytics? Tell us what your particular business problem is and we are ready to help you with our expertise


1. These variables had a significant impact during pre-Covid19 period. They still play a role but, at this point in time, it remains unclear how important they are.

Share This