The State of AI in Real Estate

Townhomes located at 6th and G Street, SW in the Southwest Waterfront neighborhood of Washington, D.C. The United States Department of Housing and Urban Development headquarters is in the background.

Neighborhood of Washington, D.C. The US Department of Housing and Urban Development headquarters is in the background. Attribution: dbking, CC BY 2.0, via Wikimedia Commons

26 August 2021

From the desk of Laura Norén

(read more about Laura Norén)

Using predictive statistical models in real estate has been an active data science application space for at least a decade, but importance has ticked up recently as the pandemic and low mortgage interest rates inspired the great real estate reshuffling of 2020-2021. Because the sales value of a home is usually seen as the most important characteristic for buyers, sellers, and agents, and because there are many (but not an infinite number) of attributes that can impact the sales price, real estate is a rich problem space for a statistician or data scientist.

Real Estate is the Bedrock for Wealth Formation in the U.S.

For context, pre-pandemic research consistently found that home ownership was a key contributor to wealth in the US -- the average homeowner has 40 times the wealth of the average renter (New American, 2019). Even if the wealth associated with the home equity is bracketed out, the median home owner still has more than 20x the wealth of the median renter (US Census, 2019 using 2017 data). For typical working, middle, and upper-middle class families, owning a home is the single biggest contributor to their net worth. The outsize contribution that home ownership imparts to intergenerational wealth transfers has implications not just for individuals and families, but for communities and the way the social fabric is woven. According to Pew Research, overall home ownership has ticked up over the past 12-18 months, but the increase has been unevenly distributed. Older people (65+) and whites are more likely to enter home ownership for the first time than younger people and non-white racial/ethnic groups (Richard Fry, Pew Research, 2021). Understanding how real estate pricing models work may or may not help explain trends in wealth disparities, but the pricing models certainly influence the experience of buying and renting homes for individuals.

Data-driven real estate companies

Zillow, RedFin, and Compass are all US companies that have amplified the public face of statistical modeling in residential real estate. Zillow famously introduced the Zestimate - an estimate of the current value of any US home - as part of the listing information presented to consumers. Stan Humphries, former UVa faculty, now head of analytics at Zillow noted in 2017 that the Zestimate once had a “median error rate [that declined] from 14 percent down to around 5 percent” (Humphries, 2017) with the current error rate reported at 1.9% for homes currently on the market and 6.9% for all the other homes (Zillow, 2021).  Zillow defines an error as being more than 5% off the eventual sale price. RedFin offers a similar estimate and reports an accuracy rate of 2.67% (RedFin, 2021). Compass’s key tools are for real estate agents, not consumers, but also focus on modeling sales values. 

Estimating the price of a home in an area with many similar homes - built at a similar time using similar materials with similar updates within one standard deviation of the median home price for that metro area - is a tractable problem. It’s the type of house for which RedFin and Zillow are able to provide accurate price estimates. There’s enough transaction volume of similar homes over a short period of months to feed models with many, many variables (square footage, level of finish, paint colors, tree coverage, corner vs. interior lot, busy street vs. side street, landscaping, crime rates, rating of nearby K-12 schools, etc). For homes in areas with very few transactions, large differences between homes, homes more than 2 standard deviations above the median sales price for the area, or for homes that have sustained damage from recent natural disasters or fires, providing accurate estimates is more challenging. Zestimates, for instance, appear to be less accurate for homes in areas where flood risk is increasing due to climate change.

Real estate estimates and climate change

Ed Kearns, the former Chief Data Officer of the US National Oceanographic and Atmospheric Administration who is now the Chief Data Officer of First Street Foundation, a startup “correcting an asymmetry of information” about flood risk to real estate shared that RedFin has been posting First Street’s flood risk scores on consumer facing listings for several months (FirstStreet, 2021). There is no national code of ethics that guides realtors to disclose flood risks or any other type of risk related to climate change, so the presence of a flood risk score on RedFin’s listing summary may be the only information a potential buyer receives. Motivated consumers can use the sea level rise property-impact estimator maintained by the Union of Concerned Scientists using Zillow Transaction and Assessment Dataset (ZTRAX). The UCS warns that, “more than 300,000 of today's coastal homes, with a collective market value of about $117.5 billion today, are at risk of chronic inundation in 2045—a timeframe that falls within the lifespan of a 30-year mortgage issued today” or the Flood Factor risk estimator at firststreet.org. Kearns noted that adding data about climate risk to real estate listings is complicated by the fact that what’s good for buyers - knowing which properties are at highest risk - may upset sellers. 

First Street Foundation recently announced a new partnership, “to build a climate-adjusted, property specific wildfire risk model, analyzing the risk to American homeowners of wildfire” to housing structures (First Street Foundation, 2021). The “risk will be a function of burn probability, fire intensity, ember spread, and the estimated vulnerability of the buildings on the property” and will hopefully be available on typical commercial real estate listings.

I spoke with five sources for this article, including Kearns and data scientists working for real estate and banking firms. When properties have high risk scores, seller’s and seller’s agents have a harder time selling their homes, so acting as a data scientist in real estate is not always purely about model accuracy and adding meaningful variables to the dataset. The climate change perspective adds another layer of complexity. It hasn’t yet been clear which types of consumers, if any, are factoring climate risk into their sales and purchasing decisions or which climate risks are more or less critical. Until recently, this murkiness has kept the intersection of climate change and real estate more academic than commercial.

Data scientists who work in real estate firms may not always get to model the most alarming  questions facing us in the anthropocene, but they are working on interesting questions, nonetheless.

The rental market

The most common uses of data science in real estate are on the home sales side, but there is work happening on the rental side, too. Improving the accuracy of sales models is typically a major priority - adding features to the model, cleaning data, getting more data, but there are other considerations such as preventing listings from bearing any resemblance to used car adverts.

New York’s Manhattan, as seen from Brooklyn, where data scientists worked closely with UX/UI designers to show base rents during the early months of the COVID pandemic when the rental market dipped precipitously. Photo credit: Laura Norén.

In New York during the spring and summer of 2020, a sizeable portion of the city’s renters decided to leave. Because they left altogether, vacancy rates topped 5% -- far higher than usual -- and caused agents and landlords to offer concessions to attract the renters who were available. These concessions were usually free months of rent. Free rent sounds like a boon for renters, but is one of the ways landlords and agents often have the upper hand, even in difficult renters’ markets. Offering free months of rent is different than offering a reduced base rent. The trouble is that real estate listings likely only have a single field for “rent”. In an ideal world, this means the listing does not look like a used car advert where it’s hard to tell what the final cost is going to be. 

Why don’t landlords simply offer a lower base rent? Because the base rent may impact the landlord’s mortgage and refinancing options - base rents matter more than concessions. They’re more durable over time. Plus, keeping base rents high allows landlords to take a small, one-time hit in rent rolls during a temporary market downturn without resetting base rents. Already in May 2021, “the median rent, including concessions, was $3,037 a month, up 8.8 percent from the previous month, the biggest monthly increase in nearly a decade” according to data from Douglas Elliman (Velsey, June 2021). 

Photo credit: Laura Norén

Real project: Net effective rent detector
A data scientist at a real estate firm spoke to me off the record about one of his pandemic projects: develop an NLP model to detect listings where the base rent and net effective rents were different, then ensure the UI displayed the base rent. Prior to the change, many agents posting to the site put the net effective rent into the one “rent” field and either buried the base rent at the end of a block of text or left it unclear whether the rent listed was the base rent before or after concessions had been calculated.

Short term versus long term data science - Why rent pricing not flood risk?

The experience of data scientists working in real estate brings up a fundamental difference between data scientists working in any commercial venture and those working in academia and non-profits. The goals data scientists have in commercial contexts are often necessarily short term. In this case, there was a big exogenous shock to the real estate market and data scientists had to respond to that shock. Building models that led to UI changes so renters could get a more accurate sense of their rent was the change that needed to happen, quickly. Giving renters clarity in a changing market is a reasonable, straight-shooting kind of goal. The renters’ needs were the priority even though the agents paid the listing fees. This is evidence that business models do not always dictate what data scientists do. 

Data scientists working in academia and research foundations are likely the ones who will continue to sort out how to intersect climate change with the real estate market. If home ownership continues to be the bedrock of wealth formation in the US, it is imperative that we move deliberately and holistically to ensure a sustainable, equitable future. Commercial actors can and do help - Zillow makes transaction data available to researchers, RedFin and realtor.com include flood risk scores. As much as commercial actors are typically touted for their capacity to innovate, in this case, it’s unlikely commercial actors will be the source of innovative solutions on their own. There are short term disincentives for commercial actors - angry sellers and sellers’ agents, little room for error in a highly visible sector, lawsuits -  so commercial actors are unlikely to lead, but they can play a substantially helpful role in collaboration with organizations like First Street Foundation and the Union of Concerned Scientists.

Looking for more From the Desk?

Check out the From the Desk archive!