How to Build a Reliable CRE Comps Database (And Why Most Firms Don't Have One)

Ask any acquisitions analyst where their comp data comes from, and you'll get a version of the same answer: "A bit of CoStar, a few broker emails, some deals we've done in the market before." That's not a comps database. It's a collection of partially-remembered data points held together by institutional memory. When the analyst leaves, the data leaves with them.

We've been working with mid-market CRE deal teams for long enough to know that the absence of a maintained comps database is one of the most consistent and underappreciated structural weaknesses in how firms underwrite acquisitions. Here's why most firms don't have one — and what it takes to build something that actually holds up.

Why Most Firms Don't Have a Real Comps Database

The fundamental problem isn't data availability. CoStar, CBRE, JLL, and a handful of other sources publish enough transaction data to support a usable comps database for most US markets. The problem is ownership and maintenance.

Building a comps database requires someone to decide what gets in, in what format, with what level of verification. It requires a consistent taxonomy — what counts as a "comparable" for a 50,000 SF Class B office building in the Atlanta metro? Is a 35,000 SF lease in the same submarket a comp? What about a 80,000 SF transaction in an adjacent submarket? These definitional decisions are judgment calls, and they need to be made explicitly and consistently — otherwise you end up with a dataset that looks like a comps database but produces meaningless results when you query it.

Maintenance is the other obstacle. Comp data decays. A lease transaction from 2020 may have been market-representative when it was signed but is now 40-60% below current market rent in many industrial submarkets. Including that comp in a 2025 underwriting model without applying appropriate context produces a lowball rent assumption that distorts your NOI and your valuation. Either you update your database regularly, or you track vintage dates closely enough to know which records are still relevant.

Both of those things — initial build and ongoing maintenance — require dedicated time. Mid-market firms with small analyst teams simply don't prioritize it. Every hour spent building database infrastructure is an hour not spent on active deals. In a market with active deal flow, the database keeps getting deferred.

The Real Cost of Not Having One

The cost shows up in three ways, and they compound.

Inconsistent rent assumptions across deals. When different analysts pull comps from different sources on different days, even deals in the same submarket can produce materially different rent growth assumptions. We've seen a spread of 8-12% in stabilized rent estimates for similar assets underwritten by the same team within the same quarter — purely because the comp research was done independently each time. That inconsistency makes it impossible to build a reliable track record of how your underwriting assumptions perform against actual outcomes.

Slow diligence cycles. Rebuilding comp research from scratch on every deal is one of the biggest time sinks in the underwriting process. An analyst spending 4 hours on comp research for a 30,000 SF office deal isn't adding 4 hours of value — they're recreating work that could be reused from a prior deal if it had been properly documented and stored. In markets where your team regularly underwrites similar asset types, the reuse rate should be high. It almost never is.

Dependence on brokers for market knowledge. When your comp research depends primarily on broker-provided data, you're getting comps selected by the broker — which is to say, comps that support the broker's view of market rent. That's not necessarily dishonest, but it's biased. Brokers naturally surface comps that support deal activity. A independent database with verified transactions gives your team an independent reference point that doesn't depend on the sell-side relationship.

What a Reliable CRE Comps Database Requires

A reliable comps database isn't just a spreadsheet of transactions. It has several structural components that most informal comp collections lack:

A defined scope. What asset classes, geographies, and size ranges does the database cover? A database that tries to cover everything produces results too broad to be useful for any specific deal. Define your core target markets and asset classes explicitly. For a firm focused on office and industrial in the Southeast, that means a different scope than a firm covering mixed-use in gateway cities.
A transaction verification standard. Not all published lease data is equally reliable. Broker-reported comps are sometimes based on rumor and anecdote, especially for private transactions in smaller submarkets. Your database should distinguish between verified transactions (confirmed by both parties, with documented lease terms) and market-estimate data. Those two categories should not be mixed without flagging.
A canonical field schema. Every transaction record should include: property address, asset class, building class (A/B/C), year built, transaction date, lease type (NNN, gross, modified gross), lease term, base rent PSF, TI allowance, free rent period, and effective rent PSF. When fields are missing, they should be explicitly null — not blank. A blank and a null are different things when you're querying the database.
A vintage weighting methodology. Transactions don't stay relevant forever. A 5-year transaction should typically be weighted less than a 12-month transaction for purposes of establishing current market rent, unless there's a specific reason to look at historical trend. Build your weighting logic in explicitly — don't leave it to the analyst querying the data to decide on the fly.
A regular update cadence. Quarterly is the minimum. Monthly is better for markets where you have active deal flow. The update process needs to be owned by someone, with a clear process for sourcing new transactions. Without this, databases that start strong get stale within 18 months.

Sourcing: Where Verified Comp Data Actually Comes From

Reliable comp data has a few primary sources, each with different coverage profiles and reliability characteristics.

CoStar is the most widely-used US source for commercial lease transaction data, with coverage across most major metros. Coverage quality varies by market — in larger metros like NYC, LA, Chicago, and Dallas, CoStar's lease database is fairly complete for larger transactions. In secondary markets, coverage thins significantly, and smaller private deals often don't appear until months after signing. CoStar's data requires ongoing subscription and is priced for institutional users, making it the standard for larger shops but a significant cost for smaller operators.

Broker comp reports provide deal-specific intel that doesn't always appear in aggregated databases, especially for off-market transactions or in markets where CoStar coverage is thin. The tradeoff is bias: as noted above, broker-provided comps are not neutral. Use them as supplementary data, not primary.

Internal transaction history is often overlooked as a source. If your firm has acquired or managed properties in a target submarket over the past several years, your own lease execution data is verified, detailed, and directly relevant. Building a structured record of internal transactions is the easiest win in comps database construction — the data exists, it just isn't organized.

Public filings for REIT-owned properties include lease disclosure data that can provide verified comp support for submarkets where the REIT has meaningful concentration. This source requires extraction effort but produces high-quality verified data.

At Tenantvein, we aggregate and normalize lease transaction data from multiple sources into a database covering 2 million-plus CRE lease transactions across 50 major US metros. The normalization process applies the same canonical schema and verification standards described above, which is what makes the data queryable and usable in underwriting models directly — rather than requiring an analyst to manually filter and assess comp quality on each use.

How to Actually Build This at a Mid-Market Firm

If you're building from scratch, the practical approach is narrower than you might think. Don't try to build a national database in year one. Build a deep, well-maintained database covering your 3-5 primary target submarkets. Depth beats breadth when the data is going to drive real underwriting decisions.

Start with your internal transaction history. Export every lease you've executed in the past 5 years — your own acquisitions, dispositions, and renewals. Standardize the fields. That's your baseline. Then pull CoStar data for the same submarkets and same time period. Flag every record as CoStar-sourced versus internally-verified. Run a spot-check on a sample of CoStar records against broker intel or county records to assess accuracy in your specific markets.

The whole initial build, done properly, takes one analyst about 3 weeks for a 3-submarket scope. After that, maintenance is 4-6 hours per quarter per submarket — pulling new CoStar data, checking it against any internal transactions from the period, and updating the vintage weighting logic if market conditions have shifted.

A well-maintained comps database for your 3 primary submarkets is worth more than a poorly-maintained one that nominally covers 30 markets. Reliability is the only metric that matters at the point of use.

The payoff is deal-level efficiency and organizational knowledge retention. Analysts working in a market for which the firm has a maintained comps database spend 60-75% less time on comp research per deal than analysts starting from scratch. More importantly, the comp assumptions are consistent across deals in the same market, which makes deal-versus-deal comparison and post-acquisition performance attribution actually meaningful.

When to Supplement With a Third-Party Data Source

Building your own database and using a third-party source aren't mutually exclusive. In fact, for most mid-market firms, the right answer is both: maintain a curated internal database for your core markets, and supplement with a broader data source for deals in markets where you don't have depth.

The key thing to look for in a third-party comps source is data provenance — can you see where each transaction record came from, and how it was verified? A comps database that aggregates data without documentation of source and verification status gives you false confidence. You don't know whether the $28 PSF comp you're citing in your underwriting came from a verified lease execution or from a broker estimate that was never confirmed.

That provenance question is something we take seriously at Tenantvein. Every comps record in our database carries a source tag and a verification status. When you query for comparables, you see the underlying data quality, not just the number. That makes it possible for analysts to apply appropriate weight to each comp — not because we've made the judgment call for them, but because we've given them the information to make it themselves.

The firms with the best deal outcomes aren't necessarily the ones with the biggest data budgets. They're the ones who know which data they trust, why they trust it, and how to use it consistently. That's what a real comps database enables. It's less glamorous than a new analytics product, but in our experience, it's one of the highest-return investments a mid-market CRE acquisitions team can make.

Why Most Firms Don't Have a Real Comps Database

The Real Cost of Not Having One

What a Reliable CRE Comps Database Requires

Sourcing: Where Verified Comp Data Actually Comes From

How to Actually Build This at a Mid-Market Firm

When to Supplement With a Third-Party Data Source

Related articles

Tenant Credit Quality in Office Underwriting: What Data Points Actually Predict Default Risk

Industrial Lease Underwriting in 2025: Cap Rate Compression, Shorter Terms, and What It Means for Models

Integrating AI Underwriting Tools Into Your Existing Deal Team Workflow Without Disrupting IC Approval