ICP scoring 101

Key takeaways

ICP scoring and lead scoring are not the same thing. ICP scoring runs on accounts and is firmographic-first. Lead scoring runs on contacts and adds behavior. Conflating them produces a model that does neither job well.
A working ICP scorecard has four layers. Firmographic, technographic, intent, and behavioral. Each layer answers a different question. The weights between layers matter more than the weights inside any one layer.
Calibrate the scorecard from closed-won and closed-lost data, not from instinct. Pull the last 50 wins and 100 losses. The attributes that separate them, weighted by how often they appear, are the scorecard.
The qualified threshold is a capacity question, not an accuracy question. Set the threshold where the volume of qualified accounts matches what your SDR or AE team can actually work.
Scorecards drift. Plan a quarterly review with the rules for when to retrain, when to add a layer, and when to retire an attribute.

This playbook covers ICP scoring end to end: what it is, how it differs from lead scoring, how to build a scorecard, how to calibrate it from data, where to set the threshold, and how to maintain it over time. For the basics of lead scoring, see how to score B2B leads. This document goes deeper into the ICP side specifically and works through a complete worked example.

ICP scoring vs lead scoring: why the distinction matters

Most B2B teams use the term "lead scoring" to mean any model that ranks prospects. That language collapses two different jobs into one and produces models that work for neither.

ICP scoring answers: is this company the kind of company we close? The unit of analysis is the account. The inputs are firmographic, technographic, and account-level intent signals. The output is a score that determines whether the account is worth working at all.

Lead scoring answers: is this person showing buying behavior right now? The unit of analysis is the contact. The inputs are mostly behavioral: email opens, page visits, demo requests, trial activity. The output is a score that determines how fast the contact should be routed and to whom.

	ICP scoring	Lead scoring
Unit	Account (company)	Contact (person)
Primary inputs	Firmographic, technographic, account intent	Behavioral, engagement
Time horizon	Stable (months)	Dynamic (days or hours)
Decision it drives	Should we work this account at all?	How fast and to whom?
Owned by	Marketing ops or RevOps	Sales ops or RevOps
Review cadence	Quarterly	Monthly

The two scores work in sequence, not in parallel. An account passes the ICP filter first. Only then do the contacts inside that account get lead-scored. A high-engagement contact at a low-fit account is rarely a deal. A high-fit account with no engagement is a prospecting target, not a hot lead.

Treating them as one score is the most common scoring mistake in B2B. The fix is to run them as two stages: an account-level qualifier, then a contact-level prioritizer.

The four-layer ICP scorecard

A working ICP scorecard has four layers, each answering a different question.

Layer 1: Firmographic

Static company attributes. Does the company match the size, industry, geography, and stage of the customers you close?

The five firmographic attributes worth scoring:

Company size. Headcount or revenue band. Most B2B SaaS has a 3x-wide working band (50 to 500 employees, or $5M to $50M revenue). Inside the band is the target; outside is not.
Industry / vertical. The 3 to 5 verticals where you have closed deals. Verified verticals score higher than adjacent ones. Untested verticals score zero by default.
Geography. Where your support, contracts, and data residency work. Geographies outside that footprint score zero.
Stage of growth. Recent funding, recent leadership hires, recent product launches signal active buying windows. A company that hired a new VP Sales last month is in a buying window for sales tools.
Department headcount. The size of the team that would use the product. A 200-person company with a 3-person sales team is not the same buyer as a 200-person company with a 30-person sales team.

Layer 2: Technographic

Tools the company runs. Technographic fit is the most underused layer in B2B scoring and consistently predicts close rate.

Common technographic signals:

CRM. Salesforce, HubSpot, Pipedrive. If your product integrates natively with one, prospects on that CRM close 2 to 3 times faster than prospects on others.
Adjacent tooling. Sales engagement, marketing automation, data warehouse, communications platform. Overlap with your stack means lower switching cost and shorter evaluations.
Cloud infrastructure. AWS, Azure, GCP. Relevant for products with infrastructure dependencies.
Incumbent tools in your category. A prospect already using a competitor is a different conversation than a prospect using nothing. Sometimes faster (they have budget allocated), sometimes slower (they have switching costs). Both matter and both should score.

Data sources for technographic data include BuiltWith, Wappalyzer, HG Insights, and platforms like Clay or ReachIQ Data Enrichment that aggregate across sources.

Layer 3: Intent

Signals that the account is in a buying window. Intent data sits at the account level (which is why it belongs in the ICP scorecard, not the lead scorecard).

Working intent signals:

Third-party intent surge. Bombora, G2 Buyer Intent, TrustRadius alerts when a company is researching your category. Score these as binary "in market this month" or "not".
Job postings. A company hiring 5 SDRs is buying sales tools. A company hiring engineers in a specific stack is buying tools in that stack.
Funding events. A Series B last month means budget cycles are opening for the next 90 days.
Leadership changes. A new VP in a relevant function is typically given 60 to 90 days to evaluate tools.
Product launches. A company launching a new product line is open to tools that support the launch.

Intent signals decay. A funding announcement from 9 months ago is no longer an intent signal. Build a half-life into the intent layer: 60 to 120 days for most signals.

Layer 4: Behavioral (account-level)

Aggregated behavior of contacts inside the account, rolled up to the account. This is where the line with lead scoring blurs, and you want to be careful: account-level behavior belongs in the ICP scorecard; contact-level behavior belongs in the lead scorecard.

Account-level behavioral signals:

Multiple contacts engaged. Three contacts from one account opening emails is a different signal than one contact opening five times. Score the breadth, not just the depth.
Pricing page visits from the account. Pricing page traffic from one IP block correlated to the account, regardless of which specific contact.
Demo requests from the account in the last 12 months. A prior demo, even from a different contact, signals the account has been in evaluation before.
Webinar or event attendance. The account was represented at an event you ran.

Assigning weights between layers

The mistake most teams make is weighting attributes inside a layer (industry vs geography) before deciding on weights between layers (firmographic vs intent). The between-layer weights matter more.

A working starting point for B2B SaaS:

Layer	Weight	Why
Firmographic	40 percent	The most stable predictor. If firmographic fit is poor, nothing else matters.
Technographic	25 percent	Strong predictor of close rate, underused. Worth weighting up if your product is integration-heavy.
Intent	20 percent	Time-sensitive. High when present, decays fast.
Behavioral (account-level)	15 percent	Late-funnel signal. Highest weight when present but rarely present.

These weights shift by company. A company selling a horizontal tool to many verticals weights firmographic lower (the verticals matter less). A company selling a vertical-specific tool weights firmographic higher (the wrong vertical is a non-starter).

The diagnostic: pull the last 30 closed-won deals. Score them retrospectively with the proposed weights. The wins should score above the threshold. If they do not, the weights are off. Adjust until 80 to 90 percent of historical wins score above the threshold.

A worked example

The clearest way to make this concrete is a full scorecard, run against a hypothetical account.

The scorecard

Suppose you sell a sales engagement tool to mid-market B2B SaaS companies in North America. Your ICP scorecard looks like:

Layer	Attribute	Possible scores	Weight
Firmographic (40%)	Headcount 50 to 500	0 or 10	10
	Industry is B2B SaaS	0, 5, or 10	10
	Geography is US/CA	0 or 10	10
	Sales team headcount 5+	0 or 5	5
	Recent funding (12mo)	0 or 5	5
Technographic (25%)	Uses Salesforce or HubSpot	0, 5, or 10	10
	Has a marketing automation tool	0 or 5	5
	Currently uses competitor tool	0 or 10	10
Intent (20%)	Third-party intent surge	0 or 10	10
	Hiring SDRs (last 60 days)	0 or 5	5
	New VP Sales (last 90 days)	0 or 5	5
Behavioral (15%)	3+ contacts engaged	0 or 8	8
Behavioral (15%)	Pricing page visits from domain	0 or 7	7

Maximum possible: 100 points. The thresholds: 70+ qualified, 50 to 69 worth working with caveats, below 50 disqualify.

Account A: scores well

Headcount 220 (in band): 10
Industry is B2B SaaS: 10
Geography is US: 10
Sales team is 18 people: 5
Series B raised 4 months ago: 5
Uses Salesforce: 10
Has Marketo: 5
Uses Outreach (competitor): 10
No third-party intent surge: 0
Hiring 3 SDRs: 5
New VP Sales 6 weeks ago: 5
2 contacts engaged: 0
No pricing page visits: 0

Total: 75. Qualified. This account goes to the active outbound list with high priority.

Account B: scores poorly

Headcount 22 (below band): 0
Industry is e-commerce: 0
Geography is Brazil: 0
Sales team is 2 people: 0
No recent funding: 0
Uses HubSpot: 10
No marketing automation: 0
Uses no competitor tool: 0
Third-party intent surge: 10
No hiring activity: 0
No new VP: 0
4 contacts engaged: 8
Pricing page visits: 7

Total: 35. Disqualify. Even though intent and behavior are high, the firmographic fit is wrong. This is exactly the case where lead-scoring-as-ICP-scoring would have routed this account to sales and wasted cycles.

Account C: in the middle

Headcount 90 (in band): 10
Industry is logistics SaaS (adjacent, not top 3): 5
Geography is US: 10
Sales team is 8: 5
No recent funding: 0
Uses Pipedrive (not Salesforce or HubSpot): 5
No marketing automation: 0
Uses no competitor tool: 0
No third-party intent: 0
Hiring 1 SDR: 5
No new VP: 0
1 contact engaged: 0
No pricing page visits: 0

Total: 40. Below the qualified threshold but above the disqualify threshold. This account goes to the nurture list, not active outbound. A quarterly review of the nurture list catches accounts whose intent layer changes.

Calibrating the scorecard from closed-won and closed-lost data

Instinct is a bad input to a scorecard. The team's intuition about who closes is biased toward the most memorable deals, not the typical ones. Calibrate from data.

The 6-step calibration process

Pull the last 50 closed-won deals. Export every firmographic, technographic, and intent attribute you can find on each. This is the "winners" set.
Pull the last 100 closed-lost or disqualified opportunities. Same export. This is the "losers" set.
For each attribute, calculate the win rate. What fraction of accounts with attribute X closed? What fraction without it closed? An attribute is predictive if the gap between the two is wide.
Rank attributes by predictive lift. Lift is the difference in win rate between accounts with the attribute and accounts without. Attributes with lift over 15 percentage points go in the scorecard. Below 5 percentage points, the attribute is noise.
Assign weights proportional to lift. An attribute with 30 points of lift gets twice the weight of one with 15 points of lift.
Back-test against the historical set. Score the 50 wins and 100 losses with the new scorecard. The wins should score above the threshold 80 to 90 percent of the time; the losses should score below the threshold 70 to 80 percent of the time. If not, adjust weights until they do.

This is not statistical modeling in the formal sense. It is a heuristic that works because B2B sales has small samples and the model has to be interpretable. A logistic regression on 50 wins and 100 losses overfits; the heuristic above generalizes.

What the calibration usually reveals

Three patterns are nearly universal in our experience reviewing scorecards:

One or two attributes do most of the work. The 2 highest-lift attributes typically explain 60 to 70 percent of the win rate variance. The rest of the scorecard fine-tunes around the edges. This is useful to know: if data is missing for most of the scorecard but you have the top 2 attributes, you can still score the account usefully.

Several "obvious" attributes turn out to be noise. Company age, founding location, glossy logos, and similar attributes feel like they should matter but rarely have predictive lift in the calibration. Drop them. Cleaner scorecards run faster and are easier to maintain.

One or two attributes you did not expect turn out to be strongly predictive. Often a specific technographic signal or a specific job-posting pattern. Add these as their own layer entries.

Where to draw the qualified threshold

The threshold is not an accuracy question. It is a capacity question.

The logic: a working SDR can handle 60 to 100 active accounts at a time. An AE can run 15 to 25 discovery calls per week. The threshold should produce a flow of qualified accounts that matches what the team can actually work, not a flow optimized for some abstract notion of "fit".

The capacity-first method

Calculate weekly account capacity for the team. For example, 4 SDRs at 75 accounts each, refreshing 30 percent of their list per month, equals 90 new qualified accounts per week.
Score your full prospect universe with the scorecard.
Set the threshold so the number of accounts above the threshold roughly matches the weekly capacity.
If the threshold required to match capacity is below 50 percent of the maximum possible score, the ICP is too broad and the scorecard is not discriminating enough. Tighten the ICP definition or add stricter attributes.
If the threshold required is above 80 percent of the maximum, the team is under-resourced for the ICP. Either expand the ICP or add headcount.

The output of this exercise is a threshold that produces the right volume of qualified accounts to keep the team working without overflow. A threshold that produces 3 times the team's capacity is the same as no threshold; accounts pile up in the queue and the system breaks down.

The two-tier threshold

Many teams find that a single threshold is too binary. A two-tier model works better:

Tier 1 (above 70 percent of max): Active outbound. SDRs work these accounts within 2 weeks of qualification.
Tier 2 (50 to 70 percent of max): Nurture and review monthly. These accounts get marketing touches and re-score against intent signals each month. An account that adds an intent surge moves to Tier 1.
Below 50 percent: Disqualified. Quarterly re-check only.

This is the structure most working B2B teams use. The labels vary (A/B/C accounts, hot/warm/cold) but the underlying mechanic is the same.

How ReachIQ ICP Matching automates this

Building and maintaining a scorecard manually is operationally heavy. The list pulls, the scoring, the calibration, the threshold tuning, and the quarterly review add up to 10 to 20 hours of RevOps time per month for a mid-market B2B SaaS team.

ReachIQ ICP Matching handles the layers automatically:

Pulls firmographic and technographic data continuously from connected sources.
Layers third-party intent signals (where available) and behavioral signals from your sending and CRM activity.
Applies the weights you set, with sensible defaults calibrated to B2B SaaS norms.
Surfaces the top-scoring accounts into the outbound sequence automatically, refreshing daily.
Flags scorecard drift (accounts that should have scored high but did not, or vice versa, based on what your sales team actually closed).

The point is not that the platform replaces the thinking. The team still owns the ICP definition, the weights, and the threshold. The platform removes the operational overhead of running the scorecard every day.

Maintaining the scorecard quarterly

Scorecards drift. The product changes, the market shifts, the team learns. A scorecard that was right last quarter is wrong this quarter in small but compounding ways. Plan a quarterly review.

The quarterly review checklist

Pull the latest 30 closed-won deals. How many scored above the threshold under the current scorecard? If below 80 percent, the scorecard is missing something the team is closing on.
Pull the latest 30 closed-lost deals. How many scored above the threshold (false positives)? If above 20 percent, the scorecard is letting in bad fits.
Pull the disqualified-fast queue. Accounts the team got into discovery on but disqualified within 14 days. These are the most expensive misses; the team invested time. Investigate the common patterns.
Review the technographic data quality. Technographic data is the most likely to drift (companies switch tools, vendors rebrand). Validate the top 5 technographic attributes against a current sample.
Check the intent-signal half-life. Are intent surges from 90 days ago still scoring? If yes, the decay function is too slow. Tighten it.
Update the threshold. If team capacity has changed (headcount up or down), the threshold has to move with it.

When to retrain vs when to tune

A small adjustment (one weight up or down by 10 percent, one attribute added or removed) is tuning. Tune freely.

A full retrain (recalibrating the whole scorecard against fresh historical data) is heavier and should happen every 6 to 12 months, or whenever the company has crossed a stage transition (a new product, a new vertical, a new pricing model, a 2x growth in ACV).

Both have their place. The mistake is doing neither and letting the scorecard rot quietly. A rotting scorecard does not look broken; it just gradually sends sales after the wrong accounts. The diagnostic is the win-rate drift among accounts the scorecard called qualified. Down 5 percentage points quarter over quarter is the signal to retrain.

Common pitfalls in ICP scoring

Pitfall 1: Conflating ICP and lead scoring. Treating contact-level behavior as the same as account-level fit produces a model that routes hot inbound leads from poor-fit accounts to sales. Sales burns cycles, win rates drop, and the team blames the model rather than the architecture.

Pitfall 2: Scoring on too many attributes. A 25-attribute scorecard sounds rigorous and is actually noisier than a 7-attribute scorecard. The top 5 to 8 attributes do most of the work; the rest add noise without lift.

Pitfall 3: Building from instinct, not data. Most first-version scorecards are written from sales leader intuition. That intuition is usually 50 to 70 percent right and 30 to 50 percent wrong. Calibrating against closed-won and closed-lost data fixes the gap.

Pitfall 4: No decay on intent signals. Intent signals that never expire produce false positives. Build a half-life into every intent attribute. 60 to 120 days for most signals; 30 days for fast-moving ones like website visits.

Pitfall 5: Threshold set in a vacuum. Setting the threshold without checking team capacity produces either a queue that overflows (threshold too low) or a team that runs out of work (threshold too high). Match the threshold to capacity, not to abstract fit.

Pitfall 6: No review cadence. A scorecard with no scheduled review drifts for 18 months and then has to be rebuilt from scratch. A quarterly 90-minute review catches drift before it compounds.

FAQ

What is the difference between ICP scoring and lead scoring? +

ICP scoring runs at the account level and is firmographic-first; it answers whether the company is the kind you close. Lead scoring runs at the contact level and is behavioral-first; it answers whether the person is showing buying intent. The two run in sequence: ICP scoring qualifies the account, then lead scoring prioritizes the contacts inside it.

How many attributes should an ICP scorecard have? +

5 to 10 attributes is the working range. Below 5 and the scorecard does not discriminate; above 10 and it is noisier than it is useful. The top 5 to 8 attributes typically explain 80 percent of the variance. The rest are decoration.

How do I know if my ICP scorecard is working? +

Two diagnostics. First, score the last 30 closed-won deals retrospectively. 80 percent or more should score above your qualified threshold. Second, score the last 30 closed-lost deals. No more than 20 percent should score above the threshold. If either number is off, the weights or the threshold need adjustment.

How often should I update the scorecard? +

Tune quarterly (small weight adjustments, attribute add/remove). Retrain every 6 to 12 months, or whenever the company crosses a stage transition (new product, new vertical, new pricing model, 2x growth in ACV). A scorecard that has not been reviewed in 12 months has almost certainly drifted.

Does account-level scoring work for SMB? +

Yes, with a tighter weighting toward firmographic and intent and away from technographic and behavioral. SMB has less technographic data available (smaller companies have smaller tech stacks) and less behavioral signal (fewer contacts engaging). The four-layer structure still works; the weights shift.

Can ICP scoring be fully automated? +

The data collection and the scoring application can be automated, and platforms like ReachIQ ICP Matching handle that. The ICP definition, the weights, and the threshold still need human ownership. The mistake is treating the automation as the whole job; the platform removes the operational overhead, not the strategic thinking.