Intermediate12 min readObject-Oriented Foundationslive prototype

equals(), hashCode() & Identity

Reference identity (same object?) vs value equality (same contents?), and the hashCode contract that makes hash-based collections work.

The idea

What it is

Two objects can hold exactly the same data and still not be 'equal' — it all depends on what equal means. There are two very different questions hiding behind that one word: are these the same object? (identity) and do these hold the same contents? (value equality). Mixing them up is the source of some of the most baffling bugs in object-oriented code.

Picture two printed copies of the same boarding pass. They show identical details — same name, same seat, same flight — but they are two separate pieces of paper. Asking "is this literally the same sheet of paper?" is identity. Asking "do these two passes say the same thing?" is value equality. Most of the time you care about the second question, but the default behavior in most languages answers the first.

The one rule that ties it all together

If two objects are equal, they must produce the same hashCode. Break that one rule and your objects start vanishing inside HashSets and HashMaps — found one moment, lost the next.

Mechanics

How it works

Two kinds of "equal"

When you compare two objects, you have to decide which question you're asking:

  • Reference / identity equalityare these two variables pointing at the very same object in memory? In Java this is p1 == p2; in Python it's p1 is p2. It compares addresses, not contents, so two separately created objects are never identity-equal even if every field matches.
  • Value equalitydo these two objects hold the same contents? This is p1.equals(p2) in Java, p1 == p2 in Python, p1 == p2 (operator overload) in C++. By default, most languages make value equality fall back to identity — so until you override it, equals just checks == and reports false for two distinct objects.

That default is the trap. You create Point(2, 3) twice, you mean them to be the same point, but the language compares addresses and says they're different.

Overriding equals to compare by value

To make two equal-looking objects actually count as equal, you override the equality method to compare the fields you care about instead of the memory address:

Point.java
@Override
public boolean equals(Object o) {
    if (this == o) return true;            // same object → trivially equal
    if (!(o instanceof Point p)) return false;
    return this.x == p.x && this.y == p.y; // compare by VALUE
}

Why hashCode must agree with equals

Hash-based collections like HashSet and HashMap don't scan every element to find a match — that would be slow. Instead they call hashCode() to compute a number, use it to pick a bucket, and only compare objects within that bucket using equals(). So a lookup is really two steps: which bucket? (hashCode), then is it really here? (equals).

That two-step dance only works if hashCode and equals agree. If two objects are equals-equal but land in different buckets because their hash codes differ, the collection looks in the wrong bucket, never finds the match, and treats them as different. That's why the contract demands they move together.

The equals / hashCode contract

A correct equals must satisfy four properties, and hashCode must stay consistent with it:

  • Reflexivex.equals(x) is always true. An object equals itself.
  • Symmetric — if x.equals(y) is true, then y.equals(x) must be true too. Equality can't depend on which side you start from.
  • Transitive — if x.equals(y) and y.equals(z), then x.equals(z) must hold. Equality chains together.
  • Consistent — repeated calls return the same answer as long as the objects don't change. No random results.
  • Equal ⇒ equal hash — if x.equals(y), then x.hashCode() == y.hashCode(). This is the rule that keeps hash collections working.
  • *Unequal may collide — the reverse is not required: two unequal objects are allowed to share a hashCode. That's a normal collision*, and equals resolves it inside the bucket. Good hash functions just keep collisions rare.

The classic trap: override one but not both

If you override equals to compare by value but forget hashCode (or vice-versa), your two equal points compute different hash codes, land in different buckets, and the HashSet never realizes they're duplicates — so a 'duplicate' sneaks in and set.contains(p) returns false for a point you know you added. Always override the two together. This is exactly the bug the prototype reproduces.

What breaks in HashSet / HashMap

Concretely, with equals overridden but hashCode left as the default (identity-based), here's the failure: you add p1 = Point(2,3), then add p2 = Point(2,3). They're equals-equal, so the set should keep just one. But their default hash codes differ, so they hash to different buckets; the set never compares them with equals, and you end up with two 'equal' points in a set that's supposed to hold no duplicates. Later, set.contains(new Point(2,3)) may return false even though the point is clearly in there — it's looking in the wrong bucket.

How each language spells it

  • Java — override boolean equals(Object) and int hashCode() together. Objects.equals(...) and Objects.hash(...) make this easy and null-safe.
  • Python — define __eq__(self, other) for value equality and __hash__(self) for hashing. Note: defining __eq__ without __hash__ makes the class unhashable (Python sets __hash__ = None), so you can't put it in a set or use it as a dict key until you add __hash__.
  • C++ — overload operator== for value equality, and specialize std::hash<Point> so the type can be a key in std::unordered_set / std::unordered_map.
  • JavaScript — there is only reference equality (=== / == compare object references). There's no equals/hashCode hook and Set/Map key on reference identity, so for value equality you must compare fields manually (or key your Map by a serialized string like `${x},${y}`).

Let the language write it for you

Modern value types generate both methods automatically: Java records (record Point(int x, int y) {}), Kotlin/Scala data classes, Python's `@dataclass` (and @dataclass(frozen=True) to make it hashable), and C# records all derive a correct, field-based equals and hashCode for you. Prefer these for immutable value types and you side-step the whole trap.

Interactive prototype

See it. Build it. Break it.

A sandboxed, hands-on simulation — no setup, no install. Play with it as you read.

About this simulation

Two Point objects hold the same values (2, 3) but live at different addresses. Flip Override equals() + hashCode() and watch the IDENTITY checks change, then press Add both to a HashSet — with the override OFF a duplicate sneaks in (size 2); with it ON the set recognizes the duplicate (size 1).

Hands-on

Try these yourself

Open the prototype above, predict what happens, then verify.

try 01

See identity vs value, side by side

With the toggle OFF, read the IDENTITY panel. p1 == p2 is false (different addresses — reference check), p1.equals(p2) is also false (default falls back to identity), and the two hashCode() chips differ. Now flip Override equals() + hashCode() ON: == stays false (it's still two different objects), but .equals() flips to true and both hash codes become equal. That's value equality switching on.

try 02

Reproduce the HashSet bug — then fix it

Leave the toggle OFF and press Add both to a HashSet. Watch p1 and p2 fall into two different buckets and the set size land on 2 — a duplicate sneaked in. Now flip the toggle ON and press it again: p2 hashes to the same bucket, equals recognizes it as a duplicate, and the size drops to 1. Same data, correct behavior — only because hashCode and equals now agree.

try 03

Make them genuinely differ

Edit a coordinate so the two points are no longer the same value (say change p2 to (2, 9)). Re-run the checks: now even with the override ON, .equals() is false and the hash codes differ legitimately — and the HashSet correctly holds 2 points. This shows the override isn't 'forcing' equality; it's comparing the actual contents.

In practice

When to use it — and what trips people up

When to override equals & hashCode

Override them whenever your class is a value type — an object that's defined by what it holds, not which instance it is. Money, coordinates, a date, an ID wrapper, a colour: two of these with identical fields are the same thing, and you'll want to compare them by value, deduplicate them in a Set, or use them as Map/dict keys. The moment a type ends up as a key or inside a hash-based collection, correct equals and hashCode are mandatory.

Entities keep identity; values get value-equality

Not every class should compare by value. A User with a database ID is an entity — two rows with the same name are still different users, so identity (the ID) is the right notion of 'same'. Value objects (Money, Point, Range) are the ones that earn a field-based equals/hashCode. And keep the fields used in hashCode immutable — see the cons.

What it gives you

  • Deduplication just works — equal objects collapse to one inside a HashSet/set, instead of piling up as sneaky duplicates.
  • Objects become reliable map keysmap.get(new Point(2,3)) finds the value you stored under an equal-but-different instance.
  • Value comparisons read naturally — a.equals(b) (or a == b) answers 'same contents?' instead of the rarely-useful 'same object?'.
  • Fast lookups stay fast — a correct hashCode spreads objects across buckets so hash collections keep their O(1) average.

Common mistakes

  • Overriding one but not the other — value equals with a default hashCode (or vice-versa) silently breaks every hash collection; the #1 bug here.
  • Mutable fields in hashCode — if you mutate a field the hash depends on after inserting the object, it moves buckets and gets 'lost' inside the Set/Map.
  • Using identity when you meant value — relying on == / is for value types so two equal objects compare as different (and pile up as duplicates).
  • A broken contract — an equals that isn't symmetric or transitive (a common slip when comparing across subclasses) makes collections behave unpredictably.

Reference

Code & further reading

A minimal reference implementation and pointers worth bookmarking.

// JS/TS has ONLY reference equality — no equals()/hashCode() hook,
// and Set/Map key on object identity. So compare by value yourself.
class Point {
  constructor(public readonly x: number, public readonly y: number) {}

  // structural compare — the helper JS doesn't give you
  equals(o: Point): boolean {
    return o instanceof Point && this.x === o.x && this.y === o.y;
  }

  // a stable string "hash" usable as a Map/Set key
  key(): string {
    return `${this.x},${this.y}`;
  }
}

const p1 = new Point(2, 3);
const p2 = new Point(2, 3);

p1 === p2;        // false  — different objects (reference equality)
p1.equals(p2);    // true   — same contents (manual value compare)

// A native Set keys by reference, so it would keep BOTH.
// Key by value instead, using p.key():
const set = new Map<string, Point>();
set.set(p1.key(), p1);
set.set(p2.key(), p2);   // same key → overwrites, no duplicate
set.size;                // 1  — p2 recognized as equal to p1

References & further reading

5 sources

Knowledge check

Did it land?

Quick questions, answers revealed on submit. Nothing is scored or saved.

question 01 / 05

You create p1 = new Point(2,3) and p2 = new Point(2,3) and have overridden equals/hashCode to compare by value. What do p1 == p2 and p1.equals(p2) return in Java?

question 02 / 05

You override equals on Point to compare by value but forget to override hashCode. You add two equal points to a HashSet. What happens?

question 03 / 05

Which part of the equals/hashCode contract is actually required?

question 04 / 05

Why is it dangerous to compute hashCode from a field you later mutate while the object is inside a HashMap?

question 05 / 05

In JavaScript, you put two distinct objects {x:2,y:3} into a Set. How many elements does the Set contain, and why?

0/5 answered