Develow
A proof-of-work rating for AI-era engineers

We make apps for real world problems

FSD skill rating for AI-native full stack developers. Made for the Hacker Rank, Code Signal, CoderPad AI assisted interviews

Free to start · Mac · Windows · Linux

1760± 95
Product EngineerFSD Rating
Develow IDE — Shopping Cart I · Rated⏱ 12:40
cart.pystore.py
11·def apply_discount(self, code):
12· rule = self._lookup(code)
13· return self.total - rule.amount
14·
15·
16·
Failing8 / 12 tests passed

Powered by the latest models from

AnthropicAnthropic
OpenAIOpenAI

Always up to date with the frontier models from Anthropic and OpenAI.

Grounded in verified research

The research behind the rating

Every number below traces to a primary source — the same standard the rating holds itself to. This is what the science says about measuring developer skill, and how we built for it.

.42

No hiring signal predicts job performance better than ~.42 — and structured, rubric-scored evaluation is what gets closest. Every diff here is graded against a hidden rubric.

Sackett, Zhang, Berry & Lievens · J. Applied Psychology (2022)

0.905

Agreement between live Elo difficulty and gold-standard IRT difficulty once 50 developers have attempted a problem — already 0.70 after just 5.

Pankiewicz · ICCE (2020)

68.3%

Share of SWE-bench tasks experts threw out as flawed. Problem quality decides what a score means — every rated Develow problem passes an audit gate first.

OpenAI · SWE-bench Verified (2024)

26.2%

Best AI model's solve rate on $1M of real freelance full-stack work. Cross-layer depth is what still separates engineers — so that's what we test.

Miserendino et al. · SWE-Lancer (2025)

Self-assessment is broken. We measure shipped work.

In a randomized trial — 16 experienced open-source developers, 246 real tasks — the same people who felt faster with AI were measurably slower. A rating has to come from the stopwatch and the shipped diff, not the vibes.

-30%0+30%
+20% faster

What developers believed — even after the study

-30%0+30%
19% slower

What the stopwatch measured on the same tasks

METR randomized controlled trial (2025) · arXiv:2507.09089

Learning is the same story — unless you use AI to build understanding

Learned hand-coding67%
Leaned on AI for answers50%

Comprehension quiz scores after learning a new library. The exception: participants who used AI to ask why — conceptual questions, explanations — retained as much as hand-coders. That's the behavior the rating's AI-orchestration signal rewards.

Anthropic randomized controlled trial (2026) · arXiv:2601.20245

One number, honestly reported

Your rating climbs as your uncertainty shrinks

Every rated challenge updates your FSD Rating against the problem's learned difficulty. The shaded band is your confidence interval — it tightens the more you prove.

12001400160018001760± 95Day 1Rated challenges completedDay 90

Illustrative trajectory. The shaded band is the ± rating deviation — it narrows as more rated challenges are completed. Ratings stay provisional until roughly 10–20 rated challenges across 3+ skill areas.

Problems get rated too

We learn how hard a problem is — fast

Every attempt rates the problem back. After five developers try a newly published challenge, its live difficulty already agrees with gold-standard psychometrics at r ≈ 0.70 — by fifty it's 0.905. Until then, the difficulty label says provisional.

0.650.700.750.800.850.90naive solve-rate baseline0.7020.7840.8520.905live Elo difficulty5 devs102050 devsdevelopers who have attempted the problemNEW PROBLEM~1550 ± 200provisional (tier seed)seeded from Silver tier

Median correlation between online Elo difficulty and an IRT graded-response reference, by number of learners sampled per task — Pankiewicz, ICCE 2020 (RunCode: 50,055 attempts). Baseline shows the two published endpoints of naive solve-rate ranking.

Built for the AI era

Four dimensions the rating actually measures

AI changed software development. The engineers who thrive aren't the ones typing the most code — they're the ones who can direct AI and ship with confidence. The rating measures exactly that.

1

Understand systems

Read the problem and the codebase first — know the fundamentals well enough to navigate.

2

Direct AI effectively

Let AI do the heavy lifting while you point it at what actually matters.

3

Verify solutions

Run the tests and confirm the fix holds — because you know what correct looks like.

4

Ship with confidence

Submit production-ready work — and be able to explain exactly why it's right.

Not one skill

A full-stack fingerprint, not a single score

One scalar can't say "great at React, shaky on auth." The rating is multidimensional, so a weak layer shows — and the public number is penalized for being lopsided.

AI Orchestration1860Frontend1810Backend / API1740Testing1700Security / Auth1680Database1510

Sample profile

FSD 1760 ± 95

AI Orchestration1860 ± 100
Frontend1810 ± 90
Backend / API1740 ± 110
Testing1700 ± 120
Security / Auth1680 ± 130
Database1510 ± 160

Where you stand

See exactly where you rank

You
← lower ratedTop 18% of rated developershigher rated →

Sample rating 1760 Product Engineer

Under the hood

How a submission becomes a rating

Grading is server-side against a hidden reference solution — you can't self-grade, and brute-forcing submissions decays their value.

Step 1

Graded 0–100

Your diff is judged against a hidden reference + rubric by the server-side grader.

Step 2

Continuous Elo

That score updates your rating against the problem's learned difficulty — one attempt at a time, and every rated attempt counts.

Step 3

± Uncertainty

A Glicko-style confidence interval tightens as you complete rated challenges — and can't be farmed by tanking.

Step 4

Skill vector

Only the skills a problem exercises move — weighted by its rubric.

Step 5

ML calibration

Offline models recalibrate difficulty and recommend your next challenge.

Try the update rule

You vs. the problem — feel the math

Your rating and the problem's learned difficulty set an expected grade; beating expectation moves you up. Drag the sliders — this is the live update rule, not a mock.

Your rating1760 ± 95
Your uncertainty (± RD)± 95

New accounts start wide (±350) — bigger swings until you're calibrated.

Problem difficulty1550
Problem calibration

Hundreds of attempts have pinned this problem's difficulty — beating it means something.

0%50%100%problem harder →← you're stronger
Expected grade
77%
Ace it
+4
Bomb it
-13

E = 1 / (1 + 10^((Rₚ − Rᵤ) / 400)) — a 1500-rated dev solves a 1500-rated problem about half the time, the same difficulty semantics Codeforces uses. Your step size scales with your own uncertainty and shrinks against uncalibrated problems.

Learn freely, prove deliberately

Practice mode and Rated mode

Practice mode

Learn freely. Build the reps.

Unlimited attempts
Hints and AI fully encouraged
Reveal the reference solution anytime
Little to no rating impact

Rated mode

Prove it. This is what counts.

Time-boxed with hidden tests
Limited submissions — brute force decays
Server-side grading, no self-grading
Moves your FSD Rating and skill vector

Speed is key

A rating that proves you can crack the AI interview

Using the exact interview IDE and AI tools used at software companies, we compiled real-world problems asked by big tech — and turned your performance into a credible, shareable rating.

HackerRankHackerRank
CoderPadCoderPad
CodeSignalCodeSignal

Logos shown to reference the coding-interview formats we grew up on. Not affiliated with or endorsed by HackerRank, CoderPad, or CodeSignal.

Where speed meets maintainability

Download the free Develow IDE

Learn test-driven development with DVL AI — the methodology engineers use to ship 10x faster. Never get stuck again.

Investigate the codebase
Find the root cause
Generate a plan
Explain the architecture
Implement a solution
Teach you how it works

Real work, not exercises

Every rated challenge moves your number

Each mission is a real engineering job, scoped to a focused rep — across React, Node, Python, FastAPI, Go, Postgres and more.

Free question

Get a free Amazon question (2026)

A real Amazon full-stack debugging OA — MovieDB I. Drop your email and we'll send it over, free.

No spam. Unsubscribe anytime.

A clear path to product engineering

No tutorial hell. Just momentum.

  1. 1
    Month 1Day 1

    Build Products

    Learn how modern applications are structured and shipped. Get hands-on with frontend, backend, databases, and Docker through real, runnable projects.

  2. 2
    Month 2Day 30

    Ship Features

    Move beyond tutorials. Add functionality, fix bugs, and navigate real codebases — watching your FSD Rating climb with every rated challenge.

  3. 3
    Month 3Day 60

    Think Like An Engineer

    Work through realistic product challenges and AI-assisted workflows. Walk into any interview with a rating that proves you've already done the work.

0+
Real-world rated challenges
0
Product engineering families
0+
Top stacks — MERN, PERN, FastAPI, Go

Pricing

Upgrade for the full 90-day roadmap

Free

$0/ forever
  • Download the Develow IDE
  • Run sample problems
  • Provisional FSD Rating
  • ×Full problem library
  • ×Full rated-mode scoring

Develow Pro

Most popular
$89.99/ year$119.99

That's 25% off — under $7.50/mo, billed yearly.

  • The full 90-day product engineering roadmap
  • Rated mode + full skill ratings
  • Browse by company
  • AI search across the catalog
  • Everything in Free
Upgrade to Develow Pro →

Cancel anytime. Secure checkout via Stripe.

Built for every background

Frequently asked questions

Whether you come from software, design, business, or product — Develow is built to accelerate your learning rate. Speed is the whole point.

It's a single, honestly-reported number (with a ± confidence interval) that reflects how well you ship correct, verified changes in real codebases across the full stack. It starts provisional and sharpens as you complete rated challenges.

Yes. Missions start small and DVL Agent works right beside you — it can investigate the codebase, explain the architecture, and teach you how a fix works. You learn by shipping real software from day one instead of grinding abstract puzzles.

Absolutely. Designers already think in systems and user outcomes. Develow turns that instinct into shipped features — you'll learn just enough of the stack to build, with AI handling the boilerplate so you move fast.

Perfect fit. Product and business people who can direct AI to ship working software are the new force multipliers. Missions teach you to scope problems, build features, and verify they work — the core of Product Engineering.

Start building your rating today

Work through real-world problems with DVL Agent, earn your FSD Rating, and build the skills companies actually pay for.