Changelog
See what we've been working on. New features and improvements every month.
Homepage hero pill leaks bold markdown
lib/changelog.ts parser rewritten around a single splitOnSeparator walker that finds the first em-dash (—), en-dash (–), or - *outside* parens, and cleanMarkdown now strips bold and ` code consistently for both titles and descriptions. Fixes three real-world regressions on v3.0.11 cards: zod 4.3.6 → 4.4.3 — … (now title zod 4.3.6 → 4.4.3), @sentry/nextjs 10.50.0 → 10.51.0 — … (now @sentry/nextjs 10.50.0 → 10.51.0), and react-email 6.0.4 → 6.0.5 (CLI dev tool only — production rendering uses …)` (no longer mis-split inside the parenthetical aside; rendered as a single title-only card).
6 regression tests added in tests/changelog.test.ts covering bold-prefix-with-trailing-text, bold+code prefix, bold inside descriptions, em-dash inside parens, multi-bold lines without a top-level dash, and hyphenated package names like lucide-react. Suite is 23/23 green; full project test run 616/616.
AI SDK trio
ai 6.0.168 → 6.0.174, @ai-sdk/openai 3.0.53 → 3.0.58, @ai-sdk/react 3.0.170 → 3.0.176. ai@6.0.171 ships an MCP prototype-pollution fix; ai@6.0.170 adds allowSystemInMessages (we use the dedicated system: parameter in lib/feeds/digest-intro.ts, so no warning fires). No call-site changes required.
zod 4.3.6 → 4.4.3
only minor in window is 4.4.0 (.merge() with refinements throws, z.url() rejects malformed forms, base64 rejects whitespace, JSON Schema strips redundant id). No call sites trigger the new tightenings; OpenAPI spec output is byte-identical (gen:openapi --check passes).
AWS SDK
@aws-sdk/client-s3 and @aws-sdk/s3-request-presigner 3.1038.0 → 3.1041.0. Daily releases targeting unrelated services; no S3 client or presigner changes.
@sentry/nextjs 10.50.0 → 10.51.0
internal "span-first APIs" migration for edge/server processors, span streaming filter, Turbopack detection attribute. No withSentryConfig or instrumentation hook changes.
@tanstack/react-query 5.100.6 → 5.100.9
devtools-only patches (Angular theme option, onClose type fix, NoInfer restore on persister TQueryKey). No runtime semantics changed.
@scalar/api-reference-react 0.9.29 → 0.9.32
bundled engine moves to @scalar/api-reference@1.55.1, deepObject query-param display fix, internal slugger refactor.
Fumadocs
fumadocs-core/fumadocs-ui 16.8.5 → 16.8.7. TOC hotfix in fumadocs-ui and Clerk-style RTL layout fix in 16.8.6.
@stripe/stripe-js 9.3.1 → 9.4.0
adds ReleaseTrain type and hashedValue overload to handleNextAction. (No source imports — server-side stripe@22.1.0 handles all flows; checkout uses Stripe-hosted redirect.)
react-hook-form 7.74.0 → 7.75.0
dirtyFields now prunes empty-false nodes, useWatch no longer re-renders on unrelated field validation, TypeScript 6.0 support added. We only read formState.errors and formState.isValid, so the dirtyFields shape change is inert.
lucide-react ^1.12.0 → ^1.14.0
additive icons only (waves-vertical, repeat-off). No renames or removals across our 90 import sites.
react-email 6.0.4 → 6.0.5 (CLI dev tool only — production rendering uses @react-email/render@2.0.8 pinned separately).
nanoid 5.1.9 → 5.1.11
fixes a regression where requesting a very large ID broke generation.
react-qr-code 2.0.18 → 2.0.21, ufo 1.6.3 → 1.6.4 (withoutBase() slash-collapse fix), use-stick-to-bottom 1.1.3 → 1.1.4.
biome ^2.4.13 → ^2.4.14
adds three new lint rules (useTestHooksOnTop, noReactStringRefs, useMathMinMax); none auto-fire on this codebase.
postcss 8.5.12 → 8.5.13
postcss-scss comment regression fix (we don't use SCSS, no impact).
AWS SDK @aws-sdk/client-s3 and @aws-sdk/s3-request-presigner 3.1033.0 → 3.1038.0 (additive client updates, retry-on-200 fix).
Stripe Node 22.0.2 → 22.1.0
pinned API version moves to 2026-04-22.dahlia. New enum values across BalanceTransaction.type, PaymentMethod.type, tax-id types, and invoice/subscription payment_method_types (now includes pix/upi). Adds parseEventNotificationAsync and async stack-trace preservation.
@stripe/stripe-js 9.2.0 → 9.3.1
adds contactDetails element and PMC update support.
@sentry/nextjs 10.49.0 → 10.50.0
additive integrations (Effect v4 beta, Hono Bun) and LCP/console fixes.
tRPC 11.16.0 → 11.17.0 across @trpc/{client,server,react-query}
fixes React 19 proxy coercion in createInnerProxy and adds subscription inference helpers.
@tanstack/react-query 5.99.2 → 5.100.6
adds retryOnMount callback, fixes hydration for resolved promises and infinite-query AbortSignal propagation.
better-auth 1.6.5 → 1.6.9
preserves response headers when APIError is thrown, fixes Partitioned cookie forwarding, organization team field inference, and OAuth profile email fallback.
Fumadocs fumadocs-core/fumadocs-ui 16.8.1 → 16.8.5, fumadocs-mdx 14.3.1 → 14.3.2
Shiki lazy-mode language fix, locale page-tree fix, schema type narrowing.
@scalar/api-reference-react 0.9.24 → 0.9.29
bundled @scalar/api-reference engine moves 1.49 → 1.53 (kepler theme position:fixed flare fix, several SSR/hydration fixes, lazy rendering, pre/post request scripts).
react-hook-form 7.73.1 → 7.74.0
adds setValues API; fixes nested-field unregister and valueAsNumber NaN coercion.
react-email 5.2.10 → 6.0.4 (CLI dev tool only — @react-email/components and @react-email/render imports continue to work).
@react-email/render 2.0.7 → 2.0.8
strips nul bytes from renderToPipeableStream output to prevent multi-byte char truncation in feed-digest emails.
lucide-react ^1.8.0 → ^1.12.0
new icons, ESM bundles switch to .mjs.
@marsidev/react-turnstile 1.5.0 → 1.5.1
fixes params being passed to turnstile.execute().
biome ^2.4.12 → ^2.4.13
noUselessEscapeInRegex v-flag fix and organizeImports declare-module sort fix.
postcss 8.5.10 → 8.5.12
fixes "reading any file via user-generated CSS" and nested-bracket parsing performance.
Lower-frequency worker schedules
job enrichment and company resolution now run hourly in low-cost MVP mode, and embedding generation runs every 6 hours instead of every 5 minutes.
Lower-frequency Vercel cron schedules
monitor checks now run twice daily, push/email stay hourly, and TTL expiry moves to a midnight UTC daily slot.
Worker documentation synced
README and agent docs now reflect the low-cost schedules for the active pipelines.
Vercel Cron route compatibility
internal cron routes now expose GET handlers in addition to POST so Vercel Cron can invoke them directly while keeping CRON_SECRET auth.
Nested source company names
company resolution now safely extracts names from raw company objects such as { company: { name: "Atari" } }, including slug-conflict recovery.
Greenhouse ATS domain detection
job-boards.greenhouse.io is now treated as an ATS host instead of a company domain in both app and worker normalizers.
Company enrichment test fixtures
stale enrichment tests now account for the current daily-limit query shape.
Zod-driven OpenAPI generator
scripts/generate-openapi.ts builds public/api/openapi.json from Zod schemas via zod-openapi@5.4.6. Wired into prebuild (Vercel regenerates on every deploy) and lint-staged (auto-regen + restage on schema/route edits).
Drift-detection tests (tests/openapi/)
13 Vitest unit tests cover: spec is valid OpenAPI 3.1, all 18 operationId values present, route↔spec parity (walks app/api/v1/, excluding cron), deterministic output, magic-string + limit validation regression coverage.
Documented query params
GET /search/suggest, /feeds/{id}/jobs, /feeds/{id}/export, /companies/export, /companies/{id}/jobs had inline private query parsing. Now centralised in schemas/api-v1-queries.ts and surfaced in the spec.
Documented filter fields
salary_unit, source_name, status, processing_status, tech_stack were already validated by feedFiltersSchema but absent from the published spec. The generated spec now reflects what the API actually accepts.
?since=last_poll regression caught pre-ship
earlier in this branch, switching since to z.string().datetime() would have broken the magic string used by /feeds/{id}/jobs and /feeds/{id}/export to fetch jobs since the last poll. Schema is now z.union([z.literal("last_poll"), z.string().datetime()]). Regression tests in tests/openapi/queries.test.ts.
/api/openapi.json 500 in dev mode
Next.js was complaining about a conflicting public file and route handler at the same path. Removed the redundant CORS-header route handler; Scalar fetches same-origin so CORS was never needed. The static file at public/api/openapi.json now serves directly with sane default cache headers.
/api/v1/templates envelope consistency
list response now includes request_id like every other v1 endpoint.
Spec version drift
info.version is now read from package.json, killing the manual sync step that produced the v2.4.0/v3.0.7 mismatch.
**Feed response shape
snake_case + secret stripped — POST /feeds, GET /feeds, GET /feeds/{id}, PATCH /feeds/{id} previously returned raw Drizzle rows: camelCase fields (deliveryMode, deliveryFrequency, createdAt, updatedAt, lastDeliveryAt, lastPollAt) and the webhook signing secret. Now they return snake_case via the new formatFeedForApi() shaper, consistent with jobs/companies. Breaking** for any consumer reading data.deliveryMode etc.; switch to data.delivery_mode. Public response fields are now: id, name, filters, delivery_mode, delivery_frequency, url, enabled, template_id, last_delivery_at, last_poll_at, created_at, updated_at. Fields explicitly stripped from responses: secret (security — write-only), organizationId (internal multi-tenant key), events, failureCount, emailSchedule, emailFormat, emailRecipient, lastEmailAt (internal state and undocumented features).
GET /feeds/{id} stats.last_delivery_at removed
same value is now at the top-level data.last_delivery_at. The stats object only carries deliveries_sent.
?limit= validation
was permissive (clamped out-of-range to bounds, fell back to default on NaN, truncated decimals). Now strict: out-of-range, non-integer, or empty limit returns 422 with a clear error instead of silently coercing.
?since= validation
was permissive (any string new Date() could parse). Now strict ISO 8601 datetime, with the literal "last_poll" accepted as a magic value. Date-only strings like 2024-01-01 return 422.
Error envelope schema
published spec now correctly nests errors and retry_after inside error.{} (matches lib/api/v1/errors.ts runtime shape).
Status code for invalid format on /feeds/{id}/export
was 400 bad_request, now 422 validation_error. Consistent with every other v1 validation error.
Spec serving
public/api/openapi.json is now a generated artifact, not hand-edited. Editing it directly fails CI via the gen:openapi:check lint hook. Regenerate with bun run gen:openapi.
Dependency updates (#171, #172)
Biome lint warnings (#171)
low-risk refactors (unused vars/params, optional chaining, safe type casts) to bring the repo back to a clean lint state.
Repository doc drift (#173)
command docs now distinguish headless Playwright runs from the Playwright UI, cron route docs match the actual /api/v1/... POST endpoints, the embedding worker schedule notes the real 5-minute cron + daily token budget, and the granular email scheduling design doc is linked from the main repo entry points.
22-package dependency patch sweep (#168)
better-auth 1.6.1→1.6.3, @tanstack/react-query ^5.96.2→^5.99.0, @sentry/nextjs ^10.47.0→^10.48.0, ai ^6.0.154→^6.0.161, @ai-sdk/openai ^3.0.52→^3.0.53, @ai-sdk/react ^3.0.156→^3.0.163, @stripe/stripe-js 9.1.0→9.2.0, resend 6.10.0→6.11.0, @react-email/components 1.0.11→1.0.12, @react-email/render 2.0.5→2.0.6, @scalar/api-reference-react ^0.9.20→^0.9.22, @aws-sdk/client-s3 + @aws-sdk/s3-request-presigner ^3.1027.0→^3.1030.0, fumadocs-core / fumadocs-ui 16.7.11→16.7.16, fumadocs-mdx 14.2.11→14.2.14, @fumadocs/content-collections 1.2.7→1.2.9, @biomejs/biome ^2.4.10→^2.4.12, @types/node 25.5.2→25.6.0, autoprefixer 10.4.27→10.5.0, dotenv 17.4.1→17.4.2. Every bump audited against our call sites before merging (see PR #168 body). 554 unit tests still pass.
16-package pre-release dep bump (unreleased, commit d15b038)
next 16.2.2→16.2.3 (security fix), react / react-dom 19.2.4→19.2.5, better-auth 1.6.0→1.6.1, stripe 22.0.0→22.0.1, ai ^6.0.149→^6.0.154, @ai-sdk/openai ^3.0.51→^3.0.52, @ai-sdk/react ^3.0.151→^3.0.156, @aws-sdk/client-s3 + presigner ^3.1025.0→^3.1027.0, fumadocs-core / fumadocs-ui 16.7.10→16.7.11, @next/bundle-analyzer 16.2.2→16.2.3, @testcontainers/postgresql + testcontainers 11.13.0→11.14.0, postcss 8.5.8→8.5.9.
Lucide + Vitest (unreleased, commit a82a434)
lucide-react 1.7.0→^1.8.0, vitest 4.1.3→^4.1.4.
packageManager
bun@1.3.11 → bun@1.3.12.
Documentation drift
README.md now reflects 34 DB tables (was 33), stale LICENSE reference removed (file deleted in 36d4003). CLAUDE.md corrects API-key attribution — we use a custom resolver in lib/feeds/api-keys.ts, NOT the Better Auth apiKey plugin (which is not registered in lib/auth/index.ts) (#169). CLAUDE.md, ROADMAP.md, and the historical v2.7.5 CHANGELOG.md entry corrected from "19 retryable patterns" to "18" (matches code: 18 regex patterns in lib/pipeline/enrichment-retry.ts). ROADMAP.md test count updated to 554 across 40 files (was 508 across 36).
CI reproducibility (#168)
.github/workflows/deploy-enrichment-worker.yml now pins bun-version: 1.3.12 on oven-sh/setup-bun to match packageManager, preventing drift between local and CI Bun versions.
cropperjs 1.6.2
v2 is a Web Components rewrite. react-cropper@2.3.3 is unmaintained (last commit Sep 2023) and still pins cropperjs ^1.5.13. The CSS path, getCroppedCanvas(), and ref pattern we use are gone in v2. Upstream issue requesting v2 support has no activity. Tracked in TODOS as P2.
drizzle-orm 0.45.2 / drizzle-kit 0.31.10
these ARE the latest stable on the 0.x line. 1.0.0-beta introduces a migration-system overhaul; plan a dedicated PR when 1.0 GAs.
stripe 22.0.1
22.1.0-beta is public-preview of an unreleased Stripe API version. We just migrated to v22 in v3.0.2.
ai / @ai-sdk/* majors
v7 beta / v4 beta pre-stable, ESM-only. Wait for GA.
react-email (CLI) 5.2.10
v6 is a major restructuring (consolidates per-component packages) and needs a dedicated migration PR.
next-themes 0.4.6
Bun patch retained until upstream pacocoursey/next-themes#386 merges.
554 tests across 40 files
34 user tables (lib/db/schema/tables.ts + __drizzle_migrations)
15 active scrapers (+1 archived: JobDataAPI)
18 OpenAPI operations across 15 paths
30 feed filter fields (Zod feedFiltersSchema)
18 retryable enrichment error patterns
5-level job monitoring pipeline
L0 (time expiry), L1 (HEAD status), L1.5 (bot/CAPTCHA detection), L2 (keyword scan), L3 (AI classifier). Easy cases caught for free, AI only for ambiguous pages (~10% of checks)
L1.5 bot detector
detects Cloudflare challenges, reCAPTCHA/hCaptcha, 403 forbidden, JS-only shells, empty pages. Flags blocked jobs for future L4 (Apify rendering)
L2 keyword scanner
30+ regex patterns for closed/filled/expired job pages plus active confirmation patterns. Catches ~55% of status changes at zero cost
Fixed L3 AI classifier
replaced broken dynamic import("zod") with static Zod schema, limited input to 2K chars (was 50K), added structured output with reason field
Adaptive backoff
6h for fresh jobs (0-3 days), 12h for settled (3-14 days), 24h for stale (14-30 days), 72h for very old (30+ days). Max 8 attempts
Structured monitoring detail
monitorLevel and monitorDetail columns on jobTable track which level caught each job and what keyword matched
monitor_log table
schema created for per-check audit trail (writes not yet wired in the pipeline)
Monitor API richer stats
blocked count and levelBreakdown in response
Monitor pipeline rewrite
monitor-jobs.ts now runs L0→L1→L1.5→L2→L3 sequentially, short-circuiting at the first conclusive level. ~90% of checks resolved without AI
Monitor route
logs and returns blocked + levelBreakdown for observability
Broken Zod import in AI classifier
import("zod") dynamic import never worked; replaced with static z.object() schema
L3 AI classifier
now strips HTML tags before sending to AI (was sending raw HTML, would mostly be <head> metadata)
L2 keyword patterns
bounded greedy .* to .{0,80} to prevent false positives matching across long pages
L1.5 bot detector
cached regex matches to avoid double scans; gated empty-page check behind content-type to avoid false positives on JSON/binary responses
Monitor pipeline
handle 405/501 HEAD rejection by proceeding to GET; wrap URL parsing in try/catch; map private_host/invalid_url to monitorStatus: "no_url"; use inArray instead of raw SQL
Apify schedule two-way sync
toggling a source on/off in the admin UI now enables/disables the corresponding Apify schedule via their API. Previously the toggle only updated the database, leaving actors running on Apify's cron
Webhook rejection for inactive sources
webhooks from paused or archived sources are now rejected at the handler level (HTTP 200 with source_inactive status). No more silent data ingestion from disabled sources
Cache invalidation for non-active sources
the webhook handler's source cache no longer stores paused/archived sources, so pause takes effect immediately instead of after 60s TTL
Source resolution
resolveSource() now returns both name and status (was name-only). Queries all sources regardless of status (was filtered to active-only)
Admin jobs: 4 new filter dropdowns
Salary Unit, Source, Status, Processing. Source/Status/Processing are admin-only. Full DB/API/UI filter parity (30 feed filter fields)
Clickable stat cards
Total/Active/Closed/Draft cards on admin jobs page now filter the table by status. Active state with highlighted border
Job detail: 8 new fields
status badge, salary unit, first scraped date, last updated date, source type, external job ID, processing status, processed date, tags (jsonb). Full DB/UI parity (46 fields in detail select)
Admin jobs default view
now shows only enriched jobs (processing_status=completed) instead of all 52K jobs. Hides "Untitled" draft/failed jobs from default browse. Explicit filter dropdowns or stat card clicks override the default
Feed filter schema
status and processing_status use Zod enums matching Postgres enums (type-safe at API boundary)
Fuzzy search fallback
now respects the same status/enrichment filter as the main query (was previously unfiltered)
Hydration error
CompanyLogo fallback changed from <div> to <span> to fix React hydration warning when rendered inside SheetDescription's <p> tag
Stripe SDK v22
upgraded from v21.0.1. Migrated type annotations in checkout and subscription billing code (SessionCreateParams → Parameters utility type, UpcomingInvoice → Invoice, StripeError → InstanceType). Pinned to exact 22.0.0 to prevent silent API version drift
16 dependency updates
AI SDK (ai 6.0.145, @ai-sdk/openai 3.0.50, @ai-sdk/react 3.0.147), AWS SDK 3.1023.0, Scalar API docs 0.9.19 (fixes production crash), react-hook-form 7.72.1 (fixes dirty state bugs), fumadocs 16.7.10, TanStack React Query 5.96.2, slugify 1.6.9, Playwright 1.59.1, TypeScript types, dotenv 17.4.0
company_ats table (#154)
DB-driven ATS platform connections. Links companies to ATS slugs (1:many, nullable companyId). Unique constraint on (platform, slug). Status tracking (active/empty/dead), discoveredVia provenance, job count + last scraped timestamps
Internal scraper API (#154)
3 CRON_SECRET-protected endpoints replacing per-scraper Apify KV stores: GET /api/internal/scraper/slugs (read by platform/status), POST /api/internal/scraper/slugs/report (upsert scrape results), POST /api/internal/scraper/slugs/discover (submit new slugs)
Admin ATS Connections card (#154)
inline add/delete UI on company detail page with platform-specific board URL generation (8 ATS platforms), status badges, external links, job count display
ATS connections in public API (#154)
ats_connections array in GET /v1/companies/:id response
Greenhouse slug seed script (#154)
scripts/seed-greenhouse-slugs.ts imports 7,101 pre-discovered slugs from Apify KV store into company_ats
Greenhouse scraper (#154)
migrated from Apify KV store to DB-driven slug management via internal API. Reads slugs from API, reports results back, discovers new slugs via Common Crawl. Seed slugs merged into current run's poll list for immediate scraping
Greenhouse actor input schema (#156)
added api_base and cron_secret fields to Apify input schema. cron_secret is required and marked as secret. Apify schedule updated with prod credentials
Admin company router (#154)
createAtsConnection uses upsert (onConflictDoUpdate on platform+slug) to handle linking existing seed slugs to companies
Admin sources page (#157)
replaced static status Badge with interactive Switch toggle (optimistic UI). Removes redundant pause/play action button. Both desktop and mobile layouts updated
DB schema
33 user tables (up from 32). New company_ats table with indexes on (platform, status) and companyId
v3 pipeline architecture
scrapers do transport only, AI extracts all fields. Design doc: docs/designs/v3-architecture.md
externalId-based dedup
{source_name}_{source_job_id} replaces content hash + fuzzy title/company matching. Exists → UPDATE (reset processingStatus if raw data changed), new → INSERT
Source-aware enrichment prompts
buildExtractionPrompt(rawJson, sourceName) tells GPT-4o-mini what ATS format it's looking at. Raw JSON >50K chars gets truncated with title+description fallback
Set reconciliation closure
lib/pipeline/closure.ts diffs source_job_id sets between scrapes. 2-miss grace period via consecutiveMisses column. Gated on webhook_log.status = 'completed'
90-day TTL expiry cron
app/api/v1/cron/expire-jobs/ marks jobs with no scrape activity >90 days as expired
Job status module
lib/pipeline/job-status.ts extracted from deleted lib/enrichment/
Scraper v3 contract tests
106 static analysis tests validating all 15 scrapers output {source_job_id, source_name, url, raw}
New tests
ingest dedup (21), webhook validation (20), enrichment prompts (13), enrichment pipeline (15), process-jobs (37), closure (12), scraper contract (106)
All 15 scrapers migrated to v3 format
output {source_job_id, source_name, url, raw} only. ~3,000 lines of convertToRawJob() field mapping deleted
Enrichment worker
v3 path reads rawSourcedJobData + sourceName from claimed jobs. Legacy fallback for pre-v3 jobs using description/title string
Webhook handler
validates source_job_id presence, rejects items without it, tracks rejected count in webhook log. Runs closure reconciliation on completed scrape runs
Jobvite re-enabled
now v3 format with all other scrapers (was disabled at 24 jobs)
Test count
508 tests across 36 files (up from 379 across 34)
DB schema
consecutiveMisses integer column on job table, closed + expired added to JobStatus enum
Safe dependency refresh
updated Next.js to 16.2.2, @next/bundle-analyzer to 16.2.2, Playwright to 1.59.0, Fumadocs to 16.7.9, TanStack React Query to 5.96.1, and the React Email patch trio (@react-email/components, @react-email/render, @react-email/tailwind)
Documentation sync
aligned README, quickstart, AI/chat docs, source inventory, and enrichment docs with the current v2.7.5 codebase and routing structure
lib/enrichment/
entire directory deleted (extract.ts, prompt.ts, schema.ts, index.ts). CF Worker is sole enrichment path
/api/internal/ingest
deleted, webhook handler is sole ingest entry point
/api/internal/process
deleted, CF Worker is sole enrichment path (no Vercel fallback)
lib/pipeline/process-jobs.ts
Vercel-side enrichment fallback deleted
Content hash dedup
replaced by externalId-based upsert
Fuzzy title+company matching
replaced by externalId-based upsert
Per-scraper field mapping
~3,000 lines of convertToRawJob(), convertListingToRawJob(), convertDetailToRawJob() functions across all 15 scrapers
Auth E2E smoke spec
updated Playwright auth checks to match current auth metadata (Create an account) and the actual card-title DOM structure used by the auth UI
Oracle HCM Cloud job source (#149)
reverse-engineered Oracle Recruiting Cloud REST API (/hcmRestApi/resources/latest/recruitingCEJobRequisitions). No auth required. Multi-tenant ATS architecture with KV state persistence across 11 FA pod variants. Dual discovery: Common Crawl (20 indexes × 11 pods) + Wayback Machine CDX fallback. DNS wildcard pollution filtering for Wayback results (4-char slug validation). 535 active companies discovered, 32K+ jobs in pool. Input-sanitized siteNumber via VALID_SITE_RE. Actor 0oejH2uT5fnHgeUtc, 12h schedule (6am/6pm UTC), webhook 1Xwc4iQIAO3yDTuCd
Enrichment retry system (#150)
new lib/pipeline/enrichment-retry.ts with classifyEnrichmentFailure() (18 retryable error patterns: 500, 429, quota, timeout, network errors), buildFailureMetadata() (exponential backoff: 15min → 1h → 6h → 24h, max 7 retries), getRetryCountFromMetadata(). Both lib/pipeline/process-jobs.ts and workers/enrichment/src/process-jobs.ts now requeue retryable failed jobs before each batch, prioritize retry jobs in claim queries
Writing category expansion (#150)
lib/feeds/filter-builder.ts now expands writing/content category filters across 4 fields (title, occupational_category, department, skills) using word-boundary regex matching. New constants: WRITING_CATEGORY_TOKENS, WRITING_TITLE_SIGNALS (13 terms), WRITING_OCCUPATIONAL_CATEGORY_SIGNALS (16 terms), WRITING_DEPARTMENT_SIGNALS, WRITING_SKILL_SIGNALS
Writing-aware enrichment prompts (#150)
both lib/enrichment/prompt.ts and workers/enrichment/src/prompt.ts updated to prefer specific writing-family categories (Technical Writing, Copywriting, Editorial, Communications, SEO, Medical Writing, Journalism) over generic "Marketing" or null
Writing re-enrichment script (#150)
scripts/requeue-writing-reenrichment.ts targets active writing jobs with blank/generic occupational_category and requeues them for enrichment with updated prompts. Supports --dry-run and --limit flags
New tests (#150)
tests/feeds/category-matching.test.ts (5 tests for writing category expansion), tests/pipeline/enrichment-retry.test.ts (2 tests for failure classification + retry metadata), updated tests/pipeline/process-jobs.test.ts (retry requeue + priority ordering), updated workers/enrichment/src/process-jobs.test.ts (152 additions for worker retry logic)
Multi-Tenant ATS Pattern in skill docs
148-line section added to .claude/skills/new-job-source/SKILL.md covering architecture, KV state, 3-source discovery (CC + Wayback + robots.txt), time budgets, unknown classification caps, scrape modes, 15K ingest cap
Scraper count
14 active sources (up from 13), 15 total actors in apify-scrapers/
Active sources in DB
oracle added with actor ID 0oejH2uT5fnHgeUtc
Test count
379 tests across 34 files (up from 370 across 32)
Enrichment error handling
both pipelines now use buildFailureMetadata() + classifyEnrichmentFailure() instead of hardcoded { lastError, failedAt } objects. Failed jobs get retryable, retryCount, nextRetryAt, lastErrorType metadata
Job claim priority
retry-due jobs are prioritized ahead of content-role priority and chronological ordering in both lib and worker pipelines
Biome lint cleanup
removed 3 biome-ignore comments from filter-builder.ts
JazzHR job source
HTML scraping of {slug}.applytojob.com career pages + JSON-LD JobPosting extraction from detail pages. 2,696 seed company slugs discovered via crt.sh, Brave, Google, Wayback Machine, and Common Crawl. Self-healing pattern: classifies boards as active/empty/dead, rotates dead sample daily. Seed-only fast path for quick test runs (3s vs 10+ min for full store). Actor L8IXtCnjtPNUzKy7w, 12h schedule, webhook wired to ingest pipeline. JazzHR (Employ Inc) powers SMB hiring — complements Lever (mid-market) and Jobvite (enterprise, disabled)
Scraper count
13 active sources (up from 12), 14 total actors in apify-scrapers/. 109K+ active jobs
Active sources in DB
jazzhr added with actor ID L8IXtCnjtPNUzKy7w
Concentrix job source (#133)
reverse-engineered WordPress REST API (Talkpush JDQ plugin at jdq/v1/search). ~1,933 jobs, no auth, paginated. Apply URLs → Workday. Type normalization (full_time → "Full Time", permanent → "Full Time"). Actor MAfceoYD9uN48sR0P, 12h schedule
Teleperformance job source (#134)
reverse-engineered Umbraco CMS REST API (/Umbraco/Api/Careers/GetCareersBase). ~2,192 jobs, no auth, paginated. Apply URLs → iCIMS via short.sg. Actor xnUX4LT7uyMJgyhgi, 12h schedule
Scraper count
12 active sources (up from 10), 13 total actors in apify-scrapers/
Feed filter security hardening
escapeRegex() prevents regex injection in category/title filters, escapeIlike() prevents wildcard injection in company/location ILIKE filters. Zod schema enforces ^[a-zA-Z0-9 /&,._-]+$ regex allowlist on category values. Client-side validation on filter editor with inline error messages
Feed filter UX improvements
lazy export (tab renders on first visit), notification form field persistence, deduplication of filter arrays, filter preview count before save, create-feed page validates filters with same Zod schema
Stored XSS prevention
company_size and company_funding_stage feed filters validated at save time (not just query time)
Scraper limits removed
all 10 active scrapers now default to Infinity (no artificial cap). Input schemas have no maximum constraint on limit field. Apify schedule inputs no longer pass a limit. The only real constraints are: date freshness (max_age_days/max_age_hours), content-hash dedup, and 15K webhook ingestion cap. Jobvite schedule disabled (dead source)
DynamiteJobs max_age_days
schedule input corrected from 1 to 7 days
FTS category matching
feed category filter changed from broken content_search @@ ... full-text approach to occupational_category ~* ... regex matching (10x more accurate for writing/content jobs)
Email digest cron
category filter in cron route updated to match tRPC regex approach (was still using old FTS path)
Invalid regex crash
invalid regex patterns in feed filters now caught with try/catch instead of crashing the query
CC+PDL+browser enrichment pipeline
replaces Apollo as primary company enrichment source. 4-layer approach: PDL free dataset (35M companies, industry/size/location/LinkedIn), tech detection from Common Crawl WARC records (4,036 Wappalyzer + CSP patterns), AI extraction via GPT-4o-mini (description/keywords/socials), agent-browser fallback for SPAs
Company discovery from Common Crawl
1,282 companies extracted from Ashby ATS pages in CC March 2026 crawl, 1,186 seeded to DB with 16 domain backfills
Tech stack detector (scripts/cc-extract/lib/tech-detector.ts)
7-layer detection: Wappalyzer rules (2,809 technologies) + CSP header parsing + cookies + preconnect/DNS-prefetch + inline script globals + external host detection + data attributes. Relevance filter drops noise (HTTP/3, HSTS, jQuery, embedded video)
10 enrichment scripts in scripts/cc-extract/
discover, seed, enrich-from-pdl, enrich-from-cc, backfill-pdl-state, fix-pdl-casing, tech detection tests, AI enrichment tests
Wappalyzer detection rules
3,931 technology definitions in data/wappalyzer/ (MIT license, Lissy93 fork)
Dependencies
updated 4 packages: AWS SDK 3.1020, @stripe/stripe-js 9.0.1, resend 6.10
CF Worker Phase 2 disabled
Apollo enrichment removed from cron pipeline. Worker still handles Phase 1 (resolve), Phase 3 (dedup), Phase 4 (stats). Enrichment now runs locally via scripts/cc-extract/
Admin enrichment dashboard
Apollo budget card replaced with CC+PDL provider info card. Removed unused Progress import
Admin enrichment router
updated worker health type to match new stats endpoint (removed dailyApiCalls/dailyLimit)
Shell injection in fetchLive
replaced execSync with execFileSync + domain regex validation (CC data is untrusted)
--limit arg parsing
was returning NaN when flag omitted, silently processing 0 companies
3,692 companies enriched via PDL (industry, size, founded, city, state, country, LinkedIn)
3,475 states backfilled, 5,943 values title-cased
209 companies enriched via CC+AI (tech stack, description, keywords, socials)
3,545 stale Apollo "failed" statuses corrected to "enriched"
Total cost: ~$0.05 in OpenAI credits
Dependencies
updated 8 packages: AWS SDK 3.1019, tRPC 11.16, Turnstile 1.5, Scalar 0.9.17, TypeScript 6.0.2
TypeScript 6.0 migration
removed deprecated baseUrl, added explicit types field, simplified lib (dom now includes dom.iterable)
Company logo component (components/companies/company-logo.tsx)
reusable CompanyLogo with Logo.dev CDN (img.logo.dev), retina support (2x), graceful fallback to building icon. Replaces inline logos across 5 views
Admin company CRUD
create/edit/delete companies via CompanyFormDialog (identity, location, financials, social, tech stack). Soft-delete via mergedIntoId pattern, duplicate detection on DB constraint
Company detail enhancements
enrichment snapshot diffs (before/after with field completeness), funding events table, identity/financials/meta admin cards, Facebook social link, multi-industry badges
ATS landscape research (ATS.md)
survey of 318 ATS platforms by job volume with P0/P1 gap analysis
Common Crawl extraction plan (CC-EXTRACT.md)
Web Data Commons extraction strategy (4.3M jobs across 63K domains)
Company discovery scripts (scripts/)
SourceStack extraction, domain backfill, discovered company seeding utilities
Workday parallel discovery
24 Common Crawl indexes queried in parallel (batches of 10) + robots.txt site enumeration (1,214 tenants probed in 37s, found 2,043 hidden sites). From ~1.7K to 4,337 known companies, 1,122 unique active boards, 254K total jobs
Workday time budget
graceful shutdown before actor timeout with periodic state saves, unknown company cap (200/run), priority-ordered poll list
SmartRecruiters concurrent fetching
batches of 5 parallel detail fetches with AbortSignal.timeout(15s), pagination guard (MAX_PAGES=50), named KV store for company persistence
PW concurrent fetching
3 parallel detail fetches with 500ms batch delay, AbortSignal.timeout, error-safe discovery via Promise.allSettled
Scraper KV persistence
all 5 stateful scrapers (SmartRecruiters, Workday, PW, Greenhouse, Lever) now use named Apify KV stores. Previously used per-run default stores, losing all classification state between runs
SmartRecruiters timeout
was timing out at 30min with sequential fetching, now completes in ~53s
Workday timeout
was rediscovering from scratch every run (broken KV), timing out at 5min. Now persists 4,337 companies and completes scrape runs in ~90s
PW timeout
was timing out at 10min with sequential 2s-delay fetching, now completes in ~201s
PW crash
unhandled AbortError from listing page fetch crashed the entire actor
Workday retry with backoff
transient errors (timeout, 429, 502-504) get 1 retry with exponential backoff. 5 consecutive failures auto-downgrade company to dead
PostgreSQL array filter
technology_names filter used invalid SQL, now uses proper ARRAY[]::text[] cast
Apollo enrichment
skip early if no domain (was returning 422 from name-only fallback)
CompanyLogo stale state
reset errored state when domain prop changes
Admin company domain
preprocess empty string to null (unique index conflict prevention)
Form NaN validation
annualRevenue/totalFunding validated with Number.isFinite before submit
Migration breakpoint
added missing statement-breakpoint marker on second ALTER
Database schema
annual_revenue and total_funding upgraded from integer to bigint (supports billion-scale values)
Job detail queries
include companyDomain via company table join for logo rendering
API response
adds enrichment_error, apollo_id, normalizedName to company responses
Removed .cursorrules (project uses CLAUDE.md for AI guidance)
Admin Enrichment Command Center
dedicated /dashboard/admin/enrichment page with stats cards (5 status counts), Apollo daily budget bar with color thresholds (sage/amber/red), worker health card, pipeline run history table, and merge audit log
Enrichment card on admin company detail
status badge, data completeness bar (22 fields), error display, Apollo ID, re-enrich button with 1-hour cooldown
Enrichment status column + filter on admin companies DataTable with DESIGN.md-aligned badge colors
Re-enrich All Non-Enriched bulk action
resets all failed+skipped companies to pending with confirmation dialog
Worker /stats endpoint
returns last run, daily API calls, daily limit, environment (CRON_SECRET auth)
Pipeline run log
enrichment_run_log table tracks every worker execution with phase-by-phase metrics, auto-prunes after 7 days
Enrichment snapshots
enrichment_snapshot table stores pre-re-enrich field values for diff view
enrichment_error column on company table
worker writes failure reason on error, clears on success and skip
EnrichmentStatusBadge component
shared badge with semantic colors per DESIGN.md (sage for enriched, warning for pending, info for processing, error for failed, neutral for skipped)
ENRICHMENT_FIELDS constant
22-field list used for completeness scoring and snapshot building
Domain/website_url preserved during enrichment
COALESCE prevents Apollo from NULLing Phase 1 resolve data when it doesn't return primary_domain, avoiding duplicate companies on next resolve cycle
Apollo budget metric
daily limit now counts all companies that entered processing today (not just 'enriched'), preventing over-provisioning
Apollo provider failure model
API errors now throw (map to 'failed' status) instead of returning null; only missing org data maps to 'skipped'
processing status added to EnrichmentStatus enum
was used by worker but missing from Drizzle schema
enrichmentStatus filter validated
constrained to z.enum of 5 valid statuses instead of accepting arbitrary strings
Progress bar fallback color
fixed for Tailwind v4 (var(--primary) not hsl(var(--primary)))
Re-enrich/diff guard merged companies
added mergedIntoId IS NULL filter to prevent operating on soft-deleted duplicates
Worker run logging
both cron and manual /process triggers now write pipeline results to enrichment_run_log
Worker health proxy
implemented as tRPC workerHealth procedure (admin-only, 5s timeout) instead of separate API route
18 dependencies updated
security patches (drizzle-orm SQL injection fix, vitest CVE), major bumps (stripe v21, @stripe/stripe-js v9), and routine patches across AI SDK, AWS SDK, tRPC, Sentry, lucide-react, recharts, fumadocs, Biome, Scalar
drizzle-orm 0.45.2
patches SQL injection in sql.identifier() / sql.as() (CWE-89), used in lib/feeds/facets.ts
vitest 4.1.2
resolves CVE in flatted dependency
CRON_SECRET required
changed from optional to required in env schema, preventing "Bearer undefined" auth bypass on cron/webhook endpoints when secret is unset
API key revocation org-scoped
revokeApiKey WHERE clause now includes organizationId, closing cross-tenant IDOR that could disable another org's keys
Worker auth deny-by-default
all 3 CF workers (enrichment, company-enrichment, embedding-generation) now return 401 when CRON_SECRET is missing instead of skipping auth entirely
GitHub Actions SHA-pinned
actions/checkout@v4, oven-sh/setup-bun@v2, cloudflare/wrangler-action@v3 pinned to full commit SHAs for supply chain hardening
Embedding worker lockfile
committed bun.lock for deterministic builds
Rate limiting TODO
added P1 security TODO for real API rate limiting implementation (stub headers only, no enforcement)
Faceted search counts
GET /feeds/:id/jobs?facets=workplace_type,industry returns value/count distributions per field. Typesense-style: each facet excludes its own filter condition so users see "what else is available"
Search highlighting
?include=highlights adds <mark>-tagged snippets in title and description via PostgreSQL ts_headline(). HTML stripped before highlighting for XSS safety
Search synonym expansion
common job terms automatically expand with OR groups (e.g. "frontend" matches "front-end", "ui"; "engineer" matches "developer", "programmer"). 30 synonym groups seeded, admin CRUD via tRPC
Admin synonym management
admin.synonyms tRPC router for CRUD operations on synonym groups with cache invalidation
OpenAPI spec
documented facets param, highlights in include, relevance_score, highlights response fields, exclude_domains filter, synonym behavior
Facet counts respect since filter
temporal conditions now passed to facet queries so incremental polls return consistent facet counts
Fuzzy fallback uses raw search
synonym-expanded query no longer breaks pg_trgm similarity matching in dashboard browsers
Synonym cache TTL reset on failure
prevents thundering herd of DB retries when the database is temporarily unavailable
Feed search relevance scoring
REST feed jobs endpoint returns relevance_score and orders by ts_rank_cd when feed has a search filter
Companies similar endpoint
fixed Drizzle array serialization for technology_names && overlap operator (was passing raw array, now uses sql.join with explicit ARRAY[]::text[] cast)
Feed search pagination safety
cursor-based pagination and lastPollAt checkpoint are now disabled when search is active, preventing silent data loss from mixing relevance ordering with time-based cursors
Full-text search
replaced ILIKE with Postgres tsvector + GIN indexes for ranked, stemmed search across jobs (title/company/description weighted A/B/C) and companies (name/industry)
Fuzzy search fallback
pg_trgm trigram similarity for typo-tolerant matching in dashboard browsers when FTS returns zero results
Relevance ordering
search results in admin and org dashboards now rank by ts_rank_cd relevance instead of chronological
Similar jobs API
GET /api/v1/jobs/{id}/similar returns top 10 semantically similar jobs via pgvector cosine distance
Search suggestions API
GET /api/v1/search/suggest?q= returns up to 5 fuzzy-matched job title suggestions
Embedding generation worker
Cloudflare Worker (5min cron) generates OpenAI text-embedding-3-small vectors for active jobs with daily token budget
DRY company filters
extracted buildRestCompanyConditions() to deduplicate REST API company filter logic
Search field max length increased to 200 chars (feeds + companies schemas)
REST companies API and export routes refactored to use shared filter builder
API response includes optional relevance_score when search is active
Companies browser rewrite
replaced custom grid/table view with shared DataTable component, nuqs URL state persistence, faceted multi-select filters (industry, country, funding stage), secondary toolbar (employee range, has active jobs checkbox), sortable columns, and offset pagination with total count
Admin companies dashboard
full admin companies list + detail pages reusing shared CompaniesPageClient and CompanyDetailClient with context="admin". Admin sidebar nav updated with Companies link
Two-way job↔company linking
company detail page shows Jobs tab with clickable job cards linking to internal job pages. Job detail pages link company name back to company detail
Shared filter builder
extracted lib/companies/filter-builder.ts with companyListInput Zod schema and buildCompanyConditions() to DRY admin + org company routers
Total funding formatting
company detail page now shows raw dollar amounts ($484.4M) instead of treating values as millions ($484.4T)
Company jobs link
job cards in company detail Jobs tab now link to internal job page (/dashboard/.../jobs/{id}) instead of external apply URL
Companies page crash
Radix Select crashed on load when a filter value resolved to an empty string. Fixed by passing undefined instead of "" and filtering empty strings from filter option queries
Apollo enrichment throughput
raised batch size (10 → 40) and daily limit (50 → 5000), unlocking ~480 company enrichments/hour — approaching Apollo's 600/hour rate limit
Companies API
6 REST endpoints (/v1/companies, /v1/companies/:id, /v1/companies/:id/jobs, /v1/companies/:id/similar, /v1/companies/trending, /v1/companies/export) with cursor pagination, search, and filtering by industry, size, funding stage, tech stack, and country
Company table
new company entity with 30+ columns for identity, enrichment data, and derived stats. company_merge_audit table for dedup audit trail. company_id FK on job table
3-tier entity resolution
deterministic name/domain matching (Tier 1), Apollo enrichment signal merging (Tier 2), and LLM dedup placeholder (Tier 3). Handles ATS domain filtering (57+ known platforms), normalized name matching, and domain-first enrichment ordering
Apollo Organization Enrichment
CF Worker enriches pending companies via Apollo.io API with daily credit limits, FOR UPDATE SKIP LOCKED concurrency control, and automatic Tier 2 dedup on apolloId/domain conflicts
CF Worker pipeline
4-phase cron every 5 min: Resolve (500 jobs/tick) → Enrich (10/tick via Apollo) → LLM Dedup (future) → Stats. Deployed at workers/company-enrichment/
Companies dashboard
list page with DataTable, search, faceted filters (industry, country, funding stage), sortable columns, and offset pagination. Detail page with tabs (Overview, Jobs, Similar). Admin companies pages with full parity. Trending widget on org dashboard home
Company feed filters
4 new feed filter fields: company_size, company_funding_stage, company_tech_stack, company_id. Only matches jobs at enriched companies (documented semantic)
Hiring velocity
on-the-fly computation of 7d/30d/90d job counts with trend detection (accelerating/stable/decelerating)
Company normalization
pure functions for name normalization (suffix stripping, word-boundary safe), domain extraction (ATS filtering), and slug generation. Shared between Next.js app and CF Worker
OpenAPI spec
6 new company endpoints added to openapi.json with Company, CompanyBrief, and HiringVelocity schemas
42 unit tests
normalization (10), domain extraction (8), slug generation (5), and velocity computation (6) plus API and filter tests
ATS domain list expanded
from 28 to 57+ known ATS/HRIS platforms to prevent wrong company enrichments (added Taleo, ADP, Workday, Oracle, SAP, Personio, etc.)
Enrichment ordering
domain-first priority (ORDER BY (domain IS NOT NULL) DESC) improves Apollo match rate from ~0% to ~90% for name-only lookups
Granular email scheduling
new feed_notification_config table replaces the rigid daily/weekly email schedule. Supports per-recipient configs with independent frequencies (every 6h, every 12h, daily, weekly), preferred delivery hour + timezone, and minimum job threshold
Smart auto-frequency
adaptive "auto" mode sends based on job volume: 50+ jobs at 6h intervals, 20+ at 6h, 5+ at 12h, 1+ daily fallback. 6-hour floor caps sends at 4/day
Preferred hour + timezone
schedule emails for a specific hour in the recipient's timezone using native Intl.DateTimeFormat (no external libs). Handles DST transitions gracefully
Minimum job threshold
minJobCount column skips digest sends when fewer than N new jobs match, preventing low-value emails
Channel-aware table design
feed_notification_config supports future Slack/webhook channels via channel column (only email implemented now)
Multi-recipient data model
one feed can have N notification configs with different recipients, frequencies, and formats
Parallel cron processing
email-export cron rewritten with Promise.allSettled batches of 5 concurrent sends, maxDuration=300
Per-config idempotency
keys include configId + hour (notif-{configId}-{YYYY-MM-DDTHH}) enabling sub-daily sends without duplicates
Neon driver unique constraint detection
Postgres error code 23505 is embedded in the error message string by the Neon serverless driver, not as a .code property. Duplicate notification config attempts now correctly show "This email is already configured" instead of a 500 error
Cron schedule
email-export changed from daily (0 8 * * *) to hourly (0 * * * *) to support sub-daily frequencies
Email delivery tracking
extracted trackEmailDelivery() helper from 3 duplicated call sites (cron markdown, cron HTML, sendDigest mutation). Added configId FK on email_delivery table
tRPC mutations
added createNotificationConfig, updateNotificationConfig, deleteNotificationConfig with org-scope auth. Legacy updateEmailSchedule retained as wrapper
Feed detail UI
email schedule card replaced with notification config manager (config list, frequency dropdown with 5 options, add/remove)
Vercel enrichment cron
Removed /api/internal/process from vercel.json crons. CF Worker (workers/enrichment/) is the sole enrichment path — 50 jobs/5 min with atomic claiming. Vercel route kept as manual fallback trigger
pg SSL deprecation warning suppressed
Filtered pg-connection-string v2.12.0 SECURITY WARNING about sslmode=require being treated as verify-full. Vercel's Neon integration auto-sets this value and it cannot be changed
content-collections config migrated
Renamed collections to content in defineConfig per content-collections v0.14.0 breaking change
next-themes React 19 script warning patched
Bun patch on next-themes@0.4.6 skips rendering ThemeScript on the client (script only needs SSR for FOUC prevention). Matches upstream PR pacocoursey/next-themes#386
next-themes patch
patches/next-themes@0.4.6.patch should be removed when next-themes releases a fix (watch for v0.4.7+ or PR #386 merge)
Dependencies updated
Patch bumps for 8 packages: @ai-sdk/openai 3.0.48 (Bun fetch retry fix), @ai-sdk/react 3.0.139, ai 6.0.137 (gateway model catalog), @aws-sdk/client-s3 3.1015.0 (error deserialization fix), @aws-sdk/s3-request-presigner 3.1015.0, @tanstack/react-query 5.95.2 (NodeJS.Timeout type leak fix), fumadocs-core 16.7.5 (TOC observation rewrite), fumadocs-ui 16.7.5
Careers page
Removed /careers route, section components, navigation links (header + footer), and sitemap entry. Page contained hardcoded placeholder positions, not real listings
Component render contract tests
9 new tests verify React.memo wrappers on Button and all Card subcomponents
Feed detail actions
Dropdown menu on feed detail page with Rename, Pause/Resume, and Delete actions
SessionProvider context memoized
Context value wrapped in useMemo to prevent tree-wide re-renders on every parent update
JobsBrowser memoization overhaul
browseInput, columns, facetedFilterEntries, and columnFilters wrapped in useMemo; event handlers (commitSalary*, commitCompany, handleFiltersChange, handleSearchQueryChange, handleSaveAsFeed) wrapped in useCallback
Button wrapped in React.memo
Skips CVA variant recomputation when props unchanged
Card subcomponents wrapped in React.memo
All 7 exports (Card, CardHeader, CardTitle, CardDescription, CardAction, CardContent, CardFooter) memoized
Feed detail handlers use useCallback
handleCopy, handleDownload stabilized
OrganizationSwitcher Icon extracted to useCallback
Prevents inline component recreation on every render
NiceModal pattern for feed actions
Rename and delete dialogs use NiceModal.create() / ConfirmationModal instead of manual useState, consistent with codebase conventions
Sidebar context missing dependencies
Added setOpenMobile, setWidth, setIsDraggingRail to useMemo dependency array, fixing potential stale closure bug
Column filters dependency chain
Replaced fragile spread-as-deps pattern with explicit useMemo chain, removed eslint-disable comment
Delete clears detail cache
Removes cached feed data on delete so browser back button shows "Feed not found" instead of a stale ghost page
Rename trim comparison
Compares trimmed input against current name so whitespace-only changes are correctly rejected
Delete double error toast
Removed duplicate onError handler that caused two overlapping toasts on delete failure (ConfirmationModal already handles the error path)
Email delivery tracking
New email_delivery table tracks every email sent via Resend with status updates from webhooks (delivered, bounced, complained, failed). Unique index on resend_message_id prevents duplicates
Resend webhook endpoint
POST /api/webhooks/resend receives Svix-signed delivery status events, verifies signatures, and updates email delivery records
Email stats in feed overview
4-column stat grid now shows Emails Sent, Last Email, Webhook Deliveries, and Failures with tabular-nums and border-destructive on failure state
RESEND_WEBHOOK_SECRET env var
Supports webhook signature verification
sendFeedDigestEmail returns messageId
Enables delivery tracking in both cron and manual send paths
Cron email export tracks deliveries
Both markdown and HTML format paths insert into email_delivery table (best-effort with try/catch)
Dashboard sendDigest tracks deliveries
Manual sends record email delivery with error-resilient DB insert
Feeds list badge accounts for email deliveries
Badge now queries email_delivery table for counts and last-sent dates per feed. Cascades: webhook date → email date → email count → "No deliveries yet" (PR #98)
"Last Email" stat derives from actual sends
Feed detail stat card now shows max(email_delivery.createdAt) instead of the cron cursor (lastEmailAt), so manual sends are reflected immediately (PR #98)
Jobvite Apify scraper
Reverse-engineered hidden XML feed at app.jobvite.com/CompanyJobs/Xml.aspx?c={companyId}. One request returns all jobs per company with full HTML descriptions, category, jobtype, region, location, and dates. 548 company slugs from Common Crawl, self-healing KV store, CompanyId extraction from page JS. Total sources: 12
Dependency patches
Bumped content-collections, t3-env, react-query to latest patch versions
Removed newsletter/ folder
Marked TODO done, cleaned stale deps
Digest email template polish
neutral title (was sage), removed horizontal dividers, tighter workplace tags (inline with meta row), lighter card borders (neutral-100), smaller footer
Job limit picker
configurable 1-200 (default 20) for generateDigest and sendDigest, number input in Digest tab
Dark mode audit
all elements have semantic CSS classes with explicit prefers-color-scheme: dark overrides; added job-divider class to card borders
sendDigest no longer advances cron cursor
manual "Send Now" was updating lastEmailAt, causing scheduled cron to skip unsent jobs. Manual sends are now one-off with no side effects on cron scheduling
workplaceLabel test synced
test had stale copy of old workplaceTag function (returned { label, color } instead of string)
Footer link contrast
neutral-300 (1.6:1) → neutral-400 (2.7:1) for readability
Unused FileText import removed from feed-detail-client
CLAUDE.md
updated template description from "sage accent" to "neutral palette"
Feed digest email template
Rich React Email template for scheduled feed digests. Job cards with salary emphasis, workplace tags (sage pill for Remote), deadline countdown, skills, and empty state handling. Uses pixelBasedPreset for email client compatibility, dark mode via prefers-color-scheme, sage accent from DESIGN.md
AI-generated digest intro
GPT-4o-mini generates a 1-sentence contextual summary for each digest email. 10s timeout, maxOutputTokens: 100, static fallback on error. Via AI SDK v6 generateText()
exclude_domains feed filter
Block jobs from specific domains in any feed. Domain-boundary-safe ILIKE matching (won't block smart.com when blocking art.com). Capped at 20 domains per feed. Uses isNull() for proper NULL handling
Idempotency keys for digest emails
feed-digest-{feedId}-{date} prevents duplicate sends on cron retries via Resend SDK native idempotencyKey parameter
33 new unit tests
digest-intro.test.ts (4 tests: AI fallback, empty jobs, error handling), exclude-domains.test.ts (11 tests: SQL generation, domain format validation, schema constraints), digest-email-helpers.test.ts (18 tests: deadline labels, workplace tags, meta content)
Digest preview tab
new dashboard tab in feed detail page. Generate → preview (HTML iframe or markdown source) → copy to clipboard → download as file → send now to any email. Two new tRPC mutations: generateDigest (render-only) and sendDigest (with 1-min per-feed rate limit)
Email export cron upgraded
HTML format now uses React Email template with AI intro instead of basic formatJobsHtml(). Markdown format preserved
toDigestJob() extracted into shared lib/feeds/digest-job.ts
used by both cron and dashboard tRPC (DRY)
formatSalary() and formatLocation() exported from job-formatter.ts for reuse across cron and future consumers
newsletter/ standalone app
All functionality (Kadoa scrapers, OpenAI inline processing, React Email template, domain blacklist, streaming UI) replaced by native Bordfeed infrastructure. Can be safely deleted
Workday Apify scraper
Reverse-engineered Workday's CXS API (POST /wday/cxs/{tenant}/{site}/jobs). 1,765 company sites discovered via Common Crawl across 22 wd variants. Self-healing KV store, per-job detail fetch for full descriptions, 500 job cap per company. Total sources: 11
YCombinator Apify scraper
Scrapes YC's Work at a Startup job listings. 217 hiring companies with full job descriptions
Poets & Writers (PW) Apify scraper
Sitemap discovery + RDFa extraction for literary job postings
Lever Apify scraper
Self-healing KV store pattern (like Greenhouse). 2,714 company slugs from Common Crawl, daily scrape + monthly discovery mode. Seed slugs in Apify KV store mnmz5EL4jZcUGE6pC. Total sources: 8
RemoteOK Apify scraper
Dynamic tag discovery from sitemap (~250 tags), per-tag API fan-out, ID-based dedup. Respects 1s crawl-delay. Employment type inference from tags (Contract, Part Time, Freelance, Internship). Locale-safe salary formatting via Intl.NumberFormat
Intra-batch dedup
Identical payloads within the same batch now deduplicated via in-memory hash set (previously all inserted)
Null byte sanitization
Strip \x00 and control chars before DB insert (PostgreSQL rejects them, causing silent INSERT failures)
15K ingestion cap
Datasets exceeding 15K items auto-capped and marked partial instead of silently timing out
Stuck webhook recovery
Stale received entries (>5min) auto-cleaned to error on each webhook invocation
Greenhouse actor limit
Default capped at 10K (safe margin under 15K tested ceiling)
DynamiteJobs, SmartRecruiters, Greenhouse Apify scrapers
3 new job source actors in apify-scrapers/, deployed to Apify with webhooks + schedules. Total sources: 6 (was 3). Greenhouse alone covers 19K+ jobs across 7,101 company slugs
/new-job-source skill
Codified workflow for building future Apify scrapers
Chunked batch INSERT
Ingest pipeline chunks inserts (500 rows/chunk) to handle 15K+ job ingestion without hitting Postgres query size limits
Webhook processing for large batches
Switched from after() to inline processing to prevent premature function termination on large datasets. Increased maxDuration to 300s (Vercel Pro ceiling)
Ingest input sanitization
Strip HTML tags from titles/companies, cap title length at 500 chars, handle empty strings as "Untitled"
Apify scrapers excluded from TypeScript build
apify-scrapers/ added to tsconfig exclude list
Dependency updates
@tanstack/react-query 5.91.3→5.94.5, fumadocs-core/ui 16.7.3→16.7.4 (TOC CSS fix), react-hook-form 7.71.2→7.72.0
cropperjs 1→2 (react-cropper incompatible — need to swap to react-easy-crop)
Deduplicated jobs pages
Admin and org jobs now share the same rich browser component (components/jobs/), detail sheet, and detail page via a context prop pattern
Admin jobs page upgraded
Now uses the full-featured filterable DataTable browser (was a basic table with pagination), plus stats cards
Admin job detail route
Added /dashboard/admin/jobs/[id] detail page (was org-only)
Shared select objects
Extracted jobBrowseSelect and jobDetailSelect to lib/jobs/selects.ts, reused by both admin and org routers
Admin jobs router
Added browse, filterOptions, and get endpoints matching org shape (admin sees all statuses, not just active)
fumadocs-core 16.7.2 → 16.7.3
Sidebar scroll fix
fumadocs-ui 16.7.2 → 16.7.3
Home layout navigation menu redesign
recharts 2.15.4 → 3.8.0
Major version upgrade. Fixed chart.tsx tooltip/legend types for v3 compatibility
Removed react-resizable-panels
Unused dependency and wrapper component deleted. Re-add via bunx shadcn@latest add resizable if needed
better-auth 1.4.17 → 1.5.5
Auth framework major upgrade with breaking changes: renamed email confirmation callback, updated error code types, fixed session/user type inference for plugin-added fields (activeOrganizationId, twoFactorEnabled)
Worker Zod 3 → 4
Upgraded enrichment worker's Zod dependency from v3.23 to v4.3.6, aligning with root app. Import path changed to zod/v4 for explicit version pinning.
53 dependency upgrades
Next.js 16.1.6→16.2.1, tRPC 11.9→11.14, AI SDK 6.0.57→6.0.134, Sentry 10.37→10.45, motion 12.29→12.38, Tailwind CSS 4.1→4.2, Vitest 4.0→4.1, Biome 2.3→2.4, fumadocs 16.4→16.7, shikijs/rehype 3.21→4.0, react-dropzone 14→15, AWS SDK 3.975→3.1014, plus ~30 patch bumps
Worker AI SDK sync
Enrichment worker @ai-sdk/openai and ai packages synced to match root versions
Vercel Analytics/Speed Insights v2
Upgraded from v1 to v2 (React API unchanged)
Dropzone rejection handling TODO
Tracked react-dropzone v15 follow-up for upload error feedback in TODOS.md
Upload dropzone rejection handling
Both avatar and logo upload flows now validate file type (images only) and size (5MB max), show toast errors for rejected files, and guard against empty drops opening the crop modal
Worker CI uses bun
deploy-enrichment-worker.yml now uses oven-sh/setup-bun and bun install instead of npm, matching the project's package manager
Removed stale package-lock.json
Deleted npm lockfile (58 mismatches with package.json) and added packageManager: bun@1.3.11 to package.json for explicit bun selection
Apify Console link
External link button on each Apify source row opens the actor's Apify Console page directly. Desktop icon button + mobile labeled button. Only visible for sources with an actor ID
Run Now button
Trigger Apify actor runs directly from the admin Sources page. Rocket icon button for Apify-type sources in both desktop table and mobile card layouts. Backend triggerApifyRun function with full error handling
6 new triggerApifyRun tests
Full coverage: not found, non-Apify rejection, missing token, success with runId, API error response, network failure
Bulletproof Apify webhook handler
Refactored to respond 200 immediately and process via Next.js after(), eliminating Apify's 30-second timeout risk. Includes paginated dataset fetch (1000 items/page), timeout tracking, and auto-retry once on failure
Webhook idempotency
New webhook_log table with unique apifyRunId index prevents duplicate processing. Race conditions handled via insert constraint catch
Pipeline failure notifications
Dual-channel alerts (Resend email + Slack webhook) fire on FAILED/ABORTED/TIMED_OUT events and all-error ingests. Fire-and-forget with graceful degradation
Admin Sources page
Dashboard at /dashboard/admin/sources with source CRUD (create, edit, archive, toggle active/paused), 4 stats cards, webhook log table with status/source filters, retry button for failed runs. Includes toast notifications, AlertDialog confirmation for destructive actions, inline form validation, accessible icon buttons, and mobile card layout
DB-driven source resolution
Webhook handler resolves Apify actor IDs to source names from job_source table (with 60s in-memory TTL cache) instead of hardcoded map. Sources managed via admin UI
Apify run traceability
apifyRunId column on jobTable links every ingested job back to its exact Apify actor run
26 new pipeline tests
15 webhook handler tests (auth, validation, idempotency, event routing, source mapping) + 11 notification tests (channel selection, failure resilience, content formatting)
DESIGN.md
Formalized design system: Industrial/Utilitarian aesthetic, Geist Sans typography, sage accent, compact admin density, motion and component conventions
Webhook handler now handles ACTOR.RUN.FAILED, ACTOR.RUN.ABORTED, and ACTOR.RUN.TIMED_OUT events (previously ignored all non-success events)
ingestJobBatch accepts optional IngestOptions with apifyRunId passthrough for traceability
Enrichment pipeline now writes clean descriptions
AI extraction already produced a clean description field but the UPDATE query never wrote it back to the DB. Raw RSS/HTML from ingest stayed in the description column forever. Fixed in both CF Worker and lib pipeline
stripHtml hardened for double-encoded HTML
RSS feeds often contain <strong> (HTML entities inside HTML). Now runs two-pass strip: decode entities, then strip revealed tags. Also strips RSS metadata preambles (Title/Category/Posted headers)
Enrichment prompt updated
AI instructed to return description as clean plain text, preserving original wording verbatim — no summarizing or rephrasing, only stripping markup and encoding artifacts
Job detail views
Side panel sheet on row click in jobs browser + full-page detail view at /dashboard/organization/jobs/[id] with 100% API field parity (salary, location, career, skills, languages, description, dates & source)
Shared DetailRow component
Reusable label-value display component at components/ui/custom/detail-row.tsx, used across sheet and full page views
stripHtml utility
Converts HTML job descriptions to clean text with newline preservation, entity decoding, and whitespace normalization
formatSalaryFull utility
Formats salary ranges with currency/unit defaults (USD, /year)
18 new unit tests
Full coverage for stripHtml (11 tests) and formatSalaryFull (7 tests)
Feed cards
Added Clock icon next to delivery timestamps, skeleton loading states
Feed detail
Stat card icons (Send, Clock, AlertTriangle), skeleton loading, copy-to-clipboard button for filter JSON
API keys
Click-to-copy key prefix, standardized empty state icon size (h-12 w-12)
Jobs router
get procedure now filters by active status, consistent with browse
Cloudflare Workers enrichment pipeline
Replaces Vercel cron-based enrichment with a dedicated CF Worker at workers/enrichment/. Processes 50 jobs every 5 minutes (vs 10 jobs/15 min on Vercel). No 60s timeout limit
Atomic job claiming
FOR UPDATE SKIP LOCKED prevents race conditions between concurrent worker runs. processingRunId guard prevents stale overwrites
Configurable concurrency
Worker enriches jobs in parallel chunks (default 5 concurrent). Batch size and concurrency configurable via env vars
Stale lock auto-cleanup
Crashed workers' locks are automatically released after a configurable timeout (default 10 min)
GitHub Action auto-deploy
Worker auto-deploys on push to main when workers/enrichment/** changes via cloudflare/wrangler-action@v3
AI SDK v6 token cleanup
Removed as any casts from token usage. Uses inputTokens/outputTokens directly. Captures new v6 fields: reasoningTokens, cachedInputTokens
Enrichment throughput
50 jobs/5 min (600/hour) vs previous 40 jobs/hour on Vercel cron. 15x improvement
Neon HTTP driver
Worker uses @neondatabase/serverless HTTP driver (not WebSocket) for Cloudflare Workers compatibility
Root tsconfig exclusion
workers/ excluded from root TypeScript compilation so Vercel builds don't fail on worker-only dependencies
Live changelog
Homepage hero pill and /changelog page now show real release history from CHANGELOG.md instead of boilerplate placeholder content
Full date display
Changelog page now shows "March 19, 2026" instead of abbreviated "March 2026"
Changelog parser
Build-time parser at lib/changelog.ts handles 5 markdown bullet formats with server-only guard
Total tests
149 passing (13 suites), including 17 new changelog parser tests
CSV formula injection (CWE-1236)
CSV export now prefixes cells starting with =, +, @, - with a tab character to prevent Excel/Sheets formula injection
URL scheme validation
HTML and Markdown formatters now block javascript: and data: URL schemes in job apply_url fields, preventing XSS in exports and emails
Webhook secret leak
tRPC feed endpoints no longer return the HMAC webhook secret to the frontend on any operation (list, get, create, update, email schedule)
Export credit timing
Export API now checks credit balance before fetching data, preventing charges on failed queries
Dashboard export cursor
Dashboard export no longer incorrectly filters by email delivery cursor, which caused it to return incomplete results after cron runs
Type-safe status enum
Replaced "active" as any casts with JobStatus.active enum constant across 3 files
Ingest dedup performance
Secondary title+company dedup now uses batched OR queries (chunks of 50) instead of N+1 per-job queries
Total tests
132 passing (12 suites)
Dashboard export + scheduled email delivery
Feed detail page with Export tab. Download matching jobs as HTML, Markdown, CSV, or JSON. Configure daily/weekly email delivery via Resend — Bordfeed emails formatted newsletter content to your inbox
Feed detail page
Click any feed card to see overview stats, filters, and the new Export tab. Breadcrumb navigation back to feeds list
Export API endpoint
GET /v1/feeds/:id/export?format=json|csv|html|markdown returns up to 1000 jobs in a single response without pagination. Supports since=last_poll for incremental exports
since=last_poll auto-cursor
Server-side cursor tracks your last poll/export. GET /v1/feeds/:id/jobs?since=last_poll returns only jobs added since your last API call, eliminating client-side state tracking
include=html_snippet
GET /v1/feeds/:id/jobs?include=html_snippet adds a pre-formatted HTML block per job for newsletter use, with XSS sanitization
Job formatter utility
lib/feeds/job-formatter.ts with HTML, Markdown, CSV formatters. All user-sourced fields sanitized to prevent XSS
Secondary dedup
Ingest pipeline now catches semantic duplicates via case-insensitive title + company match, in addition to existing content hash dedup
Email export cron
POST /api/v1/cron/email-export runs daily at 08:00 UTC via Vercel Cron. Sends formatted jobs to configured recipients
17 new tests
Job formatter test suite covering HTML, Markdown, CSV output, XSS sanitization, empty states, and edge cases
ROADMAP.md
Rewritten to reflect 4-phase plan: Data Flow (done) → Dogfood+DX (current) → Monetization → Scale. SDK removed from roadmap per CEO plan decision
OpenAPI spec
Updated to v1.3.0 with export endpoint, since and include parameters documented
Feed cards
Now clickable, navigate to feed detail page
Total tests
125 passing (12 suites)
OpenAPI 3.1 spec + Scalar interactive docs
Full API specification at public/api/openapi.json covering all 11 public endpoints. Interactive docs at /docs/api via @scalar/api-reference-react with kepler theme. API route serves spec with CORS headers
Dashboard metrics
Replaced achromatic boilerplate email stats with real Bordfeed metrics: Active Feeds, Credits Remaining, API Keys, Webhook Deliveries. New organization.stats.dashboard tRPC procedure with org-scoped parallel queries
OpenAPI route
GET /api/openapi.json serves the spec with caching and CORS for Scalar client-side rendering
Enrichment cron frequency
Now runs every 15 minutes instead of every 2 hours. Clears the enrichment backlog of 1,253 jobs in ~31 hours instead of ~10 days — jobs reach your feed much faster
Org selection cards
Cards now have a visible shadow and a cleaner hover effect so it's obvious which org you're selecting
Boilerplate dashboard charts
Removed dashboard-demo-charts.tsx (640 lines of fake email metrics data) replaced by real org-scoped stats
Avatar and org logo upload failing
S3 signed upload URL was generated with ContentType: "image/jpeg" hardcoded, but the image cropper outputs PNG. S3 rejected the PUT due to content type mismatch. Made contentType a required parameter throughout the upload chain (schema → tRPC → S3) so the signed URL always matches the actual upload
**Dashboard empty
tRPC 503 errors** — getBaseUrl() used stale per-deployment Vercel URL for tRPC batch requests. Added NEXT_PUBLIC_VERCEL_PROJECT_PRODUCTION_URL as stable fallback before the ephemeral deployment URL. Also set NEXT_PUBLIC_SITE_URL env var in Vercel production (#60)
**Enrichment schema
OpenAI 100% failure** — status: z.string().default("active") produced a JSON schema without status in the required array, causing OpenAI structured output to reject every request. Changed to .nullable() with default handled in process-jobs.ts (#56)
Feed templates returning 0 results
AI enrichment produces free-form categories ("Product Design", "Full-Stack Programming") but filter builder used exact inArray match against slug-style template values ("design", "full-stack"). Changed to case-insensitive ilike partial matching. Same fix for workplace_type (#57)
Homepage hero copy
replaced achromatic boilerplate with Bordfeed messaging: "The API for job data. Fresh, structured, delivered." CTAs: "Start Building" / "View Pricing". Pill links to /changelog (#58)
Features section copy
"Everything your feed needs" with 20+ Filter Fields and Webhook Delivery cards (#59)
Stats section copy
real product metrics: 11 endpoints, 20+ filters, 6 templates, 50+ fields (#59)
FAQ section copy
5 Bordfeed-specific questions replacing generic SaaS boilerplate (#59)
Apify webhook adapter
POST /api/internal/apify-webhook receives Apify run metadata, fetches dataset items from Apify API, passes raw data to shared ingest logic. Actor→source mapping for Workable, WWR, JobDataAPI
Shared ingest module
extracted lib/pipeline/ingest-jobs.ts used by both /api/internal/ingest and /api/internal/apify-webhook
Pipeline tests
60 new tests across 4 suites (ingest, process, monitor, push). Total: 108 tests / 11 suites
SSRF protection in monitor
blocks private/internal IPs before fetching source URLs
AI job classification in monitor
wires up classifyJobStatus() to detect filled/closed jobs returning HTTP 200
Batch ingest
single SELECT + INSERT replaces N+1 per-job queries
Max batch size guard
rejects payloads >200 jobs with 413
Pipeline reliability TODOs
concurrent push protection (P2), batch push at scale (P3)
skills column
migrated from text to text[] array with arrayOverlaps filter (migration: 20260319082018_damp_vulcan.sql)
Content hash
JSON keys sorted before hashing for deterministic dedup across actor runs
Processing lock
now filters by specific job IDs (inArray) instead of locking all pending jobs
Race condition in job processing lock
concurrent cron runs could process same jobs twice
isPrivateHost() exported from webhook-validation for reuse in monitor SSRF checks
Pre-push git hook (tsc --noEmit)
failed on generated content-collections types; Vercel build already gates type errors
Feed-Centric API
complete rewrite on achromatic SaaS boilerplate
Feeds API (public REST)
POST/GET/PATCH/DELETE /v1/feeds, GET /v1/feeds/:id/jobs (poll with cursor pagination), GET /v1/health
Feed filter builder
20+ strict filter fields (category, workplace_type, country_code, career_level, salary range, languages, skills, industry, visa_sponsorship, and more)
Feed templates
6 pre-built templates (Remote Writing, AI/ML Engineering, Growth Marketing EU, DevRel, Remote Design, Full-Stack Remote US)
Webhook delivery
HMAC-SHA256 signing, SSRF protection, retry with backoff
Feed tRPC router
dashboard management (list, create, update, delete, templates)
Job admin tRPC router
pipeline management (list, stats, status updates, bulk actions)
Dashboard pages
feed list with cards, admin jobs table with stats
Database schema
job table (50+ fields), feed table, webhook delivery table, feed template table
Zod schemas
strict validation with unknown field rejection
Unit tests
48 tests across 7 suites (content hash, salary, pagination, webhook validation, webhook signing, Zod schemas, utils)
Credit-based billing
feed poll and webhook delivery consume credits (Starter $9.99/1K, Growth $49/10K, Scale $299/100K)
Design doc
docs/designs/feed-centric-api.md
Foundation
replaced v0.5.4 codebase with achromatic SaaS boilerplate (Better Auth + orgs + Stripe + Resend + Sentry + marketing pages)
Product model
enrichment is now internal-only, feeds are THE product
API architecture: hybrid
tRPC for dashboard, REST for public /v1/* API
Billing
API keys and feeds scoped to organizations (multi-tenant)