Guides ·

How to Save a Website Before It Goes Offline (Step-by-Step)

How to save a website before it goes offline

A service is shutting down. A client’s old site is about to be replaced. A forum you’ve used for years posted a goodbye notice. Once the server is switched off, that content is gone — and the Wayback Machine may or may not have caught the pages you actually need.

If you’re reading this with a deadline, here’s the short version: don’t try to save pages one at a time. Capture the whole site in one pass, then sort it out later. This guide shows you how, from the fastest option to the most thorough.


First: Figure Out How Much Time You Have

Your method depends on your deadline.

  • Hours left: Capture the entire site automatically with a crawler. Don’t hand-save pages — you’ll miss things.
  • A few days: Crawl the site, then go back and manually grab anything behind a login or paywall.
  • Just one or two pages: Your browser’s built-in save is enough (covered at the end).

The biggest mistake people make is saving the homepage, feeling done, and discovering later that the pages they actually needed — a specific thread, a documentation section, an account history — were never captured.


The Fastest Way: Capture the Whole Site at Once

When a site is going offline, you want every page preserved before the lights go out. Saving them individually doesn’t scale — a medium site has dozens or hundreds of URLs.

A crawler does it in one pass. Site2pdf.online takes the site’s URL, follows its internal links, and captures every page it finds — exporting the whole thing as a single PDF, individual images, or a ZIP. Because it runs a real headless browser, JavaScript-rendered pages and dynamically loaded content get captured as they actually appear, not as empty templates.

For a site that’s disappearing, this is the difference between a complete archive and a folder of random pages. Start the crawl, let it run, and review what you got while you still have time to fill gaps.


What a Crawler Can’t Reach (and How to Handle It)

Automated capture has limits. Plan around them before the site goes dark.

Login-protected content. A crawler only sees what a logged-out visitor sees. For account pages, private messages, or members-only areas, log in yourself and save those pages manually with your browser.

Paywalled articles. Same problem — if you can’t see it without paying, neither can the tool. Capture these by hand while logged in.

Downloadable files. PDFs, spreadsheets, and media attached to a site aren’t always followed as links. Make a list of the important files and download them directly.

Pages with no internal links pointing to them. Orphan pages — old URLs not linked from the current navigation — won’t be discovered by following links. If you know specific URLs matter, save them individually too.


Don’t Rely Only on the Wayback Machine

The Internet Archive’s Wayback Machine is the default fallback, and it’s genuinely useful — but it’s not a safety net you should count on for a site that’s shutting down.

It captures pages on its own schedule, so recent updates may not be archived. It often misses pages behind forms, JavaScript-heavy content, and anything not well linked. And you can’t control when it crawls — by the time you request a capture, the site may already be down.

You can submit a URL manually at archive.org’s “Save Page Now” tool, and it’s worth doing for key pages as a backup. But treat it as a second copy, not your primary archive. The Internet Archive’s own help docs explain what it does and doesn’t capture.


For Just One or Two Pages: Use Your Browser

If you only need to preserve a handful of specific pages, your browser already has the tools.

Save as PDF: Press Ctrl+P (Windows) or Cmd+P (Mac), then choose Save as PDF as the destination. This preserves the layout and keeps links clickable.

Save the full page (HTML): Press Ctrl+S and choose Webpage, Complete. This downloads the HTML plus images and styling into a folder you can open offline later.

Full-page screenshot: In Firefox, right-click → Take ScreenshotSave full page. In Chrome, use DevTools (Ctrl+Shift+P → “Capture full size screenshot”).

These are quick and reliable for individual pages — they just don’t scale to an entire site.


Capture Methods Compared

MethodSpeedWhole siteHandles JSBest for
Crawler (Site2pdf.online)FastYesYesSaving an entire site fast
Wayback MachineSlow / uncertainPartialPartialBackup copy
Browser Save as PDFManualNoPartialA few key pages
Save HTML (Ctrl+S)ManualNoPartialOffline single pages

After You’ve Captured It

Once the site is saved, do two things while you can still cross-check against the live version:

  1. Spot-check the important pages. Open your archive and confirm the pages that actually matter came through complete — not cut off, not blank.
  2. Store a second copy. Keep the archive in at least two places (local drive plus cloud). A backup you can’t find later is the same as no backup.

FAQ

How do I save an entire website before it shuts down?

Use a tool that crawls internal links and captures every page automatically — saving them one at a time won’t scale. Site2pdf.online takes a URL, follows the site’s links, and exports the whole site as a PDF, images, or a ZIP in one pass. For login- or paywall-protected pages, save those manually in your browser while logged in.

Will the Wayback Machine save the site automatically?

Not reliably. The Internet Archive crawls on its own schedule and may not capture recent pages, JavaScript content, or pages behind forms. You can submit individual URLs through its “Save Page Now” feature as a backup, but don’t depend on it as your only archive of a site that’s about to disappear.

Can I save a website that requires a login?

Yes, but not with an automated crawler — it only sees logged-out content. Log in yourself and save the protected pages manually using your browser’s Save as PDF (Ctrl+P) or Save Page (Ctrl+S) options.

What’s the best format to archive a website in?

PDF is the most portable and readable for sharing or reference. A full HTML save preserves the site’s structure for offline browsing. Images (PNG/JPG) are best when exact visual appearance matters — for example, as evidence. Many people keep a PDF for everyday use plus an HTML copy as a structural backup.


Act Before the Clock Runs Out

When a site is going offline, time is the only thing that matters. Capture the whole site first with a crawler, manually grab anything behind a login, and keep the Wayback Machine as a secondary copy. If you need to preserve an entire site right now, site2pdf.online captures every page in one pass — before it’s gone for good.


Sources