8(a) Business Development Program
SBA program for socially and economically disadvantaged small businesses. 9-year program providing federal contracting access, mentorship, and training. Stricter eligibility than other set-asides.
A reference glossary covering 177 terms across artificial intelligence, machine learning, SaaS architecture, custom software, web development, SEO/AEO/AIO/GEO, SDVOSB federal contracting, and compliance. Maintained by That AI Guy. Each term has a stable URL fragment for citation.
SBA program for socially and economically disadvantaged small businesses. 9-year program providing federal contracting access, mentorship, and training. Stricter eligibility than other set-asides.
Americans with Disabilities Act, Title III applied to commercial websites. Lacks specific technical standards but DOJ accepts WCAG AA as a benchmark. ADA lawsuits common.
AI systems that operate semi-autonomously through tool use, planning, and reasoning. Enables multi-step workflows like 'research this topic and write a report' or 'fix this bug across the codebase'.
A commercial SEO platform competitor to Semrush. Excellent backlink index. Keyword research, content gap analysis. $99-$1000/mo.
An LLM-powered system that can take actions through tools (web search, file systems, APIs, code execution) to accomplish multi-step tasks autonomously. Includes Claude Code, ChatGPT operator-style systems, and open frameworks like LangGraph.
Being cited as a source by an AI engine (ChatGPT, Claude, Perplexity, Gemini) when answering user questions. The new center of gravity in search visibility.
Structuring a site to be cited by ChatGPT, Claude, Perplexity, Gemini, and other LLM-based search engines. Includes llms.txt, factual density, citation worthiness, and embedding distinctiveness.
The visible clickable text of a link. Descriptive anchor text helps users and search engines understand the destination. Avoid 'click here' and 'read more.'
Optimizing for direct answer engines (featured snippets, People Also Ask, knowledge panels) and voice assistants (Alexa, Siri, Google Assistant). Often emphasizes Q&A structure, schema, and entity authority.
AI safety lab founded in 2021. Creator of the Claude family (Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 4.x Opus/Sonnet/Haiku) and Constitutional AI methodology. API at anthropic.com.
Apple's ARM-based processors (M1, M2, M3, M4). High performance per watt. Increasingly used for local AI inference because of unified memory architecture. Powers MEGAMIND.
A defined way for software to talk to other software. Typically HTTP/REST or GraphQL. APIs let you integrate services like Stripe billing, Twilio SMS, OpenAI, or your own internal tools.
Computer systems that perform tasks typically requiring human intelligence — language understanding, decision making, perception, and reasoning. In small business contexts, AI usually refers to the use of large language models like GPT-4 or Claude to automate or enhance specific workflows.
A multi-framework static site generator. Renders content as plain HTML by default, hydrates only what needs interactivity. Excellent for content-heavy sites and blogs.
A method in neural networks for weighting how strongly each part of an input influences each part of an output. Allows transformers to focus on relevant context when generating each token.
Code that performs a repetitive task without human intervention — file processing, data sync, scheduled report generation. Often written in Python, Node.js, or Bash.
A link from another website pointing to yours. Quality (authority, relevance) matters more than quantity. Toxic backlinks (spam, paid) can harm rankings.
Microsoft's webmaster tool for Bing search. Provides similar data to GSC. Supports IndexNow for immediate URL submission. Required since Microsoft Copilot is Bing-integrated.
Best Matching 25 — a classic keyword-based ranking function used in search engines. Powers Elasticsearch and OpenSearch. Often combined with semantic search in hybrid retrieval.
Schema markup for site navigation showing the path from homepage to current page. Renders as a hierarchy in Google search results.
The official, preferred URL for a piece of content. Set with . Critical for preventing duplicate content issues. Self-referencing canonicals are required on canonical pages.
A 1-2 page document summarizing a federal contractor's core competencies, past performance, certifications, NAICS codes, and contact info. Required for any federal pursuit.
California Consumer Privacy Act / California Privacy Rights Act. Gives California residents rights over their personal data similar to GDPR. CPRA expanded CCPA in 2023.
A prompting technique where the LLM is asked to reason step-by-step before producing an answer. Improves accuracy on math, logic, and complex reasoning. Many modern models do CoT automatically when needed.
Splitting documents into smaller pieces before embedding. Strategies: fixed-size, sentence-based, paragraph-based, semantic, recursive. Chunk size affects retrieval quality dramatically.
Continuous Integration / Continuous Deployment. Automated testing and deployment pipelines triggered by code changes. Tools: GitHub Actions, GitLab CI, CircleCI, Vercel.
Any mention of a business's NAP across the web. Structured citations: directory listings (Yelp, BBB, Yellowpages). Unstructured: news articles, blog mentions. Both improve local authority.
Anthropic's web crawlers. ClaudeBot is the search/citation crawler; anthropic-ai is the broader user-agent. Allow both for Claude citations.
Serverless JavaScript/TypeScript functions that run on Cloudflare's edge network. Fast cold start, global deployment. Powers many production AI integrations.
Cybersecurity Maturity Model Certification — DoD requirement for contractors handling controlled unclassified information (CUI). Levels 1-5; Level 2 most common for SDVOSB AI/SaaS work.
A systematic review of all content on a site for quality, performance, and relevance. Identifies pages to keep, improve, consolidate, or remove.
A global network of servers caching static assets close to users. Reduces latency, increases availability. Cloudflare, Fastly, AWS CloudFront, Bunny CDN, Akamai are major providers.
The maximum number of tokens an LLM can process in one prompt. GPT-4-Turbo has 128K tokens; Claude 3.5 Sonnet has 200K; Claude with extended thinking can reach 1M. Larger windows allow more documents in RAG.
A broad change to Google's ranking algorithm released several times per year. Recent: March 2024 (largest ever), August 2024, November 2024, December 2024, March 2025. Impacts can take weeks to settle.
Three Google performance metrics: LCP (Largest Contentful Paint, target < 2.5s), INP (Interaction to Next Paint, target < 200ms), CLS (Cumulative Layout Shift, target < 0.1). Required for good rankings.
A time-based job scheduler in Unix-like systems. Runs scripts at specified intervals (every hour, daily at 3am, etc.). Cloud equivalents: AWS EventBridge, GitHub Actions schedule, Vercel Cron.
The amount of unexpected layout shift while a page loads. Target under 0.1. Improved with explicit image dimensions, reserved space for ads, and avoiding late-loading content.
Software built specifically for one organization's needs rather than purchased off the shelf. Examples: internal tools, custom dashboards, automation scripts, data pipelines, custom APIs.
A series of steps that move and transform data from source to destination. ETL (Extract-Transform-Load) or ELT (Extract-Load-Transform). Tools: Airflow, Prefect, Dagster, dbt.
Containerization platform. Packages applications with their dependencies into portable containers. Industry standard for deployment.
Experience, Expertise, Authoritativeness, Trustworthiness — Google's framework for evaluating content sources. Critical for YMYL queries. Implemented through author bylines, credentials, sameAs, citations, and editorial signals.
A numerical vector representation of text where similar meanings produce similar vectors. Used in semantic search to find relevant documents by meaning rather than keyword match. Common dimensions are 384, 768, 1024, 1536.
How important an entity is within a piece of content based on NLP analysis. Per-page entity engineering helps Google understand what each page is primarily about.
The process of pulling data from source systems, transforming it (cleaning, joining, enriching), and loading it into a destination database or warehouse.
Schema for marking up question-and-answer content. Can produce rich results in search. Use only when the page genuinely answers questions.
A direct answer extracted from a webpage and shown at the top of Google results. Position zero. Triggered by question queries, achieved through clear, structured Q&A content.
Federal Risk and Authorization Management Program — security standard for cloud services used by federal agencies. Low/Moderate/High impact levels. Long, expensive certification.
Providing the LLM with several worked examples in the prompt before asking it to handle a new case. Helps the model match a desired output format, style, or reasoning pattern.
Training an existing LLM on additional data to specialize its behavior or knowledge. More expensive than RAG and harder to update. Often misused — for most use cases RAG is faster, cheaper, and easier to maintain.
A pricing model with a free tier and paid upgrades. Common in SaaS to drive viral adoption. Risk: users stay on free tier; signal quality drops.
A capability where an LLM outputs structured JSON specifying which tool/function to invoke with which arguments. Enables agents to interact with external systems. OpenAI calls it function calling, Anthropic calls it tool use.
General Data Protection Regulation — EU law on personal data. Requires lawful basis for processing, data subject rights (access, deletion), breach notification within 72 hours.
Optimizing for generative AI search results — specifically Google AI Overviews and similar synthesized answer surfaces. A subset of AIO focused on Google's SGE-derived experiences.
Distributed version control system. Standard for source code management. GitHub, GitLab, Bitbucket are hosted providers.
The largest hosted Git provider. Owned by Microsoft. Provides repos, pull requests, issues, Actions (CI/CD), Pages (static hosting), and Copilot.
Google's generative answer feature appearing above organic results. Synthesizes information from multiple sources with citations. Major shift in search visibility — appearing here is increasingly critical.
Formerly Google My Business. The free listing system for local businesses on Google Search and Maps. Critical for local SEO. Verified ownership unlocks insights and review responses.
Google's family of multimodal LLMs. Powers Google AI Overviews, Bard (now Gemini), and Vertex AI. Strong at long context (1M+ tokens) and code.
Google's free webmaster tool. Provides indexing status, crawl errors, query data, click-through rate, and Core Web Vitals. Required for any serious SEO effort.
A separate user-agent Google uses for Bard/Gemini training. Distinct from Googlebot (which is for Search). Allow Google-Extended to be considered for Gemini training.
OpenAI's web crawler used to gather data for ChatGPT. Allow it in robots.txt to be considered for ChatGPT citations. Blocks reduce LLM training inclusion.
A query language for APIs developed by Meta. Clients request exactly the data they need. More flexible than REST for complex frontends but more complex to operate.
General Services Administration — runs federal-wide acquisition programs. The GSA Schedule (formerly Schedule 70 for IT) is a multiple award contract pre-vetting vendors for federal agencies.
When an LLM generates plausible-sounding but factually incorrect output. Occurs because LLMs predict statistically likely tokens, not verify facts. Mitigated by RAG, citation requirements, and fact-checking layers.
A website built directly in HTML, CSS, and JavaScript without a CMS or builder. Faster, lighter, and more controllable than WordPress or Squarespace. Tradeoff: requires a developer for content updates.
A content management system that exposes content via API but doesn't render it. Frontend is decoupled, can be built with any framework. Examples: Sanity, Contentful, Strapi, Payload.
Google's framework for evaluating whether content was created primarily for users (helpful) or for search engines (low-quality). Asks Who/How/Why questions.
Health Insurance Portability and Accountability Act — US law protecting health information. Requires Business Associate Agreements (BAA), encryption, audit logs, and breach notification.
SaaS used across many industries (CRM, project management, file storage). Larger market but harder to differentiate. Examples: Salesforce, Notion, Slack.
Schema for marking up step-by-step instructional content. Each step gets a HowToStep type with name, text, and optional image. Often used with totalTime and estimatedCost.
A meta tag specifying the language and region of a page. Enables Google to serve the right version to the right audience. Common values: en-US, es-US, en-GB, x-default.
The 2015 update to HTTP. Multiplexed streams, header compression, server push. Faster than HTTP/1.1 for sites with many resources. Required for modern SEO.
The 2022 update to HTTP. Built on QUIC (UDP-based) instead of TCP. Faster connection setup, better mobile performance. Adoption growing across modern hosting.
Historically Underutilized Business Zone — a SBA program for businesses in distressed areas. Set-aside category for federal contracts.
Open AI hub hosting hundreds of thousands of models, datasets, and demos. Provides Transformers library, Inference Endpoints, and Spaces. Standard repository for open source ML.
Combining keyword search (BM25) with semantic search (embeddings) to get the best of both. Often paired with a reranker for final ranking. Standard for production RAG.
A Next.js feature that combines static generation with on-demand revalidation. Pages are static but can be regenerated when content changes. Best of static + dynamic.
An open protocol for instant URL submission to search engines. Bing, Yandex, and Naver participate. Submit URL on publish/update; engines respond within minutes.
Google Patent US 11,995,114 B2. A measure of how much new information a piece of content provides relative to existing search results. Original research, contrarian analysis, first-hand experience score high.
The time from a user interaction (click, tap, key press) to the next paint update. Target under 200ms. Improved with smaller JavaScript bundles, breaking up long tasks, and using Web Workers.
A link from one page on a site to another. Distributes PageRank, provides crawl paths, and helps thematic clustering. Hub-and-spoke architecture is the dominant pattern.
Software used inside an organization (not by customers). Common types: admin dashboards, data entry forms, reporting tools, automation panels. Often built fast with React + a backend or with Retool.
JavaScript + APIs + Markup. An architecture pattern where static HTML is served from CDN, dynamic features come from JavaScript and external APIs. Faster, more secure, and easier to scale than traditional server-rendered apps.
A compact, URL-safe token format used for authentication. Contains claims signed by a secret. Common in API authentication and stateless session management.
JavaScript Object Notation for Linked Data — the format Google recommends for schema.org markup. Embedded in