pynydus.engine.pipeline

Spawning pipeline.

Resolves Nydusfile directives, loads sources, runs gitleaks and Presidio on file text, invokes the platform spawner, optionally runs LLM refinement, then builds manifest and Egg records.

Pipeline steps: 1. Resolve base egg (FROM directive) 2. Read source files 3. Redaction (file filtering, secret scan, PII redaction) 4. Parse sources via spawner connector 5. Build structured records (skills, memory, secrets) 6. Merge with base egg (FROM + SOURCE) 7. LLM refinement (optional) 8. Post-processing (custom labels, memory exclusions) 9. Package egg

Module Contents

Classes

PipelineContext

Mutable context passed through each pipeline phase.

Functions

spawn

Run the spawning pipeline.

ensure_gitleaks_if_needed

Raise if gitleaks is required but not installed.

_resolve_base_egg

If FROM is present, load and merge the base egg.

_is_registry_ref

Check if a base egg reference looks like a registry ref (name:version).

_pull_registry_egg

Pull a registry egg to a temp file and return its path.

_read_source_files

Read source files into independent per-group dicts.

_read_files_from_path

Read text files matching patterns from a directory.

_filter_files_by_patterns

Remove files whose keys match any exclude glob.

_scan_secrets_gitleaks

Replace secrets with {{SECRET_NNN}} placeholders via gitleaks.

_redact_pii

Replace PII with {{PII_NNN}} placeholders via Presidio.

_get_spawner

Return the spawner connector for the given agent type.

_parse_sources

Parse redacted files, dispatching each source group to its own spawner.

_build_skills_module_from_parse

Convert ParseResult skills into AgentSkill objects.

_build_mcp_module_from_parse

Build McpModule from ParseResult’s raw MCP config dicts.

_build_memory_module_from_parse

Convert ParseResult memory into MemoryRecord objects.

_merge_skills

Combine base egg skills with freshly extracted skills, re-numbering IDs.

_merge_mcp

Merge MCP server configs from base egg and parsed source.

_merge_memory

Combine base egg memory with freshly extracted memory, re-numbering IDs.

_merge_secrets

Combine base egg secrets with extracted secrets, deduplicating by name.

_apply_custom_labels

Override memory record labels based on source_store pattern matching.

_drop_memory_records_with_excluded_labels

Remove memory records whose label is listed in excluded.

_package_egg

Construct the final Egg with manifest and neutral metadata fields.

_stash_apm

Find and return apm.yml content from source files (passthrough).

_resolve_a2a_card

Return A2A card: passthrough from source or generate from egg.

_embed_spec_snapshots

Load spec markdown files and build the snapshots dict with manifest.json.

_generate_standards_artifacts

Generate A2A card, AGENTS.md, spec snapshots, and stash apm.yml.

Data

API

pynydus.engine.pipeline.logger

‘getLogger(…)’

class pynydus.engine.pipeline.PipelineContext

Mutable context passed through each pipeline phase.

All Nydusfile fields are front-loaded here at the start of the pipeline. No phase should reach back into NydusfileConfig.

nydusfile_dir: pathlib.Path

None

sources: list[pynydus.engine.nydusfile.SourceDirective]

‘field(…)’

base_egg: str | None

None

merge_ops: list[pynydus.engine.nydusfile.MergeOp]

‘field(…)’

redact: bool

True

excluded_memory_labels: list[pynydus.common.enums.MemoryLabel]

‘field(…)’

custom_labels: dict[str, str]

‘field(…)’

source_remove_globs: list[str]

‘field(…)’

agent_type: pynydus.common.enums.AgentType | None

None

llm_config: pynydus.llm.LLMTierConfig | None

None

spawn_log: list[dict]

‘field(…)’

pynydus.engine.pipeline.spawn(config: pynydus.engine.nydusfile.NydusfileConfig, *, nydusfile_dir: pathlib.Path, llm_config: pynydus.llm.LLMTierConfig | None = None) tuple[pynydus.api.schemas.Egg, dict[str, str], dict[str, list[dict]]]

Run the spawning pipeline.

This is the single entry point for spawn: it enforces prerequisites such as ensure_gitleaks_if_needed before any file reads or redaction.

Args: config: Parsed Nydusfile (sources, FROM, merge ops, redaction flags). nydusfile_dir: Directory containing the Nydusfile (resolves relative paths). llm_config: Optional LLM tier for spawn Step 7 refinement.

Returns: (egg, raw_artifacts, logs): the spawned Egg, redacted source file contents, and pipeline log entries (e.g. {"spawn_log": [...]}).

Raises: NydusfileError: If the Nydusfile is invalid (e.g. multiple SOURCE lines). GitleaksNotFoundError: When redaction requires gitleaks but it is missing.

pynydus.engine.pipeline.ensure_gitleaks_if_needed(config: pynydus.engine.nydusfile.NydusfileConfig) None

Raise if gitleaks is required but not installed.

Secret scanning is required when REDACT is true (the default) and at least one SOURCE directive is present. FROM-only spawns and REDACT false pipelines skip file-level scanning entirely.

Args: config: Parsed Nydusfile configuration.

Raises: GitleaksNotFoundError: When scanning is required but gitleaks is not found.

pynydus.engine.pipeline._resolve_base_egg(ctx: pynydus.engine.pipeline.PipelineContext) tuple[pynydus.api.schemas.EggPartial | None, pynydus.common.enums.AgentType | None]

If FROM is present, load and merge the base egg.

Returns (partial, agent_type): partial is the merged base egg, agent_type is the base egg’s manifest agent type.

pynydus.engine.pipeline._is_registry_ref(ref: str) bool

Check if a base egg reference looks like a registry ref (name:version).

pynydus.engine.pipeline._pull_registry_egg(ref: str) str

Pull a registry egg to a temp file and return its path.

pynydus.engine.pipeline._read_source_files(ctx: pynydus.engine.pipeline.PipelineContext) list[tuple[pynydus.common.enums.AgentType, pathlib.Path, dict[str, str]]]

Read source files into independent per-group dicts.

Returns at most one (agent_type, source_root, files) tuple (at most one SOURCE). Each group’s dict has bare filename keys and is independent: no merging is performed here.

pynydus.engine.pipeline._read_files_from_path(root: pathlib.Path, patterns: list[str]) dict[str, str]

Read text files matching patterns from a directory.

pynydus.engine.pipeline._filter_files_by_patterns(files: dict[str, str], patterns: list[str]) dict[str, str]

Remove files whose keys match any exclude glob.

pynydus.engine.pipeline._scan_secrets_gitleaks(files: dict[str, str], ctx: pynydus.engine.pipeline.PipelineContext, *, start_index: int = 1) tuple[dict[str, str], list[pynydus.api.schemas.SecretRecord], int]

Replace secrets with {{SECRET_NNN}} placeholders via gitleaks.

Writes scannable files to a temp directory, runs gitleaks, maps findings back to in-memory dict keys. Ignored (binary) files pass through unchanged.

Returns (redacted_files, credential_records, next_index).

pynydus.engine.pipeline._redact_pii(files: dict[str, str], ctx: pynydus.engine.pipeline.PipelineContext, *, start_index: int = 1) tuple[dict[str, str], list[pynydus.api.schemas.SecretRecord], int]

Replace PII with {{PII_NNN}} placeholders via Presidio.

Returns (redacted_files, pii_records, next_index) so callers can chain the counter across multiple groups.

pynydus.engine.pipeline._get_spawner(agent_type: pynydus.common.enums.AgentType)

Return the spawner connector for the given agent type.

pynydus.engine.pipeline._parse_sources(source_groups: list[tuple[pynydus.common.enums.AgentType, pathlib.Path, dict[str, str]]], ctx: pynydus.engine.pipeline.PipelineContext) pynydus.api.raw_types.ParseResult

Parse redacted files, dispatching each source group to its own spawner.

Each group’s dict is already redacted: it is passed directly to the spawner with bare filename keys.

pynydus.engine.pipeline._build_skills_module_from_parse(parse_result: pynydus.api.raw_types.ParseResult, agent_type: pynydus.common.enums.AgentType) pynydus.api.schemas.SkillsModule

Convert ParseResult skills into AgentSkill objects.

pynydus.engine.pipeline._build_mcp_module_from_parse(parse_result: pynydus.api.raw_types.ParseResult) pynydus.api.schemas.McpModule

Build McpModule from ParseResult’s raw MCP config dicts.

pynydus.engine.pipeline._build_memory_module_from_parse(parse_result: pynydus.api.raw_types.ParseResult, agent_type: pynydus.common.enums.AgentType) pynydus.api.schemas.MemoryModule

Convert ParseResult memory into MemoryRecord objects.

pynydus.engine.pipeline._merge_skills(base: pynydus.api.schemas.SkillsModule, extracted: pynydus.api.schemas.SkillsModule) pynydus.api.schemas.SkillsModule

Combine base egg skills with freshly extracted skills, re-numbering IDs.

pynydus.engine.pipeline._merge_mcp(base: pynydus.api.schemas.McpModule, extracted: pynydus.api.schemas.McpModule) pynydus.api.schemas.McpModule

Merge MCP server configs from base egg and parsed source.

pynydus.engine.pipeline._merge_memory(base: pynydus.api.schemas.MemoryModule, extracted: pynydus.api.schemas.MemoryModule) pynydus.api.schemas.MemoryModule

Combine base egg memory with freshly extracted memory, re-numbering IDs.

pynydus.engine.pipeline._merge_secrets(base: pynydus.api.schemas.SecretsModule, extracted: pynydus.api.schemas.SecretsModule) pynydus.api.schemas.SecretsModule

Combine base egg secrets with extracted secrets, deduplicating by name.

pynydus.engine.pipeline._apply_custom_labels(memory: pynydus.api.schemas.MemoryModule, custom_labels: dict[str, str], spawn_log: list[dict] | None = None) None

Override memory record labels based on source_store pattern matching.

pynydus.engine.pipeline._drop_memory_records_with_excluded_labels(memory: pynydus.api.schemas.MemoryModule, excluded: list[pynydus.common.enums.MemoryLabel], spawn_log: list[dict] | None = None) pynydus.api.schemas.MemoryModule

Remove memory records whose label is listed in excluded.

pynydus.engine.pipeline._package_egg(ctx: pynydus.engine.pipeline.PipelineContext, skills: pynydus.api.schemas.SkillsModule, mcp: pynydus.api.schemas.McpModule, memory: pynydus.api.schemas.MemoryModule, secrets: pynydus.api.schemas.SecretsModule, parse_result: pynydus.api.raw_types.ParseResult | None = None) pynydus.api.schemas.Egg

Construct the final Egg with manifest and neutral metadata fields.

pynydus.engine.pipeline._stash_apm(groups: list[tuple[pynydus.common.enums.AgentType, pathlib.Path, dict[str, str]]], ctx: pynydus.engine.pipeline.PipelineContext) str | None

Find and return apm.yml content from source files (passthrough).

pynydus.engine.pipeline._resolve_a2a_card(groups: list[tuple[pynydus.common.enums.AgentType, pathlib.Path, dict[str, str]]], egg: pynydus.api.schemas.Egg, ctx: pynydus.engine.pipeline.PipelineContext) dict | None

Return A2A card: passthrough from source or generate from egg.

pynydus.engine.pipeline._embed_spec_snapshots(ctx: pynydus.engine.pipeline.PipelineContext) dict[str, str] | None

Load spec markdown files and build the snapshots dict with manifest.json.

pynydus.engine.pipeline._generate_standards_artifacts(egg: pynydus.api.schemas.Egg, groups: list[tuple[pynydus.common.enums.AgentType, pathlib.Path, dict[str, str]]], ctx: pynydus.engine.pipeline.PipelineContext) pynydus.api.schemas.Egg

Generate A2A card, AGENTS.md, spec snapshots, and stash apm.yml.

Mutates nothing. Returns a new Egg via model_copy().