Commands¶

This page contains all Makora CLI command documentation.

Command Sections¶

Authentication
Generate & Optimize
Jobs & Sessions
Kernels & Results
Evaluate
Profile
Expert Generate
Search
Plugin Install

Authentication Commands¶

`makora login`¶

Authenticate with the Makora API. Stores credentials locally for all subsequent commands.

Usage¶

# Interactive — prompts for token
makora login

# Non-interactive — pass token directly
makora login --token YOUR_TOKEN

Options¶

Option	Type	Description
`--token`	string	API token (skips interactive prompt)
`--user`	string	Username (optional)
`--url`	string	Override the Makora API URL
`--quite`	flag	Disable interactive prompts

Where to Get a Token¶

Go to https://generate.makora.com/tokens
Log in or create an account
Create a new token or copy an existing one

Credential Storage¶

Tip

Credentials are stored as a plain text file. For CI/CD pipelines, use the MAKORA_USER_FILE environment variable to point to a credentials file managed by your secrets system.

Credentials are saved to ~/.makora/user by default. Override this with the MAKORA_USER_FILE environment variable.

Examples¶

# Interactive login
makora login

# Script-friendly login
makora login --token mk-abc123def456

`makora logout`¶

Remove stored credentials.

Usage¶

makora logout

Deletes the credential file at ~/.makora/user.

`makora info`¶

Display version information, login status, and environment variable settings.

Usage¶

makora info

Output¶

Makora version: 0.1.0
    Repo: makora-cli
    Commit: abc123

Logged in as: user@example.com

Env Variable       Value                                    Default
MAKORA_AUTH_URL     https://be.stage.makora.com/api/v1/      https://be.stage.makora.com/api/v1/
MAKORA_NO_RICH
MAKORA_URL          https://generate.stage.makora.com        https://generate.stage.makora.com
MAKORA_USER_FILE    ~/.makora/user                           ~/.makora/user

Environment Variables¶

Note

These variables are only needed for advanced use cases like pointing at a staging server or custom credential paths. Most users won't need to change them.

Variable	Default	Description
`MAKORA_URL`	`https://generate.stage.makora.com`	Base URL for the Makora Generate API
`MAKORA_USER_FILE`	`~/.makora/user`	Path to the credential file
`MAKORA_AUTH_URL`	`https://be.stage.makora.com/api/v1/`	Base URL for the authentication API
`MAKORA_NO_RICH`	(empty)	Set to any value to disable Rich text formatting

Generate & Optimize¶

`makora generate`¶

Run generation on a problem file for optimization. Makora validates the file, then creates an optimization session that generates progressively faster kernels.

Usage¶

makora generate --file <path> --device <device> [options]

Options¶

Option	Type	Default	Description
`--file`	path	required	Path to the problem file
`--device`	enum	required	Target device (`H100`, `H200`, `B200`, `L40S`, `MI300X`, `Adreno 830`, `Adreno 750`, `Hexagon v79`, `Hexagon v75`)
`--language`	enum	device default	Kernel language (`cuda`, `triton`, `cutedsl`, `hip`, `opencl`, `ripple`)
`--label`	string	`""`	Label for the session (visible in `makora jobs`)
`--atol`	float	`0.01`	Absolute tolerance for correctness validation (see Tolerances)
`--rtol`	float	`0.01`	Relative tolerance for correctness validation (see Tolerances)
`--fix`	flag	`false`	Enable automatic fix suggestions for validation errors
`--instr`	path(s)	none	Path(s) to instruction files providing optimization context
`--url`	string	none	Override the Makora API URL

How It Works¶

When you run makora generate, the following happens:

Validation — Your problem file is uploaded and validated:
- Compilation check
- Preparing objects for execution
- Benchmarking to establish baseline performance
Session creation — If validation passes, an optimization session starts
Kernel generation — The platform generates and benchmarks optimized kernels in the background

The `--fix` Flag¶

Tip

Always try --fix when a run fails validation. It can automatically correct common issues like missing imports, wrong class names, or tensor device placement.

If validation fails, use --fix to get automatic fix suggestions:

makora generate --file problem.py --device H100 --fix

When a fix is available, Makora shows the suggested changes and asks if you want to accept them. If you accept, the fixed code is run again automatically.

Without --fix, a failing run prints a hint:

Hint: try generating with --fix to get automatic fix suggestions:
  makora generate --file problem.py --device H100 --fix

Instruction Files (`--instr`)¶

The --instr flag is how you steer the optimization agent. Makora's optimizer is an AI agent that generates and iterates on kernel code — instruction files let you inject your own expert knowledge into that process. Think of it as pair-programming with the agent: you bring the domain expertise, it brings the implementation speed. This is also where you'd provide an existing kernel implementation if you want the agent to start from a particular baseline instead of a blank slate.

This is your opportunity to nudge the agent toward specific optimization strategies, low-level techniques, or hardware-specific tricks that you know will work for your problem. Without instructions, the agent explores on its own. With instructions, you can point it directly at the approach you want.

makora generate --file problem.py --device H100 --instr hints.txt

Multiple instruction files can be combined:

makora generate --file problem.py --device H100 --instr technique.txt --instr constraints.txt

Instruction files are plain text. Their contents are concatenated and passed as context to the optimization agent.

What to Put in Instruction Files¶

You can include anything that helps the agent write better kernels:

Specific optimization techniques — "Use double buffering with shared memory" or "Apply register tiling with a 8x8 thread tile"
Low-level intrinsics — "Use __ldg() for read-only global memory loads" or "Use warp shuffle __shfl_sync() for the reduction"
Memory access patterns — "The input matrices are always power-of-2 aligned, so you can assume 128-byte aligned loads"
Architecture-specific knowledge — "On H100, the L2 cache is 50MB — the working set fits entirely in L2"
Algorithmic hints — "This is a tall-skinny matmul (M>>N), so parallelize along M and use a serial reduction along K"
Constraints — "Do not use torch.compile" or "The solution must be a single fused kernel"
Reference implementations — Paste in a known-good approach from a paper or library and tell the agent to build on it

Example: Guiding a Matrix Multiply with Expert CUDA Knowledge¶

Say you're optimizing a matrix multiply and you know from experience that on H100, the key to peak throughput is using cp.async to overlap global-to-shared-memory copies with computation, combined with warp-specialized persistent kernels (the approach used by CUTLASS 3.x).

Create a file h100-matmul-hints.txt:

Use an asynchronous warp-specialized persistent kernel design for this matmul:

1. Partition warps into producer and consumer roles. Producer warps issue
   cp.async (or TMA on H100) to load tiles from global memory into shared
   memory. Consumer warps compute on the previously loaded tiles using
   tensor core mma instructions (m16n8k16 for fp32 accum).

2. Use multi-stage software pipelining with at least 3 shared memory buffers
   so that loads, computes, and stores can overlap across pipeline stages.

3. Use the following tiling:
   - Thread block tile: 128x256xK
   - Warp tile: 64x64xK
   - Use ldmatrix (PTX: ldmatrix.sync.aligned.m8n8.x4) for shared-to-register
     loads to feed the tensor cores efficiently.

4. Use inline PTX for the cp.async instructions:
   asm volatile("cp.async.cg.shared.global [%0], [%1], %2;" :: "r"(smem_ptr), "l"(gmem_ptr), "n"(16));
   asm volatile("cp.async.commit_group;");
   asm volatile("cp.async.wait_group %0;" :: "n"(stages - 2));

5. Epilogue: use vectorized 128-bit stores (float4) to write the result
   tile back to global memory with full memory coalescing.

Run with the instruction file:

makora generate --file matmul.py --device H100 --instr h100-matmul-hints.txt

Instead of exploring broadly, the agent will focus on implementing the specific warp-specialized persistent kernel approach you described — and it can often get there much faster than discovering this strategy on its own.

Tips¶

Getting the most out of instruction files

Be specific. "Make it faster" doesn't help. "Use 128x128 thread block tiles with 8 pipeline stages" does.
Include code snippets. If you know the exact PTX or intrinsic call, paste it in. The agent can incorporate it directly.
Combine with --language. If your instructions reference CUDA intrinsics, make sure you're running with --language cuda. If they reference Triton tl.dot tuning, use --language triton.
Iterate. Check results with makora kernels, then refine your instructions and run generate again.

Examples¶

# Basic run on H100
makora generate --file problem.py --device H100

# Generate with Triton on H100
makora generate --file problem.py --device H100 --language triton

# Generate on AMD MI300X
makora generate --file problem.py --device MI300X

# Generate with a label and fix suggestions
makora generate --file problem.py --device H100 --label "matmul-v2" --fix

# Generate with custom tolerances
makora generate --file problem.py --device H100 --atol 1e-3 --rtol 1e-3

# Generate with instruction context
makora generate --file problem.py --device H100 --instr optimization-hints.txt

Output¶

Device: H100
Language: cuda

✓ Validation passed
  Compilation: passed
  Preparation: passed
  Benchmarking: passed (1.234 ms)

Session created!
  Session ID: a1b2c3d4
  Problem ID: e5f6a7b8

Monitor progress with: makora jobs

Jobs & Sessions¶

`makora jobs`¶

List all your optimization sessions and their current status.

Usage¶

makora jobs [--fast]

Options¶

Option	Type	Default	Description
`--fast`	flag	`false`	Skip fetching extra data (device, speedup) for faster output

Output Columns¶

Column	Description
Session ID	First 8 characters of the session UUID
Status	Current status (running, completed, failed, stopped, etc.)
Label	Session label (set with `--label` when running `makora generate`), truncated to 20 characters
Device	Target device (omitted with `--fast`)
vs torch.compile	Best speedup vs `torch.compile` baseline (omitted with `--fast`)
Started	Relative time since session started

Examples¶

# List all jobs with full details
makora jobs

# Quick listing (skip device/speedup lookups)
makora jobs --fast

Output¶

                              Jobs
 Session ID   Status        Label        Device   vs torch.compile   Started
 a1b2c3d4     ● running     matmul-v2    H100     1.94x              5m ago
 e5f6a7b8     ● completed   conv-test    L40S     2.31x              1h ago
 c9d0e1f2     ● failed      -            MI300X   -                  3h ago

`makora stop`¶

Stop a running optimization session.

Usage¶

makora stop <job_uuid>

Arguments¶

Argument	Description
`job_uuid`	The UUID (or UUID prefix) of the session to stop

Tip

UUID prefix matching is supported — you only need enough characters to uniquely identify the session. In most cases the first 4-8 characters are enough.

Examples¶

# Stop using full UUID
makora stop a1b2c3d4-e5f6-7890-abcd-ef1234567890

# Stop using prefix (must be unique)
makora stop a1b2c3d4

# Stop using short prefix
makora stop a1b2

Output¶

Found job: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Job a1b2c3d4 stopped successfully.

Kernels & Results¶

`makora kernels`¶

View the optimized kernels generated by an optimization session.

Usage¶

# List all kernels for a session
makora kernels <session_id>

# View a specific kernel's code and performance
makora kernels <session_id> <kernel_id>

# Save kernel code to a file
makora kernels <session_id> <kernel_id> -o <output_file>

Arguments¶

Argument	Description
`session_id`	Session ID or prefix
`kernel_id`	(Optional) Kernel ID or prefix — shows code and performance details

Options¶

Option	Type	Description
`-o`, `--output`	path	Save kernel code to a file instead of printing it

Prefix matching

Both session_id and kernel_id support prefix matching — you only need enough characters to uniquely identify the target. The first 4-8 characters usually suffice.

Listing Kernels¶

makora kernels a1b2c3d4

Output Columns¶

Column	Description
Attempt	Which optimization attempt generated this kernel
Kernel ID	First 8 characters of the kernel UUID
Name	Kernel name (truncated to 15 characters)
Status	Evaluation status (completed, failed, or close-miss with tolerance info)
Time	Execution time (with unit)
vs torch.compile	Speedup compared to `torch.compile` baseline

Example Output¶

              Kernels for a1b2c3d4 (matmul-v2)
 Attempt   Kernel ID   Name          Status        Time       vs torch.compile
 1         f1e2d3c4    kernel_v1     ● completed   0.523 ms   1.82x
 2         b5a6c7d8    kernel_v2     ● completed   0.491 ms   1.94x
 3         a9b0c1d2    kernel_v3     ● failed      -          -

Viewing Kernel Code¶

makora kernels a1b2c3d4 b5a6c7d8

Displays the full kernel source code with syntax highlighting, followed by performance metrics:

── kernel_v2 (b5a6c7d8) ──
  Kernel time:      0.491 ms
  Reference eager:  1.234 ms
  torch.compile:    0.952 ms
  vs eager:         2.51x
  vs torch.compile: 1.94x

Saving Kernel Code¶

makora kernels a1b2c3d4 b5a6c7d8 -o solution.py

Kernel saved to: solution.py

Examples¶

# List all kernels from a session
makora kernels a1b2c3d4

# View best kernel's code
makora kernels a1b2c3d4 b5a6c7d8

# Save kernel for evaluation
makora kernels a1b2c3d4 b5a6c7d8 -o solution.py

`makora refcode`¶

View the original reference code (problem file) that was used for a session.

Usage¶

makora refcode <session_id> [-o <output_file>]

Arguments¶

Argument	Description
`session_id`	Session ID or prefix

Options¶

Option	Type	Description
`-o`, `--output`	path	Save reference code to a file

Examples¶

# View the original problem code
makora refcode a1b2c3d4

# Save it to a file
makora refcode a1b2c3d4 -o original_problem.py

Evaluate¶

`makora evaluate`¶

Benchmark an optimized kernel against a reference implementation on remote hardware. Returns execution times and speedup.

Usage¶

makora evaluate <reference_file> <optimized_file> [options]

Arguments¶

Argument	Description
`reference_file`	Path to the reference/problem file
`optimized_file`	Path to the optimized solution file

Options¶

Option	Type	Default	Description
`-d`, `--device`	string	`L40S`	See all devices in Supported Hardware
`--url`	string	none	Override the Makora API URL

Device names are case-insensitive.

Output¶

Evaluating code...

✓ Evaluation successful!

Benchmark Results:
  Reference time: 1.234567 ms
  Solution time:  0.491234 ms
  Speedup:        2.51x

Examples¶

# Evaluate on default device (H100)
makora evaluate problem.py solution.py

# Evaluate on H100
makora evaluate problem.py solution.py --device H100

# Evaluate on AMD MI300X
makora evaluate problem.py solution.py --device MI300x

`makora check`¶

Tip

Use makora check to validate your problem file before committing to a full generate. It catches errors quickly without creating an optimization session.

Validate a problem file without starting an optimization session. Runs compilation, preparation, and benchmarking checks.

Usage¶

makora check <file> [--device <device>]

Arguments¶

Argument	Description
`file`	Path to the problem file to validate

Options¶

Option	Type	Default	Description
`--device`	enum	`H100`	Target device for validation (`H100`, `H200`, `B200`, `L40S`, `MI300X`, etc.)

Examples¶

# Validate on default device (H100)
makora check problem.py

# Validate for a specific device
makora check problem.py --device L40S
makora check problem.py --device MI300X

Output¶

Shows validation results including compilation, preparation, and benchmarking status. If validation fails, error logs are displayed.

Profile¶

`makora profile`¶

Profile an optimized kernel on remote hardware. While makora evaluate tells you how fast your kernel is, makora profile tells you why — returning hardware counters, occupancy data, Nsight Systems and Nsight Compute traces, and even the generated SASS assembly so you can see exactly what the GPU is doing.

Use profiling when you need to diagnose performance bottlenecks, verify that your optimization strategy is working at the hardware level, or gather data to inform your next round of makora generate ... --instr hints.

Currently, only the NVIDIA H100 is supported by the profiler.

Usage¶

makora profile <reference_file> <optimized_file> [options]

Arguments¶

Argument	Description
`reference_file`	Path to the reference/problem file
`optimized_file`	Path to the optimized solution file

Options¶

Option	Type	Default	Description
`-d`, `--device`	string	`H100`	Currently, only `H100` is supported
`--url`	string	none	Override the Makora API URL

What You Get Back¶

Profiling runs in full mode, which returns the most comprehensive data available. For each GPU kernel launched by your code, the output can include:

Raw Metrics¶

Hardware performance counters and execution statistics:

Metrics:
  duration_ns: 491234
  registers_per_thread: 32
  shared_memory_bytes: 8192
  grid_size: [128, 1, 1]
  block_size: [256, 1, 1]
  occupancy: 0.75

Interpreting raw metrics

These are the numbers you need to diagnose performance. For example:

High register count (e.g., 128+ per thread) → low occupancy, consider reducing register pressure
Low occupancy → not enough warps to hide memory latency, try smaller tile sizes or less shared memory per block
Short duration but many kernel launches → launch overhead is significant, consider fusing kernels

Details Page¶

Detailed kernel execution breakdown from the profiler, including timing per operation, memory throughput, and compute utilization.

Nsight Systems (nsys) Report¶

The full Nsight Systems trace output showing the timeline of GPU activity — kernel launches, memory transfers, synchronization points, and idle gaps. This is the same data you'd get from running nsys profile locally, but executed on remote hardware.

Additional Data in the API Response¶

Note

The data below is available through the API response even if not all fields are printed by the CLI. Use the API directly if you need access to SASS assembly or annotated source.

The profiling API also captures the following data (available through the API even if not all are printed by the CLI):

Data	Description
CUDA source	The compiled CUDA source code as seen by the profiler
SASS assembly	The actual GPU assembly (SASS) that ran on the hardware — the ground truth of what your kernel compiled to
Annotated source	Source code annotated with profiling data (hotspots, stall reasons)
Torch trace	PyTorch execution trace for understanding the operator-level breakdown

Example Output¶

Profiling code...

Profiling successful!

Profiled 2 kernel(s):

--- Kernel 1 ---

Metrics:
  duration_ns: 491234
  registers_per_thread: 32
  shared_memory_bytes: 49152
  grid_size: [128, 1, 1]
  block_size: [256, 1, 1]

Details:
  Compute Throughput:     78.3%
  Memory Throughput:      45.2%
  Achieved Occupancy:     75.0%
  Warp Execution Eff:     98.4%

Nsys Report:
  Time(%)  Total Time (ns)  Instances  Avg (ns)   Kernel Name
  -------  ---------------  ---------  ---------  -----------
   85.2%          491234          1    491234     matmul_kernel
   14.8%           85432          1     85432     elementwise_add

--- Kernel 2 ---
  ...

When to Use Profile vs Evaluate¶

	`makora evaluate`	`makora profile`
Purpose	Get speedup number	Understand why it's fast/slow
Speed	Fast	Slower (runs profiling tools)
Output	Reference time, solution time, speedup	Hardware counters, nsys trace, SASS, source annotations
Use when	Checking if your kernel is faster	Diagnosing bottlenecks, planning next optimization

Recommended workflow

evaluate first to see the speedup
If the speedup isn't what you expected, profile to find out why
Use profiling data to write better --instr hints for your next run

Examples¶

# Profile on default device (H100)
makora profile problem.py solution.py

# Profile on H100
makora profile problem.py solution.py --device H100

# Profile on AMD MI300X
makora profile problem.py solution.py --device MI300x

Expert Generate¶

`makora expert-generate`¶

Generate a single optimized GPU kernel using AI-powered expert optimization patterns. Unlike makora generate which runs a full optimization loop, this command generates a single optimized kernel and prints the code to stdout.

Usage¶

makora expert-generate <file> [options]

Arguments¶

Argument	Description
`file`	Path to the kernel file to optimize

Options¶

Option	Type	Default	Description
`-p`, `--problem`	path	none	Path to the reference/problem file for additional context
`-d`, `--device`	string	`L40S`	See full device list in Supported Hardware
`-l`, `--language`	string	`cuda`	Target language (`cuda`, `triton`, `cutedsl`, `hip`, `opencl`, `ripple`). Must be compatible with the selected device.
`--speedup`	float	none	Current speedup vs baseline (provides context for further optimization)
`--url`	string	none	Override the Makora API URL

Output¶

Piping to a file

Kernel code goes to stdout and status messages go to stderr, so you can pipe directly to a file with > solution.py without capturing log noise.

The generated kernel code is printed to stdout. Status messages and summaries go to stderr. This makes it easy to pipe the output to a file:

makora expert-generate kernel.py > optimized_kernel.py

Examples¶

# Generate optimized CUDA kernel for L40S (default)
makora expert-generate kernel.py

# Generate with problem file context
makora expert-generate kernel.py --problem problem.py

# Generate Triton kernel for H100
makora expert-generate kernel.py --device H100 --language triton

# Generate HIP kernel for MI300X
makora expert-generate kernel.py --device MY300x --language hip

# Provide current speedup for context
makora expert-generate kernel.py --problem problem.py --speedup 1.5

# Pipe output directly to a file
makora expert-generate kernel.py --problem problem.py > solution.py

Output Example¶

# stderr:
Generating optimized kernel...
Summary: Applied tiling and shared memory optimization for matrix multiplication

# stdout:
import torch
import torch.nn as nn
from torch.utils.cpp_extension import load_inline

cuda_source = """
// ... optimized CUDA kernel code
"""

class ModelNew(nn.Module):
    ...

Search¶

Search tools for finding GPU documentation, optimization snippets, and technical references.

`makora document-search`¶

Search Makora's document database for GPU programming references.

Usage¶

makora document-search <query> [options]

Arguments¶

Argument	Description
`query`	Search query string

Options¶

Option	Type	Default	Description
`-n`, `--max-entries`	int	`5`	Maximum number of documents to return (1–49)
`--url`	string	none	Override the Makora API URL

Examples¶

# Search for shared memory documentation
makora document-search "CUDA shared memory bank conflicts"

# Get more results
makora document-search "matrix multiplication optimization" --max-entries 10

Output¶

Searching documents...
Found 3 document(s):

--- Document 1 ---
id: abc123
score: 0.92
meta: {"source": "cuda_guide", "section": "shared_memory"}
content:
  [Document content...]

--- Document 2 ---
  ...

Companion CLI: `makora-skills`¶

Note

The commands below require the makora-skills package, which is separate from the main makora CLI. Install it with pip install makora-skills.

The makora-skills package provides additional search commands. Install it separately:

pip install makora-skills

`makora search-snippets`¶

Search for GPU code optimization snippets and techniques.

makora search-snippets <query> [options]

Options¶

Option	Type	Default	Description
`-n`, `--max-entries`	int	`5`	Maximum number of results
`-l`, `--language`	string	`cuda`	Programming language filter
`-a`, `--architecture`	string	none	GPU architecture filter (e.g., `H100`, `MI300X`)

Examples¶

# Search for CUDA optimization snippets
makora search-snippets "matrix multiplication tiling"

# Search for Triton snippets for H100
makora search-snippets "fused attention kernel" --language triton --architecture H100

# Get more results
makora search-snippets "memory coalescing" --max-entries 10

`makora search-docs`¶

Search for GPU documentation and API references.

makora search-docs <query> [options]

Options¶

Option	Type	Default	Description
`-n`, `--max-entries`	int	`5`	Maximum number of results
`-l`, `--language`	string	none	Programming language filter
`-a`, `--architecture`	string	none	GPU architecture filter (e.g., `H100`, `MI300X`)

Examples¶

# Search for documentation
makora search-docs "warp shuffle instructions"

# Filter by architecture
makora search-docs "memory hierarchy" --architecture MI300X

# Filter by language
makora search-docs "kernel launch configuration" --language cuda

Plugin Install¶

`makora install`¶

Install the Makora plugin for supported platforms.

Usage¶

makora install <target>

Arguments¶

Argument	Description
`target`	Platform to install for. Currently only `claude` is supported.

`makora install claude`¶

Installs the Makora plugin into Claude Code, giving Claude access to GPU optimization tools directly in your coding sessions.

makora install claude

Login required

You must be logged in (makora login) before running this command. The installer needs your credentials to configure the plugin.

What It Does¶

Removes any previously cached Makora plugin
Installs makora-plugin as a Claude Code MCP server
Registers available Makora skills for Claude to use

Available Plugin Commands After Install¶

Once installed, Claude Code gains access to:

Evaluate — Benchmark optimized code against reference implementations
Generate — Generate optimized GPU kernels from problem descriptions
Optimize — Iteratively optimize CUDA/Triton kernels
Search docs — Search GPU documentation and API references
Search snippets — Find GPU optimization code snippets

Example¶

# Log in first
makora login

# Install the Claude Code plugin
makora install claude

Installing Makora plugin for Claude Code...
Installing makora-plugin...

Makora plugin installed successfully for Claude Code!

Commands¶

Command Sections¶

Authentication Commands¶

makora login¶

Usage¶

Options¶

Where to Get a Token¶

Credential Storage¶

Examples¶

makora logout¶

Usage¶

makora info¶

Usage¶

Output¶

Environment Variables¶

Generate & Optimize¶

makora generate¶

Usage¶

Options¶

How It Works¶

The --fix Flag¶

Instruction Files (--instr)¶

What to Put in Instruction Files¶

Example: Guiding a Matrix Multiply with Expert CUDA Knowledge¶

Tips¶

Examples¶

Output¶

Jobs & Sessions¶

makora jobs¶

Usage¶

Options¶

Output Columns¶

Examples¶

Output¶

makora stop¶

Usage¶

Arguments¶

Examples¶

Output¶

Kernels & Results¶

makora kernels¶

Usage¶

Arguments¶

Options¶

Listing Kernels¶

Output Columns¶

Example Output¶

Viewing Kernel Code¶

Saving Kernel Code¶

Examples¶

makora refcode¶

Usage¶

Arguments¶

Options¶

Examples¶

Evaluate¶

makora evaluate¶

Usage¶

Arguments¶

Options¶

Output¶

Examples¶

makora check¶

Usage¶

Arguments¶

Options¶

Examples¶

Output¶

Profile¶

makora profile¶

Usage¶

Arguments¶

Options¶

What You Get Back¶

Raw Metrics¶

Details Page¶

Nsight Systems (nsys) Report¶

Additional Data in the API Response¶

Example Output¶

When to Use Profile vs Evaluate¶

`makora login`¶

`makora logout`¶

`makora info`¶

`makora generate`¶

The `--fix` Flag¶

Instruction Files (`--instr`)¶

`makora jobs`¶

`makora stop`¶

`makora kernels`¶

`makora refcode`¶

`makora evaluate`¶

`makora check`¶

`makora profile`¶

`makora expert-generate`¶

`makora document-search`¶

Companion CLI: `makora-skills`¶

`makora search-snippets`¶

`makora search-docs`¶

`makora install`¶

`makora install claude`¶