Process **REMOTE FILES** or script BULK TOOL EXECUTIONS using Python code IN A REMOTE SANDBOX. If you can see the data in chat, DON'T USE THIS TOOL.
**ONLY** use this when processing **data stored in a remote file** or when scripting bulk tool executions.
DO NOT USE
- When the complete response is already inline/in-memory, or you only need quick parsing, summarization, or basic math.
USE IF
- To parse/analyze tool outputs saved to a remote file in the sandbox or to script multi-tool chains there.
- For bulk or repeated executions of known Composio tools (e.g., add a label to 100 emails).
- To call APIs via proxy_execute when no Composio tool exists for that API.
OUTPUTS
- Returns a compact result or, if too long, artifacts under `/home/user/.code_out`.
IMPORTANT CODING RULES:
1. Stepwise Execution: Split work into small steps. Save intermediate outputs in variables or temporary file in `/tmp/`. Call RUBE_REMOTE_WORKBENCH again for the next step. This improves composability and avoids timeouts.
2. Notebook Persistence: This is a persistent Jupyter notebook cell: variables, functions, imports, and in-memory state from previous and future code executions are preserved in the notebook's history and available for reuse. You also have a few helper functions available.
3. Parallelism & Timeout (CRITICAL): There is a hard timeout of 4 minutes so complete the code within that. Prioritize PARALLEL execution using ThreadPoolExecutor with suitable concurrency for bulk operations - e.g., call run_composio_tool or invoke_llm parallelly across rows to maximize efficiency.
3.1 If the data is large, split into smaller batches and call the workbench multiple times to avoid timeouts.
4. Checkpoints: Implement checkpoints (in memory or files) so that long runs can be resumed from the last completed step.
5. Schema Safety: Never assume the response schema for run_composio_tool if not known already from previous tools. To inspect schema, either run a simple request **outside** the workbench via RUBE_MULTI_EXECUTE_TOOL or use invoke_llm helper.
6. LLM Helpers: Always use invoke_llm helper for summary, analysis, or field extraction on results. This is a smart LLM that will give much better results than any adhoc filtering.
7. Avoid Meta Loops: Do not use run_composio_tool to call RUBE_MULTI_EXECUTE_TOOL or other COMPOSIO_* meta tools to avoid cycles. Only use it for app tools.
8. Pagination: Use when data spans multiple pages. Continue fetching pages with the returned next_page_token or cursor until none remains. Parallelize fetching pages if tool supports page_number.
9. No Hardcoding: Never hardcode data in code. Always load it from files or tool responses, iterating to construct intermediate or final inputs/outputs.
10. If the final output is in a workbench file, use upload_local_file to download it - never expose the raw workbench file path to the user. Prefer to download useful artifacts after task is complete.
ENV & HELPERS:
- Home directory: `/home/user`.
- NOTE: Helper functions already initialized in the workbench - DO NOT import or redeclare them:
-
`run_composio_tool(tool_slug: str, arguments: dict) -> tuple[Dict[str, Any], str]`: Execute a known Composio **app** tool (from RUBE_SEARCH_TOOLS). Do not invent names; match the tool's input schema. Suited for loops/parallel/bulk over datasets.
i) run_composio_tool returns JSON with top-level "data". Parse carefully—structure may be nested.
-
`invoke_llm(query: str) -> tuple[str, str]`: Invoke an LLM for semantic tasks. Pass MAX 200k characters in input.
i) NOTE Prompting guidance: When building prompts for invoke_llm, prefer f-strings (or concatenation) so literal braces stay intact. If using str.format, escape braces by doubling them ({{ }}).
ii) Define the exact JSON schema you want and batch items into smaller groups to stay within token limit.
- `upload_local_file(*file_paths) -> tuple[Dict[str, Any], str]`: Upload files in workbench to Composio S3/R2 storage. Use this to download any generated files/artifacts from the workbench.
- `proxy_execute(method, endpoint, toolkit, query_params=None, body=None, headers=None) -> tuple[Any, str]`: Call a toolkit API directly when no Composio tool exists. Only one toolkit can be invoked with proxy_execute per workbench call
- `web_search(query: str) -> tuple[str, str]`: Search the web for information.
- `smart_file_extract(sandbox_file_path: str, show_preview: bool = True) -> tuple[str, str]`: Extracts text from files in the sandbox (e.g., PDF, image).
- Workbench comes with comprehensive Image Processing (PIL/Pillow, OpenCV, scikit-image), PyTorch ML libraries, Document and Report handling tools (pandoc, python-docx, pdfplumber, reportlab), and standard Data Analysis tools (pandas, numpy, matplotlib) for advanced visual, analytical, and AI tasks.
All helper functions return a tuple (result, error). Always check error before using result.
## Python Helper Functions for LLM Scripting
### run_composio_tool(tool_slug, arguments)
Executes a known Composio tool via backend API. Do NOT call COMPOSIO_* meta tools to avoid cyclic calls.
def run_composio_tool(tool_slug: str, arguments: Dict[str, Any]) -> tuple[Dict[str, Any], str]
# Returns: (tool_response_dict, error_message)
# Success: ({"data": {actual_data}}, "") - Note the top-level data
# Error: ({}, "error_message") or (response_data, "error_message")
result, error = run_composio_tool("GMAIL_FETCH_EMAILS", {"max_results": 1, "user_id": "me"})
if error:
print("GMAIL_FETCH_EMAILS error:", error); return
email_data = result.get("data", {})
print("Fetched:", email_data)
### invoke_llm(query)
Calls LLM for reasoning, analysis, and semantic tasks. Pass MAX 200k characters input.
def invoke_llm(query: str) -> tuple[str, str]
# Returns: (llm_response, error_message)
resp, error = invoke_llm("Summarize the key points from this data")
if not error:
print("LLM:", resp)
# Example: analyze tool response with LLM
tool_resp, err = run_composio_tool("GMAIL_FETCH_EMAILS", {"max_results": 5, "user_id": "me"})
if not err:
parsed = tool_resp.get("data", {})
resp, err2 = invoke_llm(f"Analyze these emails and summarize: {parsed}")
if not err2:
print("LLM Gmail Summary:", resp)
# TIP: batch prompts to reduce LLM calls.
### upload_local_file(*file_paths)
Uploads sandbox files to Composio S3/R2 storage. Single files upload directly, multiple files are auto-zipped.
Use this when you need to upload/download any generated artifacts from the sandbox.
def upload_local_file(*file_paths) -> tuple[Dict[str, Any], str]
# Returns: (result_dict, error_string)
# Success: ({"s3_url": str, "uploaded_file": str, "type": str, "id": str, "s3key": str, "message": str}, "")
# Error: ({}, "error_message")
# Single file
result, error = upload_local_file("/path/to/report.pdf")
# Multiple files (auto-zipped)
result, error = upload_local_file("/home/user/doc1.txt", "/home/user/doc2.txt")
if not error:
print("Uploaded:", result["s3_url"])
### proxy_execute(method, endpoint, toolkit, query_params=None, body=None, headers=None)
Direct API call to a connected toolkit service.
def proxy_execute(
method: Literal["GET","POST","PUT","DELETE","PATCH"],
endpoint: str,
toolkit: str,
query_params: Optional[Dict[str, str]] = None,
body: Optional[object] = None,
headers: Optional[Dict[str, str]] = None,
) -> tuple[Any, str]
# Returns: (response_data, error_message)
# Example: GET request with query parameters
query_params = {"q": "is:unread", "maxResults": "10"}
data, error = proxy_execute("GET", "/gmail/v1/users/me/messages", "gmail", query_params=query_params)
if not error:
print("Success:", data)
### web_search(query)
Searches the web via Exa AI.
def web_search(query: str) -> tuple[str, str]
# Returns: (search_results_text, error_message)
results, error = web_search("latest developments in AI")
if not error:
print("Results:", results)
## Best Practices
### Error-first pattern and Defensive parsing (print keys while narrowing)
res, err = run_composio_tool("GMAIL_FETCH_EMAILS", {"max_results": 5})
if err:
print("error:", err); return
if isinstance(res, dict):
print("res keys:", list(res.keys()))
data = res.get("data") or {}
print("data keys:", list(data.keys()))
msgs = data.get("messages") or []
print("messages count:", len(msgs))
for m in msgs:
print("subject:", m.get("subject", "<missing>"))
### Parallelize (4-min sandbox timeout)
Adjust concurrency so all tasks finish within 4 minutes.
import concurrent.futures
MAX_CONCURRENCY = 10 # Adjust as needed
def send_bulk_emails(email_list):
def send_single(email):
result, error = run_composio_tool("GMAIL_SEND_EMAIL", {
"to": email["recipient"], "subject": email["subject"], "body": email["body"]
})
if error:
print(f"Failed {email['recipient']}: {error}")
return {"status": "failed", "error": error}
return {"status": "sent", "data": result}
results = []
with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_CONCURRENCY) as ex:
futures = [ex.submit(send_single, e) for e in email_list]
for f in concurrent.futures.as_completed(futures):
results.append(f.result())
return results
email_list = [{"recipient": f"user{i}@example.com", "subject": "Test", "body": "Hello"} for i in range(1000)]
results = send_bulk_emails(email_list)