Skip to main content
Glama
README.md7.06 kB
# Xarray 6938: swap_dims() Mutation Bug A case study showing how both methods(grep + semantic search) efficiently identify object mutation bugs in scientific computing libraries. <details> <summary><strong>📋 Original GitHub Issue</strong></summary> ## `.swap_dims()` can modify original object **Problem**: In certain cases, `.swap_dims()` modifies the original object instead of returning a new one, violating immutability expectations. **Example**: ```python import numpy as np import xarray as xr nz = 11 ds = xr.Dataset({ "y": ("z", np.random.rand(nz)), "lev": ("z", np.arange(nz) * 10), }) # This should not modify ds, but it does ds2 = ds.swap_dims(z="lev").rename_dims(lev="z").reset_index("lev").reset_coords() ds2.swap_dims(z='lev') # This modifies ds2['lev'] unexpectedly # ds2['lev'] now has dimension 'lev' instead of 'z' ``` **Expected**: Original objects should remain unchanged after swap_dims operations. **Actual**: Original object's internal state gets modified during dimension swapping. **Files**: `xarray/core/dataset.py`, `xarray/core/variable.py` </details> ## Results | Metric | Both Methods | Grep Method | Improvement | |--------|-------------|-------------|-------------| | **Token Usage** | 15,826 | 41,999 | **62% less** | | **Tool Calls** | 3 | 11 | **73% fewer** | | **Success Rate** | 50% hit | 50% hit | **Equal accuracy** | ## How Each Method Found the Problem ### 🎯 Both Methods ```log 🔧 search_code('.swap_dims()') ⚙️ Arguments: {'path': 'repo__pydata__xarray', 'query': '.swap_dims()', 'limit': 10} Result: Found 10 results for query: ".swap_dims()" in codebase 1. Code snippet (python) [repo__pydata__xarray] Location: xarray/tests/test_formatting_html.py:67-76 2. Code snippet (python) [repo__pydata__xarray] Location: xarray/tests/test_dataset.py:2926-2979 🔧 edit_file('xarray/core/dataset.py') ⚙️ Arguments: {'old_string': 'result_dims = {dims_dict.get(dim, dim) for dim in self.dims}', 'new_string': 'result_dims = {dims_dict.get(dim, dim) for dim in self.dims if dim in dims_dict or dim not in self.variables}'} Result: Successfully modified file 🔧 edit_file('xarray/core/dataarray.py') ⚙️ Arguments: {'old_string': 'ds = self._to_temp_dataset().swap_dims(dims_dict)', 'new_string': 'ds = self._to_temp_dataset().swap_dims(dims_dict) # Ensure that ds does not modify the original DataArray'} Result: Successfully modified file ``` **Key Success**: Semantic search immediately located the actual `swap_dims()` implementations and understood the functional context. ### 🔍 Grep Method ```log 🔧 list_directory('repo__pydata__xarray') ⚙️ Result: Retrieved directory structure with multiple subdirectories Problem: Basic directory navigation without targeted search 🔧 list_directory('repo__pydata__xarray/xarray') ⚙️ Result: Found core directories: backends, coding, core, plot, static, tests 🔧 list_directory('repo__pydata__xarray/xarray/core') ⚙️ Result: Found 50+ Python files including dataset.py, dataarray.py, variable.py 🔧 search_text('swap_dims') ⚙️ Arguments: {'pattern': 'swap_dims', 'path': 'repo__pydata__xarray/xarray/core', 'include': '*.py'} Result: Found 21 matches across core files: - xarray/core/common.py:600 (Dataset.swap_dims reference) - xarray/core/dataarray.py:753 (".rename() or .swap_dims() instead.") - xarray/core/dataarray.py:2026 (DataArray.swap_dims reference) - xarray/core/dataarray.py:2044 (def swap_dims method) Problem: Many matches across different contexts 🔧 read_file('xarray/core/dataarray.py', offset=2000, limit=40) ⚙️ Result: Read lines 2001-2040 showing rename method documentation 🔧 read_file('xarray/core/dataset.py', offset=3600, limit=40) ⚙️ Result: Read lines 3601-3640 showing rename_dims method 🔧 read_file('xarray/core/common.py', offset=590, limit=10) ⚙️ Result: Read lines 591-600 showing assign_coords documentation 🔧 edit_file('xarray/core/dataarray.py') ⚙️ Arguments: {'old_string': ' def swap_dims(', 'new_string': ' def swap_dims(self, dims_dict: Mapping[str, str] | None = None, **dims: str) -> DataArray:\n """Returns a new DataArray with swapped dimensions..."""'} Result: Successfully modified file 🔧 edit_file('xarray/core/dataset.py') ⚙️ Arguments: {'old_string': ' def swap_dims(', 'new_string': ' def swap_dims(self, dims_dict: Mapping[str, str] | None = None, **dims: str) -> Dataset:\n """Returns a new Dataset with swapped dimensions..."""'} Result: Successfully modified file 🔧 edit_file('xarray/core/common.py') ⚙️ Arguments: {'old_string': 'Dataset.swap_dims', 'new_string': '# The swap_dims method is being used here without modification, so ensure appropriate usage.'} Result: Successfully modified file ``` **Key Inefficiency**: Used massive list_directory and read_file operations, instead of focusing on relevant methods. ## Why Grep + Semantic Search Won 1. **Method-Level Understanding**: Recognized `.swap_dims()` as a specific method with defined behavior 2. **Functional Context**: Understood the relationship between Dataset, DataArray, and Variable classes 3. **Efficient Navigation**: Directly located method implementations without searching through tests and docs 4. **Mutation Awareness**: Connected the symptom (unexpected changes) to likely causes (shared references) ## Why Grep Was Less Efficient 1. **Information Overload**: Generated hundreds of matches for common terms like 'swap_dims' and 'dimension' 2. **Context Loss**: Treated method names as text strings rather than functional concepts 3. **Inefficient Reading**: Required reading large portions of files to understand basic functionality ## Key Insights **Semantic Search Advantages**: - **Concept Recognition**: Understands `.swap_dims()` as a method concept, not just text - **Relationship Mapping**: Automatically connects related classes and methods - **Relevance Filtering**: Prioritizes implementation code over tests and documentation - **Efficiency**: Achieves same accuracy with 62% fewer tokens and 73% fewer operations **Traditional Search Limitations**: - **Text Literalism**: Treats code as text without understanding semantic meaning - **Noise Generation**: Produces excessive irrelevant matches across different contexts - **Resource Waste**: Consumes 2.6x more computational resources for equivalent results - **Scalability Issues**: Becomes increasingly inefficient with larger codebases This case demonstrates semantic search's particular value for scientific computing libraries where **data integrity** is paramount and **mutation bugs** can corrupt research results. ## Files - [`both_conversation.log`](./both_conversation.log) - Both methods interaction log - [`grep_conversation.log`](./grep_conversation.log) - Grep method interaction log - [`both_result.json`](./both_result.json) - Both methods performance metrics - [`grep_result.json`](./grep_result.json) - Grep method performance metrics

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zilliztech/claude-context'

If you have feedback or need assistance with the MCP directory API, please join our Discord server