Skip to content

Commit 2320055

Browse files
authored
Merge pull request #319 from luiztauffer/icephys
Icephys - WIP
2 parents d371ed8 + d80c32a commit 2320055

File tree

12 files changed

+1254
-1
lines changed

12 files changed

+1254
-1
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11
# Changes
22

3+
## June 6, 2025
4+
- Added SequentialRecordingsTable plugin for NWB files to visualize SequentialRecordingsTable neurodata type
5+
36
## June 10, 2025
47
- Added optional dandisets parameter to DANDI semantic search for filtering results to specific datasets
58

69
## June 5, 2025
710
- Modernized Python package structure with pyproject.toml configuration. Removed legacy setup.py, setup.cfg, and setup.cfg.j2 files
811
- Added option for using local neurosift server with the CLI
12+
- Fixed CORS policy in local file access server to allow any localhost port for development
913

1014
## May 20, 2025
1115
- Added support for resolving NWB file URLs from dandiset path in NwbPage
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# NWB Data Reading in Neurosift
2+
3+
This document explains how Neurosift reads and processes NWB (Neurodata Without Borders) files, including the architecture, optimizations, and technical details relevant for contributors.
4+
5+
## Overview
6+
7+
Neurosift uses a multi-layered approach to read NWB files efficiently in the browser. The system supports both traditional HDF5 files and optimized LINDI (Linked Data Interface) files, with intelligent format detection and performance optimizations.
8+
9+
## Architecture Components
10+
11+
### 1. Entry Point (`src/pages/NwbPage/NwbPage.tsx`)
12+
- Handles URL processing and format detection
13+
- Manages DANDI API integration for asset resolution
14+
- Coordinates LINDI optimization attempts
15+
16+
### 2. HDF5 Interface Layer (`src/pages/NwbPage/hdf5Interface.ts`)
17+
- Central abstraction for all file operations
18+
- Implements caching and request deduplication
19+
- Manages authentication and error handling
20+
- Provides React hooks for component integration
21+
22+
### 3. Remote File Access (`src/remote-h5-file/`)
23+
- Core file reading implementation
24+
- Supports multiple file formats (HDF5, LINDI)
25+
- Handles HTTP range requests and chunking
26+
- Web Worker integration for non-blocking operations
27+
28+
## Data Flow
29+
30+
```
31+
URL Input → Format Detection → LINDI Check → File Access → Caching → Visualization
32+
↓ ↓ ↓ ↓ ↓ ↓
33+
NwbPage.tsx → hdf5Interface → tryGetLindiUrl → RemoteH5File* → Cache → Plugins
34+
```
35+
36+
### Step-by-Step Process
37+
38+
1. **URL Resolution**: Convert DANDI paths to direct download URLs
39+
2. **LINDI Detection**: Check for optimized `.lindi.json` or `.lindi.tar` files
40+
3. **File Access**: Use appropriate reader (HDF5 or LINDI)
41+
4. **Data Loading**: Lazy load only required data with chunking
42+
5. **Caching**: Store results to avoid redundant requests
43+
6. **Visualization**: Pass data to type-specific plugins
44+
45+
## File Format Support
46+
47+
### Traditional HDF5 Files
48+
- **Access Method**: HTTP range requests via Web Workers
49+
- **Worker URL**: `https://tempory.net/js/RemoteH5Worker.js`
50+
- **Chunk Size**: 100KB default (configurable)
51+
- **Limitations**: Slower metadata access, requires full header parsing
52+
53+
### LINDI Files (Optimized)
54+
- **Format**: JSON-based reference file system
55+
- **Metadata**: Instant access to all HDF5 structure
56+
- **Data Storage**: References to external URLs or embedded chunks
57+
- **Location**: `https://lindi.neurosift.org/[dandi|dandi-staging]/dandisets/{id}/assets/{asset_id}/nwb.lindi.json`
58+
- **Tar Support**: `.lindi.tar` files containing both metadata and data
59+
60+
## Performance Optimizations
61+
62+
### 1. LINDI Priority System
63+
```typescript
64+
if (isDandiAssetUrl(url) && currentDandisetId && tryUsingLindi) {
65+
const lindiUrl = await tryGetLindiUrl(url, currentDandisetId);
66+
if (lindiUrl) return { url: lindiUrl }; // 10-100x faster metadata access
67+
}
68+
```
69+
70+
### 2. Lazy Loading Strategy
71+
- **Groups**: Load structure on-demand
72+
- **Datasets**: Load metadata separately from data
73+
- **Data**: Only load when visualization requires it
74+
75+
### 3. HTTP Range Requests
76+
- Load only required byte ranges from large files
77+
- Configurable chunk sizes for optimal network usage
78+
- Automatic retry logic for failed requests
79+
80+
### 4. Multi-Level Caching
81+
- **In-Memory**: Groups, datasets, and data results
82+
- **Request Deduplication**: Prevent duplicate network calls
83+
- **Status Tracking**: Monitor ongoing operations
84+
85+
### 5. Web Workers
86+
- Non-blocking file operations
87+
- Prevents UI freezing during large data loads
88+
- Single worker by default (configurable)
89+
90+
## Technical Limits and Constraints
91+
92+
### Data Size Limits
93+
```typescript
94+
const maxNumElements = 1e7; // 10 million elements maximum
95+
if (totalSize > maxNumElements) {
96+
throw new Error(`Dataset too large: ${formatSize(totalSize)} > ${formatSize(maxNumElements)}`);
97+
}
98+
```
99+
100+
### Slicing Constraints
101+
- Maximum 3 dimensions can be sliced simultaneously
102+
- Slice parameters must be valid integers
103+
- Format: `[[start, end], [start, end], ...]`
104+
105+
### Authentication Requirements
106+
- DANDI API key required for embargoed datasets
107+
- Automatic detection of authentication errors
108+
- User notification system for access issues
109+
110+
## Key Implementation Details
111+
112+
### Core Functions
113+
114+
#### `getHdf5Group(url, path)`
115+
- Returns HDF5 group structure with subgroups and datasets
116+
- Implements caching to avoid redundant requests
117+
- Used for building file hierarchy views
118+
119+
#### `getHdf5Dataset(url, path)`
120+
- Returns dataset metadata (shape, dtype, attributes)
121+
- Does not load actual data
122+
- Essential for understanding data structure before loading
123+
124+
#### `getHdf5DatasetData(url, path, options)`
125+
- Loads actual array data with optional slicing
126+
- Supports cancellation via `Canceler` objects
127+
- Handles BigInt conversion for compatibility
128+
129+
### React Integration
130+
```typescript
131+
// Hook-based API for components
132+
const group = useHdf5Group(url, "/acquisition");
133+
const dataset = useHdf5Dataset(url, "/data/timeseries");
134+
const { data, errorMessage } = useHdf5DatasetData(url, "/data/values");
135+
```
136+
137+
### Error Handling
138+
- Network timeout handling (3-minute default)
139+
- Authentication error detection and user notification
140+
- Graceful fallbacks for failed LINDI attempts
141+
- CORS issue mitigation strategies
142+
143+
## DANDI Integration
144+
145+
### Asset URL Resolution
146+
```typescript
147+
// Convert DANDI paths to download URLs
148+
const response = await fetch(
149+
`https://api.dandiarchive.org/api/dandisets/${dandisetId}/versions/${version}/assets/?glob=${path}`
150+
);
151+
const assetId = data.results[0].asset_id;
152+
const downloadUrl = `https://api.dandiarchive.org/api/assets/${assetId}/download/`;
153+
```
154+
155+
### LINDI URL Construction
156+
```typescript
157+
const aa = staging ? "dandi-staging" : "dandi";
158+
const lindiUrl = `https://lindi.neurosift.org/${aa}/dandisets/${dandisetId}/assets/${assetId}/nwb.lindi.json`;
159+
```
160+
161+
## Contributing Guidelines
162+
163+
### Adding New File Formats
164+
1. Implement `RemoteH5FileX` interface in `src/remote-h5-file/lib/`
165+
2. Add format detection logic in `hdf5Interface.ts`
166+
3. Update `getMergedRemoteH5File` for multi-file support
167+
168+
### Performance Considerations
169+
- Always prefer LINDI files when available
170+
- Implement proper caching for new data types
171+
- Use Web Workers for CPU-intensive operations
172+
- Consider memory usage for large datasets
173+
174+
### Testing Large Files
175+
- Test with files >1GB to verify chunking works
176+
- Verify LINDI fallback mechanisms
177+
- Test authentication flows with embargoed data
178+
- Check error handling for network failures
179+
180+
### Plugin Development
181+
- Use provided hooks (`useHdf5Group`, `useHdf5Dataset`, etc.)
182+
- Implement proper loading states and error handling
183+
- Consider data slicing for large arrays
184+
- Follow lazy loading patterns
185+
186+
## Debugging and Monitoring
187+
188+
### Status Bar Integration
189+
The system provides real-time statistics in the status bar:
190+
- `numGroups / numDatasets / numDatasetDatas`: Operation counters
191+
- Loading indicators for active operations
192+
- Error notifications for failed requests
193+
194+
### Console Logging
195+
- LINDI detection attempts and results
196+
- Authentication error details
197+
- Performance metrics and timing
198+
- Cache hit/miss information
199+
200+
### Common Issues
201+
1. **CORS Errors**: Usually resolved by LINDI files or proper headers
202+
2. **Authentication Failures**: Check DANDI API key configuration
203+
3. **Large Dataset Errors**: Implement proper slicing
204+
4. **Worker Loading Failures**: Verify CDN accessibility
205+
206+
## Future Improvements
207+
208+
### Potential Optimizations
209+
- Implement progressive loading for very large datasets
210+
- Add compression support for data transfers
211+
- Enhance caching with persistence across sessions
212+
- Improve error recovery mechanisms
213+
214+
### Format Extensions
215+
- Support for additional HDF5-compatible formats
216+
- Enhanced LINDI features (compression, encryption)
217+
- Integration with cloud storage providers
218+
- Real-time streaming capabilities
219+
220+
This architecture enables Neurosift to efficiently handle NWB files ranging from megabytes to gigabytes while providing responsive user interactions and comprehensive error handling.

python/neurosift/local-file-access-js/src/index.js

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,13 @@ if (!dir) {
1010
console.info('Serving files in', dir)
1111

1212
// Allow CORS from neurosift.app flatironinstitute.github.io and localhost:3000
13-
const allowedOrigins = ['https://neurosift.app', 'https://flatironinstitute.github.io', 'http://localhost:3000', 'http://localhost:4200']
13+
const allowedOrigins = [
14+
'https://neurosift.app',
15+
'https://flatironinstitute.github.io',
16+
'http://localhost:3000',
17+
'http://localhost:4200',
18+
'http://localhost:5173' // local dev server for neurosift
19+
]
1420
app.use((req, resp, next) => {
1521
const origin = req.get('origin')
1622
const allowedOrigin = allowedOrigins.includes(origin) ? origin : undefined
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
import React from "react";
2+
import { useHdf5Group } from "@hdf5Interface";
3+
4+
type Props = {
5+
nwbUrl: string;
6+
path: string;
7+
objectType: "group" | "dataset";
8+
onOpenObjectInNewTab?: (path: string) => void;
9+
width?: number;
10+
height?: number;
11+
};
12+
13+
const IntracellularRecordingsTableView: React.FC<Props> = ({
14+
nwbUrl,
15+
path,
16+
width = 500,
17+
height = 400,
18+
}) => {
19+
const group = useHdf5Group(nwbUrl, path);
20+
21+
if (!group) {
22+
return <div>Loading IntracellularRecordingsTable...</div>;
23+
}
24+
25+
return (
26+
<div
27+
style={{
28+
width,
29+
height,
30+
padding: "10px",
31+
overflow: "auto",
32+
fontFamily: "monospace",
33+
fontSize: "14px",
34+
}}
35+
>
36+
<h3 style={{ margin: "0 0 15px 0", color: "#333" }}>
37+
IntracellularRecordingsTable
38+
</h3>
39+
40+
<div style={{ marginBottom: "15px" }}>
41+
<strong>Path:</strong> {path}
42+
</div>
43+
44+
<div style={{ marginBottom: "15px" }}>
45+
<strong>Attributes:</strong>
46+
</div>
47+
48+
<div
49+
style={{
50+
backgroundColor: "#f5f5f5",
51+
padding: "10px",
52+
borderRadius: "4px",
53+
border: "1px solid #ddd",
54+
}}
55+
>
56+
{Object.keys(group.attrs).length === 0 ? (
57+
<div style={{ color: "#666", fontStyle: "italic" }}>
58+
No attributes found
59+
</div>
60+
) : (
61+
Object.entries(group.attrs).map(([key, value]) => (
62+
<div
63+
key={key}
64+
style={{
65+
marginBottom: "8px",
66+
display: "flex",
67+
flexDirection: "column",
68+
}}
69+
>
70+
<div style={{ fontWeight: "bold", color: "#555" }}>{key}:</div>
71+
<div
72+
style={{
73+
marginLeft: "10px",
74+
color: "#333",
75+
wordBreak: "break-word",
76+
}}
77+
>
78+
{typeof value === "object"
79+
? JSON.stringify(value, null, 2)
80+
: String(value)}
81+
</div>
82+
</div>
83+
))
84+
)}
85+
</div>
86+
87+
{/* Show basic structure info */}
88+
<div style={{ marginTop: "20px" }}>
89+
<strong>Structure:</strong>
90+
</div>
91+
<div
92+
style={{
93+
backgroundColor: "#f9f9f9",
94+
padding: "10px",
95+
borderRadius: "4px",
96+
border: "1px solid #ddd",
97+
marginTop: "5px",
98+
}}
99+
>
100+
<div>Subgroups: {group.subgroups.length}</div>
101+
<div>Datasets: {group.datasets.length}</div>
102+
103+
{group.subgroups.length > 0 && (
104+
<div style={{ marginTop: "10px" }}>
105+
<div style={{ fontWeight: "bold" }}>Subgroups:</div>
106+
{group.subgroups.map((sg) => (
107+
<div key={sg.path} style={{ marginLeft: "10px" }}>
108+
{sg.name}
109+
</div>
110+
))}
111+
</div>
112+
)}
113+
114+
{group.datasets.length > 0 && (
115+
<div style={{ marginTop: "10px" }}>
116+
<div style={{ fontWeight: "bold" }}>Datasets:</div>
117+
{group.datasets.map((ds) => (
118+
<div key={ds.path} style={{ marginLeft: "10px" }}>
119+
{ds.name} ({ds.dtype}, shape: {JSON.stringify(ds.shape)})
120+
</div>
121+
))}
122+
</div>
123+
)}
124+
</div>
125+
</div>
126+
);
127+
};
128+
129+
export default IntracellularRecordingsTableView;

0 commit comments

Comments
 (0)