
Classify Edge Artifacts into Hierarchical Categories
Source:R/classifyEdgeArtifacts.R
classifyEdgeArtifacts.RdClassifies detected artifacts based on their location (edge vs. interior)
and size (large vs. small). This function works downstream of detectEdgeArtifacts().
Usage
classifyEdgeArtifacts(
spe,
qc_metric = "sum_umi",
samples = "sample_id",
min_spots = 20,
name = "edge_artifact",
exclude_slides = NULL
)Arguments
- spe
A SpatialExperiment object that has been processed with
detectEdgeArtifacts().- qc_metric
Character string specifying the QC metric column used for validation (default: "sum_umi"). Note: This column must exist, but is not directly used for classification logic.
- samples
Character string specifying the sample ID column name (default: "sample_id").
- min_spots
Minimum number of spots for an artifact to be classified as "large" (default: 20).
- name
Character string matching the
nameargument used indetectEdgeArtifacts()(default: "edge_artifact").- exclude_slides
Character vector of slide IDs to exclude from edge detection (default: NULL). Spots on these slides will be forced to FALSE for edge artifacts.
Value
A SpatialExperiment object with additional classification columns in colData:
[name]_true_edges: Logical, indicating edge artifacts after applying slide exclusions.[name]_classification: Character, containing hierarchical categories:"not_artifact": Normal, high-quality spots."large_edge_artifact": Clusters > min_spots touching the slide boundary."small_edge_artifact": Clusters <= min_spots touching the slide boundary."large_interior_artifact": Clusters > min_spots in the tissue interior."small_interior_artifact": Clusters <= min_spots in the tissue interior.
Details
Parameter Recommendations:
The min_spots threshold should scale with platform resolution to
represent similar physical artifact sizes:
Standard Visium (55µm bins):
min_spots = 20-40Physical area: ~0.06-0.12 mm²
Rationale: Artifacts <20 spots likely represent isolated noise
Typical edge artifacts: 50-200 spots
VisiumHD 16µm bins:
min_spots = 100-200Physical area: ~0.026-0.051 mm² (comparable to Visium)
Rationale: Higher density requires proportionally higher threshold
Typical edge artifacts: 500-2000 bins
VisiumHD 8µm bins:
min_spots = 400-800Physical area: ~0.026-0.051 mm² (comparable to Visium)
Rationale: 4× density of 16µm bins requires 4× threshold
Typical edge artifacts: 2000-8000 bins
Practical Guideline: For VisiumHD data, start with:
min_spots = 20 × (55µm / bin_size)²
This formula maintains constant physical area thresholds across resolutions.
Context-Specific Tuning:
Increase
min_spotsfor noisy data (prevents over-flagging small clusters)Decrease
min_spotsfor high-quality data (captures smaller genuine artifacts)Visualize intermediate results to validate threshold appropriateness
Examples
library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#> Loading required package: S4Vectors
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: Seqinfo
#> Loading required package: Biobase
#> Welcome to Bioconductor
#>
#> Vignettes contain introductory material; view with
#> 'browseVignettes()'. To cite Bioconductor, see
#> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#>
#> Attaching package: ‘Biobase’
#> The following object is masked from ‘package:MatrixGenerics’:
#>
#> rowMedians
#> The following objects are masked from ‘package:matrixStats’:
#>
#> anyMissing, rowMedians
library(S4Vectors)
# --- Create a Mock SPE with "Detected" Results ---
# We simulate the output that detectEdgeArtifacts() would produce
n_spots <- 5
spe <- SpatialExperiment(
colData = DataFrame(
sample_id = "sample01",
sum_umi = rep(100, n_spots),
edge_artifact_edge = c(TRUE, TRUE, FALSE, FALSE, FALSE),
edge_artifact_problem_id = c("Cluster1", "Cluster1", "Cluster2", "Cluster3", NA),
edge_artifact_problem_size = c(50, 50, 10, 30, 0)
)
)
# Review the mock scenarios:
# Spot 1: Edge=TRUE, Size=50 -> Expect "large_edge_artifact"
# Spot 2: Edge=TRUE, Size=50 -> Expect "large_edge_artifact"
# Spot 3: Edge=FALSE, Size=10 -> Expect "small_interior_artifact"
# Spot 4: Edge=FALSE, Size=30 -> Expect "large_interior_artifact"
# Spot 5: Edge=FALSE, Size=0 -> Expect "not_artifact"
# --- Run Classification ---
# Set threshold to 20 to separate large/small
spe <- classifyEdgeArtifacts(spe, min_spots = 20, name = "edge_artifact")
#> Classifying artifacts spots...
#> Classification added: edge_artifact_classification
#>
#> Classification summary:
#> large_edge_artifact: 2 spots
#> large_interior_artifact: 1 spots
#> not_artifact: 1 spots
#> small_interior_artifact: 1 spots
table(spe$edge_artifact_classification)
#>
#> large_edge_artifact large_interior_artifact not_artifact
#> 2 1 1
#> small_interior_artifact
#> 1
colData(spe)[, c("edge_artifact_problem_size", "edge_artifact_classification")]
#> DataFrame with 5 rows and 2 columns
#> edge_artifact_problem_size edge_artifact_classification
#> <numeric> <character>
#> 1 50 large_edge_artifact
#> 2 50 large_edge_artifact
#> 3 10 small_interior_artif..
#> 4 30 large_interior_artif..
#> 5 0 not_artifact