--- title: "Getting started with logrittr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with logrittr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Motivation In SAS, every DATA step prints a log that tells you exactly what happened: ``` NOTE: There were 120000 observations read from WORK.SALES. NOTE: 7153 observations were deleted. NOTE: The data set WORK.SALES has 112847 observations and 11 variables. ``` R's `dplyr` pipelines are silent by default. `logrittr` fills that gap with `%>=%`, a pipe operator that logs at each step: - row count before/after (with signed delta) - column count before/after (with signed delta) - column names added or dropped - elapsed time No function masking, no dependencies beyond `cli` and `stringr` for coloring and formatting in console. > **Tip**: Fira Code users: with ligatures enabled, `%>=%` renders as a single > wide arrow, close from a regular pipe. ## Installation ```{r install} install.packages("logrittr", repos = "https://guillaumepressiat.r-universe.dev") # alternatively # remotes::install_github("GuillaumePressiat/logrittr") ``` ## Basic usage ```{r basic} library(logrittr) library(dplyr) iris %>=% as_tibble() %>=% filter(Sepal.Length < 5) %>=% mutate(rn = row_number()) %>=% group_by(Species) %>=% summarise(n = n_distinct(rn)) ``` ``` ── iris [rows: 150 cols: 5] ──────────────────────────────────────────── ℹ as_tibble() rows: 150 +0 cols: 5 +0 [ 0.0 ms] ℹ filter(Sepal.Length < 5) rows: 22 -128 cols: 5 +0 [ 1.0 ms] ℹ mutate(rn = row_number()) rows: 22 +0 cols: 6 +1 [ 1.0 ms] added: rn ℹ group_by(Species) rows: 22 +0 cols: 6 +0 [ 1.0 ms] ℹ summarise(n = n_distinct(rn)) rows: 3 -19 cols: 2 -4 [ 1.0 ms] dropped: Sepal.Length, Sepal.Width, Petal.Length, and 2 other added: n ``` `%>=%` is fully composable with `|>` and `%>%`. Use it only where you want visibility, and fall back to the native pipe for the rest. ## Nested pipelines When `%>=%` appears inside an argument (e.g. inside `semi_join()`), the nested steps are automatically indented with a `>` prefix so they are visually distinct from the main pipeline: ```{r nested} iris %>=% as_tibble() %>=% filter(Sepal.Length < 5) %>=% mutate(rn = row_number()) %>=% semi_join( iris %>% as_tibble() %>=% filter(Species == "setosa"), by = "Species" ) %>=% group_by(Species) %>=% summarise(n = n_distinct(rn)) ``` ``` ── iris [rows: 150 cols: 5] ───────────────────────────────────── ℹ as_tibble() rows: 150 +0 cols: 5 +0 [ 1.0 ms] ℹ filter(Sepal.Length < 5) rows: 22 -128 cols: 5 +0 [ 1.0 ms] ℹ mutate(rn = row_number()) rows: 22 +0 cols: 6 +1 [ 1.0 ms] added: rn ℹ > filter(Species == "setosa") rows: 50 -100 cols: 5 +0 [ 2.0 ms] ℹ semi_join(iris %>% as_tibble() %>=% filter(Species == rows: 20 -2 cols: 6 +0 [ 32.0 ms] "setosa"), by = "Species") ℹ group_by(Species) rows: 20 +0 cols: 6 +0 [ 1.0 ms] ℹ summarise(n = n_distinct(rn)) rows: 1 -19 cols: 2 -4 [ 1.0 ms] dropped: Sepal.Length, Sepal.Width, Petal.Length, and 2 others added: n ``` ## Options All display options are controlled via `logrittr_options()`: ```{r options} logrittr_options() #> $wrap_width #> [1] 52 #> $big_mark #> [1] " " #> $lang #> [1] "en" #> $max_cols #> [1] 5 ``` ### Language Switch to French with `lang = "fr"` (the metrics line uses `lignes` instead of `rows`): ```{r lang} logrittr_options(lang = "fr") iris %>=% select(Species, Sepal.Length, Sepal.Width) %>=% filter(Sepal.Length > 5) ``` ``` ── iris [lignes: 150 cols: 5] ───────────────────────────────────────── ℹ select(Species, Sepal.Length, Sepal.Width) lignes: 150 +0 cols: 3 -2 [ 3.0 ms] dropped: Petal.Length, Petal.Width ℹ filter(Sepal.Length > 5) lignes: 118 -32 cols: 3 +0 [ 1.0 ms] ``` ### Thousands separator ```{r bigmark} logrittr_options(lang = "en", big_mark = ",") big <- data.frame(x = seq_len(1e6), y = rnorm(1e6)) big %>=% filter(x > 500000) ``` ``` ── big [rows: 1,000,000 cols: 2] ──────────────────────────────────────────── ℹ filter(x > 5e+05) rows: 500,000 -500000 cols: 2 +0 [ 11.0 ms] ``` or underscore ``` ── big [rows: 1_000_000 cols: 2] ──────────────────────────────────────────── ℹ filter(x > 5e+05) rows: 500_000 -500000 cols: 2 +0 [ 11.0 ms] ``` ### Column name truncation When a `select` or `join` adds or drops many columns, only the first `max_cols` names are shown to keep the log readable: ```{r maxcols} logrittr_options(max_cols = 2, lang = "en") iris %>=% as_tibble() %>=% select(Species, Sepal.Length) ``` ``` ── iris [rows: 150 cols: 5] ─────────────────────────────────────────────── ℹ as_tibble() rows: 150 +0 cols: 5 +0 [ 0.0 ms] ℹ select(Species, Sepal.Length) rows: 150 +0 cols: 2 -3 [ 1.0 ms] dropped: Sepal.Width, Petal.Length, and 1 other ``` Use `max_cols = Inf` to always display all names: ```{r maxcols_inf} logrittr_options(max_cols = Inf) ``` ### Restoring defaults `logrittr_options()` invisibly returns the previous values, which makes it easy to restore the state after a temporary change: ```{r restore} old <- logrittr_options(lang = "fr", big_mark = ",") # ... work ... do.call(logrittr_options, old) # restore previous state ``` ## Using logrittr with lumberjack If you already use the `lumberjack` package, `logrittr_logger` plugs directly into its `%L>%` pipe. The same console output as `%>=%` is produced, and you keep access to all lumberjack features (`run_file()`, custom loggers, etc.). ```{r lumberjack} library(lumberjack) library(dplyr) iris %L>% start_log(log = logrittr_logger$new(), label = "Iris Example") %L>% as_tibble() %L>% filter(Sepal.Length < 5) %L>% mutate(rn = row_number()) %L>% group_by(Species) %L>% summarise(n = n_distinct(rn)) %L>% dump_log(stop = TRUE) ``` ``` ── Iris Example [rows: 150 cols: 5] ─────────────────────────────────────── ℹ as_tibble() rows: 150 +0 cols: 5 +0 [ NA ms] ℹ filter(Sepal.Length < 5) rows: 22 -128 cols: 5 +0 [ 47.0 ms] ℹ mutate(rn = row_number()) rows: 22 +0 cols: 6 +1 [ 5.0 ms] added: rn ℹ group_by(Species) rows: 22 +0 cols: 6 +0 [ 11.0 ms] ℹ summarise(n = n_distinct(rn)) rows: 3 -19 cols: 2 -4 [ 4.0 ms] dropped: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, rn added: n ✔ Log from Iris Example step written to ~/Documents/GitHub/logrittr/Iris Example_simple.csv ``` The first step always shows `NA ms` because lumberjack does not provide a start time -- elapsed is measured as the interval between consecutive steps.