diff --git a/Chapter_3.md b/Chapter_3.md
new file mode 100644
index 0000000..dccf98c
--- /dev/null
+++ b/Chapter_3.md
@@ -0,0 +1,1268 @@
+# Chapter_3, Data Transformation
+
+
+``` r
+library(dplyr)
+```
+
+
+ Attaching package: 'dplyr'
+
+ The following objects are masked from 'package:stats':
+
+ filter, lag
+
+ The following objects are masked from 'package:base':
+
+ intersect, setdiff, setequal, union
+
+``` r
+library(nycflights13)
+library(tidyverse)
+```
+
+ ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
+ ✔ forcats 1.0.0 ✔ readr 2.1.5
+ ✔ ggplot2 3.5.2 ✔ stringr 1.5.1
+ ✔ lubridate 1.9.4 ✔ tibble 3.3.0
+ ✔ purrr 1.0.4 ✔ tidyr 1.3.1
+
+ ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
+ ✖ dplyr::filter() masks stats::filter()
+ ✖ dplyr::lag() masks stats::lag()
+ ℹ Use the conflicted package () to force all conflicts to become errors
+
+glimpse(flights)
+
+Use glimpse() to inspect data in a package
+
+# dplyr Basics
+
+The first argument is always a data frame. The subsequent arguments
+typically describe which columns to operate on using the variable names
+(without quotes). The output is always a new data frame. A pipe can
+combine multiple verbs (\|\>) which stand as “then” when reading
+function
+
+``` r
+flights |>
+ filter(dest == "IAH") |>
+ group_by(year, month, day) |>
+ summarize(
+ arr_delay = mean(arr_delay, na.rm = TRUE)
+ )
+```
+
+ `summarise()` has grouped output by 'year', 'month'. You can override using the
+ `.groups` argument.
+
+ # A tibble: 365 × 4
+ # Groups: year, month [12]
+ year month day arr_delay
+
+ 1 2013 1 1 17.8
+ 2 2013 1 2 7
+ 3 2013 1 3 18.3
+ 4 2013 1 4 -3.2
+ 5 2013 1 5 20.2
+ 6 2013 1 6 9.28
+ 7 2013 1 7 -7.74
+ 8 2013 1 8 7.79
+ 9 2013 1 9 18.1
+ 10 2013 1 10 6.68
+ # ℹ 355 more rows
+
+# Rows
+
+filter() changes which rows are present without changing their order,
+allows you to keep rows based on their values arrange() changes the
+order of the rows without changing which are present distinct() finds
+rows with unique values
+
+``` r
+flights |>
+ filter(dep_delay > 120)
+```
+
+ # A tibble: 9,723 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 848 1835 853 1001 1950
+ 2 2013 1 1 957 733 144 1056 853
+ 3 2013 1 1 1114 900 134 1447 1222
+ 4 2013 1 1 1540 1338 122 2020 1825
+ 5 2013 1 1 1815 1325 290 2120 1542
+ 6 2013 1 1 1842 1422 260 1958 1535
+ 7 2013 1 1 1856 1645 131 2212 2005
+ 8 2013 1 1 1934 1725 129 2126 1855
+ 9 2013 1 1 1938 1703 155 2109 1823
+ 10 2013 1 1 1942 1705 157 2124 1830
+ # ℹ 9,713 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+!= (not equal to) == (equal to) & or , to indicate “and” \| to indicate
+“or” (check for either condition)
+
+``` r
+# Flights that departed on January 1
+flights |>
+ filter(month == 2 & day == 1)
+```
+
+ # A tibble: 926 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 2 1 456 500 -4 652 648
+ 2 2013 2 1 520 525 -5 816 820
+ 3 2013 2 1 527 530 -3 837 829
+ 4 2013 2 1 532 540 -8 1007 1017
+ 5 2013 2 1 540 540 0 859 850
+ 6 2013 2 1 552 600 -8 714 715
+ 7 2013 2 1 552 600 -8 919 910
+ 8 2013 2 1 552 600 -8 655 709
+ 9 2013 2 1 553 600 -7 833 815
+ 10 2013 2 1 553 600 -7 821 825
+ # ℹ 916 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+# Flights that departed in January or February
+flights |>
+ filter(month == 1 | month == 2)
+```
+
+ # A tibble: 51,955 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 517 515 2 830 819
+ 2 2013 1 1 533 529 4 850 830
+ 3 2013 1 1 542 540 2 923 850
+ 4 2013 1 1 544 545 -1 1004 1022
+ 5 2013 1 1 554 600 -6 812 837
+ 6 2013 1 1 554 558 -4 740 728
+ 7 2013 1 1 555 600 -5 913 854
+ 8 2013 1 1 557 600 -3 709 723
+ 9 2013 1 1 557 600 -3 838 846
+ 10 2013 1 1 558 600 -2 753 745
+ # ℹ 51,945 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+# A shorter way to select flights that departed in January or February
+flights |>
+ filter(month %in% c(1, 2))
+```
+
+ # A tibble: 51,955 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 517 515 2 830 819
+ 2 2013 1 1 533 529 4 850 830
+ 3 2013 1 1 542 540 2 923 850
+ 4 2013 1 1 544 545 -1 1004 1022
+ 5 2013 1 1 554 600 -6 812 837
+ 6 2013 1 1 554 558 -4 740 728
+ 7 2013 1 1 555 600 -5 913 854
+ 8 2013 1 1 557 600 -3 709 723
+ 9 2013 1 1 557 600 -3 838 846
+ 10 2013 1 1 558 600 -2 753 745
+ # ℹ 51,945 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+jan1 <- flights |>
+ filter(month == 1 & day == 1)
+```
+
+``` r
+flights |>
+ arrange(year, month, day, dep_time)
+```
+
+ # A tibble: 336,776 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 517 515 2 830 819
+ 2 2013 1 1 533 529 4 850 830
+ 3 2013 1 1 542 540 2 923 850
+ 4 2013 1 1 544 545 -1 1004 1022
+ 5 2013 1 1 554 600 -6 812 837
+ 6 2013 1 1 554 558 -4 740 728
+ 7 2013 1 1 555 600 -5 913 854
+ 8 2013 1 1 557 600 -3 709 723
+ 9 2013 1 1 557 600 -3 838 846
+ 10 2013 1 1 558 600 -2 753 745
+ # ℹ 336,766 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+# Remove duplicate rows, if any
+flights |>
+ distinct()
+```
+
+ # A tibble: 336,776 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 517 515 2 830 819
+ 2 2013 1 1 533 529 4 850 830
+ 3 2013 1 1 542 540 2 923 850
+ 4 2013 1 1 544 545 -1 1004 1022
+ 5 2013 1 1 554 600 -6 812 837
+ 6 2013 1 1 554 558 -4 740 728
+ 7 2013 1 1 555 600 -5 913 854
+ 8 2013 1 1 557 600 -3 709 723
+ 9 2013 1 1 557 600 -3 838 846
+ 10 2013 1 1 558 600 -2 753 745
+ # ℹ 336,766 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+flights |>
+ count(origin, dest, sort = TRUE)
+```
+
+ # A tibble: 224 × 3
+ origin dest n
+
+ 1 JFK LAX 11262
+ 2 LGA ATL 10263
+ 3 LGA ORD 8857
+ 4 JFK SFO 8204
+ 5 LGA CLT 6168
+ 6 EWR ORD 6100
+ 7 JFK BOS 5898
+ 8 LGA MIA 5781
+ 9 JFK MCO 5464
+ 10 EWR BOS 5327
+ # ℹ 214 more rows
+
+# Exercises pt 1 of 3
+
+# Question 1
+
+``` r
+flights |>
+ filter(arr_time >= 120 )
+```
+
+ # A tibble: 319,999 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 517 515 2 830 819
+ 2 2013 1 1 533 529 4 850 830
+ 3 2013 1 1 542 540 2 923 850
+ 4 2013 1 1 544 545 -1 1004 1022
+ 5 2013 1 1 554 600 -6 812 837
+ 6 2013 1 1 554 558 -4 740 728
+ 7 2013 1 1 555 600 -5 913 854
+ 8 2013 1 1 557 600 -3 709 723
+ 9 2013 1 1 557 600 -3 838 846
+ 10 2013 1 1 558 600 -2 753 745
+ # ℹ 319,989 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+flights |>
+ filter(month %in% c(7, 8, 9))
+```
+
+ # A tibble: 86,326 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 7 1 1 2029 212 236 2359
+ 2 2013 7 1 2 2359 3 344 344
+ 3 2013 7 1 29 2245 104 151 1
+ 4 2013 7 1 43 2130 193 322 14
+ 5 2013 7 1 44 2150 174 300 100
+ 6 2013 7 1 46 2051 235 304 2358
+ 7 2013 7 1 48 2001 287 308 2305
+ 8 2013 7 1 58 2155 183 335 43
+ 9 2013 7 1 100 2146 194 327 30
+ 10 2013 7 1 100 2245 135 337 135
+ # ℹ 86,316 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+flights |>
+ filter(carrier %in% c("UA", "AA", "DL"))
+```
+
+ # A tibble: 139,504 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 517 515 2 830 819
+ 2 2013 1 1 533 529 4 850 830
+ 3 2013 1 1 542 540 2 923 850
+ 4 2013 1 1 554 600 -6 812 837
+ 5 2013 1 1 554 558 -4 740 728
+ 6 2013 1 1 558 600 -2 753 745
+ 7 2013 1 1 558 600 -2 924 917
+ 8 2013 1 1 558 600 -2 923 937
+ 9 2013 1 1 559 600 -1 941 910
+ 10 2013 1 1 559 600 -1 854 902
+ # ℹ 139,494 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+flights |>
+ filter(arr_delay > 120, dep_delay <= 0)
+```
+
+ # A tibble: 29 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 27 1419 1420 -1 1754 1550
+ 2 2013 10 7 1350 1350 0 1736 1526
+ 3 2013 10 7 1357 1359 -2 1858 1654
+ 4 2013 10 16 657 700 -3 1258 1056
+ 5 2013 11 1 658 700 -2 1329 1015
+ 6 2013 3 18 1844 1847 -3 39 2219
+ 7 2013 4 17 1635 1640 -5 2049 1845
+ 8 2013 4 18 558 600 -2 1149 850
+ 9 2013 4 18 655 700 -5 1213 950
+ 10 2013 5 22 1827 1830 -3 2217 2010
+ # ℹ 19 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+``` r
+flights |>
+ filter(dep_delay >= 60, dep_delay - arr_delay > 30)
+```
+
+ # A tibble: 1,844 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 1 2205 1720 285 46 2040
+ 2 2013 1 1 2326 2130 116 131 18
+ 3 2013 1 3 1503 1221 162 1803 1555
+ 4 2013 1 3 1839 1700 99 2056 1950
+ 5 2013 1 3 1850 1745 65 2148 2120
+ 6 2013 1 3 1941 1759 102 2246 2139
+ 7 2013 1 3 1950 1845 65 2228 2227
+ 8 2013 1 3 2015 1915 60 2135 2111
+ 9 2013 1 3 2257 2000 177 45 2224
+ 10 2013 1 4 1917 1700 137 2135 1950
+ # ℹ 1,834 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+# Question 2
+
+``` r
+flights |>
+ arrange(desc(dep_delay))
+```
+
+ # A tibble: 336,776 × 19
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 1 9 641 900 1301 1242 1530
+ 2 2013 6 15 1432 1935 1137 1607 2120
+ 3 2013 1 10 1121 1635 1126 1239 1810
+ 4 2013 9 20 1139 1845 1014 1457 2210
+ 5 2013 7 22 845 1600 1005 1044 1815
+ 6 2013 4 10 1100 1900 960 1342 2211
+ 7 2013 3 17 2321 810 911 135 1020
+ 8 2013 6 27 959 1900 899 1236 2226
+ 9 2013 7 22 2257 759 898 121 1026
+ 10 2013 12 5 756 1700 896 1058 2020
+ # ℹ 336,766 more rows
+ # ℹ 11 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour
+
+# Question 3
+
+``` r
+flights |>
+ mutate(speed = distance / (air_time / 60)) |>
+ arrange(desc(speed))
+```
+
+ # A tibble: 336,776 × 20
+ year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
+
+ 1 2013 5 25 1709 1700 9 1923 1937
+ 2 2013 7 2 1558 1513 45 1745 1719
+ 3 2013 5 13 2040 2025 15 2225 2226
+ 4 2013 3 23 1914 1910 4 2045 2043
+ 5 2013 1 12 1559 1600 -1 1849 1917
+ 6 2013 11 17 650 655 -5 1059 1150
+ 7 2013 2 21 2355 2358 -3 412 438
+ 8 2013 11 17 759 800 -1 1212 1255
+ 9 2013 11 16 2003 1925 38 17 36
+ 10 2013 11 16 2349 2359 -10 402 440
+ # ℹ 336,766 more rows
+ # ℹ 12 more variables: arr_delay , carrier , flight ,
+ # tailnum , origin , dest , air_time , distance ,
+ # hour , minute , time_hour , speed