Data Science 1 - Programmieren & Visualisieren
Saskia Otto
Universität Hamburg, IMF
Wintersemester 2022/2023
Eine komplette Übersicht der wichtigsten Funktionen gibt es hier:
Dieses wie alle anderen DSB Cheatsheets sind auf Moodle zu finden.
Stellen Sie sich vor, Sie haben einen Vektor vom Typ ‘character’, der Monate enthält:
Die Verwendung einer Zeichenkette für diese Variable birgt zwei Probleme:
→ mit factor()
Übergeben Sie dem ARGUMENT levels
einen Vektor mit allen gültigen Faktorstufen in gewünschter Reihenfolge.
Um den Satz der gültigen Faktorstufen anzeigen zu lassen, verwenden Sie die FUNKTION levels()
:
labels
übergeben.Wenn kategoriale Variablen ordinal-skaliert sind, dann ist es sinnvoll, einen geordneten Faktor mit dem Argument ordered = TRUE
zu erstellen.
→ Hauptunterschied zum normalen Faktor ist die Klasse und die Behandlung von bestimmten Methoden und Funktionen zur Modellanpassung.
cut(x, breaks)
lassen sich kontinuierliche Variablen in Faktoren umwandeln.fct_relevel()
fct_inorder()
fct_infreq()
fct_rev()
fct_reorder()
fct_collapse()
p <- bshydro15 %>% ggplot() +
guides(x = guide_axis(angle = 45)) +
theme(text = element_text(size = 16))
p1 <- p + geom_bar(aes(month)) + ggtitle("Standardeinstellung")
p2 <- p + geom_bar(aes(fct_rev(month))) + ggtitle("fct_rev()")
p3 <- p + geom_bar(aes(fct_infreq(month))) + ggtitle("fct_infreq()")
p4 <- p + geom_bar(aes(fct_collapse(
month,
q1 = c("Jan", "Feb", "Mar"),
q2 = c("Apr", "May", "Jun"),
q3 = c("Jul", "Aug", "Sep"),
q4 = c("Oct", "Nov", "Dec")
))) +
xlab("annual quarter") +
ggtitle("fct_collapse()")
gridExtra::grid.arrange(p1,p2,p3,p4, nrow=2)
02:00
Um zu sehen, wie das aktuelle Datum bzw. die aktuelle Zeit Ihres Betriebssystems ist, schauen Sie sich in R Sys.Date()
und Sys.time()
an:
→ zuerst kommt Jahr-Monat-Tag (yyyy:mm:dd), dann die Zeit (hh:mm:ss), und dann die Zeitzone (CET = Central European Time).
?Ops.Date
):
Date
→ repräsentiert nur Kalenderdaten
<date>
deklariertPOSIXct
(interner Vektor) und POSIXlt
(interne Liste) → repräsentieren Kalenderdaten und die Zeit bis zur nächsten Sekunde, einschließlich Zeitzonen (siehe auch ?DateTimeClasses
)
<dttm>
deklariertDate
oder POSIXct
) ist aber nützlich um einzelne Zeitkomponenten später zu extrahieren (z.B. Monat, Jahr, etc.).ymd()
, ydm()
, mdy()
, myd()
, dmy()
, dym()
Kombinieren Sie einfach eine der 6 Datumfunktionen (ymd()
, ydm()
, mdy()
, myd()
, dmy()
, dym()
) mit:
_h
wenn Sie nur die Stunde haben_hm
wenn Sie Stunde und Minuten haben_hms
für Stunde-Minute-Sekunde (hour:min:sec)Ist die Zeitzone nicht UTC (‘default’), spezifizieren Sie das tz
Argument:
[1] "2021-11-17 12:11:00 CET"
[1] "Africa/Abidjan" "Africa/Accra"
[3] "Africa/Addis_Ababa" "Africa/Algiers"
[5] "Africa/Asmara" "Africa/Asmera"
[7] "Africa/Bamako" "Africa/Bangui"
[9] "Africa/Banjul" "Africa/Bissau"
[11] "Africa/Blantyre" "Africa/Brazzaville"
[13] "Africa/Bujumbura" "Africa/Cairo"
[15] "Africa/Casablanca" "Africa/Ceuta"
[17] "Africa/Conakry" "Africa/Dakar"
[19] "Africa/Dar_es_Salaam" "Africa/Djibouti"
[21] "Africa/Douala" "Africa/El_Aaiun"
[23] "Africa/Freetown" "Africa/Gaborone"
[25] "Africa/Harare" "Africa/Johannesburg"
[27] "Africa/Juba" "Africa/Kampala"
[29] "Africa/Khartoum" "Africa/Kigali"
[31] "Africa/Kinshasa" "Africa/Lagos"
[33] "Africa/Libreville" "Africa/Lome"
[35] "Africa/Luanda" "Africa/Lubumbashi"
[37] "Africa/Lusaka" "Africa/Malabo"
[39] "Africa/Maputo" "Africa/Maseru"
[41] "Africa/Mbabane" "Africa/Mogadishu"
[43] "Africa/Monrovia" "Africa/Nairobi"
[45] "Africa/Ndjamena" "Africa/Niamey"
[47] "Africa/Nouakchott" "Africa/Ouagadougou"
[49] "Africa/Porto-Novo" "Africa/Sao_Tome"
[51] "Africa/Timbuktu" "Africa/Tripoli"
[53] "Africa/Tunis" "Africa/Windhoek"
[55] "America/Adak" "America/Anchorage"
[57] "America/Anguilla" "America/Antigua"
[59] "America/Araguaina" "America/Argentina/Buenos_Aires"
[61] "America/Argentina/Catamarca" "America/Argentina/ComodRivadavia"
[63] "America/Argentina/Cordoba" "America/Argentina/Jujuy"
[65] "America/Argentina/La_Rioja" "America/Argentina/Mendoza"
[67] "America/Argentina/Rio_Gallegos" "America/Argentina/Salta"
[69] "America/Argentina/San_Juan" "America/Argentina/San_Luis"
[71] "America/Argentina/Tucuman" "America/Argentina/Ushuaia"
[73] "America/Aruba" "America/Asuncion"
[75] "America/Atikokan" "America/Atka"
[77] "America/Bahia" "America/Bahia_Banderas"
[79] "America/Barbados" "America/Belem"
[81] "America/Belize" "America/Blanc-Sablon"
[83] "America/Boa_Vista" "America/Bogota"
[85] "America/Boise" "America/Buenos_Aires"
[87] "America/Cambridge_Bay" "America/Campo_Grande"
[89] "America/Cancun" "America/Caracas"
[91] "America/Catamarca" "America/Cayenne"
[93] "America/Cayman" "America/Chicago"
[95] "America/Chihuahua" "America/Ciudad_Juarez"
[97] "America/Coral_Harbour" "America/Cordoba"
[99] "America/Costa_Rica" "America/Creston"
[101] "America/Cuiaba" "America/Curacao"
[103] "America/Danmarkshavn" "America/Dawson"
[105] "America/Dawson_Creek" "America/Denver"
[107] "America/Detroit" "America/Dominica"
[109] "America/Edmonton" "America/Eirunepe"
[111] "America/El_Salvador" "America/Ensenada"
[113] "America/Fort_Nelson" "America/Fort_Wayne"
[115] "America/Fortaleza" "America/Glace_Bay"
[117] "America/Godthab" "America/Goose_Bay"
[119] "America/Grand_Turk" "America/Grenada"
[121] "America/Guadeloupe" "America/Guatemala"
[123] "America/Guayaquil" "America/Guyana"
[125] "America/Halifax" "America/Havana"
[127] "America/Hermosillo" "America/Indiana/Indianapolis"
[129] "America/Indiana/Knox" "America/Indiana/Marengo"
[131] "America/Indiana/Petersburg" "America/Indiana/Tell_City"
[133] "America/Indiana/Vevay" "America/Indiana/Vincennes"
[135] "America/Indiana/Winamac" "America/Indianapolis"
[137] "America/Inuvik" "America/Iqaluit"
[139] "America/Jamaica" "America/Jujuy"
[141] "America/Juneau" "America/Kentucky/Louisville"
[143] "America/Kentucky/Monticello" "America/Knox_IN"
[145] "America/Kralendijk" "America/La_Paz"
[147] "America/Lima" "America/Los_Angeles"
[149] "America/Louisville" "America/Lower_Princes"
[151] "America/Maceio" "America/Managua"
[153] "America/Manaus" "America/Marigot"
[155] "America/Martinique" "America/Matamoros"
[157] "America/Mazatlan" "America/Mendoza"
[159] "America/Menominee" "America/Merida"
[161] "America/Metlakatla" "America/Mexico_City"
[163] "America/Miquelon" "America/Moncton"
[165] "America/Monterrey" "America/Montevideo"
[167] "America/Montreal" "America/Montserrat"
[169] "America/Nassau" "America/New_York"
[171] "America/Nipigon" "America/Nome"
[173] "America/Noronha" "America/North_Dakota/Beulah"
[175] "America/North_Dakota/Center" "America/North_Dakota/New_Salem"
[177] "America/Nuuk" "America/Ojinaga"
[179] "America/Panama" "America/Pangnirtung"
[181] "America/Paramaribo" "America/Phoenix"
[183] "America/Port_of_Spain" "America/Port-au-Prince"
[185] "America/Porto_Acre" "America/Porto_Velho"
[187] "America/Puerto_Rico" "America/Punta_Arenas"
[189] "America/Rainy_River" "America/Rankin_Inlet"
[191] "America/Recife" "America/Regina"
[193] "America/Resolute" "America/Rio_Branco"
[195] "America/Rosario" "America/Santa_Isabel"
[197] "America/Santarem" "America/Santiago"
[199] "America/Santo_Domingo" "America/Sao_Paulo"
[201] "America/Scoresbysund" "America/Shiprock"
[203] "America/Sitka" "America/St_Barthelemy"
[205] "America/St_Johns" "America/St_Kitts"
[207] "America/St_Lucia" "America/St_Thomas"
[209] "America/St_Vincent" "America/Swift_Current"
[211] "America/Tegucigalpa" "America/Thule"
[213] "America/Thunder_Bay" "America/Tijuana"
[215] "America/Toronto" "America/Tortola"
[217] "America/Vancouver" "America/Virgin"
[219] "America/Whitehorse" "America/Winnipeg"
[221] "America/Yakutat" "America/Yellowknife"
[223] "Antarctica/Casey" "Antarctica/Davis"
[225] "Antarctica/DumontDUrville" "Antarctica/Macquarie"
[227] "Antarctica/Mawson" "Antarctica/McMurdo"
[229] "Antarctica/Palmer" "Antarctica/Rothera"
[231] "Antarctica/South_Pole" "Antarctica/Syowa"
[233] "Antarctica/Troll" "Antarctica/Vostok"
[235] "Arctic/Longyearbyen" "Asia/Aden"
[237] "Asia/Almaty" "Asia/Amman"
[239] "Asia/Anadyr" "Asia/Aqtau"
[241] "Asia/Aqtobe" "Asia/Ashgabat"
[243] "Asia/Ashkhabad" "Asia/Atyrau"
[245] "Asia/Baghdad" "Asia/Bahrain"
[247] "Asia/Baku" "Asia/Bangkok"
[249] "Asia/Barnaul" "Asia/Beirut"
[251] "Asia/Bishkek" "Asia/Brunei"
[253] "Asia/Calcutta" "Asia/Chita"
[255] "Asia/Choibalsan" "Asia/Chongqing"
[257] "Asia/Chungking" "Asia/Colombo"
[259] "Asia/Dacca" "Asia/Damascus"
[261] "Asia/Dhaka" "Asia/Dili"
[263] "Asia/Dubai" "Asia/Dushanbe"
[265] "Asia/Famagusta" "Asia/Gaza"
[267] "Asia/Harbin" "Asia/Hebron"
[269] "Asia/Ho_Chi_Minh" "Asia/Hong_Kong"
[271] "Asia/Hovd" "Asia/Irkutsk"
[273] "Asia/Istanbul" "Asia/Jakarta"
[275] "Asia/Jayapura" "Asia/Jerusalem"
[277] "Asia/Kabul" "Asia/Kamchatka"
[279] "Asia/Karachi" "Asia/Kashgar"
[281] "Asia/Kathmandu" "Asia/Katmandu"
[283] "Asia/Khandyga" "Asia/Kolkata"
[285] "Asia/Krasnoyarsk" "Asia/Kuala_Lumpur"
[287] "Asia/Kuching" "Asia/Kuwait"
[289] "Asia/Macao" "Asia/Macau"
[291] "Asia/Magadan" "Asia/Makassar"
[293] "Asia/Manila" "Asia/Muscat"
[295] "Asia/Nicosia" "Asia/Novokuznetsk"
[297] "Asia/Novosibirsk" "Asia/Omsk"
[299] "Asia/Oral" "Asia/Phnom_Penh"
[301] "Asia/Pontianak" "Asia/Pyongyang"
[303] "Asia/Qatar" "Asia/Qostanay"
[305] "Asia/Qyzylorda" "Asia/Rangoon"
[307] "Asia/Riyadh" "Asia/Saigon"
[309] "Asia/Sakhalin" "Asia/Samarkand"
[311] "Asia/Seoul" "Asia/Shanghai"
[313] "Asia/Singapore" "Asia/Srednekolymsk"
[315] "Asia/Taipei" "Asia/Tashkent"
[317] "Asia/Tbilisi" "Asia/Tehran"
[319] "Asia/Tel_Aviv" "Asia/Thimbu"
[321] "Asia/Thimphu" "Asia/Tokyo"
[323] "Asia/Tomsk" "Asia/Ujung_Pandang"
[325] "Asia/Ulaanbaatar" "Asia/Ulan_Bator"
[327] "Asia/Urumqi" "Asia/Ust-Nera"
[329] "Asia/Vientiane" "Asia/Vladivostok"
[331] "Asia/Yakutsk" "Asia/Yangon"
[333] "Asia/Yekaterinburg" "Asia/Yerevan"
[335] "Atlantic/Azores" "Atlantic/Bermuda"
[337] "Atlantic/Canary" "Atlantic/Cape_Verde"
[339] "Atlantic/Faeroe" "Atlantic/Faroe"
[341] "Atlantic/Jan_Mayen" "Atlantic/Madeira"
[343] "Atlantic/Reykjavik" "Atlantic/South_Georgia"
[345] "Atlantic/St_Helena" "Atlantic/Stanley"
[347] "Australia/ACT" "Australia/Adelaide"
[349] "Australia/Brisbane" "Australia/Broken_Hill"
[351] "Australia/Canberra" "Australia/Currie"
[353] "Australia/Darwin" "Australia/Eucla"
[355] "Australia/Hobart" "Australia/LHI"
[357] "Australia/Lindeman" "Australia/Lord_Howe"
[359] "Australia/Melbourne" "Australia/North"
[361] "Australia/NSW" "Australia/Perth"
[363] "Australia/Queensland" "Australia/South"
[365] "Australia/Sydney" "Australia/Tasmania"
[367] "Australia/Victoria" "Australia/West"
[369] "Australia/Yancowinna" "Brazil/Acre"
[371] "Brazil/DeNoronha" "Brazil/East"
[373] "Brazil/West" "Canada/Atlantic"
[375] "Canada/Central" "Canada/Eastern"
[377] "Canada/Mountain" "Canada/Newfoundland"
[379] "Canada/Pacific" "Canada/Saskatchewan"
[381] "Canada/Yukon" "CET"
[383] "Chile/Continental" "Chile/EasterIsland"
[385] "CST6CDT" "Cuba"
[387] "EET" "Egypt"
[389] "Eire" "EST"
[391] "EST5EDT" "Etc/GMT"
[393] "Etc/GMT-0" "Etc/GMT-1"
[395] "Etc/GMT-10" "Etc/GMT-11"
[397] "Etc/GMT-12" "Etc/GMT-13"
[399] "Etc/GMT-14" "Etc/GMT-2"
[401] "Etc/GMT-3" "Etc/GMT-4"
[403] "Etc/GMT-5" "Etc/GMT-6"
[405] "Etc/GMT-7" "Etc/GMT-8"
[407] "Etc/GMT-9" "Etc/GMT+0"
[409] "Etc/GMT+1" "Etc/GMT+10"
[411] "Etc/GMT+11" "Etc/GMT+12"
[413] "Etc/GMT+2" "Etc/GMT+3"
[415] "Etc/GMT+4" "Etc/GMT+5"
[417] "Etc/GMT+6" "Etc/GMT+7"
[419] "Etc/GMT+8" "Etc/GMT+9"
[421] "Etc/GMT0" "Etc/Greenwich"
[423] "Etc/UCT" "Etc/Universal"
[425] "Etc/UTC" "Etc/Zulu"
[427] "Europe/Amsterdam" "Europe/Andorra"
[429] "Europe/Astrakhan" "Europe/Athens"
[431] "Europe/Belfast" "Europe/Belgrade"
[433] "Europe/Berlin" "Europe/Bratislava"
[435] "Europe/Brussels" "Europe/Bucharest"
[437] "Europe/Budapest" "Europe/Busingen"
[439] "Europe/Chisinau" "Europe/Copenhagen"
[441] "Europe/Dublin" "Europe/Gibraltar"
[443] "Europe/Guernsey" "Europe/Helsinki"
[445] "Europe/Isle_of_Man" "Europe/Istanbul"
[447] "Europe/Jersey" "Europe/Kaliningrad"
[449] "Europe/Kiev" "Europe/Kirov"
[451] "Europe/Kyiv" "Europe/Lisbon"
[453] "Europe/Ljubljana" "Europe/London"
[455] "Europe/Luxembourg" "Europe/Madrid"
[457] "Europe/Malta" "Europe/Mariehamn"
[459] "Europe/Minsk" "Europe/Monaco"
[461] "Europe/Moscow" "Europe/Nicosia"
[463] "Europe/Oslo" "Europe/Paris"
[465] "Europe/Podgorica" "Europe/Prague"
[467] "Europe/Riga" "Europe/Rome"
[469] "Europe/Samara" "Europe/San_Marino"
[471] "Europe/Sarajevo" "Europe/Saratov"
[473] "Europe/Simferopol" "Europe/Skopje"
[475] "Europe/Sofia" "Europe/Stockholm"
[477] "Europe/Tallinn" "Europe/Tirane"
[479] "Europe/Tiraspol" "Europe/Ulyanovsk"
[481] "Europe/Uzhgorod" "Europe/Vaduz"
[483] "Europe/Vatican" "Europe/Vienna"
[485] "Europe/Vilnius" "Europe/Volgograd"
[487] "Europe/Warsaw" "Europe/Zagreb"
[489] "Europe/Zaporozhye" "Europe/Zurich"
[491] "Factory" "GB"
[493] "GB-Eire" "GMT"
[495] "GMT-0" "GMT+0"
[497] "GMT0" "Greenwich"
[499] "Hongkong" "HST"
[501] "Iceland" "Indian/Antananarivo"
[503] "Indian/Chagos" "Indian/Christmas"
[505] "Indian/Cocos" "Indian/Comoro"
[507] "Indian/Kerguelen" "Indian/Mahe"
[509] "Indian/Maldives" "Indian/Mauritius"
[511] "Indian/Mayotte" "Indian/Reunion"
[513] "Iran" "Israel"
[515] "Jamaica" "Japan"
[517] "Kwajalein" "Libya"
[519] "MET" "Mexico/BajaNorte"
[521] "Mexico/BajaSur" "Mexico/General"
[523] "MST" "MST7MDT"
[525] "Navajo" "NZ"
[527] "NZ-CHAT" "Pacific/Apia"
[529] "Pacific/Auckland" "Pacific/Bougainville"
[531] "Pacific/Chatham" "Pacific/Chuuk"
[533] "Pacific/Easter" "Pacific/Efate"
[535] "Pacific/Enderbury" "Pacific/Fakaofo"
[537] "Pacific/Fiji" "Pacific/Funafuti"
[539] "Pacific/Galapagos" "Pacific/Gambier"
[541] "Pacific/Guadalcanal" "Pacific/Guam"
[543] "Pacific/Honolulu" "Pacific/Johnston"
[545] "Pacific/Kanton" "Pacific/Kiritimati"
[547] "Pacific/Kosrae" "Pacific/Kwajalein"
[549] "Pacific/Majuro" "Pacific/Marquesas"
[551] "Pacific/Midway" "Pacific/Nauru"
[553] "Pacific/Niue" "Pacific/Norfolk"
[555] "Pacific/Noumea" "Pacific/Pago_Pago"
[557] "Pacific/Palau" "Pacific/Pitcairn"
[559] "Pacific/Pohnpei" "Pacific/Ponape"
[561] "Pacific/Port_Moresby" "Pacific/Rarotonga"
[563] "Pacific/Saipan" "Pacific/Samoa"
[565] "Pacific/Tahiti" "Pacific/Tarawa"
[567] "Pacific/Tongatapu" "Pacific/Truk"
[569] "Pacific/Wake" "Pacific/Wallis"
[571] "Pacific/Yap" "Poland"
[573] "Portugal" "PRC"
[575] "PST8PDT" "ROC"
[577] "ROK" "Singapore"
[579] "Turkey" "UCT"
[581] "Universal" "US/Alaska"
[583] "US/Aleutian" "US/Arizona"
[585] "US/Central" "US/East-Indiana"
[587] "US/Eastern" "US/Hawaii"
[589] "US/Indiana-Starke" "US/Michigan"
[591] "US/Mountain" "US/Pacific"
[593] "US/Samoa" "UTC"
[595] "W-SU" "WET"
[597] "Zulu"
attr(,"Version")
[1] "2022g"
Sie können zwischen beiden Formaten mit Hilfe von as_date()
und as_datetime()
wechseln (es kann aber sein, dass Informationen verloren gehen):
‘lubridate’ bietet viele weitere Funktionen, die das Handling mit Datum und Uhrzeit erleichtern, wie z.B.:
%--%
→ erzeugt Intervalleas.duration()
→ berechnet die Dauer eines Intervalls01:30
Folgender Vektor des Datentyp ‘POSIXcl’ soll in den Datentyp ‘Date’ umgewandelt werden:
Folgender tibble enthält mehrere Datum- bzw. Datum/Zeit-Variablen, die aber alle als ‘character’ Datentyp vorliegen:
# A tibble: 10 × 5
x1 x2 x3 x4 x5
<chr> <chr> <chr> <chr> <chr>
1 2015-07-26 2015/26/07 Jul 26, 2015 2015-07-26 18:16:00 26.07.2015T18:16
2 2015-07-03 2015/03/07 Jul 03, 2015 2015-07-03 16:42:00 03.07.2015T16:42
3 2015-09-01 2015/01/09 Sep 01, 2015 2015-09-01 04:45:00 01.09.2015T04:45
4 2015-06-03 2015/03/06 Mar 03, 2015 2015-06-03 08:08:00 03.06.2015T08:08
5 2015-10-15 2015/15/10 Oct 15, 2015 2015-10-15 21:05:00 15.10.2015T21:05
6 2015-10-01 2015/01/10 Oct 01, 2015 2015-10-01 10:38:00 01.10.2015T10:38
7 2015-03-14 2015/14/03 Mar 14, 2015 2015-03-14 05:55:00 14.03.2015T05:55
8 2015-02-25 2015/25/02 Jun 25, 2015 2015-02-25 08:12:00 25.02.2015T08:12
9 2015-06-03 2015/03/06 Sep 03, 2015 2015-06-03 11:09:00 03.06.2015T11:09
10 2015-09-14 2015/14/09 Jul 14, 2015 2015-09-14 05:38:00 14.09.2015T05:38
tbl <- mutate(tbl,
x1 = ymd(x1), x2 = ydm(x2),
x3 = mdy(x3), x4 = ymd_hms(x4),
x5 = dmy_hm(x5) )
tbl # check
# A tibble: 10 × 5
x1 x2 x3 x4 x5
<date> <date> <date> <dttm> <dttm>
1 2015-07-26 2015-07-26 2015-07-26 2015-07-26 18:16:00 2015-07-26 18:16:00
2 2015-07-03 2015-07-03 2015-07-03 2015-07-03 16:42:00 2015-07-03 16:42:00
3 2015-09-01 2015-09-01 2015-09-01 2015-09-01 04:45:00 2015-09-01 04:45:00
4 2015-06-03 2015-06-03 2015-03-03 2015-06-03 08:08:00 2015-06-03 08:08:00
5 2015-10-15 2015-10-15 2015-10-15 2015-10-15 21:05:00 2015-10-15 21:05:00
6 2015-10-01 2015-10-01 2015-10-01 2015-10-01 10:38:00 2015-10-01 10:38:00
7 2015-03-14 2015-03-14 2015-03-14 2015-03-14 05:55:00 2015-03-14 05:55:00
8 2015-02-25 2015-02-25 2015-06-25 2015-02-25 08:12:00 2015-02-25 08:12:00
9 2015-06-03 2015-06-03 2015-09-03 2015-06-03 11:09:00 2015-06-03 11:09:00
10 2015-09-14 2015-09-14 2015-07-14 2015-09-14 05:38:00 2015-09-14 05:38:00
Aus der Spalte x4
sollen nun folgende Komponenten extrahiert werden: Jahr, Monat, Monatstag, Stunde
tbl2 <- transmute(tbl,
Jahr = year(tbl$x4),
Monat = month(tbl$x4),
Monatstag = mday(tbl$x4),
Stunde = hour(tbl$x4)
)
tbl2 # check
# A tibble: 10 × 4
Jahr Monat Monatstag Stunde
<dbl> <dbl> <int> <int>
1 2015 7 26 18
2 2015 7 3 16
3 2015 9 1 4
4 2015 6 3 8
5 2015 10 15 21
6 2015 10 1 10
7 2015 3 14 5
8 2015 2 25 8
9 2015 6 3 11
10 2015 9 14 5
Lasst uns ein berühmtes Zitat von Albert Einstein anschauen:
[1] "character"
[1] 4
→ Der ‘character’ oder Zeichen-Vektor einstein
beinhaltet 4 Elemente oder präziser 4 Zeichenketten.
Auch wenn Sie keine aufwendige Textanalyse vorhaben, kann es für Sie hilfreich sein Wissen über die Handhabung und Verarbeitung von Zeichenketten in R zu haben!
str_
, so dass Sie schnell die entsprechende Funktion von der ‘dropdown’ Liste in RStudio auswählen können.str_to_lower()
str_to_upper()
str_to_sentence()
str_to_title()
[1] "the difference" "between stupidity" "and genius is that"
[4] "genius has its limits."
[1] "THE DIFFERENCE" "BETWEEN STUPIDITY" "AND GENIUS IS THAT"
[4] "GENIUS HAS ITS LIMITS."
[1] "The difference" "Between stupidity" "And genius is that"
[4] "Genius has its limits."
[1] "The Difference" "Between Stupidity" "And Genius Is That"
[4] "Genius Has Its Limits."
sepal.length sepal.width petal.length petal.width species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
str_trim()
→ Entfernt ‘white space’ am Anfang und Ende einer Zeichenkette.'data.frame': 34 obs. of 3 variables:
$ station : chr "ER1 " "ER2 " "ER3 " "ER4 " ...
$ zinc : chr "BACK " "HIGH " "HIGH " "MEDIUM " ...
$ diversity: num 2.27 1.25 1.15 1.62 1.7 0.63 2.05 1.98 1.04 2.19 ...
[1] "BACK " "HIGH " "MEDIUM " "LOW "
[1] "BACK" "HIGH" "MEDIUM" "LOW"
str_c(..., sep = "", collapse = NULL)
→ Fügt 2 oder mehr Skalare oder Vektoren (egal welcher Datentyp!) elementweise zu einem einzigen Zeichenvektor bzw. Zeichenkette (!) zusammen.
[1] "A + B"
[1] "The difference between stupidity and genius is that genius has its limits."
[1] "A.1, B.2, C.3"
Deskriptive Statistiktabelle
iris |> mutate(
Species = str_c("Iris ", Species)) |>
group_by(Species) |>
summarise(
slm = round(mean(Sepal.Length), 1),
sls = round(sd(Sepal.Length), 2),
slms = str_c(slm, sls, sep=" ± ")
) |>
select(Species, slms) |>
knitr::kable(
col.names = c("Species",
"Mean Sepal Length (± st.dev)")
) # erstellt Tabellen in LaTeX, HTML
Species | Mean Sepal Length (± st.dev) |
---|---|
Iris setosa | 5 ± 0.35 |
Iris versicolor | 5.9 ± 0.52 |
Iris virginica | 6.6 ± 0.64 |
str_sub(string, start, end)
→ Extrahiert und ersetzt Teilzeichenketten aus einem Zeichenvektor.
str_detect()
str_detect(string, pattern)
→ Identifiziert Zeichenketten, die ein bestimmtes Muster haben, und gibt einen logischen Vektor zurück.Adaptiert vom RegEx cheatsheet von Ian Kopacka.
Die Anker (‘anchors’) ^
und $
stehen für die Anfangs- und Endpunkte einer Zeichenkette.
[1] FALSE TRUE TRUE FALSE FALSE
[1] FALSE FALSE FALSE TRUE TRUE
[1] FALSE TRUE TRUE FALSE FALSE
[1] FALSE FALSE TRUE FALSE FALSE
[1] FALSE TRUE FALSE FALSE TRUE FALSE
[1] FALSE FALSE FALSE TRUE FALSE FALSE
[1] FALSE FALSE FALSE FALSE FALSE TRUE
[1] TRUE TRUE TRUE FALSE FALSE FALSE
[1] TRUE TRUE TRUE FALSE FALSE FALSE
[1] FALSE FALSE FALSE FALSE FALSE FALSE
Verwenden Sie den ODER-Operator mit runden Klammern oder das Listenformat ’[Muster1, Muster2,..] wenn nach mehreren Mustern gesucht werden soll.
str_count()
→ Zählt die Anzahl der Übereinstimmungen mit dem Muster in einer Zeichenkette.str_which()
→ Gibt die Position der Zeichenketten zurück, bei der das Muster gefunden wurde (‘wrapper’ Funktion für which(str_detect(string, pattern)
).str_subset()
→ Wählt Zeichenfolgen aus, die ein bestimmtes Muster enthalten (‘wrapper’ Funktion für x[str_detect(x, pattern)]
).str_replace()
[1] "Cruise" "Station"
[3] "Type" "yyyy.mm.ddThh.mm"
[5] "Latitude..degrees_north." "Longitude..degrees_east."
[7] "Bot..Depth..m." "PRES..db."
[9] "TEMP..deg.C." "PSAL..psu."
[11] "DOXY..ml.l."
names(hydro) <- names(hydro) |>
str_to_lower() |>
str_replace(pattern = "\\.$", replacement = "") |>
str_replace_all(pattern = "\\.+", replacement = "_")
names(hydro)
[1] "cruise" "station" "type"
[4] "yyyy_mm_ddthh_mm" "latitude_degrees_north" "longitude_degrees_east"
[7] "bot_depth_m" "pres_db" "temp_deg_c"
[10] "psal_psu" "doxy_ml_l"
str_split(string, pattern, n = Inf, simplify = FALSE)
→ Spaltet eine Zeichenkette in Teile basierend auf einem spezifischen Muster:
sharks_fb
Wir wollen aus dem sharks_fb
Datensatz im R Paket ‘marinedata’ alle Arten filtern, die auch im Pelagial vorkommen und anschl. den Gattungsnamen in einer Extraspalte abspeichern:
data(sharks_fb, package = "marinedata")
# Hier ist die Information zum Habitattyp:
levels(sharks_fb$DemersPelag)
[1] "pelagic" "benthopelagic" "demersal" "reef-associated"
[5] "bathypelagic" "bathydemersal" "pelagic-oceanic" "pelagic-neritic"
sharks_fb |> filter(str_detect(DemersPelag, "pelagic")) |>
select(Species, DemersPelag) |>
mutate(Genus = str_split(Species, " ", n = 2, simplify = TRUE)[ ,1])
# A tibble: 324 × 3
Species DemersPelag Genus
<chr> <fct> <chr>
1 Lamna nasus pelagic-oceanic Lamna
2 Cetorhinus maximus pelagic-oceanic Cetorhinus
3 Somniosus microcephalus benthopelagic Somniosus
4 Squalus acanthias benthopelagic Squalus
5 Echinorhinus cookei benthopelagic Echinorhinus
6 Centrophorus tessellatus benthopelagic Centrophorus
7 Centroscyllium granulatum bathypelagic Centroscyllium
8 Centroscyllium nigrum benthopelagic Centroscyllium
9 Etmopterus brachyurus pelagic-oceanic Etmopterus
10 Etmopterus bullisi pelagic-oceanic Etmopterus
# … with 314 more rows
03:00
[1] "Germany" "England" "Italy" "France" "Sweden" "Norway" "Portugal"
[1] "GER" "ENG" "ITA" "FRA" "SWE" "NOR" "POR"
Wie viele Arten gehören der Gattung Carcharhinus (Requiemhaie) an (bzw. sind im sharks_fb
tibble und somit in der fishbase.org Datenbank enthalten)?
Sie haben folgenden Vektor x
und wollen nun die Anfangsnullen entfernen (aus “file_001.csv” also “file_1.csv”,… machen):
[1] "file_001.csv" "file_002.csv" "file_003.csv" "file_004.csv" "file_005.csv"
[6] "file_006.csv" "file_007.csv" "file_008.csv" "file_009.csv" "file_010.csv"
Dann testen Sie doch Ihr Wissen in folgendem Abschlussquiz…
Bei weiteren Fragen: saskia.otto(at)uni-hamburg.de
Diese Arbeit is lizenziert unter einer Creative Commons Attribution-ShareAlike 4.0 International License mit Ausnahme der entliehenen und mit Quellenangabe versehenen Abbildungen.
Kurswebseite: Data Science 1