# First key column 'origin' matches “JFK” and second key column 'dest' matches “MIA” ![]() Setkey(mydata, origin, dest) Filtering while setting keys on Multiple Columns We can also set keys to multiple columns like we did below to columns 'origin' and 'dest'. If you look at the real time in the image above, setting key makes filtering twice as faster than without using keys. You can compare performance of the filtering process (With or Without KEY). You don't need to refer the key column when you apply filter. Setkey(mydata, origin) Note : It makes the data table sorted by the column 'origin'. In this case, we are setting 'origin' as a key in the dataset mydata. We can use 'employee ID' as a key to search a particular employee. For example, you have employee’s name, address, salary, designation, department, employee ID. It is important to set key in your dataset which tells system that data is sorted by the key column. If we do not use this algorithm, we would have to search 5 in the whole list of seven values. So we can ignore all the values that are lower than or equal to 10. Since 20 is greater than 10, it should be somewhere after 10.We would calculate the middle value i.e.You are searching the value 20 in the above list. Suppose you have the following values in a variable : If you need to select all the flights whose origin is equal to 'JFK' and carrier = 'AA'ĭat11 = mydataĭata.table uses binary search algorithm that makes data manipulation faster.īinary search is an efficient algorithm for finding a value from a sorted list of values. It involves repeatedly splitting in half the portion of the list that contains values, until you found the value that you were searching for. The following program selects all the flights whose origin is not equal to 'JFK' and 'LGA'ĭat10 = mydata Suppose you are asked to find all the flights whose origin is 'JFK'.ĭat8 = mydata Select Multiple Valuesįilter all the flights whose origin is either 'JFK' or 'LGA'ĭat9 = mydata Setnames(mydata, c("dest","origin"), c("Destination", "origin.of.flight")) To rename multiple variables, you can simply add variables in both the sides. Setnames(mydata, c("dest"), c("Destination")) In the following code, we are renaming a variable 'dest' to 'destination'. You can rename variables with setnames() function. It is same as base R's grepl() function, SQL's LIKE operator and SAS's CONTAINS function.ĭat7 = mydata You can use %like% operator to find pattern. It can be easily done by adding ! sign (implies negation in R)ĭat5 = mydataĭat6 = mydata Suppose you want to include all the variables except one column, say. You can keep second through fourth columns using the code below. Keeping multiple columns based on column position The following code tells R to select 'origin', 'year', 'month', 'hour' columns.ĭat3 = mydata In this code, we are selecting second column from mydata. Keeping a column based on column position To get result in data.table format, run the code below :ĭat1 = mydata # returns a data.table ![]() You can use the code below -ĭat1 = mydata # returns a vector The above line of code returns a vector not data.table. Suppose you need to select only 'origin' column. ![]() "cancelled" "carrier" "tailnum" "flight" "origin" "dest" "air_time" "year" "month" "day" "dep_time" "dep_delay" "arr_time" "arr_delay" It constitutes information about flights' arrival or departure time, delays, flight cancellation and destination in year 2014. This dataset contains 253K observations and 17 columns. It is equivalent to read.csv() function of base R. In data.table package, fread() function is available to read or get data from your computer or from a web page. ![]() How to Install and load data.table Package
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |