MapReduce Input Split And Custom Input Format
1 min readAug 9, 2018
透過實作InputFormat interface來實現客制的input split
Interface InputFormat<K,V>
- 對job的input進行確認驗證(i.e.configuration的確認)與生效。
Validate the input-specification of the job. - 將輸入的檔案(在HDFS中為blocks)轉換成為Input Split(logical chunks of type InputSplit),而每一個Input Split將會被分別指派給Mapper進行處理。
Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper. - 實作RecordReader,用以產生InputSplit的key/value pairs(實際描述Mapper每次要處理的單元),送進Mapper處理。
Provide the RecordReader implementation to be used to glean input records from the logical InputSplit for processing by the Mapper.