MapReduce Input Split And Custom Input Format

Wang William (WJWang)
1 min readAug 9, 2018

--

透過實作InputFormat interface來實現客制的input split

Interface InputFormat<K,V>

  1. 對job的input進行確認驗證(i.e.configuration的確認)與生效。
    Validate the input-specification of the job.
  2. 將輸入的檔案(在HDFS中為blocks)轉換成為Input Split(logical chunks of type InputSplit),而每一個Input Split將會被分別指派給Mapper進行處理。
    Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper.
  3. 實作RecordReader,用以產生InputSplit的key/value pairs(實際描述Mapper每次要處理的單元),送進Mapper處理。
    Provide the RecordReader implementation to be used to glean input records from the logical InputSplit for processing by the Mapper.

--

--

No responses yet